Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Support for Symbolizing Thread-Local Data in .tbss? #246

Open
scottconstable opened this issue Oct 27, 2021 · 7 comments
Open

Support for Symbolizing Thread-Local Data in .tbss? #246

scottconstable opened this issue Oct 27, 2021 · 7 comments

Comments

@scottconstable
Copy link

I'm trying to write a pass that inserts some instrumentation code, and which will store statistics in TLS (for example, .tbss). But I cannot figure out how to find MCSymbols for thread-local data. When I write:

for (auto &Symbol : BC.GlobalSymbols)
    outs() << Symbol.first() << '\n';

I see all of the symbols in .data and .bss, but I can't see any data that a given binary exposes in .tbss. I also do not see any members or methods on BinaryContext that reference TLS.

Is this supported? If not, are there plans to support it? Is there a workaround?

@yota9
Copy link
Contributor

yota9 commented Oct 27, 2021

Hello @scottconstable !
The TLS sections are ignored and mark as non allocatable in BinarySection isAllocatable() check during discoverFileObjects. Also be aware that there are some check like for address == 0 later, and it is OK for tls to have 0 address since basically it is the offset from TLS phdr. The are currently no reason to import this symbols since bolt doesn't change the tbss/tdata section, so in order to add something to them you will need to expand them, which will result also results in elf sections & in phdr patching. It is not too easy to do.

@scottconstable
Copy link
Author

Hi @yota9 !
Thank you for the prompt response. I was considering either (a) inserting the new TLS symbol with BOLT, or (b) defining the new TSL symbol in the target program's source code. I would have preferred (a), but given your answer it sounds like I should settle for (b).

So suppose I have a program like this:

__thread uint64_t basic_block_count = 0;
int main() {
    printf("Hello, world!\n");
}

Ideally I would like to be able to compile a hello binary and then have BOLT load the binary, find the basic_block_count symbol, and insert an INC instruction at the beginning of each basic block to increment basic_block_count. I think I can do this with a lot of effort by scanning .dynsym and .dynstr to find basic_block_count's offset within .tbss and then manually constructing the FS-relative offset to form the memory address. But, is there an easier way to do this with BOLT APIs?

@yota9
Copy link
Contributor

yota9 commented Oct 27, 2021

The second approach is easier since you've already have the symbol and reserved space, it's true. After a few changes in discoverFileObjects you will be able to get your symbol by name in your pass and insert target-specific code to each BB to access tls variable and increment it. There is no ready functionality to do it but it doesn't sound like very difficult task..

@scottconstable
Copy link
Author

@yota9 your analysis above was perfect. I added just 2 LoC into discoverFileObjects:

  1. I changed the isSymbolInMemory functor to return true if the symbol belongs to TLS.
  2. I changed the null-address check so that it is only applied to non-TLS symbols.

Now my pass can see the TLS symbols in the BinaryContext symbol table. However, when I create an MCInst that references one of these TLS symbols, BOLT emits a RIP-relative memory access, whereas I would have expected something like:

incq %fs:FFFFFFFFFFFFFFF8

Here is my pass code:

  auto &MIB = *BC.MIB;
  auto *BasicBlockCounterSymbol = BC.Ctx->lookupSymbol("__basic_blocks");
  assert(BasicBlockCounterSymbol && "Could not find symbol");
  MCInst IncInst;
  MIB.createIncMemory(IncInst, BasicBlockCounterSymbol, BC.Ctx.get());
  for (auto &It : BC.getBinaryFunctions()) {
    BinaryFunction &Function = It.second;
    for (BinaryBasicBlock &BB : Function) {
      BB.insertInstruction(BB.begin(), IncInst);
    }
  }

Do you have any suggestions?

@yota9
Copy link
Contributor

yota9 commented Oct 28, 2021

@scottconstable This is what I was talking about, that you will need to create your target-specific code. The createIncMemory doesn't know that this is the TLS symbol. I'm not sure about x86, I assume it is similar to ARM - you need access thread-register, knowing the data structure stored in it you will be able to access data knowing the offset (address) of TLS variable. Since it won't change you can use getAddress() as offset. But the createIncMemory doesn't access thread register & etc, it just increments the value in the memory, which is not right in this case. You will need to create your own smth like createIncTlsMemory to do it.

@scottconstable
Copy link
Author

Thanks @yota9. I am now able to get the TLS offset in my pass by doing the following:

  const BinaryData *CounterBD = BC.getBinaryDataByName("__basic_blocks");
  uint64_t CounterOffset = CounterBD->getAddress();

I then use LLVM's existing MC infrastructure to build instructions with TLS-relative memory operands.

I have one more question. Sometimes the __basic_blocks symbol is imported by BOLT (or maybe by LLVM) with a numeric suffix, for example __basic_blocks/1. I can't figure out why this happens on some binaries but not others. I tried BinaryContext::postProcessSymbolTable() but this made no difference. Is there a reliable way to always find a unique symbol?

@maksfb
Copy link
Contributor

maksfb commented Nov 2, 2021

/<N> is appended to all local symbol names to differentiate between multiple local symbols with the same name. Check the attributes of the symbol the compiler/linker is generating.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants