Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Injected functions & AssignSections pass & PatchEntries pass #276

Open
ghost opened this issue Feb 9, 2022 · 3 comments
Open

Injected functions & AssignSections pass & PatchEntries pass #276

ghost opened this issue Feb 9, 2022 · 3 comments

Comments

@ghost
Copy link

ghost commented Feb 9, 2022

Hello there, I've been working on original code (which goes to .bolt.org.text section) modifications, in particular - replacing local function calls made via PLT with direct calls, but it's not that matters (yes, these things happen sometimes).

In my case, I had to create 1-instruction injected BinaryFunctions at the places I want to replace call target as far as we cannot modify original code functions body entirely. And I had to do it in RewriteInstance scope for some reason.
The solution was based on PatchEntries pass code for its simplicity.

And the issue I faced was: at every targeted call instruction offset in produced binary there were blocks of X equal replaced call instructions, where X is the total amount of injected functions BOLT made in process of PLT calls replacing, so the sample code:

  f76660:	34000fe0 	cbz	w0, f7685c <_Z19Online_UpdateModuleP16PROCESS_INSTANCE@@Base+0x2a4>
  f76664:	97ef626b 	bl	b4f010 <_Z28Online_IsNetworkAbortPendingv@plt>
  f76668:	b94206c8 	ldr	w8, [x22, #516]
  f7666c:	2a000108 	orr	w8, w8, w0
  f76670:	35000f68 	cbnz	w8, f7685c <_Z19Online_UpdateModuleP16PROCESS_INSTANCE@@Base+0x2a4>

became the following:

  f76660:	34000fe0 	cbz	w0, f7685c <_Z25Online_IsUserAbortPendingv@@Base+0x2a8>
  f76664:	94e29464 	bl	481b7f4 <_Z20Lockstep_HasDivergedP19LOCKSTEP_DIVERGENCE@@Base+0x1c>
  f76668:	94e29463 	bl	481b7f4 <_Z20Lockstep_HasDivergedP19LOCKSTEP_DIVERGENCE@@Base+0x1c>
  f7666c:	94e29462 	bl	481b7f4 <_Z20Lockstep_HasDivergedP19LOCKSTEP_DIVERGENCE@@Base+0x1c>
  f76670:	35000f68 	cbnz	w8, f7685c <_Z25Online_IsUserAbortPendingv@@Base+0x2a8>

in case of requested replacing of total 3 PLT calls (global binary scope). The left 2 places are the same - 3 copies of replaced calls (all with unique opcodes).

Here, the replaced call should be only at offset f76664.

The root of this issue is: BOLT has AssignSections pass, which assigns sections for the functions, Injected functions as well - they go to .text.injected section as, I believe, is made 'by design' of Injected functions variant.

Later, RewriteInstance::mapCodeSections() maps the functions to their sections (output is added manually):

Setting function SYM1.PLT.org.1/ ImageSize to section .text.injected, .text.injected size 12
Setting function SYM2.PLT.org.2/ ImageSize to section .text.injected, .text.injected size 12
Setting function SYM3.PLT.org.3/ ImageSize to section .text.injected, .text.injected size 12

And this is the point - .text.injected section size is computed of sum size of all injected functions that assigned to it.

But! This does not concern injected functions created by PatchEntries pass as it runs after AssignSections pass!

Setting function SYM1.org.0/ ImageSize to section .local.text.SYM1.org.0/, .local.text.SYM1.org.0/ size 4
Setting function SYM2.org.0/ ImageSize to section .local.text.SYM1.org.0/, .local.text.SYM2.org.0/ size 4
Setting function SYM3.org.0/ ImageSize to section .local.text.SYM1.org.0/, .local.text.SYM3.org.0/ size 4

As I see, injected trampolines made by PatchEntries pass are assigned to local function sections, which are unique, and, thus, are resolved by size 4 (1 instruction).

To succeed my purposes, I had to add a special property to BinaryFunction class, and check for it in AssignSections::runOnFunctions() to avoid assigning .text.injected sections to those functions.

And finally, what I have to suggest here is to do one of the following:
a) Somehow document main idea of Injected BinaryFunction design, because from the point I see - they really used only in PatchEntries pass, but from the start they are meant to emit into separated binary section (in what case they are really not doing that);
b) somehow document the requirement of layout of every pass relative to the AssignSections pass. With FinalizeFunctions pass we have at least the comment about it should be running very last, same warning about ReorderFunctions pass etc.
c) Using PatchEntries pass without any comment about avoiding functions it creates being 'legacy-assigned' to separate injected section also looks like undocumented hack, because (at least) I had to spend not 1 hour inspecting the unique behaviour of these functions before I finally figured this out.

Consider this as a code documentation suggestion, its not looking like a bug, but its definitely some non-obvious usage of injected functions, which 'ought' to emit in one way, but 'could' be used the other.

@rafaelauler
Copy link
Contributor

Thanks for writing this, I appreciate your suggestions. I will take a look on how to translate this into better documentation for AssignSections and injected functions. By the way, we moved our repo to the upstream LLVM repo and this repo here is deprecated. We are tracking https://discourse.llvm.org/c/subprojects/bolt/68 for BOLT-related discussions. But I'm not sure if discourse is a good way to replace the github issues of facebookincubator repo. Any ideas, @maksfb ?

@maksfb
Copy link
Contributor

maksfb commented Feb 10, 2022

@alexmechanic, you can file new issues at https://github.com/llvm/llvm-project using BOLT tag. Did you look at PLTCall pass? It sounds similar to what you are trying to accomplish. Modifications are required for AArch64, though.

@ghost
Copy link
Author

ghost commented Feb 10, 2022

@maksfb thanks for the tip about issuing new topics, it’s been a while since I forked this project to improve.
Yes, I took a look at the PLTCall pass, but I have a different goal. Pass replaces PLT calls with GOT calls, and I need to fix-replace local calls made via PLT with direct calls.

  • I cannot really provide an example binary here, but I have a quite good idea when it can happen that binary has some calls for functions, which are local to the binary, but made via PLT: this can happen if initial binary No.1 (e.g. library) had exported functions which were used by some other binary No.2, of course, via PLT, but later they were merged into single object (e.g. using ar), so that calls from object No.2 that were non-local became local, but that tool did not take this fact into account and left those calls be. Obviously, this reduces overall performance of (now) useless PLT usage.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants