Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Issues with optimizing a shared object #175

Open
kmod opened this issue Jun 18, 2021 · 7 comments
Open

Issues with optimizing a shared object #175

kmod opened this issue Jun 18, 2021 · 7 comments

Comments

@kmod
Copy link

kmod commented Jun 18, 2021

We've been using bolt successfully on our binary, but when we compile our program with -fPIC and link as a shared object and apply bolt to it, the result doesn't work correctly. I'm not exactly sure what's going on but the two things I've noticed are:

  • The behavior is different than the pre-bolt library, where it fails an assertion that should never fail
  • Debugging with gdb seems to imply that debug info is broken somehow: at the point of the assertion failure, the backtrace has broken frames in it, and the source locations are wrong (the function names are potentially right but the source locations aren't for the right files).

I assume these are related and imply that we didn't get good output from bolt, but I can't be sure.

Is there anything different we should be doing for optimizing a shared object / PIC code?

Here's how we produced the files:

LD_PRELOAD=libpython3.8-pyston2.2d.so.1.0.prebolt perf record -e cycles:u -j any,u -o libpython3.8-pyston2.2d.so.1.0.perf -- ./python3 run_profile_task.py
perf2bolt -p libpython3.8-pyston2.2d.so.1.0.perf -o libpython3.8-pyston2.2d.so.1.0.fdata libpython3.8-pyston2.2d.so.1.0.prebolt
llvm-bolt libpython3.8-pyston2.2d.so.1.0.prebolt -o libpython3.8-pyston2.2d.so.1.0 -data=pyston/build/cpython_dbgshared_install/usr/lib/libpython3.8-pyston2.2d.so.1.0.fdata -update-debug-sections -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -frame-opt=hot -use-gnu-stack
./python3 -c 'print("success")'

# This works:
LD_PRELOAD=libpython3.8-pyston2.2d.so.1.0.prebolt ./python3 -c 'print("success")'

Here are the files, let me know if there's any other info that I could provide that would be helpful.

@maksfb
Copy link
Contributor

maksfb commented Jun 18, 2021

Thanks for reporting the issue.

The only known thing to not work with .so's is -split-eh, but that should be turned off automatically and you will see a warning. -inline-all can mess debug info, but likely to a limited extend. I would start with disabling all optimizations but code ordering and check if the binary works. I would check it myself, but I need to setup a virtual machine first.

@kmod
Copy link
Author

kmod commented Jun 18, 2021

Oh good idea, I removed all the command line flags:

$ llvm-bolt libpython3.8-pyston2.2d.so.1.0.prebolt -o libpython3.8-pyston2.2d.so.1.0
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 0c14e20238604a4c05e174e71676857d45c60a0f
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x600000, offset 0x600000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function _PyEval_EvalFrameDefault
BOLT-INFO: 0 out of 7274 functions in the binary (0.0%) have non-empty execution profile
BOLT-INFO: the input contains 831 (dynamic count : 0) opportunities for macro-fusion optimization that are going to be fixed
BOLT-INFO: UCE removed 0 blocks and 0 bytes of code.
BOLT-INFO: SCTC: patched 2 tail calls (2 forward) tail calls (0 backward) from a total of 2 while removing 0 double jumps and removing 2 basic blocks totalling 10 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
BOLT-INFO: patched build-id (flipped last bit)

And the result still crashes:

$ ./python3 -c '1'
python3: ../../../Objects/dictobject.c:883: lookdict_unicode_nodummy: Assertion `ix != DKIX_DUMMY' failed.

It also crashes if I still pass the profile file.

@maksfb
Copy link
Contributor

maksfb commented Jun 18, 2021

Thanks for trying that. I will take a look.

@kmod
Copy link
Author

kmod commented Jun 19, 2021

When I pass the --update-debug-sections flag and no other flags, the source locations are correct now, but there are still a couple bad frames in the gdb backtrace. I believe that one of the two functions in question is _PyEval_EvalFrameDefault, which was mentioned during the bolt run as being notable for having a PIC jump table, in case that's helpful.

@maksfb
Copy link
Contributor

maksfb commented Jul 1, 2021

There is an issue with what looks like a computed goto in _PyEval_EvalFrameDefault. I suspect the effect is limited to just this function (interpreter loop?), so you can try to disable its optimization with -skip-funcs=_PyEval_EvalFrameDefault while I think of a proper solution.

@kmod
Copy link
Author

kmod commented Jul 8, 2021

That didn't quite do it, but after skipping every function mentioned by BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function XXX I got things working.

Just in case it's relevant, we compile _PyEval_EvalFrameDefault with -Os

@maksfb
Copy link
Contributor

maksfb commented Jul 8, 2021

That's good to know. Although, it's quite unexpected. You can also disable processing functions with jump tables using -jump-tables=none option.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants