-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sysvabi64] Optional add x16, x16, :lo12: &.got.plt[N]]
for ld -z now
#202
Comments
Should we create a similar bug for binutils ? |
I think some binutils aarch64 maintainers watch this repository. This issue tracker may be the best place to discuss the possible change. The owners of this repository have a better idea which binutils aarch64 maintainers should be tagged:) |
this prevents plt hooking at runtime (not just lazy binding). e.g. glibc ld.so supports LD_PROFILE and LD_AUDIT even with bind-now (then plt got is not readonly). but i think there are external tools that hook plts like ltrace (although i think it only places breakpoints on the plt, but this means it has to know the plt layout). so this is not an obvious change (might need additional elf marking). |
From the perspective of just the sysvabi64 document I'd like to write down what the minimal requirements are for the PLT sequences. I think that the two extreme approaches are:
I think our approach so far has tended towards the maximalist as we've provided some dynamic tags such as I personally tend towards the minimalist approach in the specifications to provide more freedom for implementers who may choose different trade-offs. I'm wondering if there is a set of properties that we represent with a combination of dynamic tags so that we don't have to keep introducing new ones. For example:
I think For this particular case we have documented the calling convention for As an aside there are other possibilities for the PLT entries that could help:
|
A larger issue is that PLTs have become less efficient with the added BTI making them 20 bytes and span multiple fetch blocks. In principle we don't need BTI in PLTs. To reduce PLT uses we could always create function addresses via a GOT load. Canonical PLTs still need a BTI. An extra thunk just containing BTI would add significant overhead, so the thunk needs to load the address from the GOT and branch (making them non-lazy). It would be feasible to remove the ADD x16 by slightly changing the PLT: the default (unlinked) GOT entry could point to a branch associated with the PLT entry which then branches to PLT[0]. So x16 contains a unique address relating to the PLT entry. The branch can be at the begin/end of the PLT sequence (eg. ADRP/LDR/BR/B) or placed after all PLTs (which would allow using BTI for canonical PLTs). |
The
sysvabi64/sysvabi64.rst
document contains sample PLT sequences where x16 holds the address of the.got.plt
entry. Some rtld implementations, such as glibc and FreeBSD rtld, with lazy PLT resolver support use x16 to compute the PLT entry index.However, for
ld -z now
, there is no need to compute the PLT entry index, so theadd x16, x16, ...
instruction can be removed,although the performance effect is likely negligible.
There are multiple variants of PLT due to BTI and PAC. Allowing optional
add x16, x16, ...
will add another dimension and increase the number of PLT types, so there is some debate about whether we should allow this flexibility.I created the issue on this topic, as @appujee mentioned that the instruction seems unneeded on Android. Also CC @enh-google
Note that changing the PLT sequences may affect certain programs, including disassemblers, profilers, and debuggers. However, in my experience, they are unlikely to check the existence of
add x16, x16, ...
.If we consider
add x16, x16, ...
unneeded for-z now
, ideally GNU ld and ld.lld can make the change for parity.The text was updated successfully, but these errors were encountered: