[sysvabi64] Optional `add x16, x16, :lo12: &.got.plt[N]]` for `ld -z now` #202

MaskRay · 2023-04-11T22:10:11Z

The sysvabi64/sysvabi64.rst document contains sample PLT sequences where x16 holds the address of the .got.plt entry. Some rtld implementations, such as glibc and FreeBSD rtld, with lazy PLT resolver support use x16 to compute the PLT entry index.

However, for ld -z now, there is no need to compute the PLT entry index, so the add x16, x16, ... instruction can be removed,
although the performance effect is likely negligible.
There are multiple variants of PLT due to BTI and PAC. Allowing optional add x16, x16, ... will add another dimension and increase the number of PLT types, so there is some debate about whether we should allow this flexibility.
I created the issue on this topic, as @appujee mentioned that the instruction seems unneeded on Android. Also CC @enh-google

Note that changing the PLT sequences may affect certain programs, including disassemblers, profilers, and debuggers. However, in my experience, they are unlikely to check the existence of add x16, x16, ....
If we consider add x16, x16, ... unneeded for -z now, ideally GNU ld and ld.lld can make the change for parity.

The text was updated successfully, but these errors were encountered:

appujee · 2023-04-11T22:15:52Z

Should we create a similar bug for binutils ?

MaskRay · 2023-04-11T22:18:22Z

I think some binutils aarch64 maintainers watch this repository. This issue tracker may be the best place to discuss the possible change. The owners of this repository have a better idea which binutils aarch64 maintainers should be tagged:)

nsz-arm · 2023-04-12T08:45:12Z

this prevents plt hooking at runtime (not just lazy binding). e.g. glibc ld.so supports LD_PROFILE and LD_AUDIT even with bind-now (then plt got is not readonly). but i think there are external tools that hook plts like ltrace (although i think it only places breakpoints on the plt, but this means it has to know the plt layout). so this is not an obvious change (might need additional elf marking).

smithp35 · 2023-04-12T09:01:17Z

From the perspective of just the sysvabi64 document I'd like to write down what the minimal requirements are for the PLT sequences. I think that the two extreme approaches are:

A minimalist document the calling conventions for PLT[0] and PLT[N]. No requirements on the instruction sequences, or requirements for uniform size of PLT[N]. This would maximise the freedom for linker implementers, but make it harder for disassembly/debugging tools as they would need to handle more possible implementations.
A maximalist documentation of the calling conventions and at least some of the heuristics used by disassemblers/debuggers to recognise PLT entries. This has the opposite properties of the first.

I think our approach so far has tended towards the maximalist as we've provided some dynamic tags such as DT_AARCH64_BTI_PLT and DT_AARCH64_PAC_PLT https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst#642dynamic-section

I personally tend towards the minimalist approach in the specifications to provide more freedom for implementers who may choose different trade-offs.

I'm wondering if there is a set of properties that we represent with a combination of dynamic tags so that we don't have to keep introducing new ones. For example:

DT_AARCH64_PLT_SIZE = <size of each PLT entry, with a special value for variable sized entries>
DT_AARCH64_NOLAZY =
DT_AARCH64_PAC_PLT = <PLT entry expects .got.plt entries to be signed>
DT_AARCH64_OS_PLT = <PLT has OS specific behaviour described by one or more tags between DT_LOOS and DT_HIOS>

I think DT_AARCH64_BTI_PLT may not be required as DT_AARCH64_PLT_SIZE should work for that purpose.

For this particular case we have documented the calling convention for PLT[0], but not PLT[N] which is strongly implied by PLT[0]. I think this could be made explicit by stating that when lazy loading is permitted, ip0 (x16) contains the address of the .got.plt entry corresponding to PLT[N], contents unspecified otherwise.

As an aside there are other possibilities for the PLT entries that could help:

For small ELF files where the .got.plt is within 1 MiB of the of the .plt then adr and ldr could be used to save an adrp.
For BTI it doesn't have to embedded in the PLT entry itself. There could be a thunk/stub for every indirect branch to a PLT entry that direct branches to the PLT which does not have a BTI. This reduces the amount of BTIs at the expense of a slightly more expensive sequence for indirect calls.

Wilco1 · 2023-04-12T14:02:29Z

A larger issue is that PLTs have become less efficient with the added BTI making them 20 bytes and span multiple fetch blocks. In principle we don't need BTI in PLTs. To reduce PLT uses we could always create function addresses via a GOT load. Canonical PLTs still need a BTI. An extra thunk just containing BTI would add significant overhead, so the thunk needs to load the address from the GOT and branch (making them non-lazy).

It would be feasible to remove the ADD x16 by slightly changing the PLT: the default (unlinked) GOT entry could point to a branch associated with the PLT entry which then branches to PLT[0]. So x16 contains a unique address relating to the PLT entry. The branch can be at the begin/end of the PLT sequence (eg. ADRP/LDR/BR/B) or placed after all PLTs (which would allow using BTI for canonical PLTs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sysvabi64] Optional `add x16, x16, :lo12: &.got.plt[N]]` for `ld -z now` #202

[sysvabi64] Optional `add x16, x16, :lo12: &.got.plt[N]]` for `ld -z now` #202

MaskRay commented Apr 11, 2023

appujee commented Apr 11, 2023 •

edited

Loading

MaskRay commented Apr 11, 2023 •

edited

Loading

nsz-arm commented Apr 12, 2023

smithp35 commented Apr 12, 2023 •

edited

Loading

Wilco1 commented Apr 12, 2023

[sysvabi64] Optional add x16, x16, :lo12: &.got.plt[N]] for ld -z now #202

[sysvabi64] Optional add x16, x16, :lo12: &.got.plt[N]] for ld -z now #202

Comments

MaskRay commented Apr 11, 2023

appujee commented Apr 11, 2023 • edited Loading

MaskRay commented Apr 11, 2023 • edited Loading

nsz-arm commented Apr 12, 2023

smithp35 commented Apr 12, 2023 • edited Loading

Wilco1 commented Apr 12, 2023

[sysvabi64] Optional `add x16, x16, :lo12: &.got.plt[N]]` for `ld -z now` #202

[sysvabi64] Optional `add x16, x16, :lo12: &.got.plt[N]]` for `ld -z now` #202

appujee commented Apr 11, 2023 •

edited

Loading

MaskRay commented Apr 11, 2023 •

edited

Loading

smithp35 commented Apr 12, 2023 •

edited

Loading