-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
control over non-api symbol declarations #923
Comments
to show what kind of thing i currently have, e.g.
|
I submitted my proof of concept integration: chimera-linux/cports#2673 |
Very cool! I see what you mean -- I thought Already, this is a bit of trouble -- I have all these declarations in I think you are asking to upstream all the internal declarations with a |
that is what i am doing, the single i'm just about ready to land the PR soon (i'm mostly tweaking option values now, but i'm already running it on my machines, desktop and all)
|
or rather, instead of using this lets me expose the glue cleanly as well as tweak some stuff (i found that fully hardened build is actually broken and significantly slower in some scenarios, but i still want things like freelist encoding in place etc.) |
i have landed this by the way; i think the declarations patching makes the bigger bulk of the changes, everything else is some minimal plumbing changes (some in the libc, some in the allocator, pretty much just for process init + thread teardown + since we cannot do thread-local storage in libc code due to ldso shenanigans, i just store the thread-local heap pointer inside pthread non-abi section and that works good enough) |
Thank you for the info; (and if you make a PR please base it on the
|
the double free checking was pretty cheap in most things i tested (either benchmarks or real world things) so i left it at 4, however enabling padding made realloc-heavy things about 2.5x slower (notably our sort(1) went from 0.8s to 2s-ish for some things) padding checks result in firefox refusing to play videos with sound; some debugging reveals this is due to padding value checks (only those, the canary check is ok) failing in a free() in libnuma global constructor function when it's dlopened as a dependency of ffmpeg - this happens with my integration as well as when using mimalloc as a preload library with otherwise untouched libc - forcing a preload for libnuma too makes it stop happening, however i have not been able to figure out why this happens yet (i suspect the dlopen constructor stuff is a red herring and it happens either due to a bug in mimalloc or in firefox, as firefox does some mozalloc override shenanigans) - i was going to look into it later, since it happens with preload as well it's a lot safer like this |
i think what i have is pretty much as efficient as it can get; since it has access to the internal structure of the pthread implementation, all it has to do is 1) get the thread pointer (this is done via per-arch asm and is inlined) 2) from it retrieve the pthread struct (either it's that directly or it's at an offset depending on architecture) 3) get the right field (everything here is inlined) i ran mimalloc-bench and some other things and with MI_SECURE=0 it performs pretty much exactly like a regular standalone build, with my particular hardening setup it's somewhere inbetween standard MI_SECURE=0 and MI_SECURE=4, still performs very well i'll try to think of some clean abstraction in case we want to upstream this (my current thing is pretty specific to my particular musl integration, though otoh besides the declarations patching it's very slim so it's trivial to maintain downstream) |
to provide more detail for the firefox thing, this is what triggers the failure: https://github.com/numactl/numactl/blob/master/libnuma.c#L435 it's strange because the mallocs/reallocs of that happen in the same function it's triggered from numa_init: https://github.com/numactl/numactl/blob/master/libnuma.c#L95 the backtrace we get is something like so: https://img.ayaya.dev/9hNSGioGScjh what actually fails in free.c is this: https://github.com/microsoft/mimalloc/blob/dev/src/free.c#L465 (the padding is decoded correctly) this only happens when triggered from the constructor called by dlopen like http://git.musl-libc.org/cgit/musl/tree/ldso/dynlink.c#n2217 i initially thought this was related to the tls being grown which could be perhaps corrupting my heap pointer somehow, but further investigation ruled that out; everything seems to stay intact (and we don't use tls at all here technically, since we store the heap pointer inside pthread instead, and there is no other tls-allocated storage in mimalloc, and it happens identically with a preloaded mimalloc and otherwise untouched libc, which should be fine and supported) |
I don't know anything about if (nodemask_sz == 0) {/* fall back on error */
int pol;
unsigned long *mask = NULL;
nodemask_sz = 16;
do {
nodemask_sz <<= 1;
mask = realloc(mask, nodemask_sz / 8);
if (!mask)
return;
} while (get_mempolicy(&pol, mask, nodemask_sz + 1, 0, 0) < 0 && errno == EINVAL &&
nodemask_sz < 4096*8);
free(mask);
} https://github.com/numactl/numactl/blob/master/libnuma.c#L435
The docs say If the mode argument is not NULL, then get_mempolicy() will store the policy mode and any but that makes no sense (as Anyway, just my 2c. |
oh, you might actually be right - i was too preoccupied with "free() fails on stuff that was obviously reallocated in the same code block" that i did not consider that i will test this (that said, the padding checking is still too expensive in my testing to enable in prod, unfortunately) |
fixed it: chimera-linux/cports@9cc6ce3 the correct value is |
fixed it properly... i think: chimera-linux/cports@c9bd817 (why is arithmetic so hard :P) |
Ha, great that the mimalloc padding was working as intended and caught the bug :-) Bugs in "fail paths" are usually tricky to catch. Not sure how connected you are in the Linux community but perhaps the same issue exists in other distributions or libnuma and worth fixing upstream? |
I'm currently evaluating using mimalloc as libc allocator in Chimera Linux (https://chimera-linux.org). Currently, we have LLVM's Scudo integrated into our libc but testing shows that mimalloc may be a better fit.
However, the API of mimalloc is extensive, while I only require to expose a handful of symbols. I'd like to make the other symbols local in order to avoid polluting the static libc.a (in shared libc.so, symbol visibility takes care of it already). In order to integrate the allocator I am using the object build (i.e. everything is a single translation unit) so I can easily mark every function (except my specific allocator entrypoints)
static
and get what I want.For public API symbols, handling this is easy; I just do
#define mi_decl_export static
and that takes care of it. For internal symbols which do not have a tag this is harder; currently I have to patch every declaration the headers. Would it be possible to have a (by default empty) macro tagging these internal symbols so I get to reduce the amount of integration patching I have to do downstream? This should apply to any functions as well asextern
variables; e.g. my currentmimalloc.o
has(all of which are my symbols that hook into the libc internally, except the last two which become libc public API)
The text was updated successfully, but these errors were encountered: