-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hmem/cuda: avoid stub loading at runtime #10365
base: main
Are you sure you want to change the base?
Conversation
bot:aws:retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename the PR/commit title to be hmem/cuda: avoid stub loading at runtime
9180553
to
1ca4e0a
Compare
Done, thanks. One other thing I'd ask reviewers to think about is to consider searching inside |
bot:aws:retest |
Please wait on an ack from @bwbarrett before merging as there was some disagreement on a similar change here and I want to make sure we're aligned. |
Note that I objected to this commit in the OFI plugin because it doesn't match what Nvidia has done in the past with NCCL. If we want to make a change, it should be to adopt |
@bwbarrett such changes were made on the ofi plugin side; do you have a problem with this approach for libfabric? |
When the CUDA toolkit is installed, a set of "stub" libraries are installed under /usr/local/cuda*/lib64/stubs/. These libraries include a SONAME field with a `.1' suffix, but the filenames of these stubs are bare. eg: > $ readelf -d /usr/local/cuda-12.5/lib64/stubs/libnvidia-ml.so | grep soname > 0x000000000000000e (SONAME) Library soname: [libnvidia-ml.so.1] The CUDA toolkit does not include any library file with the name `libnvidia-ml.so.1` (or `libcuda.so.1`, etc.), as these are provided by the driver package. This disconnect between the stub filename in the toolkit and the SONAME within it is done intentionally to allow linking with the stub at build time, while ensuring it's never loaded at runtime. In normal dynamic linking cases (ie: without dlopen), the SONAME field of `libnvidia-ml.so.1` is used in the DT_NEEDED tag, where that filename can only come from a driver package and this ensures that the stub library will never match. Match the same behavior and provide `.1` suffixes to dlopen where appropriate for NVIDIA libraries. Signed-off-by: Nicholas Sielicki <[email protected]>
1ca4e0a
to
907688f
Compare
When the CUDA toolkit is installed, a set of "stub" libraries are installed under /usr/local/cuda*/lib64/stubs/. These libraries include a SONAME field with a `.1' suffix, but the filenames of these stubs are bare. eg:
The CUDA toolkit does not include any library file with the name
libnvidia-ml.so.1
(orlibcuda.so.1
, etc.), as these are provided by the driver package. This disconnect between the stub filename in the toolkit and the SONAME within it is done intentionally to allow linking with the stub at build time, while ensuring it's never loaded at runtime.In normal dynamic linking cases (ie: without dlopen), the SONAME field of
libnvidia-ml.so.1
is used in the DT_NEEDED tag, where that filename can only come from a driver package and this ensures that the stub library will never match.Match the same behavior and provide
.1
suffixes to dlopen where appropriate for NVIDIA libraries.