Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for --shm_size and --gpus options in Rocker #304

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

SamratThapa120
Copy link

This PR introduces support for two new options, --shm_size and --gpus, which are equivalent to the respective arguments in Docker. These options provide greater flexibility and usability when customizing container environments, particularly for resource-intensive and GPU-dependent workloads.

Copy link
Collaborator

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall both of these would be great. I think that the --shm-size pass through completely makes sense. it can be merged quickly if we add tests.

The --gpus option should be thought about a bit more as to how it will dovetail with the current --nvidia option. And hopefully can replace the current recommendation for --device /dev/dri/card0


def get_docker_args(self, cliargs):
args = ''
shm_size = cliargs.get('gpus', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some copy and paste here

setup.py Outdated
@@ -52,6 +52,8 @@
'expose = rocker.extensions:Expose',
'git = rocker.git_extension:Git',
'group_add = rocker.extensions:GroupAdd',
'shm_size = rocker.extensions:ShmSize',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in alphabetical order.

@SamratThapa120
Copy link
Author

SamratThapa120 commented Dec 7, 2024

Thank you for the review. I have made the requested changes, and added tests for both new features.

The --gpus option should be thought about a bit more as to how it will dovetail with the current --nvidia option. And hopefully can replace the current recommendation for --device /dev/dri/card0

In addition to replace the current recommendation for --device /dev/dri/card0, the new --gpus feature is helpul (in my case) compared to the --nvidia gpus option when you want to keep the image minimal without unnecessary opengl libraries.

@SamratThapa120 SamratThapa120 requested a review from tfoote December 7, 2024 08:21
@SamratThapa120
Copy link
Author

@tfoote I have also modified the --gpus argument to be compatible with --nvidia. If --gpus is used with --nvidia, the gpu ids specified by --gpus will be visible inside container, but if --gpus is not specified, it will result in default behaviour of --nvidia which is to set --gpus to all. If --gpus is used without nvidia, no OpenGL libraries are installed, and only the specified gpus are made available to the container.

Copy link
Collaborator

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests. This looks like it's basically ready to merge. The interactions passing through the gpu arguments to --nvidia is a great way to get them to work together. I have one request for the code organization, but then this looks good to go.

help="Set the size of the shared memory for the container (e.g., 512m, 1g).")


class Gpus(RockerExtension):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the cross coupling can you move this to the nvidia_extensions file instead? It's not explicitly nvidia but I'd like to keep them colocated. And that file could potentially be renamed in the future with it's expanded scope. The X11 class is already not nvidia tied.

args = p.get_docker_args(mock_cliargs)
self.assertIn('--shm-size 12g', args)

class GpusExtensionTest(unittest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also go into the test_nvidia.py in parallel with moving the implementation

@SamratThapa120
Copy link
Author

@tfoote Thank you. I have made the requested changes.

Copy link
Collaborator

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup.

Unfortunately I tried testing this out on my non-nvidia machine and was quite disappointed to discover that it doesn't work. On deeper reading the --gpus option only works with NVIDIA gpus, and not generic gpus or integrated ones.

https://docs.docker.com/reference/cli/docker/container/run/#gpus
The --gpus flag allows you to access NVIDIA GPU resources. First you need to install the [nvidia-container-runtime](https://nvidia.github.io/nvidia-container-runtime/).

So I think that it makes sense to just collapse this into the nvidia plugin implementation. I'd like to have the argument be --nvidia-gpus but we can provide an alias to --gpus to mirror the upstream command line syntax.

And to support your use case of using nvidia without the content being added into the container we could extend the policy to support NVIDIA_GLVND_VALID_VERSIONS with None as a valid option.

Potentially as these tools get more integrated this might be able to become a default policy. A shortcut option --nvidia-no-glvnd that sets that policy would be reasonable to consider, but I'm not sure it's worth the extra complexity.


@staticmethod
def register_arguments(parser, defaults={}):
parser.add_argument('--gpus',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser.add_argument('--gpus',
parser.add_argument('--nvidia-gpus', '--gpus',

And this would move up into the nvidia plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants