Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RayJob][Feature] add light weight job submitter in kuberay image #2587

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rueian
Copy link
Contributor

@rueian rueian commented Nov 27, 2024

Why are these changes needed?

Currently, noted in the issue #2537, when a user comes with a RayJob CR, KubeRay uses the same image as the RayCluster to start another container to submit the Ray Job. However, if the container runs on a node without the image preloaded, it takes a long time to download the image and start since the image is usually large.

This PR adds a light submitter (45MB) that mimics the ray job submit behavior (submit + tail logs) into the KubeRay image which is usually smaller than the image used in the RayCluster. Users can try it with the submitterPodTemplate in their RayJob CR.

Example RayJob CR yaml:

apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-sample
spec:
  rayClusterSpec:
    ...
  submitterPodTemplate:
    spec:
      restartPolicy: Never
      containers:
        - name: my-custom-rayjob-submitter-pod
          image: kuberay/operator:nightly
          command: ["/submitter"]
          args: ["--runtime-env-json", '{"pip":["requests==2.26.0","pendulum==2.1.2"],"env_vars":{"counter_name":"test_counter"}}', "--", "python", "/home/ray/samples/sample_code.py"]

And, this submitter will not fail when the job has already been submitted thus will also solve #2154.

Related issue number

#2537

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
image

@rueian rueian force-pushed the light-weight-job-submitter branch 2 times, most recently from 43016fd to 5daae7b Compare November 28, 2024 06:21
@rueian rueian force-pushed the light-weight-job-submitter branch 17 times, most recently from 1e6811e to 6a0092f Compare November 30, 2024 16:13
@rueian rueian marked this pull request as ready for review December 1, 2024 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants