[RayJob][Feature] add light weight job submitter in kuberay image #2587
+358
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Currently, noted in the issue #2537, when a user comes with a
RayJob
CR, KubeRay uses the same image as the RayCluster to start another container to submit the Ray Job. However, if the container runs on a node without the image preloaded, it takes a long time to download the image and start since the image is usually large.This PR adds a light submitter (45MB) that mimics the
ray job submit
behavior (submit + tail logs) into the KubeRay image which is usually smaller than the image used in the RayCluster. Users can try it with thesubmitterPodTemplate
in their RayJob CR.Example RayJob CR yaml:
And, this submitter will not fail when the job has already been submitted thus will also solve #2154.
Related issue number
#2537
Checks