Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RayService][Bug] when doing multi-node serving with vLLM, non-primary worker pods reports unready. #2552

Open
1 of 2 tasks
kanwang opened this issue Nov 18, 2024 · 5 comments
Labels
bug Something isn't working rayservice

Comments

@kanwang
Copy link

kanwang commented Nov 18, 2024

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

I followed https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/vllm/ray-service.vllm.yaml to setup vllm serving with RayService. While it works out, I see an issue when enabling multi-node inference.

I configured PIPELINE_PARALLELISM = 2 and the service started up correctly. But the second pod always report Ready = False. Looks like with this example, the Deployment is only deployed to one pod, and under the hood vLLM leverages the second pod for distributed inference. Thus, the Proxy isn't deployed to second pod and readiness check isn't passing.

Looks like the behavior was introduced by #1808

While this isn't an immediate problem (serving is working as expected), it doesn't seem right to report unready pods while it actually is ready. In addition, unready pods has other infrastructure impact on our k8s cluster (e.g. impacting node lifecycle management).

Reproduction script

Following https://docs.ray.io/en/latest/cluster/kubernetes/examples/vllm-rayservice.html or https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/vllm/ray-service.vllm.yaml and set PIPELINE_PARALLELISM=2 will reproduce this issue.

Anything else

  • It is always re-producible.

  • I don't have any solution in mind. but I am happy to brainstorm together and implement/tryout any solution if there is a suggestion!

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kanwang kanwang added bug Something isn't working triage labels Nov 18, 2024
@andrewsykim
Copy link
Collaborator

On Slack you mentioned overriding the readiness probe and then using proxy like Envoy. This could work but I think the implementation will be tricky to get right because it's not trivial to know which Pods are running the proxy actors without connecting to the Ray Cluster. Another workaround is potentially just running a dummy proxy actor on every node?

@kanwang
Copy link
Author

kanwang commented Nov 18, 2024

potentially just running a dummy proxy actor on every node

yes that's another idea I thought about. basically a dummy proxy that just route traffic? I was actually hoping this config: https://docs.ray.io/en/latest/serve/api/doc/ray.serve.config.ProxyLocation.html to do that. but when I was testing it, it seems to be ignoring the second pod vLLM was using - it didn't deploy proxy on that pod.
Is there a different way to start proxy actor everywhere?

@andrewsykim
Copy link
Collaborator

Can you just deploy another Ray Serve Deployment that is doing nothing? You may need to add enough replicas / resources to ensure it runs on every node

@kanwang
Copy link
Author

kanwang commented Nov 18, 2024

Will that work? I think the operator will inject readiness check based on the port of the RayService deployment (say 8000). I won't be able to make another RayServe deployment on the same port on every node, right?

also I think the proxy actor needs to know where to route the traffic. I am not sure if the second vllm port can handle requests?

@kevin85421
Copy link
Member

Ray Serve seems not providing to deploy proxy actor on every node although there are no replicas on it.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rayservice
Projects
None yet
Development

No branches or pull requests

3 participants