[RayService][Bug] when doing multi-node serving with vLLM, non-primary worker pods reports unready. #2552

kanwang · 2024-11-18T18:02:22Z

Search before asking

I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

I followed https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/vllm/ray-service.vllm.yaml to setup vllm serving with RayService. While it works out, I see an issue when enabling multi-node inference.

I configured PIPELINE_PARALLELISM = 2 and the service started up correctly. But the second pod always report Ready = False. Looks like with this example, the Deployment is only deployed to one pod, and under the hood vLLM leverages the second pod for distributed inference. Thus, the Proxy isn't deployed to second pod and readiness check isn't passing.

Looks like the behavior was introduced by #1808

While this isn't an immediate problem (serving is working as expected), it doesn't seem right to report unready pods while it actually is ready. In addition, unready pods has other infrastructure impact on our k8s cluster (e.g. impacting node lifecycle management).

Reproduction script

Following https://docs.ray.io/en/latest/cluster/kubernetes/examples/vllm-rayservice.html or https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/vllm/ray-service.vllm.yaml and set PIPELINE_PARALLELISM=2 will reproduce this issue.

Anything else

It is always re-producible.
I don't have any solution in mind. but I am happy to brainstorm together and implement/tryout any solution if there is a suggestion!

Are you willing to submit a PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

andrewsykim · 2024-11-18T18:19:48Z

On Slack you mentioned overriding the readiness probe and then using proxy like Envoy. This could work but I think the implementation will be tricky to get right because it's not trivial to know which Pods are running the proxy actors without connecting to the Ray Cluster. Another workaround is potentially just running a dummy proxy actor on every node?

kanwang · 2024-11-18T18:49:33Z

potentially just running a dummy proxy actor on every node

yes that's another idea I thought about. basically a dummy proxy that just route traffic? I was actually hoping this config: https://docs.ray.io/en/latest/serve/api/doc/ray.serve.config.ProxyLocation.html to do that. but when I was testing it, it seems to be ignoring the second pod vLLM was using - it didn't deploy proxy on that pod.
Is there a different way to start proxy actor everywhere?

andrewsykim · 2024-11-18T19:01:55Z

Can you just deploy another Ray Serve Deployment that is doing nothing? You may need to add enough replicas / resources to ensure it runs on every node

kanwang · 2024-11-18T22:38:04Z

Will that work? I think the operator will inject readiness check based on the port of the RayService deployment (say 8000). I won't be able to make another RayServe deployment on the same port on every node, right?

also I think the proxy actor needs to know where to route the traffic. I am not sure if the second vllm port can handle requests?

kevin85421 · 2024-11-20T18:01:08Z

Ray Serve seems not providing to deploy proxy actor on every node although there are no replicas on it.

kanwang added bug Something isn't working triage labels Nov 18, 2024

kevin85421 added rayservice and removed triage labels Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RayService][Bug] when doing multi-node serving with vLLM, non-primary worker pods reports unready. #2552

[RayService][Bug] when doing multi-node serving with vLLM, non-primary worker pods reports unready. #2552

kanwang commented Nov 18, 2024

andrewsykim commented Nov 18, 2024

kanwang commented Nov 18, 2024

andrewsykim commented Nov 18, 2024

kanwang commented Nov 18, 2024

kevin85421 commented Nov 20, 2024

[RayService][Bug] when doing multi-node serving with vLLM, non-primary worker pods reports unready. #2552

[RayService][Bug] when doing multi-node serving with vLLM, non-primary worker pods reports unready. #2552

Comments

kanwang commented Nov 18, 2024

Search before asking

KubeRay Component

What happened + What you expected to happen

Reproduction script

Anything else

Are you willing to submit a PR?

andrewsykim commented Nov 18, 2024

kanwang commented Nov 18, 2024

andrewsykim commented Nov 18, 2024

kanwang commented Nov 18, 2024

kevin85421 commented Nov 20, 2024