-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate client behaviour in a case of target pod/node restart #252
Comments
I think it might be because of the keepalive connections and HTTPS tunneling. How does the client learn that the pod is down and it should retry elsewhere? |
Note: We could test this on multistaging by starting two API pods, starting an actor which uses the API in a loop, and then we would kill one of the two pods. We could also make a testing version of the client with some more debug logging to help us figure it out. |
2 pod multistaging here https://github.com/apify/apify-core/pull/6934 |
It looks like keepalive doesn't work it will not propagate through the application load balancer and the requests are distributed between pods.
If you restart one node it simply switches to the new one. |
If we want to support keep-alive headers we probably need some changes on ALB or elsewhere in platform networking. Not sure if it can affect users that it is not working right now, but it probably didn't work from a time when we start using ALB, cc @dragonraid @mnmkng |
I move this to the icebox and we can follow up once the issue appears again. It looks like some network or any other error. But hard to say two months after, we do not have any logs and the issue didn't appear in the same actor again till this report. |
From this discussion https://apifier.slack.com/archives/C013WC26144/p1653552365035479, it seems that sometimes there is a series of network errors that lead to a suspicion that the client might be retrying the requests to the same pod although it's dead.
The text was updated successfully, but these errors were encountered: