Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate client behaviour in a case of target pod/node restart #252

Open
mtrunkat opened this issue May 30, 2022 · 6 comments
Open

Investigate client behaviour in a case of target pod/node restart #252

mtrunkat opened this issue May 30, 2022 · 6 comments
Labels
backend Issues related to the platform backend. bug Something isn't working. t-platform Issues with this label are in the ownership of the platform team.

Comments

@mtrunkat
Copy link
Member

From this discussion https://apifier.slack.com/archives/C013WC26144/p1653552365035479, it seems that sometimes there is a series of network errors that lead to a suspicion that the client might be retrying the requests to the same pod although it's dead.

2022-05-16T00:38:56.894Z WARN  ApifyClient: API request failed 4 times. Max attempts: 9.
2022-05-16T00:38:56.897Z Cause:Error: aborted
2022-05-16T00:38:56.899Z     at connResetException (node:internal/errors:692:14)
2022-05-16T00:38:56.901Z     at Socket.socketCloseListener (node:_http_client:414:19)
@mtrunkat mtrunkat added the bug Something isn't working. label May 30, 2022
@mnmkng
Copy link
Member

mnmkng commented May 30, 2022

I think it might be because of the keepalive connections and HTTPS tunneling. How does the client learn that the pod is down and it should retry elsewhere?

@fnesveda
Copy link
Member

fnesveda commented Jun 1, 2022

Note: We could test this on multistaging by starting two API pods, starting an actor which uses the API in a loop, and then we would kill one of the two pods. We could also make a testing version of the client with some more debug logging to help us figure it out.

@fnesveda fnesveda added medium priority Medium priority issues to be done in a couple of sprints. next sprint Check this out when planning next sprint. labels Jun 1, 2022
@fnesveda fnesveda added this to the 40th sprint - Platform team milestone Jun 6, 2022
@jirimoravcik
Copy link
Member

2 pod multistaging here https://github.com/apify/apify-core/pull/6934

@drobnikj
Copy link
Member

It looks like keepalive doesn't work it will not propagate through the application load balancer and the requests are distributed between pods.
There is a list of pods used for each API call, I was doing get run API call from the same apify client instance every 0,5 s.
Because I have just 2 pods and ALB uses a round-robin schema the pods were switched each request.

0: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
1: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
2: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
3: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
4: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
5: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
6: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
7: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
8: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
9: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
10: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
11: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
12: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
13: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
14: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
15: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
16: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
17: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
18: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
19: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
20: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
21: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
22: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
23: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
24: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
25: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
26: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
27: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
28: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
29: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
30: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
31: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
32: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
33: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
34: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"

If you restart one node it simply switches to the new one.

@drobnikj
Copy link
Member

If we want to support keep-alive headers we probably need some changes on ALB or elsewhere in platform networking. Not sure if it can affect users that it is not working right now, but it probably didn't work from a time when we start using ALB, cc @dragonraid @mnmkng

@drobnikj
Copy link
Member

drobnikj commented Jul 11, 2022

I move this to the icebox and we can follow up once the issue appears again. It looks like some network or any other error. But hard to say two months after, we do not have any logs and the issue didn't appear in the same actor again till this report.

@drobnikj drobnikj removed next sprint Check this out when planning next sprint. medium priority Medium priority issues to be done in a couple of sprints. labels Jul 11, 2022
@fnesveda fnesveda added the t-platform Issues with this label are in the ownership of the platform team. label Jul 19, 2022
@fnesveda fnesveda removed this from the 42nd sprint - Platform team milestone Aug 9, 2022
@fnesveda fnesveda added the backend Issues related to the platform backend. label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Issues related to the platform backend. bug Something isn't working. t-platform Issues with this label are in the ownership of the platform team.
Projects
None yet
Development

No branches or pull requests

5 participants