Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Something weird with the schedules is going on #12

Open
francistogram opened this issue Feb 21, 2024 · 5 comments
Open

Bug: Something weird with the schedules is going on #12

francistogram opened this issue Feb 21, 2024 · 5 comments
Labels
bug Something isn't working planned 🎉

Comments

@francistogram
Copy link

francistogram commented Feb 21, 2024

Not sure if this is related to the last issue or not but something weird is going on this morning for schedule 8qhW7k67PONn0IQGzzeE which is set to run every 5 minutes

6:50am CST works fine
image

6:55am CST works fine (failure is something on my side)
image

Then it skips 7am and runs at 7:04:56am so likely the job was delayed
image

Runs at 7:05am
image

Then something weird happens again

Runs at 7:10:01 which is correct
image

But also runs at 7:10:56 which doesn't make any sense
image

Any idea what's going on here @danmindru?

Not sure if related to the issue from last week #9

@francistogram
Copy link
Author

My guess is that it's not related to the last error and my suspicion is that it's actually related to the timeout given that some other jobs scheduled to run every 5 minutes e.g.

  • vIehCTtgKRhy6A9C2Tr8
  • EWlEr0PHHPbqKIudxky2

Did not have any issues

@francistogram
Copy link
Author

francistogram commented Feb 21, 2024

Seems like this execution at 6:55am CST was running for 943s or 15m and 43s
image

I checked the vercel logs for the endpoint and don't see anything on my side and the other interesting thing is that I have my timeout for these serverless endpoints to be 4 minutes and the max is 5 minutes that I'm not sure how it could've hung for 15m 43s

image

@danmindru
Copy link
Member

Thank you for the detailed report!
Investigating the issue and will get back to you asap.

@danmindru
Copy link
Member

Hi @francistogram.
Looking closer at the logs, it seems like the timeout indeed caused the issues here.

What is important to note is this was not a timeout on your function, but a network timeout. The destination URL could not be reached at all (hard to know the reason, can be DNS, CDN or maybe something simpler like a cold start or deployment).
This would explain also why it's not visible in the Vercel logs.

Either way, Crontap waits for a response for up to 1h, which means a long-running request could potentially overlap with future schedules. At the moment

From our logs it seems like the job at 12:55:00 ran until 13:10:54.
Without going much into detail, there is some investigation work attached below.

A potential solution here is to optionally allow customizing the maximum wait for each schedule. Here a ~5min wait before the request is abandoned should have prevented issues.
This could also be set automatically based on the schedule interval, but I wonder if setting that automatically would cause other problems on it's own.

Either way, allowing this to be set manually should be a sensible option, albeit advanced.

Screenshot

@danmindru danmindru added bug Something isn't working planned 🎉 labels Feb 21, 2024
@francistogram
Copy link
Author

The maximum wait time sounds like a reasonable solution to me!

The destination URL could not be reached at all (hard to know the reason, can be DNS, CDN or maybe something simpler like a cold start or deployment)

I'll reach out to Vercel and see if they have any more context on this

Thanks for digging into this issue 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working planned 🎉
Projects
None yet
Development

No branches or pull requests

2 participants