-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"sliding window" thresholds #2379
Comments
This is somewhat of a duplicate of #1136, but it's much better explained (:blush:) and the other issue has become more of a catch-all that just collects various semi-related threshold improvement ideas, so I'll leave both open for now... Implementing this efficiently will be quite complicated though. Sliding time windows are probably easy and efficient to implement for For now, as a workaround in some situations, you can approach the problem from the opposite direction... Instead of setting thresholds for time windows, you can set the thresholds for specific tags (sub-metrics) and use the recently introduced ability to manually set VU-wide custom metric tags through the |
@na-- maybe we can use https://pkg.go.dev/github.com/RussellLuo/slidingwindow#section-readme for example? |
I am not sure this specific library could actually be used to calculate the sliding window thresholds for a We are currently in the midst of some pretty big threshold refactoring (see #2356 and the connected issues, cc @oleiade), as the first step towards better thresholds. The problem is, we are still not sure about what steps 2, 3 and so on look like yet. We just know that there are plenty of deficiencies with the current thresholds, both in their capabilities and in their syntax, but we don't know exactly what the end goal looks like yet. For example, the syntax v2 might be PromQL-like, it might be something like what you propose (though Somewhat connected to the above, we are also in the middle of refactoring how we handle metrics and metric samples. Recently we introduced a metrics registry (#1832) and likely upcoming changes include the tracking of distinct time series (#1831), user control of which metrics and sub-metrics k6 actually emits (#1321), and refactoring in how we store metrics in-memory, likely including transitioning to something like HDR histograms (#763) for Finally, thresholds in All of these things might introduce different tradeoffs and affect how we implement "sliding window" thresholds, and vice-versa. So, it's currently difficult to gauge if any one-off changes like the one you propose in this issue will be in the direction we want to go or in some different direction that ties our hands... 😞 |
I would do this with a custom metric. The other I can think is to give the functionality to restart custom metrics and keep the threshold against that metric. |
Feature Description
The current implementation of the threshold mechanism works only for absolute values.
For example:
When I set "autostop" for 5% errors, it means that I need to collect 5% of the errors from the whole run.
But usually, degradations happen when RPS became really high.
And if you get to this amount of RPS step by step, you need to wait for some time and it will be for example
100% of errors for the last 1 minute, which results in 10% of errors for the whole run.
As a numeric example:
we run the following configuration:
10 RPS for 1 minute - it will cause in a total of 60 requests - everything is 200 -ok
20 RPS for 1 minute - it will cause in a total of 120 requests - everything is 200 -ok
30 RPS for 1 minute - it will cause in a total of 180 requests - everything is 200 -ok
50 RPS for 1 minute - it will cause in a total of 300 requests - everything is 200 -ok
60 RPS for 1 minute - it will cause in a total of 360 requests - and here became crash on last 10 sec - so 300 RPS ok and 60 error
So finally we have
60+120+180+300+300 = 960 - "200 ok"
and 60 - "500 fails"
This 60 will be only about 5.8% of the total.
But for the last 10 seconds, it will be a 100% error rate.
Suggested Solution (optional)
My suggestion is to add "sliding windows" for thresholds.
For example, I could be interested in "error rate" only for the last 1 minute or even 10 seconds.
Something like:
Already existing or connected issues / PRs (optional)
No response
The text was updated successfully, but these errors were encountered: