Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should step size conversions result in a warning? #356

Open
brharrington opened this issue May 6, 2016 · 4 comments
Open

should step size conversions result in a warning? #356

brharrington opened this issue May 6, 2016 · 4 comments

Comments

@brharrington
Copy link
Contributor

Currently if a user enters an invalid step size it will get silently converted to the next valid step. Should this result in a warning?

The side question here is usage of step in general. It is generally considered deprecated for direct use by the user.

@brharrington brharrington added this to the 1.6.0 milestone May 6, 2016
@briangann
Copy link

Being able to specify the step size is useful when the metrics being stored are at different intervals than the step-size of Atlas. It's useful for Grafana integration also :)

For example: Running a synthetic transaction every 2 minutes results in metrics being ingested every 2 minutes. When you look at the data at the 1 second interval, it looks like "holes", but at 2 minutes the results are correct. Any holes are "true" misses.

I was surprised when my queries to Atlas auto-jumped from 1s to 5s when I specified a step-size of 2s, and ended up modifying the code to allow 1s,2s,3s,4s,5s (and a few more)

Thanks!

Brian

@brharrington
Copy link
Contributor Author

brharrington commented Dec 14, 2016

Thanks for the comment. How does Grafana use that setting?

Being able to specify the step size is useful when the metrics being stored are at different intervals than the step-size of Atlas.

We typically avoid doing that. One example is cloudwatch S3 metrics that we import which get updated once a day. We report them into Atlas at minute level which has a number of benefits:

  • Visually it is easier for a user to see what is going on, they see the stair step pattern clearly.
  • The user doesn't need to worry about the step size for the data. For correctly comparing with other signals, aggregation, etc it needs to be understood and in the past this was a source of a lot of confusion and mistakes.
  • We want the gaps to be no data being available not gaps due to reporting interval. The difference between measured values and nothing reporting is often quite important.

There is a bit of overhead with this, but for us it hasn't come up much and it will get compressed to a constant block in storage so the overhead isn't that high. For use-cases where we do need different step sizes we run those as separate stacks, we don't mix them in the same instance.

I was surprised when my queries to Atlas auto-jumped from 1s to 5s when I specified a step-size of 2s, and ended up modifying the code to allow 1s,2s,3s,4s,5s (and a few more)

I haven't looked at the auto-selection in a while. In general they were selected to be evenly divisible to common time units. For example, we wouldn't want 7 because if I have 1m blocks it would cross the boundaries for a consolidated data point. We also reduced the number of available options to improve caching behavior.

We could probably make it configurable.

@briangann
Copy link

How does Grafana use that setting?

When you expand or decrease the time range of the metric you are viewing the Grafana datasource plugin for Atlas adjusts the step size. It doesn't "have to" do that, but that's how it was implemented.

What I see for the step-size question is this scenario:

I have a metric collection script (a synthetic transaction really) that can take more than a minute to execute, but always less than 2 minutes. This script is scheduled in Sensu to run every two minutes, any timeouts are "nulls" in Atlas.

I then have a check that queries Atlas (via Sensu) for the metric value, with a step size of 2 minutes, and alerts if there is a null, or if the metric exceeds a threshold value. This check is also run every 2 minutes.

The cloudwatch example I can understand - that's a bulk load of historical "minute stepped" data, but in my case it's always "2 minute stepped" data. It's nice to be able to set my step size to 2minutes can get back a clean series of data, and if there are nulls, they are always timeouts.

Thanks for all the hard work on Atlas, it's working great for us :)

@svachalek
Copy link
Contributor

Seems sensible to warn about, if it's specified explicitly. AFAIK our UIs don't add step= unless a user specifies it, which is pretty rare. Most of the time, it's trying to get a step size smaller than the minimum dictated by the time interval so it's probably best to be straightforward that it isn't going to work. The UI could also warn more directly but currently I don't think there's a sound way for the UI to know the minimum step size for a given interval, plus there are still plenty of queries produced manually.

@brharrington brharrington modified the milestones: 1.6.0, 1.7.0 Jun 21, 2018
@brharrington brharrington modified the milestones: 1.7.0, 1.8.0 Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants