You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! On large setups, there's an issue that when Redis is heavily loaded with data, SCAN doesn't perform well and only adds more load to Redis. To address this, there's check-single-keys, but it has its own problem. It sets the metric to 0 if the key is not found. For example, if I have 10 clusters managed by Sentinel, each with three nodes, and a total of 2000 queues across all of them. It's not known in advance which nodes will have which queues and which will be the masters today, so we monitor everything. This results in 10 * 5 * 2000 = 100,000 metrics instead of 2000 with SCAN. This leads to a significant increase in cardinality and a plethora of useless metrics. This overloads Grafana too. Is it possible to add an option for check-single-keys that allows skipping the key (LIST or STREAM) if it's not found, behaving like SCAN?
It would also be good to consider timeouts for scan. I mean terminating the scan if a certain timeout is reached. Otherwise, it turns out that for every Prometheus request, we have a bunch of hanging scans.
The text was updated successfully, but these errors were encountered:
This leads to a significant increase in cardinality and a plethora of useless metrics
That's true and I isn't really desirable. I vaguely remember a past conversation about this but I think it violates the Prometheus best-practices. I think it's preferred to not submit a metric at all rather than zero if no data was found.
This is a breaking change though so we'd have to be a bit careful about communicating this.
It would also be good to consider timeouts for scan
That totally makes sense. That could be a good thing to add regardless of the issue above to make the exporter more robust. Let me know if you want to look into sumitting a PR for that, otherwise we can keep it open if someone else is interested in contributing.
@oliver006 Hello! Sorry for the delayed response. I won't be able to help with context as I'm not very familiar with Go. I wanted to suggest adding an option like skip-null-value to drop queues that the exporter doesn't find. By the way, I tried to work around the issues via SD. Your exporter supports passing values via HTTP as a target, but the problem is that the parameter is called check-single-keys. The issue is that Prometheus doesn't support hyphens. Can you add these options with underscores instead?
Hello! On large setups, there's an issue that when Redis is heavily loaded with data, SCAN doesn't perform well and only adds more load to Redis. To address this, there's check-single-keys, but it has its own problem. It sets the metric to 0 if the key is not found. For example, if I have 10 clusters managed by Sentinel, each with three nodes, and a total of 2000 queues across all of them. It's not known in advance which nodes will have which queues and which will be the masters today, so we monitor everything. This results in 10 * 5 * 2000 = 100,000 metrics instead of 2000 with SCAN. This leads to a significant increase in cardinality and a plethora of useless metrics. This overloads Grafana too. Is it possible to add an option for check-single-keys that allows skipping the key (LIST or STREAM) if it's not found, behaving like SCAN?
It would also be good to consider timeouts for scan. I mean terminating the scan if a certain timeout is reached. Otherwise, it turns out that for every Prometheus request, we have a bunch of hanging scans.
The text was updated successfully, but these errors were encountered: