-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sum bytes_downloaded and bytes_uploaded for duplicate common-names #15
Comments
From my point of view, aggregating the metrics is not a valid solution for the issue you are encountering. Dropping duplicate clients also does not seem to be a good solution, as it's not obvious why for a given moment the amount of connected clients does deviate from the amount of client metrics that are provided - i.e. it is just simply hiding a possible underlying issue If you are allowing more than one connection for the same common name, I suggest to disable per client metrics by starting the collector with From prior discussion:
Given the background and further information - what would be your use-case for aggregating or dropping metrics? |
@patrickjahns We are a commercial VPN provider team and we put a limit on our customers' accounts by the amount of bandwidth they use. say for example a client has 100GB of traffic, and we track their bandwidth using But we also provide other protocols among OpenVPN, and they all report their bandwidth with Prometheus. So it's better that we also switch from using But the problem is here that clients can connect using multiple devices to the same server, and we want to measure the used bandwidth for all of those connections, not only the first one (and definitely we do not a panic error in the duplicate CN situation). |
I think the best approach possible would be adding the port number to the label set and disabling it by default, enabling them only with an extra flag (something like Because we use VictoriaMetrics instead of the original Prometheus for handling metrics, and VictoriaMetrics is pretty good at handling data with extremely high cardinality - and I'm pretty sure that even in small scales, cardinality wouldn't be a big problem. take, for example, 100 clients that all of them used all of the ports somewhere in history. it would be Edit: I partially take that back, looks like 6M is too high for Prometheus and the safe boundary is 100K, even though VictoriaMetrics can handle much more than this amount. |
Thanks for sharing the information 👍 - #16 addresses the panic for the time being.
This sounds reasonable to me - another thought that crossed my mind, we simply add a suffix to the client names - i.e. Thanks for sharing the information on VictoriaMetrics -wasn't aware that there are good solutions for high cardinality labels 👍 |
There is not just VictoriaMetrics, there are also two other competitors for using with high-cardinality data (InfluxDB and TimescaleDB), but VM seems to beat them in performance. And for the label problem, I'm thinking of another solution right now. It may be possible to have a global dictionary, matching tuples of (common_name, local_port) to a counter and using that counter in another label. For example, representation in python:
Notice how I'm using The only problem here is that the counter would become a big number, and if it gets more than 65535 for each P.S. I'm coming from Python world and I don't know what is the equivalent of dictionary or tuple in Go😅 |
Thanks for sharing the link about the timeseries databases - will give it a read. Interesting idea of yours.
That's a typical caching and cache invalidation problem - let me think about it a bit more. |
A better way to give small numbers to connection would be having something like a number pool. What I mean is that on each new connection, start an infinite loop with This method will ensure the numbers remain unique for each connection's entire lifespan, and after their disconnection, that number will be usable again for the next connection.
I'm doing a kind of brain-storm right now, but of course, just do whatever you think fits this situation the best :) (And btw, thanks for your fast response and action, it's kind of hard to see this in GitHub nowadays) |
I like the number pool idea better as it will give way lover cardinality than using the ports, where each user can teoretically have 65k metrics. Using numbers pool, the high limit is the same as with the ports, but in reality, it probably wont be higher than 5-10? |
When multiple clients are connected to the server having the same common-name, The collector will panic.
The behaviour should be changed so that the bandwidth gets summed and connected_since field becomes some aggregation of all data (like the minimum connection date, or even drop the metric..?)
The text was updated successfully, but these errors were encountered: