You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in September, we'd like to have metrics about renewal patterns by user-agent. Questions we'd like to answer are: Which UAs commonly renew at a fixed offset from expiration or issuance, a percentage of life, day of the month/week, and potentially others.
The minimum information there would be:
How many days (or seconds) of lifetime remain in the certificate being renewed, and
How many days (or seconds) of lifetime did the certificate being renewed have at issuance?
Boulder is the only system component that currently has a chance to understand that a given Order is renewing a certificate, so such metrics would need to come from Boulder.
These metrics could be emitted as Prometheus metrics for use. However, that is a complex subject since we want to also have this broken down by User Agent in some way: We can't use UAs directly in Prometheus metrics because of cardinality. A possible option would be to add a loadable regular expression map to the WFE, so we can write regexes to match against UAs and rewrite them as specified in the map (e.g.: {'^lego-cli/': 'lego'}). UA strings that have no matches would be rendered as other.
Trade-offs in mind, I would also propose that we could add this information to the Boulder log when fulfilling an order that appears to be a renewal. Then we can handle user-agent cardinality during log analysis and not have to open that can of worms in Boulder. It's unfortunate that we'd need an extra parsing step to get any data out, but it's possible that UA rewriting might actually hide information we need, like there being some clients that behave differently on different versions, or even significant differences between large integrators, which we likely would not also include as a dimension in the Prometheus version.
Finally, one could imagine doing both the Prometheus-with-UA-rewriting and the logging versions, so we have data both ways.
As additional notes: Since the UA is only known to the WFE, and the state of the to-be-renewed certificate is processed in the RA, data's going to have to move around one way or another to accomplish this. I don't know that it makes any difference to the data processing.
The text was updated successfully, but these errors were encountered:
As discussed in September, we'd like to have metrics about renewal patterns by user-agent. Questions we'd like to answer are: Which UAs commonly renew at a fixed offset from expiration or issuance, a percentage of life, day of the month/week, and potentially others.
The minimum information there would be:
Boulder is the only system component that currently has a chance to understand that a given Order is renewing a certificate, so such metrics would need to come from Boulder.
These metrics could be emitted as Prometheus metrics for use. However, that is a complex subject since we want to also have this broken down by User Agent in some way: We can't use UAs directly in Prometheus metrics because of cardinality. A possible option would be to add a loadable regular expression map to the WFE, so we can write regexes to match against UAs and rewrite them as specified in the map (e.g.:
{'^lego-cli/': 'lego'}
). UA strings that have no matches would be rendered asother
.Trade-offs in mind, I would also propose that we could add this information to the Boulder log when fulfilling an order that appears to be a renewal. Then we can handle user-agent cardinality during log analysis and not have to open that can of worms in Boulder. It's unfortunate that we'd need an extra parsing step to get any data out, but it's possible that UA rewriting might actually hide information we need, like there being some clients that behave differently on different versions, or even significant differences between large integrators, which we likely would not also include as a dimension in the Prometheus version.
Finally, one could imagine doing both the Prometheus-with-UA-rewriting and the logging versions, so we have data both ways.
As additional notes: Since the UA is only known to the WFE, and the state of the to-be-renewed certificate is processed in the RA, data's going to have to move around one way or another to accomplish this. I don't know that it makes any difference to the data processing.
The text was updated successfully, but these errors were encountered: