Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit lifetimeRemaining metrics for each renewal, by UA #7792

Open
jcjones opened this issue Nov 7, 2024 · 0 comments
Open

Emit lifetimeRemaining metrics for each renewal, by UA #7792

jcjones opened this issue Nov 7, 2024 · 0 comments
Assignees

Comments

@jcjones
Copy link
Contributor

jcjones commented Nov 7, 2024

As discussed in September, we'd like to have metrics about renewal patterns by user-agent. Questions we'd like to answer are: Which UAs commonly renew at a fixed offset from expiration or issuance, a percentage of life, day of the month/week, and potentially others.

The minimum information there would be:

  1. How many days (or seconds) of lifetime remain in the certificate being renewed, and
  2. How many days (or seconds) of lifetime did the certificate being renewed have at issuance?

Boulder is the only system component that currently has a chance to understand that a given Order is renewing a certificate, so such metrics would need to come from Boulder.

These metrics could be emitted as Prometheus metrics for use. However, that is a complex subject since we want to also have this broken down by User Agent in some way: We can't use UAs directly in Prometheus metrics because of cardinality. A possible option would be to add a loadable regular expression map to the WFE, so we can write regexes to match against UAs and rewrite them as specified in the map (e.g.: {'^lego-cli/': 'lego'}). UA strings that have no matches would be rendered as other.

Trade-offs in mind, I would also propose that we could add this information to the Boulder log when fulfilling an order that appears to be a renewal. Then we can handle user-agent cardinality during log analysis and not have to open that can of worms in Boulder. It's unfortunate that we'd need an extra parsing step to get any data out, but it's possible that UA rewriting might actually hide information we need, like there being some clients that behave differently on different versions, or even significant differences between large integrators, which we likely would not also include as a dimension in the Prometheus version.

Finally, one could imagine doing both the Prometheus-with-UA-rewriting and the logging versions, so we have data both ways.

As additional notes: Since the UA is only known to the WFE, and the state of the to-be-renewed certificate is processed in the RA, data's going to have to move around one way or another to accomplish this. I don't know that it makes any difference to the data processing.

@jsha jsha self-assigned this Nov 8, 2024
@aarongable aarongable added this to the Sprint 2024-11-05 milestone Nov 8, 2024
@jsha jsha modified the milestones: Sprint 2024-11-05, 2024-11-12 Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants