Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renew SSL certificate for updates.jenkins-ci.org #3500

Closed
MarkEWaite opened this issue Apr 8, 2023 · 12 comments
Closed

Renew SSL certificate for updates.jenkins-ci.org #3500

MarkEWaite opened this issue Apr 8, 2023 · 12 comments

Comments

@MarkEWaite
Copy link

Service(s)

Update center

Summary

The Jenkins update center is available from two different URLs:

The SSL certificate on the first URL https://updates.jenkins.io expires May 9, 2023. That is 31 days from now. If we renew it within the next week, that will be a comfortable margin for renewal

The SSL certificate on the second URL https://updates.jenkins-ci.org expires May 1, 2023. That is 23 days from now. If we renew it within the next week, that will be a comfortable margin for renewal

Since they are the same machine, it is a little surprising that the SSL certificates expire at different times

Reproduction steps

  1. Open a web browser to https://updates.jenkins.io and check the SSL certificate expiration date. My browser reports it as 9 May 2023
  2. Open a web browser to https://updates.jenkins-ci.org and check the SSL certificate expiration date. My browser reports it as 1 May 2023
@MarkEWaite MarkEWaite added the triage Incoming issues that need review label Apr 8, 2023
@github-actions
Copy link

github-actions bot commented Apr 8, 2023

Take a look at these similar issues to see if there isn't already a response to your problem:

  1. 73% Valid ssl certificate for cert.ci.jenkins.io #3337
  2. 70% Valid ssl certificate for trusted.ci.jenkins.io #3091

@dduportal
Copy link
Contributor

Let's check again the Monday 10 of April: the certbot renew command is run once a day, at 06:00am UTC, and is expected to renew certificates 1 month before expiration

@MarkEWaite
Copy link
Author

Worth another check Tuesday 11 April 2023. since that will be 29 days before the expiration of the updates.jenkins.io certificate.

As of 22:40 UTC 10 April 2023 the certificate is not renewed.

@dduportal dduportal added this to the infra-team-sync-2023-04-11 milestone Apr 11, 2023
@dduportal dduportal removed the triage Incoming issues that need review label Apr 11, 2023
@dduportal
Copy link
Contributor

Operations done with the help of @smerle33 earlier today, on the VM pkg.origin.jenkins.io (which hosts the 2 updates.jenkins* services):

  • We checked the letsencrypt installation: both certificates were present in /etc/letsencrypt/live, as symlinks. This is expected.
  • The crontab is present and installed (command certbot renew -q) as expected (per https://github.com/jenkins-infra/jenkins-infra/blob/b92ba6e7d1e99c9c157af1a2a77a12514be45072/hieradata/common.yaml#L89-L91)
  • We tried a certbot renew --dry-run command to ensure that the toolchain is present and valid (certbot CLI, the certbot plugins including apache and dns-azure, the expected python installation, etc.) => the dry run reported success (e.g. all certificates were renewed but not persisted using the Letsencrypt staging area).
    • ⚠️ As reminder by @smerle33 , the --dry-run flag does NOT check if a certificate should be renewed (1 month before epxiratiojn) or not: it runs on all certificates wether or not they have to be renewed
  • Since the crontab runs certbot runs in quiet mode, we are blind.
  • The LetsEncrypt log in /var/log/letsencrypt/letsencrypt.log was already overridin by our dry run test
  • The quiet mode removes most of the visible errors when it's caused bvy an external problem
  • So we ran thecertbot renew command manually on the VM:
    • The certificates for pkg.origin.jenkins* weren't renewed since they already were the 17th March 2023. It's expected (not in the "1 month before expiration" time window) ✅
    • Both updates.jenkins.io and updates.jenkins-ci.org certificates were renewed with the same new expiration date: 10th July 2023 ✅

@dduportal
Copy link
Contributor

dduportal commented Apr 11, 2023

Next steps before closing this issue:

  • Enable the "logs renwal to syslog" we found today (ref. https://forge.puppet.com/modules/puppet/letsencrypt/reference#cron_output) on all the Puppet managed letsencrypt
    • Rationale: that shoud remove the -q flag and would keep history of the renwal logs, making us less blind when a renew fails
  • Add a jenkins-infra-team calendar event to check the next renewal, the 11/12 June 2023
    • Rationale: we are not sure why it failed. Most probably due to the numerous python/certbot config change. Let's check this next trimester to avoid being caught off-guard

@smerle33
Copy link
Contributor

event in place for the 12th of june (monday)

@dduportal
Copy link
Contributor

dduportal commented Apr 11, 2023

@dduportal
Copy link
Contributor

  • Cleaned up the following machines (which regenerated some certificates): cert.ci.jenkins.io, trusted.ci.jenkins.io, ci.jenkins.io and usage.jenkins.io with the following process:
cp -r /etc/letsencrypt /root/bkp-letsencrypt-20230411
apt-get remove --purge certbot
cp -r /root/bkp-letsencrypt-20230411 /etc/letsencrypt
certbot renew
systemctl reload apache2
systemctl restart apache2
  • Also fully removed any letsencrypt and certbot config and package on lettuce.jenkins.io.

@dduportal
Copy link
Contributor

Reopening as the certificate has not renewed, again.

With @smerle33 we diagnosed the following elements:

So we hacked a bit the crontab on the pkg VM:

  • Disabed puppet agent
  • Edited the root crontab with crontab -e to write the output to a log file, get the stderr along stdout outputs, remove the quiet and run it on a time that fit our analysis (21 16 * * * crontab renew >/var/log/certbot-debug.log 2>&1)

=> it surfaced in the following error error: unknown command "renew", see 'snap help'.
=> trying a 28 16 * * * which certbot >/var/log/certbot-debug.log 2>&1 showed that /usr/bin/certbot is used.

We relaized that, on some machines, the /usr/bin/certbot file exists and is a symlink to /usr/bin/snap explaining the error. Most probably a leftover from my tentatives to use snap package for certbot :'(

The symlink was removed from the following machines yesterday, and we'll wait today to see if the certificates are now renewed:

  • pkg (update center)
  • usage
  • archives

@dduportal
Copy link
Contributor

Checking the renewal today on pkg VM: no renewal.

After another crontab hacking, 34 10 * * * certbot plugins --text > /var/log/certbot-renew.log 2>&1, the following error was surfaced: /bin/sh: 1: certbot: not found

=> we should check that /usr/local/bin is part of the crontab's PATH
=> and/or we should stop rely on the Puppet module and manage the crontab item ourselve to benefit from:

  • Controlling the used binary (using the full path)
  • controlling the command arguments (no quiet mode)
  • controlling the logs (to surface this errors without requiring hacking)

@dduportal
Copy link
Contributor

New crontab applied, let's wait for tomorrow to check for certificates renewal

@dduportal
Copy link
Contributor

Confirmed that the certificates were renewed successfully and automatically earlier today:

➜  SERVER_NAME=updates.jenkins.io; PORT=443; echo -n Q | openssl s_client -servername {SERVER_NAME} -connect {SERVER_NAME}:{PORT} | openssl x509 -noout -dates
getaddrinfo: nodename nor servname provided, or not known
connect:errno=22
unable to load certificate
8351849984:error:09FFF06C:PEM routines:CRYPTO_internal:no start line:/AppleInternal/Library/BuildRoots/ff32e6fb-db00-11ed-a068-428477786501/Library/Caches/com.apple.xbs/Sources/libressl/libressl-3.3/crypto/pem/pem_lib.c:694:Expecting: TRUSTED CERTIFICATE

➜ SERVER_NAME=pkg.origin.jenkins.io; echo -n Q | openssl s_client -servername ${SERVER_NAME} -connect ${SERVER_NAME}:443 | openssl x509 -noout -dates
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = pkg.origin.jenkins.io
verify return:1
DONE
notBefore=Jun 22 05:05:18 2023 GMT
notAfter=Sep 20 05:05:17 2023 GMT

➜ SERVER_NAME=updates.jenkins-ci.org; echo -n Q | openssl s_client -servername ${SERVER_NAME} -connect ${SERVER_NAME}:443 | openssl x509 -noout -dates
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = updates.jenkins-ci.org
verify return:1
DONE
notBefore=Jun 22 05:05:40 2023 GMT
notAfter=Sep 20 05:05:39 2023 GMT

Please note that ci.jenkins.io certificates were also renewed (same reasons, same blockage, same fix):

➜ SERVER_NAME=ci.jenkins.io; echo -n Q | openssl s_client -servername ${SERVER_NAME} -connect ${SERVER_NAME}:443 | openssl x509 -noout -dates
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = ci.jenkins.io
verify return:1
DONE
notBefore=Jun 22 05:00:45 2023 GMT
notAfter=Sep 20 05:00:44 2023 GMT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants