Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ra: unittest failure on main #7812

Open
jsha opened this issue Nov 14, 2024 · 7 comments · Fixed by #7813 · May be fixed by #7868
Open

ra: unittest failure on main #7812

jsha opened this issue Nov 14, 2024 · 7 comments · Fixed by #7813 · May be fixed by #7868
Assignees

Comments

@jsha
Copy link
Contributor

jsha commented Nov 14, 2024

We just got this CI failure on main: https://github.com/letsencrypt/boulder/actions/runs/11846470871/job/33014173197

--- FAIL: TestPerformValidation_FailedThenSuccessfulValidationResetsPauseIdentifiersRatelimit (0.29s)
    ra_test.go:1154: err was unexpectedly nil and should not have been
FAIL
@jsha
Copy link
Contributor Author

jsha commented Nov 15, 2024

Reopening. I'm still seeing this test flake pretty frequently.

@jsha
Copy link
Contributor Author

jsha commented Nov 15, 2024

In #7820 I'm proposing to delete the test for now, and use this issue to track reinstating it once we figure out the flakiness.

@aarongable aarongable added this to the Sprint 2024-12-03 milestone Dec 3, 2024
@aarongable aarongable assigned aarongable and unassigned jsha Dec 3, 2024
@aarongable
Copy link
Contributor

This may have been fixed by #7824, but I'll run it a thousand times to make sure.

@aarongable
Copy link
Contributor

I've run this test, and the other surrounding TestPerformValidation_ unit tests, 1k times and reproduced zero failures.

@aarongable
Copy link
Contributor

And yet it just showed up in CI again!

https://github.com/letsencrypt/boulder/actions/runs/12149110852/job/33879085426?pr=7859

--- FAIL: TestPerformValidation_FailedThenSuccessfulValidationResetsPauseIdentifiersRatelimit (0.22s)
    ra_test.go:1090: err was unexpectedly nil and should not have been
FAIL
FAIL	github.com/letsencrypt/boulder/ra	9.776s

@aarongable aarongable reopened this Dec 3, 2024
@aarongable
Copy link
Contributor

Okay, at least I know why I was unable to reproduce the failure: In order to run it a thousand times, I ran it using docker compose exec boulder go test -count 1000 -run TestPerformValidation ./ra. But that doesn't set BOULDER_CONFIG_DIR in the environment, so this test was getting skipped every time!

boulder/ra/ra_test.go

Lines 994 to 996 in bac5602

if ra.limiter == nil {
t.Skip("no redis limiter configured")
}

boulder/ra/ra_test.go

Lines 356 to 358 in bac5602

var limiter *ratelimits.Limiter
var txnBuilder *ratelimits.TransactionBuilder
if strings.Contains(os.Getenv("BOULDER_CONFIG_DIR"), "test/config-next") {

@aarongable
Copy link
Contributor

This test has a time.Sleep() in it. This is almost certainly the cause of the flake.

boulder/ra/ra_test.go

Lines 1056 to 1057 in bac5602

// Sleep so the RA has a chance to write to the SA
time.Sleep(100 * time.Millisecond)

@aarongable aarongable linked a pull request Dec 3, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants