Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all cpld baselines are failing on derecho #2501

Open
DeniseWorthen opened this issue Nov 18, 2024 · 12 comments
Open

all cpld baselines are failing on derecho #2501

DeniseWorthen opened this issue Nov 18, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@DeniseWorthen
Copy link
Collaborator

Description

Running UFS cpld tests from top-develop on derecho are all failing. There is a baseline directory in place (develop-20241112) but Derecho was skipped for the last PR (the WW3 PIO). A note was left here that baselines were created OK, but apparently not.

To Reproduce:

Run top-develop on Derecho

@DeniseWorthen DeniseWorthen added the bug Something isn't working label Nov 18, 2024
@DeniseWorthen
Copy link
Collaborator Author

@jkbk2004 This is still an issue as of today, running top-of-develop against develop-20241121. How were the baselines generated on Derecho?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 3, 2024

@DeniseWorthen Rocoto is at least functional on Derecho. Current baseline of the develop branch is with /glade/derecho/scratch/epicufsrt/ufs-weather-model/RT/NEMSfv3gfs/develop-20241127. Can you test with develop-20241127?

@DeniseWorthen
Copy link
Collaborator Author

I'm running ecflow fine. And the baselines are not comparing.

@DeniseWorthen
Copy link
Collaborator Author

The point of maintaining 2 months worth of baselines is that a developer can check out an older hash and run against it.

I'm testing 144ccb0. That baseline date is 20241121. The baseline exists on Derecho

ls -lrt  /glade/derecho/scratch/epicufsrt/ufs-weather-model/RT//NEMSfv3gfs

....
drwxr-sr-x 142 fandrade  ncar 16384 Oct 21 14:44 develop-20241011
drwxr-sr-x  49 epicufsrt ncar  4096 Nov  8 11:38 input-data-20240501
drwxrwxr-x 145 epicufsrt ncar 16384 Nov 11 12:44 develop-20241031
drwxr-sr-x 144 epicufsrt ncar 16384 Nov 16 17:46 develop-20241112
drwxr-sr-x   3 epicufsrt ncar  4096 Nov 16 22:21 BM_IC-20220207
drwxr-sr-x 144 epicufsrt ncar 16384 Nov 21 10:05 develop-20241119
drwxr-sr-x 146 epicufsrt ncar 16384 Nov 26 14:11 develop-20241121
drwxr-sr-x 144 epicufsrt ncar 16384 Dec  1 07:44 develop-20241127
drwxr-sr-x 144 epicufsrt ncar 16384 Dec  4 06:51 develop-20241203

However, no logs were posted for that commit at UWM:
Screenshot 2024-12-04 at 12 41 48 PM

Why does a baseline exist if it the commit was not run against or created by that commit?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2024

Workflow managers on Derecho wasn't stable for a while. We tried to recover the baselines sporadically. We started maintaining the RT log from last commit.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2024

I have no issue to test with develop branch for 20241127

@DeniseWorthen
Copy link
Collaborator Author

To reiterate, a developer should expect that checking out a hash and running against a baseline will pass. Why is the baseline present if it was not generated by or tested against that hash?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2024

let us know if you have any issue with develop branch

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Dec 4, 2024

No! Why else maintain baselines if a developer cannot run against them. This is a fundamental principle.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2024

The system wasn't stable for a while

@DeniseWorthen
Copy link
Collaborator Author

That makes no sense. Was the hash used to generate the associated baseline? Yes or no.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2024

I reported Derecho baseline is fully recovered ok with full test log from 20241127. System issue before then. I am removing baselines created during the time period with workflow issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants