Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nightqa multiprocessing and add new options for skipping some steps #2418

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

akremin
Copy link
Member

@akremin akremin commented Nov 25, 2024

Overview

This resolves Issues #2401 and #2412. It skips over steps where the output file already exists (except for HTML), unless --recompute is given. Previously it raised an error if the file existed and --recompute wasn't set. It also adds two new options, --skip-cal-steps and --skip-dark-steps. If --skip-cal-steps is set then none of the calibration steps are run, even if provided in --steps="list,of,steps". If --skip-dark-steps is set then none of the steps involving calibration darks are performed, even if specified in --steps="list,of,steps". These are both useful when rerunning nightqa after "cleaning up" science tiles on a night in daily operations where the calibrations haven't changed. The morning dark calibration step can be particularly long since it actually runs preproc on the exposure, both of these new commands avoid the need to do that.

This PR also changes it so that the HTML file is overwritten, even if --recompute is not set. This is a slight departure from normal behavior, but I think it's what we want since we want new steps to be reflected in the output HTML.

Finally, this also makes updates to the multiprocessing and preproc steps. For multiprocessing, it uses with multiprocessing.Pool rather than defining a variable as the Pool and then do with pool_variable. This may not have any practical impacts, but felt more robust. The temporary preproc files are also removed. From testing, this appears to be the cause of the job hanging when multiple desi_night_qa runs are done in succession.

Testing

I've run tests in /global/cfs/cdirs/desi/users/kremin/PRs/nightqa_v29
Logs are in /global/cfs/cdirs/desi/users/kremin/PRs/nightqa_v29/logs and nightqa outputs in /global/cfs/cdirs/desi/users/kremin/PRs/nightqa_v29/nightqa.

SUCCESS: Test of normal operation: night 20241111

export SPECPROD=daily; export NIGHT=20241111; desi_night_qa --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}

This ran in ~6.5 minutes, which is typical, and produced all of the outputs.

SUCCESS: Test that skips existing steps if --recompute not set: night 20241111

export SPECPROD=daily; export NIGHT=20241111; desi_night_qa --nproc=32 -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}

No files were overwritten except for the nightqa.html and nightqa.css, and the code didn't exit with an error either.

SUCCESS: Test of --recompute operation: night 20241112

Ran
export SPECPROD=daily; export NIGHT=20241112; desi_night_qa --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}
Then reran
export SPECPROD=daily; export NIGHT=20241112; desi_night_qa --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}

All outputs were overwritten as expected. The code took roughly the same amount of time in each circumstance ~7.5 minutes.

SUCCESS: Test of --skip-cal-test operation: night 20241113

export SPECPROD=daily; export NIGHT=20241113; desi_night_qa --skip-cal-steps --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT} &>> ./logs/nightqa-${SPECPROD}-${NIGHT}.log

Produced the expected outputs and didn't produce the cal outputs.

SUCCESS: Test of --skip-dark-steps operation: night 20241113

export SPECPROD=daily; export NIGHT=20241113; desi_night_qa --skip-dark-steps --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT} &>> ./logs/nightqa-${SPECPROD}-${NIGHT}_second.log

Reproduced the non-cal outputs, created the non-dark calibration outputs for the first time, and didn't produce the dark outputs.

SUCCESS: Test of --compute-missing-only operation: night 20241113

export SPECPROD=daily; export NIGHT=20241113; desi_night_qa --compute-missing-only --nproc=32 -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT} &>> ./logs/nightqa-${SPECPROD}-${NIGHT}_third.log

Created the dark outputs, updated the HTML file, and nothing else.

@akremin akremin requested a review from sbailey November 25, 2024 21:45
@coveralls
Copy link

Coverage Status

coverage: 30.144% (-0.001%) from 30.145%
when pulling 9b6cdb0 on nightqa_v29
into 08d4acd on main.

@coveralls
Copy link

Coverage Status

coverage: 30.138% (-0.007%) from 30.145%
when pulling 9b6cdb0 on nightqa_v29
into 08d4acd on main.

Copy link
Contributor

@sbailey sbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, thanks.

Please clarify: Are the --skip-dark-steps and --skip-cal-steps just convenience options instead of having to figure out which of a long list of --steps need to be dropped to achieve the same thing? Normally it is a red flag when options appear to be competing with each other. Thanks for documenting that --skip-BLAH-steps overrides --steps, but is there anything deeper going on here?

@akremin
Copy link
Member Author

akremin commented Nov 27, 2024

Please clarify: Are the --skip-dark-steps and --skip-cal-steps just convenience options instead of having to figure out which of a long list of --steps need to be dropped to achieve the same thing? Normally it is a red flag when options appear to be competing with each other. Thanks for documenting that --skip-BLAH-steps overrides --steps, but is there anything deeper going on here?

They are convenience options as you describe. The mechanism I used to implement them drops specific steps from the list of --steps. This saves the user from having to remember what all the steps are and also saves them from having to type out a long string of text into the arguments, since the default for --steps is a list of all steps.

I agree with you that conflicts are bad, though in this case the flags are very explicit. If someone defines their own list of --steps that includes e.g. morningdark but then also gives --skip-dark-steps, I would assume they copy-pasted a steps string from elsewhere and don't actually want to process the morning dark. That is why I chose to resolve the conflicts this way.

Please let me know if you would like further explanation in script logging or comments in the code, or if you just wanted clarification here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants