-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update nightqa multiprocessing and add new options for skipping some steps #2418
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good, thanks.
Please clarify: Are the --skip-dark-steps
and --skip-cal-steps
just convenience options instead of having to figure out which of a long list of --steps
need to be dropped to achieve the same thing? Normally it is a red flag when options appear to be competing with each other. Thanks for documenting that --skip-BLAH-steps
overrides --steps
, but is there anything deeper going on here?
They are convenience options as you describe. The mechanism I used to implement them drops specific steps from the list of I agree with you that conflicts are bad, though in this case the flags are very explicit. If someone defines their own list of Please let me know if you would like further explanation in script logging or comments in the code, or if you just wanted clarification here. |
Overview
This resolves Issues #2401 and #2412. It skips over steps where the output file already exists (except for HTML), unless
--recompute
is given. Previously it raised an error if the file existed and--recompute
wasn't set. It also adds two new options,--skip-cal-steps
and--skip-dark-steps
. If--skip-cal-steps
is set then none of the calibration steps are run, even if provided in--steps="list,of,steps"
. If--skip-dark-steps
is set then none of the steps involving calibration darks are performed, even if specified in--steps="list,of,steps"
. These are both useful when rerunning nightqa after "cleaning up" science tiles on a night in daily operations where the calibrations haven't changed. The morning dark calibration step can be particularly long since it actually runs preproc on the exposure, both of these new commands avoid the need to do that.This PR also changes it so that the HTML file is overwritten, even if
--recompute
is not set. This is a slight departure from normal behavior, but I think it's what we want since we want new steps to be reflected in the output HTML.Finally, this also makes updates to the multiprocessing and preproc steps. For multiprocessing, it uses
with multiprocessing.Pool
rather than defining a variable as the Pool and then dowith pool_variable
. This may not have any practical impacts, but felt more robust. The temporary preproc files are also removed. From testing, this appears to be the cause of the job hanging when multiple desi_night_qa runs are done in succession.Testing
I've run tests in
/global/cfs/cdirs/desi/users/kremin/PRs/nightqa_v29
Logs are in
/global/cfs/cdirs/desi/users/kremin/PRs/nightqa_v29/logs
and nightqa outputs in/global/cfs/cdirs/desi/users/kremin/PRs/nightqa_v29/nightqa
.SUCCESS: Test of normal operation: night 20241111
export SPECPROD=daily; export NIGHT=20241111; desi_night_qa --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}
This ran in ~6.5 minutes, which is typical, and produced all of the outputs.
SUCCESS: Test that skips existing steps if
--recompute
not set: night 20241111export SPECPROD=daily; export NIGHT=20241111; desi_night_qa --nproc=32 -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}
No files were overwritten except for the nightqa.html and nightqa.css, and the code didn't exit with an error either.
SUCCESS: Test of
--recompute
operation: night 20241112Ran
export SPECPROD=daily; export NIGHT=20241112; desi_night_qa --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}
Then reran
export SPECPROD=daily; export NIGHT=20241112; desi_night_qa --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}
All outputs were overwritten as expected. The code took roughly the same amount of time in each circumstance ~7.5 minutes.
SUCCESS: Test of
--skip-cal-test
operation: night 20241113export SPECPROD=daily; export NIGHT=20241113; desi_night_qa --skip-cal-steps --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT} &>> ./logs/nightqa-${SPECPROD}-${NIGHT}.log
Produced the expected outputs and didn't produce the cal outputs.
SUCCESS: Test of
--skip-dark-steps
operation: night 20241113export SPECPROD=daily; export NIGHT=20241113; desi_night_qa --skip-dark-steps --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT} &>> ./logs/nightqa-${SPECPROD}-${NIGHT}_second.log
Reproduced the non-cal outputs, created the non-dark calibration outputs for the first time, and didn't produce the dark outputs.
SUCCESS: Test of
--compute-missing-only
operation: night 20241113export SPECPROD=daily; export NIGHT=20241113; desi_night_qa --compute-missing-only --nproc=32 -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT} &>> ./logs/nightqa-${SPECPROD}-${NIGHT}_third.log
Created the dark outputs, updated the HTML file, and nothing else.