Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardized option for short jobs #546

Open
gpetretto opened this issue Feb 16, 2024 · 5 comments
Open

Standardized option for short jobs #546

gpetretto opened this issue Feb 16, 2024 · 5 comments

Comments

@gpetretto
Copy link
Contributor

gpetretto commented Feb 16, 2024

In several Flows there is the need to define small and short Jobs. In atomate2 there are Jobs that just calculate a supercell or generate perturbations. While these require a minimal computational effort, identifying them and tune their execution may be quite annoying.
For this reason I would be interested in defining a standard way of marking such jobs, so that managers can then automatically optimize their execution. In the case of jobflow-remote there could be an internal local worker that would automatically execute those jobs. Some helper could probably be defined for the fireworks manager as well.
They key point is that it should be possible for the Flow developer to directly select these jobs, instead of being on the shoulders of the user. I am not sure what would be the best way of doing that. I was thinking of a new Job (or JobConfig) attribute like fast, short small that is False by default. So it would be easily marked and easily retrieved by the manager. For example:

@job(small=True)
def sum(a, b):
    return a + b

Any comments or ideas about this feature?

@ml-evs
Copy link
Contributor

ml-evs commented Feb 16, 2024

Funnily enough I had also been implementing this but in jobflow-remote directly (and discussing earlier today with @VicTrqt who was running into similar issues). I've started adding the option of profile or exec_profile as a free text value, e.g.,

@job(profile="analysis")

or

@job(profile="postprocessing")

which can then be used in jobflow-remote config to specify a default worker and exec config for jobs that match the profile. If this could be standardized at the jobflow level it would be super helpful as other managers can also make use of it.

The same issue comes up with e.g., jobs that require (or at least can make use of) GPUs. I don't know whether it is necessary to have a set of "known" profiles or whether this can be handled by convention (either way the user probably has to choose the appropriate resources for a 'small' job)

@gpetretto
Copy link
Contributor Author

Nice! I like the idea, as it allows more flexibility. On the other hand it requires a bit more work on the configuration from the user. However, the instruction for a standard worker that covers these cases can be provided in the documentation.

An additional point that concerns more jobflow-remote is that it could be needed to know if a job can be executed with just the inputs, or if it needs to have access to some files from previous jobs. For example this function in atomate2: https://github.com/materialsproject/atomate2/blob/7f4d5a60d427295dee3a0f6a9b87deb5f47d7f8a/src/atomate2/common/jobs/defect.py#L187 is clearly something that could be executed quickly, but I think that it needs to be executed on the machine where the previous jobs were executed. Not sure if there is any easy way to define or identify these kind of jobs.

In any case, I believe this needs to be implemented directly in jobflow to be effective.

@ml-evs
Copy link
Contributor

ml-evs commented Feb 19, 2024

Files are definitely a big blocker for me too; not sure how to approach this with the current API (have played around a bit with additional stores but it doesn't quite make sense to me). Being able to launch a job from the context of an older job (as resolved by the manager) would be very helpful, as would resolving dependencies on data present in additional stores.

@Andrew-S-Rosen
Copy link
Member

Absolutely amazing idea, and I love the design proposed by @ml-evs. The way I have been getting around this is very hacky with FireWorks...

@Andrew-S-Rosen
Copy link
Member

One caveat here: if you send some jobs to the local compute resource, this would require all runtime dependencies to also be present there (which may not necessarily be the case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants