Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where to find inputs of a job #663

Open
zhubonan opened this issue Aug 6, 2024 · 5 comments
Open

where to find inputs of a job #663

zhubonan opened this issue Aug 6, 2024 · 5 comments

Comments

@zhubonan
Copy link

zhubonan commented Aug 6, 2024

I am wondering where the inputs of a job gets stored. It seems that the job store only stores the output of a job in which is also what JobStoreDocument suggests.
When I use jobflow-remote, the inputs are stored in the collection of the latter, though. But it does not seem to be a collection aimed for long term storage (or I could be wrong).

(sorry I accidentally hit Ctrl+Enter while tying previously)

@zhubonan zhubonan changed the title Inputs of a job where to find inputs of a job Aug 6, 2024
@utf
Copy link
Member

utf commented Aug 6, 2024

Currently, the inputs aren't stored if using run_locally. There is a WIP PR that aimed to store the input references only (not the associated args/kwargs that these inputs correspond to). See #425.

The jobflow-remote storage should be seen as robust however. I can see an argument for storing the job inputs fully although this will result in a lot of duplicated data if using Fireworks or Jobflow-remote managers.

@zhubonan
Copy link
Author

zhubonan commented Aug 6, 2024

Hmm, I see. The potential use case I have in mind is to reconstruct the flow after it is run, which can be handy for reproducibility, and practically to make it easier to rerun a flow with minimum changes without the original submission script.

Is it something that is already possible with jobflow-remote?

@utf
Copy link
Member

utf commented Aug 6, 2024

Yes, I believe that is possible using jobflow remote. Although perhaps @gpetretto could comment?

@gpetretto
Copy link
Contributor

I am not entirely sure if I undestood correctly the use case you have in mind.
If you just want to see the inputs used for each Job these are indeed stored in jobflow-remote and you can see the connections between the jobs.
Once a Flow is submitted (and executed) it would also be possible to modify some inputs and rerun the same job. However, if instead you would like to generate a new flow with modified inputs and submit it, this is not possible at the moment. It should be possible to reconstruct the inputs and connections from the data in the DB to generate a new Flow object but I see some potential issues:

  • for each job the in the DB the inputs are available. But it might not be possible to distinguish between jobs that are there from the beginning and those that have been dynamically generated.
  • It will not be possible to get back the Flow Maker, so it will only be possible to modify the Flow object.

I think it might be interesting to add such a functionality to reconstruct the initial Flow object, but I am wondering if we should consider optionally storing the whole Flow input in a separate collection.

@zhubonan
Copy link
Author

zhubonan commented Aug 8, 2024

I see. Thanks for the detailed answer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants