Insufficient information stored to recover parent-child relationship between jobs from their `JobStore` output docs? #374

mkhorton · 2023-07-20T20:27:43Z

Please advise if I've mis-interpreted the code/docs.

Assume:

Storing Job outputs via JobStore.
Not using a workflow manager, i.e. using jobflow directly.

For a given document in the JobStore, I see uuid, I also see hosts (which can be used to see that a given Job belongs to the same Flow), however, that I can see, there is no way to see the dependency relationship between two or more Job output documents, is this correct?

If correct, is this intended usage? What would be a minimal way to retain this information, without adding a dependency on a specific workflow manager?

The text was updated successfully, but these errors were encountered:

utf · 2023-07-27T08:14:33Z

Hi @mkhorton, this is something I've spoken to @gpetretto and @davidwaroquiers about. I believe the only other information you need to resolve the job dependencies are the OutputReferences in the job inputs. These are available through the job.input_references property.

The simplest way to enable this would be:

At the beginning of the job.run function, copy the output of job.input_references. The reason why we have to copy them at the beginning is that the job.resolve_args function resolves the references in place. So at the end of the function the original input references are not available.
Add a new field field "input_references" to the data stored at the end of job.run`. E.g., here:

jobflow/src/jobflow/core/job.py

Line 599 in 53f0c76

"uuid": self.uuid,

You should then be able to construct the entire flow (including nested flows) and the dependencies between jobs. The only information that will be missing is the names of the Flows (the names of the jobs are fine). The reason is that we don't store flows in the database directly.

mkhorton · 2023-08-03T19:38:25Z

Thanks for the reply @utf, good to know I wasn't missing anything obvious.

I'll see if I can make a PR to add this, unless @gpetretto or @davidwaroquiers are already working on it? If it'd be welcome, I'd quite like to add a pydantic.BaseModel to describe the JobStore document format too.

utf · 2023-08-14T08:23:49Z

A PR would be very welcome. And yes, agreed that we should have a document model for the job store document.

davidwaroquiers · 2023-08-14T09:52:36Z

It would indeed be very useful to be able to "reconstruct" the Flow(s) after they have run (or while they are running) in order to visualize them. We've indeed already discussed about this but haven't started working on this. This issue also falls within a set of other features that would be nice to have and are somewhat interconnected. I would maybe like to raise the idea to have a meeting with the most active developers/contributors in order to list out and somehow plan for the short/mid-term developments. @utf What do you think ?

mcgalcode · 2023-08-29T18:02:39Z

@mkhorton did you end up starting work on this? I offered to make some contributions to jobflow and would love to tackle this, and am planning to start working on it now. Happy to hold off/coordinate though if you have any concerns or WIP.

mkhorton · 2023-08-29T19:38:21Z

By all means Max, go ahead! I do not have a WIP. Let me know if you have any problems however (perhaps open a PR early so anyone interested can comment?)

mcgalcode · 2023-08-30T19:38:47Z

Sounds good Matt! Early PR is a good idea for sure.

hrushikesh-s mentioned this issue Sep 8, 2023

Formalizing the JobStore document format as a pydantic model #424

Merged

5 tasks

mcgalcode mentioned this issue Sep 9, 2023

Reconstruct flow from outputs in JobStore [WIP] #425

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insufficient information stored to recover parent-child relationship between jobs from their `JobStore` output docs? #374

Insufficient information stored to recover parent-child relationship between jobs from their `JobStore` output docs? #374

mkhorton commented Jul 20, 2023

utf commented Jul 27, 2023 •

edited

Loading

mkhorton commented Aug 3, 2023

utf commented Aug 14, 2023

davidwaroquiers commented Aug 14, 2023

mcgalcode commented Aug 29, 2023

mkhorton commented Aug 29, 2023

mcgalcode commented Aug 30, 2023

Insufficient information stored to recover parent-child relationship between jobs from their JobStore output docs? #374

Insufficient information stored to recover parent-child relationship between jobs from their JobStore output docs? #374

Comments

mkhorton commented Jul 20, 2023

utf commented Jul 27, 2023 • edited Loading

mkhorton commented Aug 3, 2023

utf commented Aug 14, 2023

davidwaroquiers commented Aug 14, 2023

mcgalcode commented Aug 29, 2023

mkhorton commented Aug 29, 2023

mcgalcode commented Aug 30, 2023

Insufficient information stored to recover parent-child relationship between jobs from their `JobStore` output docs? #374

Insufficient information stored to recover parent-child relationship between jobs from their `JobStore` output docs? #374

utf commented Jul 27, 2023 •

edited

Loading