-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize approach for postprocessing data: use builders? #466
Comments
OverviewI think we should support both. Often you want to be able to use the outputs of flow in a subsequent calculation. E.g., the AMSET flow calculates the elastic constant as part of it. To avoid having to run the builders midway during a workflow, the elastic document should be calculated during the flow itself and stored in the database. Afterwards, if you want to recalculate the elastic constants using different settings, or say you calculated some more displacements, then you can run the builder to generate the elastic collection separately. In my opinion, the cleanest way to do this is to share as much code as possible between the builder and the calculation of properties during the workflow. The design pattern I adopted was to put class methods on the property task documents that construct the document from some input data. The process then works like this:
ExampleAs an example, the ElasticTaskDocument has a method atomate2/src/atomate2/common/schemas/elastic.py Lines 141 to 153 in 36e791a
In the elastic workflow, this is called using the outputs of jobs in the workflow. This construction happens in a atomate2/src/atomate2/vasp/jobs/elastic.py Lines 280 to 291 in 36e791a
In the elastic builder, the builder collects the relevant documents and then calls the exact same method to construct the document: atomate2/src/atomate2/vasp/builders/elastic.py Lines 266 to 274 in 36e791a
I think this way we get the best of both worlds. Once you have written the task document and the class method, it is very little effort to call this method in the workflow. Creating the builder is more effort but none of it is duplicated, as you can simply reuse the same code for constructing the document. My suggestion would be to implement this approach in the magnetic ordering workflow. I also recommended this be implemented in the ferroelectric workflow too. E.g., see this comment: https://github.com/materialsproject/atomate2/pull/196/files#r1143386198 DocumentationIf people agree with this approach, I suggest we add a page in the developer documentation about this design decision. Hopefully, once we have a few examples, this approach will be clearer to people. Obviously I would appreciate input from other developers as to whether there are downsides or issues with this approach. |
Excellent write-up! @utf I agree -- if the schema construction approach is used, this shouldn't be too much extra work to implement for a developer. My original hesitation with writing a postprocessing job was how to avoid getting stuck in workflows where failure of at least one job is very common (for example, a very unfavorable transformed structure cannot converge during relaxation). As long as we also have the builder, this provides some extra flexibility :) |
We discussed this briefly in the atomate2 meeting on 8/25/23 but weren't able to finish the discussion. A few notes to follow up on here:
|
Just to clarify, I didn't compare in detail but the looks similar to atomate2/src/atomate2/common/schemas/elastic.py Lines 111 to 252 in 36e791a
|
A point of confusion when writing a new
Flow
seems to be how to process the output data, particularly when it involves outputs from multiple jobs. Should we put postprocessing code in a job? Or should we use builders? Currently, there is only one builder in atomate2 (this is theElasticBuilder
):atomate2/src/atomate2/vasp/builders/elastic.py
Line 20 in 98bbc0e
IMO we should almost always recommend using builders, which is already the standard for MP (and more robust). I took this strategy in the magnetic ordering workflow (WIP #432), where I implemented a builder under
common
that can be easily adapted to results from any DFT code. This may be a good design pattern (where possible).Perhaps we should open this up for discussion?
@computron @utf
The text was updated successfully, but these errors were encountered: