Standardize approach for postprocessing data: use builders? #466

mattmcdermott · 2023-08-09T06:46:39Z

A point of confusion when writing a new Flow seems to be how to process the output data, particularly when it involves outputs from multiple jobs. Should we put postprocessing code in a job? Or should we use builders? Currently, there is only one builder in atomate2 (this is the ElasticBuilder):

atomate2/src/atomate2/vasp/builders/elastic.py

Line 20 in 98bbc0e

class ElasticBuilder(Builder):

IMO we should almost always recommend using builders, which is already the standard for MP (and more robust). I took this strategy in the magnetic ordering workflow (WIP #432), where I implemented a builder under common that can be easily adapted to results from any DFT code. This may be a good design pattern (where possible).

Perhaps we should open this up for discussion?

@computron @utf

The text was updated successfully, but these errors were encountered:

utf · 2023-08-14T09:05:14Z

Overview

I think we should support both. Often you want to be able to use the outputs of flow in a subsequent calculation. E.g., the AMSET flow calculates the elastic constant as part of it. To avoid having to run the builders midway during a workflow, the elastic document should be calculated during the flow itself and stored in the database. Afterwards, if you want to recalculate the elastic constants using different settings, or say you calculated some more displacements, then you can run the builder to generate the elastic collection separately.

In my opinion, the cleanest way to do this is to share as much code as possible between the builder and the calculation of properties during the workflow. The design pattern I adopted was to put class methods on the property task documents that construct the document from some input data. The process then works like this:

The workflow knows exactly which documents are needed to construct the property document as it has the full workflow graph. It can then just call the document class method to construct the property.
The builder's job is to decide which calculations are needed to construct the property. This should be all of the logic of the builder, just deciding which calculations to use, extracting the relevant fields from them, and then passing them to the document class method.

Example

As an example, the ElasticTaskDocument has a method from_stresses.

atomate2/src/atomate2/common/schemas/elastic.py

Lines 141 to 153 in 36e791a

    
           def from_stresses( 
        
               cls, 
        
               structure: Structure, 
        
               stresses: List[Stress], 
        
               deformations: List[Deformation], 
        
               uuids: List[str], 
        
               job_dirs: List[str], 
        
               fitting_method: str = SETTINGS.ELASTIC_FITTING_METHOD, 
        
               order: Optional[int] = None, 
        
               equilibrium_stress: Optional[Matrix3D] = None, 
        
               symprec: float = SETTINGS.SYMPREC, 
        
               allow_elastically_unstable_structs: bool = True, 
        
           ):

In the elastic workflow, this is called using the outputs of jobs in the workflow. This construction happens in a job:

atomate2/src/atomate2/vasp/jobs/elastic.py

Lines 280 to 291 in 36e791a

    
           return ElasticDocument.from_stresses( 
        
               structure, 
        
               stresses, 
        
               deformations, 
        
               uuids, 
        
               job_dirs, 
        
               fitting_method=fitting_method, 
        
               order=order, 
        
               equilibrium_stress=equilibrium_stress, 
        
               symprec=symprec, 
        
               allow_elastically_unstable_structs=allow_elastically_unstable_structs, 
        
           )

In the elastic builder, the builder collects the relevant documents and then calls the exact same method to construct the document:

atomate2/src/atomate2/vasp/builders/elastic.py

Lines 266 to 274 in 36e791a

    
           return ElasticDocument.from_stresses( 
        
               structure, 
        
               stresses, 
        
               deformations, 
        
               uuids, 
        
               job_dirs, 
        
               fitting_method=fitting_method, 
        
               symprec=symprec, 
        
           )

I think this way we get the best of both worlds. Once you have written the task document and the class method, it is very little effort to call this method in the workflow. Creating the builder is more effort but none of it is duplicated, as you can simply reuse the same code for constructing the document. My suggestion would be to implement this approach in the magnetic ordering workflow. I also recommended this be implemented in the ferroelectric workflow too. E.g., see this comment: https://github.com/materialsproject/atomate2/pull/196/files#r1143386198

Documentation

If people agree with this approach, I suggest we add a page in the developer documentation about this design decision. Hopefully, once we have a few examples, this approach will be clearer to people.

Obviously I would appreciate input from other developers as to whether there are downsides or issues with this approach.

mattmcdermott · 2023-08-14T18:26:40Z

Excellent write-up! @utf

I agree -- if the schema construction approach is used, this shouldn't be too much extra work to implement for a developer.

My original hesitation with writing a postprocessing job was how to avoid getting stuck in workflows where failure of at least one job is very common (for example, a very unfavorable transformed structure cannot converge during relaxation). As long as we also have the builder, this provides some extra flexibility :)

computron · 2023-08-25T18:49:26Z

We discussed this briefly in the atomate2 meeting on 8/25/23 but weren't able to finish the discussion. A few notes to follow up on here:

@janosh points out that the example schema for elasticity is reproduced in almost the same way in the emmett repo. Is there a reason to not consolidate these?
I brought up some concerns regarding data size and duplication (i.e. data consistency) if these builder documents are stored both within the Jobflow infrastructure and as part of an analysis collection. For 10 runs this is really not a problem to duplicate, for 10,000 it could be.
I wanted some more details about what job_dirs was and whether we had a better way to track directories when runs occur across multiple machines. Note that the emmett version of this scheme does not have job_dirs.

janosh · 2023-08-25T18:59:57Z

Just to clarify, I didn't compare in detail but the ElasticityDoc in emmet-core

https://github.com/materialsproject/emmet/blob/4f2c35976d05369f3947014dbfef6baf98e1500f/emmet-core/emmet/core/elasticity.py#L156-L353

looks similar to ElasticDocument in atomate2. So if there is potential for consolidation, we should pursue it in the same spirit as #486

atomate2/src/atomate2/common/schemas/elastic.py

Lines 111 to 252 in 36e791a

    
           class ElasticDocument(BaseModel): 
        
               """Document containing elastic tensor information and related properties.""" 
        
               structure: Structure = Field( 
        
                   None, description="The structure for which the elastic data is calculated." 
        
               ) 
        
               elastic_tensor: ElasticTensorDocument = Field( 
        
                   None, description="Fitted elastic tensor." 
        
               ) 
        
               eq_stress: Matrix3D = Field( 
        
                   None, description="The equilibrium stress of the structure." 
        
               ) 
        
               derived_properties: DerivedProperties = Field( 
        
                   None, description="Properties derived from the elastic tensor." 
        
               ) 
        
               formula_pretty: str = Field( 
        
                   None, 
        
                   description="Cleaned representation of the formula", 
        
               ) 
        
               fitting_data: FittingData = Field( 
        
                   None, description="Data used to fit the elastic tensor." 
        
               ) 
        
               fitting_method: str = Field( 
        
                   None, description="Method used to fit the elastic tensor." 
        
               ) 
        
               order: int = Field( 
        
                   None, description="Order of the expansion of the elastic tensor." 
        
               ) 
        
               @classmethod 
        
               def from_stresses( 
        
                   cls, 
        
                   structure: Structure, 
        
                   stresses: List[Stress], 
        
                   deformations: List[Deformation], 
        
                   uuids: List[str], 
        
                   job_dirs: List[str], 
        
                   fitting_method: str = SETTINGS.ELASTIC_FITTING_METHOD, 
        
                   order: Optional[int] = None, 
        
                   equilibrium_stress: Optional[Matrix3D] = None, 
        
                   symprec: float = SETTINGS.SYMPREC, 
        
                   allow_elastically_unstable_structs: bool = True, 
        
               ): 
        
                   """ 
        
                   Create an elastic document from strains and stresses. 
        
                   Parameters 
        
                   ---------- 
        
                   structure : .Structure 
        
                       The structure for which strains and stresses were calculated. 
        
                   stresses : list of Stress 
        
                       A list of corresponding stresses. 
        
                   deformations : list of Deformation 
        
                       A list of corresponding deformations. 
        
                   uuids: list of str 
        
                       A list of uuids, one for each deformation calculation. 
        
                   job_dirs : list of str 
        
                       A list of job directories, one for each deformation calculation. 
        
                   fitting_method : str 
        
                       The method used to fit the elastic tensor. See pymatgen for more details on 
        
                       the methods themselves. The options are: 
        
                       - "finite_difference" (note this is required if fitting a 3rd order tensor) 
        
                       - "independent" 
        
                       - "pseudoinverse" 
        
                   order : int or None 
        
                       Order of the tensor expansion to be fitted. Can be either 2 or 3. 
        
                   equilibrium_stress : list of list of float 
        
                       The stress on the equilibrium (relaxed) structure. 
        
                   symprec : float 
        
                       Symmetry precision for deriving symmetry equivalent deformations. If 
        
                       ``symprec=None``, then no symmetry operations will be applied. 
        
                   allow_elastically_unstable_structs : bool 
        
                       Whether to allow the ElasticDocument to still complete in the event that 
        
                       the structure is elastically unstable. 
        
                   """ 
        
                   strains = [d.green_lagrange_strain for d in deformations] 
        
                   if symprec is not None: 
        
                       strains, stresses, uuids, job_dirs = _expand_strains( 
        
                           structure, strains, stresses, uuids, job_dirs, symprec 
        
                       ) 
        
                   deformations = [s.get_deformation_matrix() for s in strains] 
        
                   # -0.1 to convert units from kBar to GPa and stress direction 
        
                   stresses = [-0.1 * s for s in stresses] 
        
                   eq_stress = None 
        
                   if equilibrium_stress: 
        
                       eq_stress = -0.1 * Stress(equilibrium_stress) 
        
                   pk_stresses = [s.piola_kirchoff_2(d) for s, d in zip(stresses, deformations)] 
        
                   if order is None: 
        
                       order = 2 if len(stresses) < 70 else 3  # TODO: Figure this out better 
        
                   if order > 2 or fitting_method == "finite_difference": 
        
                       # force finite diff if order > 2 
        
                       result = ElasticTensorExpansion.from_diff_fit( 
        
                           strains, pk_stresses, eq_stress=eq_stress, order=order 
        
                       ) 
        
                       if order == 2: 
        
                           result = ElasticTensor(result[0]) 
        
                   elif fitting_method == "pseudoinverse": 
        
                       result = ElasticTensor.from_pseudoinverse(strains, pk_stresses) 
        
                   elif fitting_method == "independent": 
        
                       result = ElasticTensor.from_independent_strains( 
        
                           strains, pk_stresses, eq_stress=eq_stress 
        
                       ) 
        
                   else: 
        
                       raise ValueError(f"Unsupported elastic fitting method {fitting_method}") 
        
                   ieee = result.convert_to_ieee(structure) 
        
                   property_tensor = ieee if order == 2 else ElasticTensor(ieee[0]) 
        
                   ignore_errors = bool(allow_elastically_unstable_structs) 
        
                   property_dict = property_tensor.get_structure_property_dict( 
        
                       structure, ignore_errors=ignore_errors 
        
                   ) 
        
                   derived_properties = DerivedProperties(**property_dict) 
        
                   eq_stress = eq_stress.tolist() if eq_stress is not None else eq_stress 
        
                   return cls( 
        
                       structure=structure, 
        
                       eq_stress=eq_stress, 
        
                       derived_properties=derived_properties, 
        
                       formula_pretty=structure.composition.reduced_formula, 
        
                       fitting_method=fitting_method, 
        
                       order=order, 
        
                       elastic_tensor=ElasticTensorDocument( 
        
                           raw=result.voigt.tolist(), ieee_format=ieee.voigt.tolist() 
        
                       ), 
        
                       fitting_data=FittingData( 
        
                           cauchy_stresses=[s.tolist() for s in stresses], 
        
                           strains=[s.tolist() for s in strains], 
        
                           pk_stresses=[p.tolist() for p in pk_stresses], 
        
                           deformations=[d.tolist() for d in deformations], 
        
                           uuids=uuids, 
        
                           job_dirs=job_dirs, 
        
                       ), 
        
                   )

computron added the discussion API design discussions label Aug 21, 2023

mattmcdermott mentioned this issue Sep 5, 2023

Add magnetic orderings workflow #432

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize approach for postprocessing data: use builders? #466

Standardize approach for postprocessing data: use builders? #466

mattmcdermott commented Aug 9, 2023 •

edited

Loading

utf commented Aug 14, 2023

mattmcdermott commented Aug 14, 2023 •

edited

Loading

computron commented Aug 25, 2023

janosh commented Aug 25, 2023

Standardize approach for postprocessing data: use builders? #466

Standardize approach for postprocessing data: use builders? #466

Comments

mattmcdermott commented Aug 9, 2023 • edited Loading

utf commented Aug 14, 2023

Overview

Example

Documentation

mattmcdermott commented Aug 14, 2023 • edited Loading

computron commented Aug 25, 2023

janosh commented Aug 25, 2023

mattmcdermott commented Aug 9, 2023 •

edited

Loading

mattmcdermott commented Aug 14, 2023 •

edited

Loading