Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPMorph Flows #938

Open
wants to merge 366 commits into
base: main
Choose a base branch
from

Conversation

BryantLi-BLI
Copy link

@BryantLi-BLI BryantLi-BLI commented Jul 30, 2024

This PR seeks to port the original MPMorph workflows from atomate to atomate2. These workflows are used to generate equilibrated amorphous/crystalline/glassy structures with MD runs via AIMD.

Tagging in @Tinaatucsd, @mcgalcode, @mattmcdermott, @esoteric-ephemera, @vir-k01, @sivonxay because they have used it recently or are apart of the original development team.

Completed

  • Volume equilibration workflow meant to equilibrate structure without NPT via a series of NVT runs (Equilibration + production volume)
  • Code independent equilibration and production workflow, with optional quench methods
  • VASP workflows used in MP papers below recreated in this PR:
  • There is an additional feature with this PR to incorporate MLFF MD runs with the same flow.
  • Multi quench options available:
  • Fast quench (double relax and static)
  • Slow quench (series of MD runs to simulate cooling with different temperature profiles)
  • Started to add tests for VASP
  • MLFF tests are completed
  • Pytests
  • Packmol conversion to pymatgen interface (used to generated initial amorphous structure)
  • Slow quench temperature profiles

MISC

  • Seeking feedback for code, missing features, usability

NOTE: Original PR closed and reopened due to some git commit history issues that caused large commit changes. This PR should have resolved the issue. Many thanks to @mcgalcode.

esoteric-ephemera and others added 30 commits March 1, 2024 15:11
…ucture, allow loading via MontyDecoder. Add check in relaxation for force convergence, attr in Forcefield taskdoc
…eal gas stress contribution only when MD trajectory outputs (temp / velocity) are stored
…0**3; add tests for MD NVE ensemble and specifying MolecularDynamics object as input
@BryantLi-BLI
Copy link
Author

@utf @janosh @JaGeo: Should be ready for review!

Comment on lines 105 to 135
working_outputs["relax"]["energy"] = [
sum(frame[-self.energy_average_frames :]) / self.energy_average_frames
for frame in working_outputs["relax"]["energies"]
]

self.postprocessor.fit(working_outputs)
working_outputs = dict(self.postprocessor.results)
for k in ("pressure", "energy"):
working_outputs["relax"].pop(k, None)

if (
working_outputs.get("V0") is None
): # breaks whole flow here if EOS is not fitted properly
return Response(output=working_outputs, stop_children=True)
if (
working_outputs.get("V0") <= working_outputs.get("Vmax")
and working_outputs.get("V0") >= working_outputs.get("Vmin")
) or (
self.max_attempts
and (
len(working_outputs["relax"]["volume"])
- self.postprocessor.min_data_points
)
>= self.max_attempts
):
# If the equilibrium volume is within the range of fitted volumes,
# or if the maximum number of attempts has been performed, stop
# and return structure at estimated equilibrium volume
final_structure = structure.copy()
final_structure.scale_lattice(working_outputs["V0"])
return final_structure
Copy link
Contributor

@orionarcher orionarcher Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this a bit hard to parse, consider:

            # Fit EOS to the running volume and energy from previous calculations
            self.postprocessor.fit(working_outputs)
            working_outputs = dict(self.postprocessor.results)

            # breaks whole flow here if EOS is not fitted properly
            if working_outputs.get("V0") is None:
                return Response(output=working_outputs, stop_children=True)

            # Check if the number of calculations has reached the maximum
            n_calcs = len(working_outputs["relax"]["volume"])
            max_calcs = self.postprocessor.min_data_points + (self.max_attempts or 1000)
            max_calcs_reached = n_calcs >= max_calcs

            v_0 = working_outputs.get("V0")
            v_min = working_outputs.get("Vmin")
            v_max = working_outputs.get("Vmax")

            # return final structure if calculated volume is in interpolation regime
            if max_calcs_reached or (v_min <= v_0 <= v_max):
                final_structure = structure.copy()
                final_structure.scale_lattice(v_0)
                return Response(output=final_structure)

            # Else, if the extrapolated equilibrium volume is outside the range of
            # fitted volumes, scale appropriately
            v_ref = v_min if v_0 < v_min else v_max
            eps_0 = (v_0 / v_ref) ** (1 / 3) - 1.0

            linear_strain = [np.sign(eps_0) * (abs(eps_0) + self.min_strain)]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tweaked this a little bit - let me know if that looks better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I might save intermediate variables for the if statements but ultimately a matter of taste.

Create a flow with MPMorph molecular dynamics (and relax+static).

By default, production run is broken up into multiple smaller steps.
Converegence and quench are optional and may be used to equilibrate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the docstr generally since it was confusing

Comment on lines 321 to 324
relax_maker: Maker = Maker
relax_maker2: Maker | None = None
static_maker: Maker = Maker

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any point in having generic Makers as defaults? nothing would happen unless you pass in explicit model/DFT Makers, no? in which case, maybe better not to set defaults at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was probably to enforce that these are required, but the workflow itself will fail if relax_maker and static_maker are unspecified. Changed their defaults to None

Comment on lines +66 to +89
def make(
self,
structure: Structure,
prev_dir: str | Path | None = None,
working_outputs: dict[str, Any] | None = None,
) -> Flow:
"""
Run an NVT+EOS equilibration flow.

Parameters
----------
structure : Structure
structure to equilibrate
prev_dir : str | Path | None (default)
path to copy files from
working_outputs : dict or None
contains the outputs of the flow as it recursively updates

Returns
-------
.Flow, an MPMorph flow
"""
if working_outputs is None:
if isinstance(self.initial_strain, float | int):
Copy link
Member

@janosh janosh Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this make looks like it could benefit from more checking of assumptions and raising helpful error messages if they fail. e.g.

 if isinstance(self.initial_strain, float | int):
    self.initial_strain = (
        -abs(self.initial_strain),
        abs(self.initial_strain),
    )
elif not isinstance(self.initial_strain, tuple) or len(self.initial_strain) != 2:
   raise ValueError(f"initial_strain must be float | tuple[float, float], got {initial_strain}")

name : str
Name of the flows produced by this maker.
convergence_md_maker : EquilibrateVolumeMaker
MDMaker to generate the equilibrium volumer searcher
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Comment on lines 201 to 204
convergence_md_maker: Maker | None = None # check logic on this line
# May need to fix next two into ForceFieldMDMakers later..)
production_md_maker: Maker | None = None
quench_maker: FastQuenchMaker | SlowQuenchMaker | None = None
Copy link
Member

@janosh janosh Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a __post_init__ to raise early if convergence_md_maker, production_md_maker, quench_maker are left as None. based on the doc string sounds like maybe production_md_maker shouldn't be allowed to be None anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a __post_init__ to this and the base quench maker classes in the same file

Comment on lines 262 to 273
@classmethod
def from_temperature_and_steps(
cls,
temperature: float,
n_steps_convergence: int = 5000,
n_steps_production: int = 10000,
end_temp: float | None = None,
md_maker: Maker = None,
quench_maker: FastQuenchMaker | SlowQuenchMaker | None = None,
) -> Self:
"""
Create an MPMorph flow from a temperature and number of steps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is missing an @abstractmethod decorator?

Copy link
Contributor

@esoteric-ephemera esoteric-ephemera Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this decorator to a few other places where I can see it's needed (eos common flows, ASE base classes)

Comment on lines +470 to +476
def call_md_maker(
self,
structure: Structure,
temp: float,
prev_dir: str | Path | None = None,
) -> Flow | Job:
"""Call MD maker for slow quench.
Copy link
Member

@janosh janosh Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, should be marked @abstractmethod

----------
name : str
Name of the flows produced by this maker.
convergence_md_maker : EquilibrateVolumeMaker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name convergence_md_maker seems a bit inconsistent. maybe call this equilibrium_volume_maker?

MDMaker to generate the equilibrium volumer searcher
production_md_maker : Maker
MDMaker to generate the production run(s)
quench_maker : SlowQuenchMaker or FastQuenchMaker or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double space after :

"""

name: str = "Equilibrium Volume Maker"
md_maker: Maker | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

md_maker is allowed to be None but looks like make will fail if it is? why not make this a required arg?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, this has a __post_init__ for checking this is not None now

Copy link
Member

@janosh janosh Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think adding a __post_init__ can be considered boilerplate. if you make md_maker a required arg and the user doesn't pass it, you dataclass should raise a similar error message

def __post_init__(self) -> None:
    if self.md_maker is None:
        raise ValueError("You must specify `md_maker` to use this flow.")

Copy link
Contributor

@esoteric-ephemera esoteric-ephemera Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - that's there already, unless I'm misunderstanding your suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know, i'm suggesting removing it and changing

- md_maker: Maker | None = None
+ md_maker: Maker

to make md_maker a required arg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh yes sorry! Changed as requested, leaving in the __post_init__'s as well tho

def calculator(self) -> Calculator:
"""ASE calculator, method to be implemented in subclasses."""
raise NotImplementedError
return NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this change accidental? can't return NotImplementedError here, doesn't match the return type Calculator. should keep the raise and the @abstractmethod decorator

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swapped out return with raise.

I'm a bit confused on the @abstractmethod decorator. It seems that every forcefield that inherits the ASEMaker adapts their calculator as @property. Does this imply that all the calculators should be swapped from @property -> @abstractmethod ? or just the original base class where the calculator should be absent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops typo, think @BryantLi-BLI fixed that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that all the calculators should be swapped from @Property -> @AbstractMethod ? or just the original base class where the calculator should be absent?

just the base class should have the @abstractmethod decorator. but all the calculator methods should have the @property decorator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryantLi-BLI : that's my bad, the AseMaker.calculator should be both a property and abstractmethod. I accidentally removed this in 00d1434 (was trying to see if there was a nice way to serialize ASE calculators without having to subclass AseMaker, and just forgot to restore a few things)

pyproject.toml Outdated
@@ -138,6 +139,7 @@ changelog = "https://github.com/materialsproject/atomate2/blob/main/CHANGELOG.md

[tool.setuptools.package-data]
atomate2 = ["py.typed"]
"atomate2.common.jobs" = ["*.json.gz"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably this line is here to include src/atomate2/common/jobs/mp_avg_vol.json.gz with the PyPI package? probably best to pass the full file path to not auto-include future json.gz file in this dir by accident. but more importantly, that file is 650KB and needs to be downloaded by everyone who installs atomate2 if we merge the current config, whether they use MPMorph flows or not. up to @utf to make a decision here but i would vote against adding that file to the package. better to auto-download it when the user first needs it with a link where to get it if user doesn't have an internet connection.

finally, i looked into that file. looks like it's partially redundant. at least the oxi_state and count keys can likely be removed?

Screenshot 2024-10-08 at 17 35 38

Copy link
Contributor

@esoteric-ephemera esoteric-ephemera Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janosh the oxidation states are I think ignored by default, but the counts are needed. If a given chemical environment doesn't exist in a database, the code looks through subspaces of that chemical environment and does a weighted average of structure volumes. The weighted average depends on the counts

I started rewriting this in parquet to jointly store them - might be better to put the parquet file on OpenData and pull down/cache as needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so for clarity: for the ICSD data, the user can either query from oxidation-state dependent or agnostic data, since ICSD includes manually-assigned charges. I pulled the MP data from summary rather than bonds, so the oxidation state info there is ignored.

That's a slight incongruity that I can change if requested

The counts are needed for the same reason given above (the code will sample subspaces of the total chemical space to perform a weighted averaging of reference volumes/atom if the total chemical space is not present in the dataset)

The data is now pulled / cached from figshare

Stuck with JSON since the parquet speed bump is probably not super relevant here (reading basically once rather than reading many times, and the user has the option to manually specify a DataFrame as input, if generating many random structures)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvements to existing features forcefields Forcefield related md Molecular dynamics
Projects
None yet
Development

Successfully merging this pull request may close these issues.