Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiStateReporter variable pos/vel save frequency #712

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

richardjgowers
Copy link
Contributor

@richardjgowers richardjgowers commented Jul 13, 2023

Description

allows MultiStateReporter to write positions and velocities and different frequency to energies data.

Todos

  • Implement feature / fix bug
  • Add tests
  • Update documentation as needed
  • Update changelog to summarize changes in behavior, enhancements, and bugfixes implemented in this PR

Status

  • Ready to go

Changelog message

MultiStateReporter now takes optional `position_interval` and `velocity_interval` keyword args to control the frequency these are saved at

@mikemhenry
Copy link
Contributor

Will we run into key errors if you try and use this with an .nc file from an older version?

@codecov
Copy link

codecov bot commented Jul 13, 2023

Codecov Report

Merging #712 (d8f80b9) into main (cbff4c8) will increase coverage by 0.00%.
The diff coverage is 92.85%.

❗ Current head d8f80b9 differs from pull request most recent head 981fe1b. Consider uploading reports for the commit 981fe1b to get more accurate results

Additional details and impacted files

@richardjgowers
Copy link
Contributor Author

@mikemhenry it shouldn't affect reading old files, I've only played with the writing code.

In terms of old code reading newer nc files, unless you've changed the defaults from writing positions/velocities on every energy write you should get the same. If you read files with missing data you now get:

>>> ds.variables['positions'][1, :, :]

masked_array(
  data=[[--, --, --],
        [--, --, --],
        [--, --, --],
        [--, --, --],
        [--, --, --],
        [--, --, --],
        [--, --, --],

so it looks like netcdf isn't erroring but instead giving you the lack of data.

I have added two attributes to the DataSet, so if you write code that expects these you're going to have a bad time.

Is this what ncfile.ConventionVersion = '0.2' is made to handle? Should I be bumping that?

Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! This looks good to me.

I do have questions about the motivation for this, since it adds a complexity that I don't seem to get why it is needed, can you expand on this?

Also, we probably want to write tests for this, since it's changing the behavior of the reporter and that can be critical for our users.

@richardjgowers
Copy link
Contributor Author

@ijpulidos the filesizes for MultiStateReporter can get pretty large, since you're saving ~11 trajectories. You typically want to save the energies at a high frequency (since you're very interesting in seeing correlation times) but positions you can have these at a lower frequency, velocities you might not even care about.

Can you point me towards the existing tests so I can make sure the tests I write fit in?

@ijpulidos
Copy link
Contributor

I see, yeah I had the feeling it was a storage issue. We probably need to clean things up in general but I think this will help. Thanks for clarifying the motivation.

As per the tests, I think having them as part of the TestReporter class should be the place, we might consider moving this class to its own test module, but we don't need that right now. Thanks!

@mikemhenry
Copy link
Contributor

In addition to adding tests, we also want to add

  • energy frequency

And double check what gets saved when

@mikemhenry
Copy link
Contributor

We want to figure out what should be saved in the checkpoint and what should be saved in simulation.nc

@ijpulidos
Copy link
Contributor

@mikemhenry Yes, I agree we probably need to discuss first what we want to store and when. And which of these are needed when resuming simulations and similar situations.

If we want to have all of these independently we probably want to sync them with the checkpointing. Such that the required values are up to date when resuming or extending simulations.

@mikemhenry
Copy link
Contributor

Yah I think we need to also make sure we figure out what our analysis tools expect. My hunch is that we need to save a lot of stuff to resume a simulation, but that should be a checkpoint, the time series data we save for analysis should be user configurable.

@ijpulidos
Copy link
Contributor

the time series data we save for analysis should be user configurable.

Yes, decoupling the checkpointing with the user-configurable time series data would probably work better here. I'd vote for that route if it makes sense.

As far as I can tell these are coupled. We would need to think a little bit further about this, since I'm not sure how big of a refactor this could be. There's a lot of moving back and forth between the MultiStateReporter and the MultiStateSampler.

for state, restored_state in zip(sampler_states, restored_sampler_states):
# missing values are returned as numpy masked array
# so we check that these arrays are all masked
assert restored_state.positions._value.mask.all()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something that I'm not 100% sure of currently. netCDF will return a numpy masked array when you access data that isn't present (here we're saving vels every other frame, so accessing velocities here has no data). I hadn't ever encountered these, so maybe it's not the best thing to return? (or maybe it is if it's the netCDF normal return).

# so we check that these arrays are all masked
assert restored_state.positions._value.mask.all()
assert restored_state.velocities._value.mask.all()
assert restored_state.box_vectors is None # not periodic
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something I need to double check, I'm a little confused why the checkpoint has a box and future frames don't, so this is still WIP till I figure that out

@richardjgowers richardjgowers changed the title MultiStateReporter variable pos/vel save frequency [wip] MultiStateReporter variable pos/vel save frequency Jan 12, 2024
@richardjgowers
Copy link
Contributor Author

@ijpulidos @mikemhenry this is much closer to ready, I've added some tests so might be a good time for you both to read through

@mikemhenry mikemhenry mentioned this pull request Feb 26, 2024
@mikemhenry mikemhenry changed the title [wip] MultiStateReporter variable pos/vel save frequency MultiStateReporter variable pos/vel save frequency Feb 26, 2024
@mikemhenry
Copy link
Contributor

@richardjgowers getting some errors in CI

Traceback (most recent call last):
  File "/home/runner/micromamba-root/envs/openmmtools-test/lib/python3.9/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/runner/work/openmmtools/openmmtools/openmmtools/tests/test_sampling.py", line 523, in test_write_sampler_states_no_pos
    assert restored_state.positions._value.mask.all()
AttributeError: 'numpy.ndarray' object has no attribute 'mask'

See the CI log here: https://github.com/choderalab/openmmtools/actions/runs/8052365133

Honestly if you make a commit to bump this PR, you should get the same results, I was having an issue re-starting CI

@ijpulidos
Copy link
Contributor

I wonder if this is what's biting us here, though I don't know why are we expecting these arrays to be masked? #701

@ijpulidos
Copy link
Contributor

Now, to be fair, I don't think the changes in that linked PR have anything to do with the serialized/deserialized positions or velocities. So I doubt it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Sprint - In Progress
Development

Successfully merging this pull request may close these issues.

3 participants