Create benchmarking "suite"; what should be in it? #54

davidlmobley · 2017-04-25T19:26:33Z

We really need a benchmark set or suite where we have a couple of diverse systems we can use to check performance of different move proposal schemes/integrators/etc. We want to move away from running a few small simulations locally when we change something and seeing that acceptance roughly stays the same or gets better to actually knowing EXACTLY how much different approaches impact sampling efficiency on some set of systems. We want this to end up basically push-button, so we can just run some utility on our queue and get back an assessment of the current level of performance.

Obviously, we should include toluene in lysozyme since we've done so much with this already and it's easy to figure out exactly how to analyze the data to assess efficiency (number of transitions per time, convergence of populations, etc.) But what else should be on our tests? @nathanmlim - do you think we can get your initial test system to this stage too?

And, what should we test? I'd think we'd want to normally look at each system, and then for each system try varying the amount of relaxation done over some range (how broad a range?) and look at measures of sampling efficiency.

sgill2 · 2017-04-25T20:15:56Z

I'd say we should cap the relaxation testing at 1000 nsteps_neq since updating the lambdas is slow, so anything longer than that would probably be painful to test*. I'd probably say a reasonable testing range would be [10, 50, 100, 250, 500, 1000] for nsteps_neq.
We can also test on #48 once that's set up.

*At least until the CustomIntegrator is improved
openmm/openmm#1661
openmm/openmm#1772 )

davidlmobley · 2017-06-29T23:50:35Z

Updates to this:

The comment just prior is out-of-date; it turns out we can resolve the speed of this by not updating the LR dispersion correction every step.
I was discussing this with @nathanmlim today and we should reawaken the issue and make some meaningful benchmarks. @khburley 's dipeptide system looks like it might come in line as the first real benchmark as she's going to umbrella sample so we can confirm we get the "right" answer in terms of populations, then this could be something we always run whenever we make changes in order to ensure the changes don't break our ability to get the right answer.

davidlmobley · 2017-08-04T18:45:20Z

We should probably also have a standard benchmark which runs varying the number of perturbation/propagation steps in a similar manner to what saltswap has done (e.g. see graph in openmm/openmm#1832) so we can easily check protocols for new move types we introduce.

nathanmlim added the Backlog label Jul 5, 2018

nathanmlim added help wanted and removed Backlog labels Jul 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create benchmarking "suite"; what should be in it? #54

Create benchmarking "suite"; what should be in it? #54

davidlmobley commented Apr 25, 2017

sgill2 commented Apr 25, 2017

davidlmobley commented Jun 29, 2017

davidlmobley commented Aug 4, 2017

Create benchmarking "suite"; what should be in it? #54

Create benchmarking "suite"; what should be in it? #54

Comments

davidlmobley commented Apr 25, 2017

sgill2 commented Apr 25, 2017

davidlmobley commented Jun 29, 2017

davidlmobley commented Aug 4, 2017