Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Chemical Environment Sampler with basic moves for Bonds, Angles, Torsions, and Impropers #112

Merged
merged 50 commits into from
Aug 23, 2016

Conversation

bannanc
Copy link
Collaborator

@bannanc bannanc commented Aug 11, 2016

I have made an updated version of sampler.py called sampler_smirky.py.
Instead of using string manipulations to sample atomtypes it uses chemical environment to sample Atoms, Bonds, Angles, Torsions, or Impropers.
This is still very much as a WIP stage, but it does run and now @cbayly13 can work on how to make moves.

I imagine some things will change in this set up, but here is an example of how to used the sampler, there are a few examples also in examples/smirky_sampler/

from smarty.sampler_smirky import *
from smarty.utils import *
temperature = 0.0001
ORdecorators = ['X1', 'X2', 'X3', 'X4', 'H0', 'H1', 'H2', 'H3','+0', '+1', '-1', 'a', 'A']
ANDdecorators = ['X1', 'X2', 'X3', 'X4', 'H0', 'H1', 'H2', 'H3','+0', '+1', '-1', 'a', 'A']
SMIRFF = "forcefield/Frosst_AlkEtOH.ffxml"
verbose = True
replacements = ['$halogen', '#9,#17,#35,#53']
elements = ['#%i' % i for i in range(1, 119)] 
typetag = 'Angle'
mol2file = 'test_filt1_tripos.mol2'
molecules = read_molecules(mol2file, verbose = verbose)
initialTypes = None

sampler = TypeSampler(molecules, typetag, elements, ORdecorators, ANDdecorators, replacements, initialTypes, SMIRFF, temperature, verbose)

niterations = 100
traj_file = "trajectory.csv"
sampler.run(niterations, traj_file)

This primarily addresses issue #97 , but I've made a few other changes while working on this that will address other issues.
#93 Chemical Environments now have a function getType() which, per @cbayly13 's request returns

'VdW', 'Bond', 'Angle', 'Torsion', 'Improper' or None (if there are 0 or more than 4 labeled atoms)
#96 I think needs to be discussed a little, I have included in the sampler checks that smirks are parseable when making moves, I'm not sure if this is a check we should include inside the chemical environments also or instead.
#98 I fixed smirksEnvDepict.ipynb in examples/chemicalEnvironments/ to copy the molecule so the original isn't flattened
#99 I updated the addAtom function in chemicalEnvironments to optionally check if the atom is "empty." We are currently thinking of empty atoms as those with only 1 ORtype decorator and 0 ANDtype decorators. @cbayly13 and I have discussed this a little offline and we don't think it makes sense to add empty atoms. @jchodera our understanding is that as long as our moves are reversible they should be allowed? So when we add atoms we will initiate them with 1 ORtype and removing can't happen unless there is only that one ORtype.

That being said the removeAtom also removes the associated bond, currently removeAtom doesn't look at the associated bond.
#110 This is not a creation of the ChemicalEnvironmentProposalEngine, but I think a lot of the frame work will end up in this sampler.

@bannanc bannanc changed the title Chemical Environment Sampler [WIP] Chemical Environment Sampler Aug 11, 2016
@bannanc
Copy link
Collaborator Author

bannanc commented Aug 11, 2016

Also, I haven't written any tests for the sampler_smirky.py yet. My plan is to mimic the ones we have for the current sampler, but if anyone thinks there is something different it should check, please let me know.

@jchodera
Copy link
Member

This is great!

Looking ahead, we'll want to divorce the sampler from the set of molecules so that we can incorporate this into a real parameter sampling scheme. You will want several separate components, rather than one monolithic component (roughly following this thread):

  • A sampler that you can run that makes proposals, rejects unsuitable ones, and accepts/rejects with respect to some posterior model. This takes as input initial types (initialTypes)
  • A posterior probability (or score) model that takes the molecules as a dataset. This holds the temperature
  • A ProposalEngine that takes only the current set of types and proposes a new set of types (and log probability). This takes as input information on how to generate new types (elements, ORdecorators, ANDdecorators, replacements)

@bannanc
Copy link
Collaborator Author

bannanc commented Aug 11, 2016

You will want several separate components, rather than one monolithic component.

For now this will give @cbayly13 and me somewhere to figure out how to make moves in this space in a way that will reach the level of complexity we need. I guess we could have written the separate tools and used them here, but it was a bit easier working in the framework we already had for atomtypes.

@cbayly13 I imagine you will work primarily in the create_new_environment function. I think pretty immediately we are going to need to fix the way it chooses which move to make, currently it wastes A LOT of time decorating and un-decorating bonds.

@jchodera
Copy link
Member

I guess we could have written the separate tools and used them here, but it was a bit easier working in the framework we already had for atomtypes.

You don't need separate tools, but you will need to refactor the class into several different classes so we can keep things isolated in a way that allows us to experiment with different strategies for atom type proposals without breaking everything or duplicating a ton of code.

@davidlmobley
Copy link
Contributor

I agree with both of you here, @bannanc and @jchodera . We're heading in the direction @jchodera mentions, but this allows us to proceed with testing out the move proposals Christopher is working on with the chemical environment objects while John finalizes the PropertyCalculator initial implementation and we proceed to finalizing what the API will be for move proposals, etc.

@bannanc has been working on this for a while, since long before we started discussing the parameter sampling API over here (#110 (comment)); as we've discussed in prior calls, we headed this direction since (a) we need a descendant of smarty which allows us to test sampling of more than just vdW types, and (b) it wasn't time to figure out the general parameter sampling API until we had the ability to calculate properties. In other words, this allows us to continue to make forward progress on figuring out how to do reasonable SMIRKS sampling for other parameter types (and test the results) so that we will know what to put into the API once we sort out what it looks like. Otherwise, Christopher and Caitlin would just end up sitting on their hands waiting...

@jchodera
Copy link
Member

Oh, it's fine for now. I'm just saying you're going to need to refactor it soon (probably over the next ~2 weeks).

@davidlmobley
Copy link
Contributor

@jchodera 👍 . We're totally expecting to have to refactor, yes. We're on the same page. :)

@davidlmobley
Copy link
Contributor

@bannanc - I'm going over this a bit. It's actually looking really great!

I have a few notes/questions. These sort of started off as questions, but as my understanding developed they became more of notes. They're probably still useful for the record, however -- just realize that most of the questions actually get answered:

  1. Can you explain what this SMIRKS means? [*:1]~;!@[*:2]~;!@[*:3]. For example, I'm seeing things like: 1873: [*:1]~;!@[*:2]~;!@[*:3] matches a0002: [#1:1]-[#6X4:2]-[#1:3] ..., but I'm just not understanding the decoration with ;!@ for the middle atom. What does the "and" here join? Oh, wait -- these are decorating the bond, right? So it's saying any bond that's not a ring bond?
  2. The results are odd but perhaps correct; here for the startEmpty_Angle.py example, the first three angles we discover end up being discriminated only by whether they involve rings (if my interpretation in point 1 is correct) -- i.e. in my run I find [*:1]~[*:2]~[*:3] (matches [#6X4:1]-[#6X4:2]-[#6X4:3]), then [*:1]~;!@[*:2]~[*:3] (same but the 1-2 bond can't be in a ring; matches [a,A:1]-[#6X4:2]-[a,A:3]) and then [*:1]~;!@[*:2]~;!@[*:3] (same but 1-2 and 1-3 bonds can't be in a ring; matches [#1:1]-[#6X4:2]-[#1:3]) and then [*:1]~[#8:2]~[*:3] (matches [#6X4:1]-[#8X2:2]-[#1:3]). Do you think these are reasonable? How much relative effort are you spending sampling bonds versus sampling atoms? It's interesting to me that we're doing more discrimination here based on bonds than atoms but maybe this is because it is easier to find bond decorators that do something relative to atom decorators/definitions? (UPDATE: After looking closer, I see you're running 119 elements in your test, which dramatically impacts this. With only elements 1-9, bond decorators go almost entirely out the window (presumably because it's now much easier to get the atom decorating to do something useful) and you start recovering sensible things, like [*:1]~[#8:2]~[*;X1:3] as a more specialized version of [*:1]~[#8:2]~[*:3] (the latter matches CT-OS-CT and the former matches CT-OS-H*).

For the record, in this set there are currently six angle patterns, and above we got four when looking at 119 elements. The other two are [#6X4:1]-[#8X2:2]-[#6X4:3] and [#8X2:1]-[#6X4:2]-[#8X2:3], I believe. Presumably those will only be found by appropriate atom definitions, not just by decorating bonds. (Update: As noted above this works better when we search fewer elements.)

Tests-wise, you'll definitely want tests which correspond to your correctAtoms.py, correctBonds.py, etc. examples, as an important test (as for smarty) is whether, if you start with the "right answer", if you get 100%.

@davidlmobley
Copy link
Contributor

Next, @bannanc , comments/answers/partial answers on some of your questions:

#96 I think needs to be discussed a little, I have included in the sampler checks that smirks are parseable when making moves, I'm not sure if this is a check we should include inside the chemical environments also or instead.

It looks like currently you raise an exception if a SMIRKS isn't parseable. I think this is the right thing to do (as if we're doing this correctly, our smirks should all be parseable) and I would put it inside the chemical environment objects so that the user of the environment object doesn't have to remember to check. In other words, have your code check itself for bugs rather than having the user check... :)

So when we add atoms we will initiate them with 1 ORtype and removing can't happen unless there is only that one ORtype.

This sounds reasonable to me.

I'd like to hear a little more on your rationale for not adding empty atoms, though:

We are currently thinking of empty atoms as those with only 1 ORtype decorator and 0 ANDtype decorators. @cbayly13 and I have discussed this a little offline and we don't think it makes sense to add empty atoms.

At first I agreed completely, as I thought adding another atom doesn't provide any more information if it is empty. But then I thought through an example. If we're looking at an angle, for example, which has three labeled atoms, and we add a fourth atom, it initially didn't seem like it would add value to add a fourth atom if that atom is empty (since the only case where this could matter would be if it ended up clarifying the number of bonds an atom involved can make - for example if we have [*:1]~[*:2]~[*:3] and go to [*:1](~[*])~[*:2]~[*:3], then the first atom can no longer be hydrogen or [#8X2] or a halogen or...). But then when I got to this point, I suddenly realized, "but wait, that's actually a LOT of information we gained by just adding an empty atom."

Now, I realize that a more "traditional" way of saying that the first atom can't be a halogen or an oxygen that already has two bonds or (etc.) would be to decorate it with all those things, and maybe that will typically make more sense. So I don't necessarily have a problem with initiating atoms with one ORtype and allowing deletion of atoms with one ORtype, if you and Christopher are convinced that the information we gain by adding/deleting empty atoms isn't useful. I also note that there are disadvantages to adding/deleting only empty atoms, one of which is then that it takes more work (and more moves) to get to the type of decoration we often expect to be most useful, which is where we SPECIFY an alpha or beta substituent. Maybe this is enough reason to use the approach you mention.

That being said the removeAtom also removes the associated bond, currently removeAtom doesn't look at the associated bond.

Can you clarify this point? I don't actually understand what you're saying. Why would removeAtom need to look at the bond? Or are you saying that after the atom was added, the bond might have become decorated, and it doesn't check this before allowing atom removal? If the latter, this will need fixing. Remember, you have to have reversibility (Gaetano just gave a group meeting talk on this while you were gone and you should get a recap when you come back!) so if you have a can make moves from the state where you have an atom with one ORtype and a decorated bond to the state where you have deleted that atom and its bond, but you CAN'T make moves from the state where you have no atom/bond to the state where you have an atom with one ORtype and a decorated bond, then the whole machinery is no longer guaranteed to converge to a steady state in the limit of enough sampling.

Hopefully I hit all the major points. I haven't done that much code review, but functionally it looks good. Let me know when you're ready for more detailed code review.

…nb for use in smirksEnvMoves to actually attempt to perform them. These files can be used to test smirksEnvMoves.
@davidlmobley
Copy link
Contributor

@bannanc - tests are failing because of an issue with importing OpenEye toolkits. I don't think your modifications have caused this, but it's possible there is already a fix in master that you haven't pulled into this? Otherwise I may need to investigate as maybe Travis is failing at installing them.

…s in BAIT space (Bonds, Angle, Impropers, and Torsions). Now all functions have a doc string.
@bannanc
Copy link
Collaborator Author

bannanc commented Aug 22, 2016

@davidlmobley I haven't pulled from master in a while, so it's possible I missed a change there, I'll do that now.

edit: I just tried to pull origin master and it said I was up to date...

bannanc and others added 2 commits August 22, 2016 15:56
Bring Bayly's latest envMoves examples from his branch into environment branch, updating existing PR #112
@bannanc
Copy link
Collaborator Author

bannanc commented Aug 23, 2016

I was testing some things this morning, the plan was just to import environments and create some with smirks and make sure the getTypes() method was working correctly. When I imported smarty.environments I got the following warning:

In [49]: from smarty.environment import *
'module' object has no attribute '_vendor'
Warning: Cannot import openeye toolkit; not all functionality will be available.

I'm guessing that whatever is causing this warning could also be causing the checks to fail. Any ideas?

@davidlmobley
Copy link
Contributor

@bannanc - I'm not seeing vendor anywhere super obvious in the code. What about trying your experiment again with verbosity, i.e. python -v?

@davidlmobley
Copy link
Contributor

However, it looks like the "Cannot import..." message is coming from __init__.py so you might poke around there and see if you can tell what's going on, @bannanc .

@bannanc
Copy link
Collaborator Author

bannanc commented Aug 23, 2016

Thanks for the direction @davidlmobley , I couldn't find vendor anywhere either, I will poke around and post anything I find.

@bannanc
Copy link
Collaborator Author

bannanc commented Aug 23, 2016

Now it looks like python 2 tests are passing, but not 3? I probably accidentally added something that isn't python3 compatible accidentally, I'll investigate.

@bannanc
Copy link
Collaborator Author

bannanc commented Aug 23, 2016

@davidlmobley I found it, there was an issue with how the smarty tools were being imported and a few issues with print statements. It looks like everything is passing now.

@davidlmobley
Copy link
Contributor

Nice work, @bannanc . But this is still [WIP], yes? Or is at the point where you can finalize a reasonable addition/modularization of this and get it merged and bring in other things later?

@bannanc
Copy link
Collaborator Author

bannanc commented Aug 23, 2016

@davidlmobley I think there are going to be a lot of changes in the format of chemical environments and the sampler as I import @cbayly13 's moves so I don't think we should merge into master until I get those combined.

@davidlmobley
Copy link
Contributor

@bannanc - would it make sense to bring this one in as a functional unit before you do that, since it will involve a ton of other modifications/extensions/changes? Just asking.

@bannanc
Copy link
Collaborator Author

bannanc commented Aug 23, 2016

@davidlmobley
that actually might make sense before I start making changes maybe we should pull this in. It addressed a bunch of smaller issues early on. Why don't you merge this and I'll write up a new issue to explain my plan to merge Christopher's moves into the sampler and make a new pull request.

@bannanc bannanc changed the title [WIP] Chemical Environment Sampler Initial Chemical Environment Sampler with basic moves for Bonds, Angles, Torsions, and Impropers Aug 23, 2016
@davidlmobley davidlmobley merged commit ffbb7c6 into master Aug 23, 2016
@davidlmobley
Copy link
Contributor

OK, let's do that. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants