Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing information printed about number of matches in labelMolecules method in forcefield_labeler #95

Open
bannanc opened this issue Aug 3, 2016 · 3 comments
Assignees
Labels

Comments

@bannanc
Copy link
Collaborator

bannanc commented Aug 3, 2016

I'm playing with the labelMolecules method to figure out how to use it in the next generation of smarty.

Here is the test case I'm using now:

mol1 = OEMol()
OEParseSmiles(mol1, 'CCCC')
OEAddExplicitHydrogens(mol1)
labeler = ForceField_labeler(get_data_filename('forcefield/Frosst_AlkEtOH.ffxml'))
labels = labeler.labelMolecules([mol1], verbose = True)

Butane is an easy case to draw out and I'm seeing a bug in how the matches are counted. This doesn't seem to carry through to the labels output, it is just in the printed results if verbose = True
Lets focus on the bond results since that is the easiest one to count by hand.

There should be 3 bonds for "[#6X4:1]-[#6X4:2]" however the results say there are 6 so I think it is double counting for example carbon1-carbon2 and carbon2-carbon1 as the same bond maybe?
However it doesn't seem to do this for every SMIRKS because it isn't double counting the bonds for carbon to hydrogen, there are 10 of these in the molecule and the results for "[#6X4:1]-[#1:2]" print 10 matches.

@davidlmobley any ideas why this might be happening?

I haven't looked closely at the labelMolecules code. When I print out the information in labels it only lists each carbon-carbon bond once.

@davidlmobley
Copy link
Contributor

@bannanc - I don't actually think this is a bug, more of an issue of "confusing output". Specifically, imagine numbering your carbons 1-2-3-4. You have three carbon-carbon bonds, sure, except that the SMIRKS patterns match both ways, so you have a 1-2 bond, a 2-1 bond, a 2-3 bond, a 3-2 bond, a 3-4 bond, and a 4-3 bond. So, we actually get six matches even though there are only three bonds. So the output is "correct".

Now, when we actually go through and create force terms, we make sure we don't create the same force twice, so we don't end up creating a force on the 2-3 bond AND the 3-2 bond. But as far as SMIRKS matches are concerned, there ARE six matches.

(OK, technically, we could adjust an argument to the OpenEye tools to return only unique matches, so it would consider the 2-3 and 3-2 bonds as equivalent. HOWEVER, that setting breaks something important when it comes to torsions in four-membered rings (for reasons I don't need to go into here) so we're stuck asking for non-unique matches when doing the SMIRKS queries.)

In any case, what this means is that I don't see a way to make it so the verbose output reports the number of force terms rather than the raw number of matches. I'm open to suggestions as to how the information printed should be changed to make it more clear that this is not the number of force terms applied, just the number of (non-unique) matches identified.

(As a side note, the carbon-hydrogen bonds aren't "double counted" since these aren't symmetric so the SMIRKS pattern only matches in one direction.)

So, not a bug, though I admit the output could perhaps be clearer. Suggestions?

@davidlmobley davidlmobley removed the bug label Aug 4, 2016
@davidlmobley davidlmobley changed the title number of matches in labelMolecules method in forcefield_labeler Confusing information printed about number of matches in labelMolecules method in forcefield_labeler Aug 4, 2016
@bannanc
Copy link
Collaborator Author

bannanc commented Aug 4, 2016

Ok, this makes much more sense. I think the easiest way to clarify output would be to put a column header of some kind, maybe "SMIRKS pattern" and "number non-unique matches" ? Thank you for clarifying what it's doing.

I would like to know why unique causes problems with 4 membered rings, but that should be done elsewhere.

@davidlmobley davidlmobley self-assigned this Aug 8, 2016
@davidlmobley
Copy link
Contributor

(Update/note to self: This applies also to the main ForceField class which now replaces forcefield_labeler, so I should clarify the info printed there.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants