-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChemicalEnvironment sampling API #110
Comments
@jchodera - I went over this with Christopher and what you have written here looks reasonable to both of us. However, there are a variety of aspects we're not clear about yet. One of these is just "what happens between this line":
Something has to pick which parent atom environment to modify, get it into a chemical environment (or maybe we have chemical environments for everything already stored in memory?) and then feed it into the proposal engine. How do you imagine that working, or is that something we should be laying out at this stage as well? Another issue is that the chemical environment objects are associated with particular parameters; are we expecting to be able to transition back and forth between these environment objects and SMIRKS/associated parameters as part of the move proposal process? I am imagining there are at least two general ways of proceeding:
It seems like it should be possible to achieve the same end goal via both means, and I'm not sure if one is more general or preferable. Note that we are thinking that, for other reasons, we are going to need the ability to go from SMIRKS to |
There will be a parameter sampling engine that sits on top of this and has "plugin" methods that are incorporated into the parameter sampling framework, but that can stay pretty invariant if we have the ability to plug in different child atom type proposal schemes. We should spec out an API for that separately.
The child parameter proposal scheme can be decoupled from the child
It would be ideal if there was an invertible one-to-one mapping between If it is very difficult, we'll have to maintain the |
SMIRKS to Chemical Environments is going to take some time to sort out, but I believe it's doable. I've spent some time thinking about it, but haven't drafted any formal code yet. It's really just slightly complicated string parsing.
Do we already know how to calculate a logP value with just one one parent and one child parameter? Or would the engine know about all of the chemical environments currently describing that set of parameters? Maybe that's a discussion about internal details and not the API. |
@bannanc - the So, this is something which has to be built into the move proposal engine if it is biased, if I understand correctly. |
@davidlmobley that does make sense, I'm sure we'll have to discuss it in more detail later. Also I realized I didn't comment on this in general, I think the API for the engine is completely reasonable, we should be able to translate the moves part of the "smirky sampler" @cbayly13 and I are working on directly into it. |
Just to kick of some brainstorming about the sampler API level above this, we might have something like this: # Define a Bayesian posterior model
model = BayesianForcefieldModel(thermoml_dataset)
# Create a parameter sampler
sampler = ParameterSampler(model, initial_parameter_set)
# Register some proposal types, along with weights corresponding to the relative probability for selecting each move type
# Nonbonded type creation/destruction proposal
sampler.registerProposal(NonbondedTypeProposal(nboptions), weight=1.0)
# Nonbonded parameter sampling proposal
sampler.registerProposal(NonbondedParameterProposal(), weight=1.0)
# Bond type creation/destruction proposal
sampler.registerProposal(BondTypeProposal(bondoptions), weight=1.0)
# Bond parameter sampling proposal
sampler.registerProposal(BondParameterProposal(bondoptions), weight=1.0)
..
# BCC sampling proposal
sampler.registerProposal(BondChargeCorrectionProposal(bccoptions), weight=0.1) These
and could create/delete a parameter, propose a modification to a parameter, propose a change of functional form, or even propose coupled moves between multiple parameters. For example, the |
@jchodera - the above looks good to me. So then the And the |
All of the ProposalEngine subclasses will have the same interface. They will need to
That may be it for now. The sampler will have a 'run' method that runs a specified number of iterations of the sampler. The 'update' method will generate a single sample according to the probability assigned to each move type. The sam |
@jchodera @davidlmobley @bannanc I just realized something important that I want to embody here. Base [Decorators with Odds] C [ ('X4', 20), ('X3', 20), ('X2', 5), ('X1', 1), ] C [ ('H0', 1), ('H1', 1), ('H2', 1), ('H3', 1) ] C [ ('-1', 1), ('+0', 50), ('+1', 0) ] I cannot think of an instance where elements H,F,Cl,Br,I need a decorator. Substituents, yes... substituents with decorators... yes. |
I totally agree with this idea, @cbayly13 . |
I certainly agree with the principle that adjusting the probabilities with which different decorators are proposed based on element type could lead to large sampling efficiency games, but I'd prod you to see if you can think of a simple way we could have the sampling scheme discover or tune these probabilities itself. Downloading your chemical intuition is a great first step, @cbayly13, but it would be even better if we could come up with a general principle and have the sampling scheme discover the intuition using this simple principle. For example, what if, during "equilibration", we adjusted the weights of each decorator that was accepted by +1? Or alternatively, just counted the number of atoms each |
@jchodera - this is interesting. I'd been struggling to think of how we could "learn" what types of moves are productive without breaking the entire scheme, but I think your point about "equilibration" might be well-taken... We could run for some period with "adaptive" probabilities where we learn what types of moves seem to do any good and what don't and have that provide some feedback for how we fix probabilities for actually collecting data that samples from the correct distribution. |
That said, I think we should actually have @cbayly13 continue along the same route he's on right now; he's not really an expert on machine learning/Monte Carlo/sampling schemes, so what will probably work best is for him to encode what he thinks the probabilities ought to be based on his chemical intuition, then when we can figure out a suitable principle/sampling scheme we can see what we discover and how well it matches with his intuition. To put it another way -- we've got him for ~2 more weeks and there is no way we'll be sampling SMIRKS patterns in combination with property calculations in the next two weeks, so if we wait to be able to learn this, then we'll have no way of comparing what we get with what he might have expected. On the other hand, if he goes ahead and builds these weights/probabilities himself manually, then when we get there we'll be able to compare with what he expected. |
I agree with the above discussion: with @jchodera on the goal of getting BaFFLE to learn good moves, and with @davidlmobley that I could not wisely undertake this in my remaining time (which is a huge bummer for me because this is getting more and more exciting and I would actually want to learn these adaptive methods if I had time). As Chemistry Guy, my main concern is that I have not yet succeeded in communicating how my "seasoned" human chemical perspective translates into the science we are doing. Hence my crude attempts, like couching expertise in terms of "...choosing decorators needs to be weighted differently for different atomic elements." At least it beats communicating expertise by pronouncing new atom types... |
I think this is actually a perfect way to maximize the use of your remaining time before switching to Parm@frosst -> SMIRFF, @cbayly13 . |
I definitely think @cbayly13 should forge ahead with his thoughtfully selected weights! I just don't want to lose the opportunity to get him to think about the "meta learning rule" that would be able to learn this knowledge on the fly: Can he come up with ideas that might be able to roughly approximate this knowledge using a simpler rule? Sent from my iPhone |
I will try to think of formalizing higher-level rules as @jchodera suggests. |
@maxentile - this is also a related issue. |
@jchodera and @davidlmobley My instincts on a "meta learning rule" here is to think about what decorators do. They are supposed to be used to distinguish elements from one another. So a
It think this is sort of what I'm getting at, but removing any decorator that matches all of a certain element number, such as |
I just wanted to make a brief proposal for a chemical environment sampling API.
The
ChemicalEnvironment
API (along with subclassesAtomEnvironment
,BondEnvironment
,AngleEnvironment
, andTorsionEnvironment
) is looking good. With a few additions, we will be able to utilize the tools fromsmarty
to actually sample over different environments in the physical property sampler in https://github.com/open-forcefield-group/open-forcefield-tools.I propose we encapsulate the generation of new
ChemicalEnvironment
proposals in subclasses of aChemicalEnvironmentProposalEngine
, with a very simple API, like:where
logP
is the log probability of generating this particular childChemicalEnvironment
given the parentChemicalEnvironment
, which will be used in computing the Metropolis-Hastings acceptance probability.The text was updated successfully, but these errors were encountered: