-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential speed up for smirky by removing chemical environment tracking #251
Comments
Is there anything quick you can do to test how much this would affect speed? Naively it seems like it would not be worth it at this point for SMIRKY for the paper, but will be helpful down the line. But I guess that depends on how much additional simulation you still need to do and how much the speed difference is. |
The fastest test I can foresee doing is making two simple scripts that store lists for many steps where the list is either SMIRKS strings or chemical environments. Then see if there is a substantial time difference. I'm not sure this is worth the effort now, but it seems like we could potentially be doing A LOT more smirky simulations unless we find why smarty isn't getting 100% |
@bannanc - I think it's worth trying to test the speed difference if you can do it without a huge investment of time. |
I forgot to add my notes here. Below I have data for a list where I sometimes added a new object. The first column is with storing smirks strings the second is for storing chemical environments the last is the difference in minutes (that is string column - environment column).
In this example I only let the list of smirks/environments get longer. The time difference doesn't seem so terrible on the time scale we use for smirky 10,000 with adding and removing. I looked a little at time/iteration and with strings it is pretty consistently 5-10E-06 minutes. With chemical environments there is less consistency, but time/iteration seems to get longer with longer iterations (which is probably to say that the time/iteration is more sensitive to the length of the list when it is storing chemical environments. I don't think it is worth the effort to re-code smirky now, but I think it is a reasonable thing to remember as we move forward and we may need to store information about the chemical perception for longer while sampling both the SMIRKS patterns and the parameters. |
I have the jupyter notebook I used in my person documents on google drive, but I can put it somewhere public or in the utilities here if we want. |
Thanks for checking this, @bannanc . I think for the record you want to share your notebook so that when someone revisits this, they can pick up from where you left off from this info. |
@davidlmobley should I just put that notebook in utilities here? |
Sounds good. |
I will start by saying I'm not sure if this is worth it at this point in the project, but this is definitely worth thinking about in the long run.
Right now, smirky keeps a list of chemical environments for the parameter it is sampling. It occurred to me yesterday that this is probably fairly memory intensive and unnecessary since it is easy to go from SMIRKS to
ChemicalEnvironments
and back to SMIRKS. Would it speed up the computational time to remove the chemical environments storage? We could instead only store the SMIRKS strings (and typenames) for the parameter list. Then thecreate_new_environment
function could take a smirks string and typename, generate an environment, make changes and return a smirks string and new typename. It seems like it might speed things up significantly since right now we are keeping a chemicalEnvironment for every parameter, which is in the 100's for Torsions and each of the environments is a complicated NetworkX graph with a lot of information that could just be tracked as a SMIRKS string.@davidlmobley
The text was updated successfully, but these errors were encountered: