Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential speed up for smirky by removing chemical environment tracking #251

Open
bannanc opened this issue May 9, 2017 · 8 comments
Open

Comments

@bannanc
Copy link
Collaborator

bannanc commented May 9, 2017

I will start by saying I'm not sure if this is worth it at this point in the project, but this is definitely worth thinking about in the long run.

Right now, smirky keeps a list of chemical environments for the parameter it is sampling. It occurred to me yesterday that this is probably fairly memory intensive and unnecessary since it is easy to go from SMIRKS to ChemicalEnvironments and back to SMIRKS. Would it speed up the computational time to remove the chemical environments storage? We could instead only store the SMIRKS strings (and typenames) for the parameter list. Then the create_new_environment function could take a smirks string and typename, generate an environment, make changes and return a smirks string and new typename. It seems like it might speed things up significantly since right now we are keeping a chemicalEnvironment for every parameter, which is in the 100's for Torsions and each of the environments is a complicated NetworkX graph with a lot of information that could just be tracked as a SMIRKS string.

@davidlmobley

@davidlmobley
Copy link
Contributor

Is there anything quick you can do to test how much this would affect speed? Naively it seems like it would not be worth it at this point for SMIRKY for the paper, but will be helpful down the line. But I guess that depends on how much additional simulation you still need to do and how much the speed difference is.

@bannanc
Copy link
Collaborator Author

bannanc commented May 9, 2017

The fastest test I can foresee doing is making two simple scripts that store lists for many steps where the list is either SMIRKS strings or chemical environments. Then see if there is a substantial time difference. I'm not sure this is worth the effort now, but it seems like we could potentially be doing A LOT more smirky simulations unless we find why smarty isn't getting 100%

@davidlmobley
Copy link
Contributor

@bannanc - I think it's worth trying to test the speed difference if you can do it without a huge investment of time.

@bannanc
Copy link
Collaborator Author

bannanc commented May 24, 2017

I forgot to add my notes here. Below I have data for a list where I sometimes added a new object. The first column is with storing smirks strings the second is for storing chemical environments the last is the difference in minutes (that is string column - environment column).
generic I started the simulation with only a generic torsion initially, short starts with 10 torsions and long starts with 82 torsions for a recent smirky run.

------------------------------  2 Iterations  ------------------------------
               short	1.97e-05	6.54e-05	4.57e-05
                long	1.93e-05	4.58e-04	4.39e-04
             generic	1.34e-05	1.82e-05	4.84e-06


------------------------------  10 Iterations  ------------------------------
               short	7.12e-05	1.16e-04	4.53e-05
                long	8.27e-05	5.40e-04	4.58e-04
             generic	6.60e-05	6.47e-05	-1.23e-06


------------------------------  100 Iterations  ------------------------------
               short	6.19e-04	7.01e-04	8.20e-05
                long	7.44e-04	1.36e-03	6.12e-04
             generic	5.49e-04	6.28e-04	7.92e-05


------------------------------  1000 Iterations  ------------------------------
               short	7.59e-03	1.73e-02	9.76e-03
                long	8.42e-03	2.10e-02	1.26e-02
             generic	6.89e-03	1.61e-02	9.20e-03


------------------------------  10000 Iterations  ------------------------------
               short	8.89e-02	1.09e+00	9.98e-01
                long	9.37e-02	1.17e+00	1.08e+00
             generic	7.18e-02	1.12e+00	1.05e+00


------------------------------  30000 Iterations  ------------------------------
               short	3.61e-01	1.04e+01	1.00e+01
                long	4.51e-01	1.08e+01	1.04e+01
             generic	3.13e-01	1.01e+01	9.76e+00

In this example I only let the list of smirks/environments get longer. The time difference doesn't seem so terrible on the time scale we use for smirky 10,000 with adding and removing. I looked a little at time/iteration and with strings it is pretty consistently 5-10E-06 minutes. With chemical environments there is less consistency, but time/iteration seems to get longer with longer iterations (which is probably to say that the time/iteration is more sensitive to the length of the list when it is storing chemical environments.

I don't think it is worth the effort to re-code smirky now, but I think it is a reasonable thing to remember as we move forward and we may need to store information about the chemical perception for longer while sampling both the SMIRKS patterns and the parameters.

@bannanc
Copy link
Collaborator Author

bannanc commented May 24, 2017

I have the jupyter notebook I used in my person documents on google drive, but I can put it somewhere public or in the utilities here if we want.

@davidlmobley
Copy link
Contributor

Thanks for checking this, @bannanc . I think for the record you want to share your notebook so that when someone revisits this, they can pick up from where you left off from this info.

@bannanc
Copy link
Collaborator Author

bannanc commented May 25, 2017

@davidlmobley should I just put that notebook in utilities here?

@davidlmobley
Copy link
Contributor

Sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants