Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow resolve reference for large output DB #627

Open
gpetretto opened this issue Jun 12, 2024 · 2 comments
Open

Slow resolve reference for large output DB #627

gpetretto opened this issue Jun 12, 2024 · 2 comments

Comments

@gpetretto
Copy link
Contributor

I have realized that, when the size of the output DB increases, resolving the references becomes a bottleneck for the execution of the jobs.
I have a DB with ~8000 jobs outputs in atomate2 and resolving the references for the store_inputs job for an elastic flow was taking hours. Introducing a mongodb index on the output collection with {uuid: 1, index: -1} led to a huge speedup.
Admittedly, I am not working with a very powerful DB, but I expect that this kind of problem would affect even more powerful machines as the DB size grows bigger.

I am opening this issue to check if I was the only one experiencing this kind of problem and to know if there is a set of suggested indexes to be added to the output DB.
Maybe there is margin for some optimization in the code? Or at least it could be good to perform some analysis of the most common queris and add a list of suggested indexes to the documentation.

@utf
Copy link
Member

utf commented Jun 12, 2024

This is a very good point. We absolutely should be setting indexes on uuid, index, and also blob_uuid for the additional stores. This could also be related to #408.

Just one thing to check: how much RAM does mongodb have available and what is the size of the DB?

@gpetretto
Copy link
Contributor Author

It is an Atlas M10 DB, so just 2GB of RAM (https://www.mongodb.com/pricing). The size of the DB is around 2.5GB.
As I have said, the DB is not very powerful, but since adding that single index allowed to go from something completely unusable back to good functionality I think it would be worth investigating it more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants