-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restoring state-sync snapshot takes a long time #698
Comments
Our state-sync scripts run every Sunday, so this issue is rather timely. Nois testnet state-sync took a bit over an hour today. In contrast, Stargaze mainnet state-sync took about 10 minutes. I am not knowledgable enough to articulate the reasons behind it. As Simon mentioned, Evmos and Sei state-sync is also slow and at times unreliable. |
Would it make sense to make the inventory of the number of application.db KV entries for all these networks and some other very busy networks to say if the rule stands or no? |
Adding profiles: Binary can be built using: One thing to mention is that stargaze still has fast node disabled by default (will be changed to enabled in our next upgrade). Seems that fast node is contributing to more cpu cycles and garbage collection |
there are a couple issue in iavl to assist with this. Here is one pr that aims to increase the speed. #664. This issue could be to how large your current state is? ill transfer this issue to iavl so it can be focused on where the actual problem may lie. Lots of the compacting is being cleaned up as we speak. |
with iavlv1 this should be significantly faster. Would you want to test with this version? |
I recently started to play with using state-sync to create a fresh node in the Nois testnet. There we observed that applying the last chunk of the snapshot takes approximately 1 hour and requires something between 13 and 14 GB of memory. As many community members consider this unexpectedly long, I though it's worth debugging what's going on here.
Network
Snapshot
System and utalization
Logs
The following logs show how the snapshot is downloaded easily, the fist 17 chunks are applied within 1 minute each and the last chunk takes 62 minutes to apply.
Data/wasm directory
Size:
Element count:
Wasm files are restored using the extension from wasmd. But those are only 11 small files stored to disk:
Profiling
Some initial profiling data provided by @jhernandezb shows a lot of compating activity. Maybe he can elaborate more on the profiling of things.
Other notes
My gut feeling here is that at a certain number of database elements, things become slow. But I am far from a Go profiling expert or database expert and don't know how to debug this further. For us it is not a big deal right now and we can try to use less KV elements for the app. But maybe this is a good chance to prevent other mainnets running into problems.
The text was updated successfully, but these errors were encountered: