-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CV Optimistic Export and Import #878
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe updates to the Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Do you use memiavl in evmos, that should only take dozens of minutes to restore. |
Will try it. Thank you. What do you think of the concept in this PR? Is it worth reducing CPU usage and inserting keys from a trusted node (for example if a validator runs 2 nodes and wants to copy keys between goleveldb databases) |
what do you mean by "final key/value pairs", do you mean only handle leaf key values skipping branch nodes? |
Usually for the restoring node to write that to disk, the hash of all children must be computed before parents can be written to disk. However it is unnecessary to rehash all data and rebuild that IAVL tree. The receiver can write down the result of all the work. Instead of
Assuming the source is trusted, It is faster to send the resulting key and value Sender:
Receiver:
Then the node will start as if all key/values were in the database for the current version/blockchain height. The goal is to make a restore takes no longer than a CSV file copy over ftp/http and insert all key/values into leveldb. |
I see, so the data written into |
Yes same result to goleveldb (or other disk based iavl)
Yes, memiavl as a solution when RAM is sufficient. And this solution when disk write speeds are acceptable (with no other change in expected RAM) and utilize more disk based strategy. |
memiavl doesn't need much memory actually in our practice. |
Will get back to you after some testing. Having a hard time getting the right rpc/peers for evmos statesync. |
Tried to use memiavl to restore evmos. Each time, the data directory is completely deleted and init is run with the chain-id evmos_9001-2. After getting all statesync parts, the node crashes and upon start, this message appears
The faster iavl build time is confirmed (3+ hours. Same even with local snapshot files, no network dependency + restore on a fast machine with high ram + raid0 nvme), but the node itself doesn't seem to get the chain-id, so full operation could not be confirmed on evmos. Noticed that there is a fair amount of code changes and dependency on the cronos chain repository (github.com/crypto-org-chain/cronos/memiavl) necessary to get enable memiavl for a new chain. That might be more than a chain developer is able to do and maintain. Another thought is to bring memiavl mainline into cosmos/iavl or cosmos/memiavl. yihuang, you are most familiar so you can probably give a good comment on options or best way to implement. This optimisitic export and import allows minimal code changes and faster restores. |
what is your testing procedure, how large is the snapshot file, it's dozens of minutes for us, do you use the local state sync commands ( |
Did not realize the chain-id was pulled from client.toml. Fixed that and statesync with memiavl works. Will let evmos run for a few more hours to generate snapshots for local restore testing. Noticed a failed statesync snapshot. Suspect that memiavl pruned the oldest height before the statesync snapshot could finish. Future statesync snapshots were not generated. Increased memiavl snapshots held to allow more time for statesync export. Do you think it would help to document the relation between these app.toml settings so that a configuration is practical?
Statesync for evmos took about 1h40m to download and apply 2924 chunks over p2p. Snapshot was old and took 4 hours to BlockSync to the latest block. Locally disk reads are fast. Reading all 2924 chunks takes only 1m30s. Will retry again when a local snapshot is successfully created for a more recent height. |
do you have the error message?
There could be different combinations, but for most of the use cases, I think |
I think most of the code changes are for the versiondb integration? memiavl integration itself should be fairly simple.
it's a good question on long-term maintenance, we had a few discussions on storage workgroup calls before, but the decision is to keep it as a third-party thing. personally I'm totally fine and supportive if we maintain it under cosmos org. |
Sharing a problem encountered for snapshots using memiavl + evmosd on mainnet ERR failed to create state snapshot err="failed to generate snapshot chunk 0: snapshot is not created yet: height: 19266752" height=192667 Looking at snapshots directory, initial restore height of 19072000 and the next snapshot is available. The rest appear to fail
data/memiavl.db has data as expected
evmosd snapshots list only shows the 2 successful data/snapshot heights
This is the current app.toml settings
Are you able to see the same issue on your side? Any alternate settings that you recommend checking? |
The fifth parameter in memiavlstore.SetupMemIAVL is
I think the second option should be good enough for the majority of use cases, that is why we set the |
Adjusted snapshot settings and retrying
Cosmos code has many unnamed variables. This can make adding a configuration variable difficult (find if a parameter exists, determine if parameter can be passed, add parameter in calling, edit function call in called library) Is there any "guidance" against code structure such as:
Would you prefer to start that yourself? Or could we prepare a repository and move that to cosmos org once it makes sense? Do you think it is generally "safe" to add memiavl to any chain of any sdk version with this pattern of code?
|
yeah, agree on this specific option, but it's usually static, there are several other options that are configurable in
I think so, we've been using on production for a while, the multistore wrapper might need some adjustments when updating to different sdk versions, we have compatible versions for sdk 0.46 and 0.47, 0.50 is on draft PR. |
Are the compatible version branches on https://github.com/crypto-org-chain/cronos/tree/main/memiavl ? |
Yeah, currently, you need to follow cronos release branch, I'm preparing a standalone repo at https://github.com/crypto-org-chain/memiavl, will make public soon. |
Please share when available. Would like to run on a v0.50 testnet |
it's available here, crypto-org-chain/cronos#1331, it's able to build and run basic devnet, but still many tests to fix. |
Implemented memiavl for gaiad in PR cosmos/gaia#2973 memiavl works well for gaiad in theta testnet statesync node over p2p then put into validator mode. snapshots creation also works. No issues so far. If you have any comments/suggestions regarding method of integration for chains, your feedback is always welcome. |
@yihuang Do you know if the statesync snapshot/restore works properly for memiavl + cosmwasm module chains? |
This comment was marked as duplicate.
This comment was marked as duplicate.
I think so, it only replaces root ms, the other part is still managed by sdk, but we didn't tested though. |
Was not able to locate memiavl in: Found this with a cosmos-sdk v0.46 reference. Is this the right branch for v0.46? Also is it difficult to modify for cosmos-sdk v0.45? |
It's a bit hard to manage the release, it'll be easier when using a standalone repo, I suggest following the Cronos's https://github.com/crypto-org-chain/cronos/blob/release/v1.0.x/go.mod#L222
I think not difficult, since the store interfaces don't change much from 0.45 to 0.46. |
Attempting to create a v0.45 branch for memiavl based on Jackal Chain Here is the current progress code with current errors. Rootmulti is not connecting properly yet. https://github.com/chillyvee/cronos/tree/cv045wip https://github.com/chillyvee/canine-chain/tree/cv3.1.3memiavl data/memiavl.db directory appears With app.toml [state-sync] configured the error is:
With app.toml [state-sync] removed the error is:
node canined starts normally after app.toml [memiavl] is removed could it be a problem with cosmwasm trying to pin codes early? (InitializePinnedCodes)
Removing wasm InitializePinnedCodes and disabling state-sync enables app start. Suspect this is okay since memiavl directly triggers state-sync snapshot Create() after memiavl snapshot is finished.
and editing canined/app/app.go
data/memiavl.db/* fills up with data quickly after start But fails near end of statesync restore where cosmwasm attempts to restore wasm files
Looks more like a cosmwasm + memiavl problem and not directly a cosmos-sdk v0.45 + memiavl problem. Disabling pinned codes after wasm restore finalize
But restore fails with appHash verification problem
Have you tried memiavl with other cosmwasm enabled chains? If you have a moment, can you suggest how to handle cosmwasm wasm file restores? If you don't see reason, please feedback so we can reach out to confio. |
wasm InitializedPinnedCodes calls cosmos-sdk which calls rootmulti GetKVStore
In this case cronos/store/rootmulti.(*Store).GetKVStore(0x0?, {0x0?, 0x0?}) immediately panics Adding cosmwasm seems to require a way to directly get a named store for wasmd keeper to read and write to. Should cronos/store/rootmulti.GetKVStore() proxy a call to CacheMultiStore to avoid Attempted to give direct access to stores in this commit: This avoids wasm crashing on restore/startup wasm files appear But maybe the function call is wrong for GetKVStore so the apphash doesn't match after wasm restore. Statesync restore without memiavl (enabled=false) succeeds:
|
You can try remove this check, it should be fine especially for read-only cases |
For statesync cosmwasm restore, write access is required. Can you recommend a different object to return? |
Attempted to use the recommended branch/commit for v0.46 target using gitopia chain without cosmwasm Without memiavl the chain statesync restore is successful Followed the replace to utilize memiavl within the same branch
This branch downgrades the commit above to match cosmos-sdk v0.46.13 This branch of gitopia uses the branch above On statesync restore, there is a apphash mismatch error |
with sdk 0.46/0.45, the option |
Just tagged a new version, please update to |
Success on statesync for jackal, memiavlSdk46Compact = true
Had some trouble during go mod tidy due to some cosmos-sdk dependencies. Will feedback later after resolved. Tagging with matching cosmos-sdk versions can be helpful for using the right version. |
great!
you'll need a little bit change in the versiondb setup code, please check the patch here: crypto-org-chain/cronos#1339 |
Thank you. Will try again when there is time and share the result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think new Export is needed, instead of it, you can use immutable_tree/TraverseStateChanges
and mutable_tree/SaveChangeSet
.
Optimistic Export and Import aims to reduce p2p statesync recovery times.
Validators attempting to recover nodes for cosmos-sdk chains with significant state size may be slashed for downtime prior to recovery.
Although all data is verified during tree build, rebuilding the entire IAVL tree using minimal data incurs high CPU overhead.
Evmos for example takes over 16 hours to rebuild IAVL on a "fast" CPU/DISK machine and can appear to "never finish". The node must then fast-sync 16 hours of missed blocks which can take an additional 4-5 hours. Slashing for downtime is within 30 hours meaning a validator must succeed on the first try or be slashed.
Since statesync appears to "hang", validators download a 244GB file from Polkachu to restore data directories. This is possible because Polkachu is a trusted party. Validators do not appear to validate the correctness of the IAVL tree so this method is less trusted but necessary. The problem is when Polkachu only makes 1 snapshot every 24 hours. A validator may need to fast-sync 23 hours of lost block, wait for the next Polkachu snapshot, or try another random less known snapshot.
Optimistic statesync enables faster database import at the restoring node by directly inserting the final key/value pairs into the database. As long as validators statesync from a trusted node, this can be a good alternative to downloading data directory zip files.
Cosmos-sdk should broadcast a different snapshot format so that nodes can select optimistic statesync node sources.
There are some code locations below where there may be more efficient methods to obtain the raw key/value storage than recalculation.
Comments welcome.