-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
398: Statediff missing parent blocks automatically. #399
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also need to handle updating canonical
in the file indexer, but that's not really doable in the CSV writer. It will need to happen in the post-processing stage.
Actually I think it might be better to split the canonicity handling into a new PR and plan it out a bit more. The DB procedure that's currently used should still work with this update to the indexing.
Unless I am misunderstanding, this would only ever be an issue if tracking head and outputting to CSV, but is that something that we would ever want to do? Updating 'canonical' status is just one of many problems in that case, since CSV also has no way for us to detect gaps of any sort, even gaps caused by something as minor as restarting the process, nor can we see find skipped parents when the chain is extended, or any other such inconsistency that is bound to arise, leaving a pretty unreliable record. CSV makes a lot of sense for bulk and offline processing (eg, eth-statediff-service) but it is much less clear how it would be useful as a real-time stream. |
The core issue of #398 is to ensure that the DB contents are valid and consistent when the chain is reorganized, and reflect the current state of the chain. That implies a determination about canonicity as well as completeness, so I don't think the concerns can be separated. |
It was being used in production for performance reasons (Postgres proved to become overloaded while trying to track head), and last I heard it still was, but @i-norden or @srwadleigh could confirm.
This is true. I should have raised a concern earlier. We have no fallback mechanism for backfilling, so this needs to work. For detecting in-progress jobs I suppose we will at worst duplicate a few blocks. |
The COPYFROM mode is used while tracking head, not the CSV persisting mode. Currently we only use the CSV mode in eth-statediff-service for historical data. |
I stand corrected, as discussed the only thing using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to approve- LGTM 👍
Port of cerc-io/go-ethereum#399 and cerc-io/go-ethereum#397. Co-authored-by: Thomas E Lackey <[email protected]> Reviewed-on: https://git.vdb.to/cerc-io/plugeth-statediff/pulls/8
Required DB updates: cerc-io/ipld-eth-db#136
Related tests: https://git.vdb.to/cerc-io/system-tests/pulls/8
This fixes #398 . After statediffing a block which we received from a ChainEvent, we now check if its parent exists in the index. If not, we statediff it as well, checking for its parent, etc. up to a maximum backfill limit.
This allows us to account for both skipped blocks because the chain was extended or missed blocks because the chain was reorganized.
To keep everything straight in the DB, we use a new 'canonical' boolean flag on the block in eth.header_cids. Whenever a new block is inserted in the DB, it is marked as canonical. If a new block is inserted at the same height later (eg, because of a reorg), any pre-existing blocks are marked as non-canonical at the same time.
This flag will allow for greatly simplified canonical checks in ipld-eth-server (see also cerc-io/ipld-eth-server#251), eg: