398: Statediff missing parent blocks automatically. #399

telackey · 2023-07-13T21:45:14Z

Required DB updates: cerc-io/ipld-eth-db#136
Related tests: https://git.vdb.to/cerc-io/system-tests/pulls/8

This fixes #398 . After statediffing a block which we received from a ChainEvent, we now check if its parent exists in the index. If not, we statediff it as well, checking for its parent, etc. up to a maximum backfill limit.

This allows us to account for both skipped blocks because the chain was extended or missed blocks because the chain was reorganized.

To keep everything straight in the DB, we use a new 'canonical' boolean flag on the block in eth.header_cids. Whenever a new block is inserted in the DB, it is marked as canonical. If a new block is inserted at the same height later (eg, because of a reorg), any pre-existing blocks are marked as non-canonical at the same time.

Note: At the time, we only register with geth to be notified of ChainEvents, which are for canonical blocks. We do not listen for ChainSideEvents, which are for non-canonical blocks. So, as it stands, any block geth notifies us about is known to be canonical at the time of the notification.

This flag will allow for greatly simplified canonical checks in ipld-eth-server (see also cerc-io/ipld-eth-server#251), eg:

SELECT * from eth.state_cids s, eth.header_cids h WHERE
   s.state_leaf_key = $1
   and s.block_number <= $2
   and s.block_number = h.block_number
   and s.header_id = h.block_hash
   and h.canonical = true
   ORDER BY s.block_number DESC
   LIMIT 1;

roysc

We'll also need to handle updating canonical in the file indexer, but that's not really doable in the CSV writer. It will need to happen in the post-processing stage.

Actually I think it might be better to split the canonicity handling into a new PR and plan it out a bit more. The DB procedure that's currently used should still work with this update to the indexing.

statediff/indexer/database/sql/writer.go

telackey · 2023-07-17T19:49:22Z

We'll also need to handle updating canonical in the file indexer, but that's not really doable in the CSV writer. It will need to happen in the post-processing stage.

Unless I am misunderstanding, this would only ever be an issue if tracking head and outputting to CSV, but is that something that we would ever want to do? Updating 'canonical' status is just one of many problems in that case, since CSV also has no way for us to detect gaps of any sort, even gaps caused by something as minor as restarting the process, nor can we see find skipped parents when the chain is extended, or any other such inconsistency that is bound to arise, leaving a pretty unreliable record.

CSV makes a lot of sense for bulk and offline processing (eg, eth-statediff-service) but it is much less clear how it would be useful as a real-time stream.

telackey · 2023-07-17T20:04:10Z

Actually I think it might be better to split the canonicity handling into a new PR and plan it out a bit more.

The core issue of #398 is to ensure that the DB contents are valid and consistent when the chain is reorganized, and reflect the current state of the chain. That implies a determination about canonicity as well as completeness, so I don't think the concerns can be separated.

roysc · 2023-07-18T04:36:16Z

CSV makes a lot of sense for bulk and offline processing (eg, eth-statediff-service) but it is much less clear how it would be useful as a real-time stream.

It was being used in production for performance reasons (Postgres proved to become overloaded while trying to track head), and last I heard it still was, but @i-norden or @srwadleigh could confirm.

Updating 'canonical' status is just one of many problems in that case, since CSV also has no way for us to detect gaps of any sort, even gaps caused by something as minor as restarting the process, nor can we see find skipped parents when the chain is extended, or any other such inconsistency that is bound to arise, leaving a pretty unreliable record.

This is true. I should have raised a concern earlier. We have no fallback mechanism for backfilling, so this needs to work. For detecting in-progress jobs I suppose we will at worst duplicate a few blocks.

i-norden · 2023-07-18T13:02:26Z

It was being used in production for performance reasons (Postgres proved to become overloaded while trying to track head), and last I heard it still was, but @i-norden or @srwadleigh could confirm.

The COPYFROM mode is used while tracking head, not the CSV persisting mode. Currently we only use the CSV mode in eth-statediff-service for historical data.

roysc · 2023-07-18T14:12:46Z

I stand corrected, as discussed the only thing using file mode in production is eth-statediff-service so there's no problem here.

i-norden

Forgot to approve- LGTM 👍

Port of cerc-io/go-ethereum#399 and cerc-io/go-ethereum#397. Co-authored-by: Thomas E Lackey <[email protected]> Reviewed-on: https://git.vdb.to/cerc-io/plugeth-statediff/pulls/8

WIP: Statediff missing parent blocks automatically.

46665cc

telackey self-assigned this Jul 13, 2023

telackey added 6 commits July 13, 2023 18:41

gofmt

304270d

Use updated ipld-eth-db

f5881c3

Passing tests

3114513

Use local ipld-eth-db.

ffbb9e0

Build ipld-eth-db

317d79c

Missed the tag.

a0565e8

telackey requested review from i-norden and dboreham July 14, 2023 20:29

telackey marked this pull request as ready for review July 14, 2023 20:37

roysc reviewed Jul 17, 2023

View reviewed changes

statediff/indexer/database/sql/writer.go Show resolved Hide resolved

Use merged version of ipld-eth-db.

78168b7

i-norden approved these changes Jul 18, 2023

View reviewed changes

telackey merged commit 1b72d2f into v1.11.6-statediff-v5 Jul 18, 2023
5 checks passed

telackey deleted the telackey/398 branch July 18, 2023 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

398: Statediff missing parent blocks automatically. #399

398: Statediff missing parent blocks automatically. #399

telackey commented Jul 13, 2023 •

edited

Loading

roysc left a comment

telackey commented Jul 17, 2023 •

edited

Loading

telackey commented Jul 17, 2023 •

edited

Loading

roysc commented Jul 18, 2023

i-norden commented Jul 18, 2023

roysc commented Jul 18, 2023

i-norden left a comment

398: Statediff missing parent blocks automatically. #399

398: Statediff missing parent blocks automatically. #399

Conversation

telackey commented Jul 13, 2023 • edited Loading

roysc left a comment

Choose a reason for hiding this comment

telackey commented Jul 17, 2023 • edited Loading

telackey commented Jul 17, 2023 • edited Loading

roysc commented Jul 18, 2023

i-norden commented Jul 18, 2023

roysc commented Jul 18, 2023

i-norden left a comment

Choose a reason for hiding this comment

telackey commented Jul 13, 2023 •

edited

Loading

telackey commented Jul 17, 2023 •

edited

Loading

telackey commented Jul 17, 2023 •

edited

Loading