Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add canonical column to eth.header_cids #136

Merged
merged 6 commits into from
Jul 18, 2023
Merged

Add canonical column to eth.header_cids #136

merged 6 commits into from
Jul 18, 2023

Conversation

telackey
Copy link
Contributor

@telackey telackey commented Jul 13, 2023

Add boolean 'canonical' column to eth.header_cids.

At any given moment, there should only ever be one canonical block at a particular height.

@telackey telackey self-assigned this Jul 13, 2023
Copy link
Collaborator

@roysc roysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoyingly, we'll also need to update the TYPE header_result which replicates header_cids for function interfaces in 00016_create_stored_procedures.sql.

Also, from a design standpoint, I'm not sure about this, since canonicity isn't an intrinsic property of the block, and it means updating existing blocks on reorgs.

This is simple and performant (though the canonical check should never cause much bottleneck) but somewhat redundant when the vast majority of blocks are canonical anyway, and past a certain point we can safely ignore (or drop) non-canon blocks.

@dboreham
Copy link

it means updating existing blocks on reorgs

Isn't that the whole point? (or conversely : how would we store state about what's canonical without updating that state on reorg?)

@telackey
Copy link
Contributor Author

Additional context:

cerc-io/ipld-eth-server#251

@telackey
Copy link
Contributor Author

I updated TYPE header_result and I also rewrote the canonical_header_hash() function simply to use this column. I also switched its language to sql since a sql function returning a query can be inlined by the query planner. I haven't yet confirmed whether it is inlined in our actual queries though.

I would like to drop the function entirely (see cerc-io/ipld-eth-server#251) but this is a step in that direction.

@telackey telackey requested a review from roysc July 17, 2023 16:46
@roysc
Copy link
Collaborator

roysc commented Jul 17, 2023

Isn't that the whole point? (or conversely : how would we store state about what's canonical without updating that state on reorg?)

I mean that it doesn't have to be stored as part of the block. We could have a transient meta table keeping track of the non-canonical blocks from recent epochs. That would be append-only, so it would jibe with the copying-from-CSV pattern used in production.

@dboreham
Copy link

Isn't that the whole point? (or conversely : how would we store state about what's canonical without updating that state on reorg?)

I mean that it doesn't have to be stored as part of the block. We could have a transient meta table keeping track of the non-canonical blocks from recent epochs. That would be append-only, so it would jibe with the copying-from-CSV pattern used in production.

Does this assume that we delete the non-canonical blocks when they fall below the finality threshold?

@roysc
Copy link
Collaborator

roysc commented Jul 17, 2023

Does this assume that we delete the non-canonical blocks when they fall below the finality threshold?

Right, they could be pruned by some housekeeping service (maybe along with the blocks themselves)

@dboreham
Copy link

Does this assume that we delete the non-canonical blocks when they fall below the finality threshold?

Right, they could be pruned by some housekeeping service (maybe along with the blocks themselves)

I think we need to retain non-canonical blocks from the pre-merge period because they could be useful for some analytics.
It feels safer to mark blocks as non-canonical vs deleting.

@telackey
Copy link
Contributor Author

telackey commented Jul 17, 2023

There are various timeliness and correctness reasons that make this preferrable. When tracking head and handling a reorg, the related change in the statediffing code inserts the new, canonical block and marks the old blocks as non-canonical in the same DB transaction. That provides any instance of ipld-eth-db reading that DB with a consistent view of the chain at any moment. A secondary housekeeping service could not provide that consistency without increased difficulty and complexity.

@roysc
Copy link
Collaborator

roysc commented Jul 18, 2023

A secondary housekeeping service could not provide that consistency without increased difficulty and complexity.

To clarify, I'm referring to a service that garbage-collects the non-canonical markers from finalized epochs. There should be no ill effects from leaving them there, we would just want to keep that table tidy.

The transactional semantics would be essentially the same - when the new canon block is inserted, the hashes of any displaced blocks are added to the meta table in the same tx. Any hashes present in the table are known as non-canonical, and we don't have to touch existing data.

Copy link
Collaborator

@i-norden i-norden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of the (not super strong) opinion that marking the blocks canonical in-place is fine, particularly since we do want to retain non-canonical blocks rather than garbage collect them below the finality threshold. Adding another table would require performing another join or query on said table in order to determine canonicity of the eth.header_cids content. If we were using this meta data table to inform garbage collection of eth.header_cids, such that we could then blindly treat the eth.header_cids as a pure, canonical view I would be more in favor of this approach.

The other issue is that we need to represent "canonicity" even in the non-final region of the chain, the latest two epochs. All we are really representing in this region is a consistent view of what geth considers the current working tip of the chain.

My only other comment was in regards to removing the canonical_header_hash function, but it looks like you already adjusted this to simply use this new field and are planning on removing it entirely after removing its usage in ipld-eth-server 👍

@telackey
Copy link
Contributor Author

My only other comment was in regards to removing the canonical_header_hash function, but it looks like you already adjusted this to simply use this new field and are planning on removing it entirely after removing its usage in ipld-eth-server 👍

Yep, exactly.

@telackey telackey merged commit 1b922db into v5 Jul 18, 2023
2 checks passed
@telackey telackey deleted the telackey/398 branch July 18, 2023 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants