Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(op-batcher): altda->ethda failover #13

Open
wants to merge 5 commits into
base: feat--multiframe-altda-channel
Choose a base branch
from

Conversation

samlaf
Copy link
Collaborator

@samlaf samlaf commented Nov 7, 2024

DON'T MERGE!

Plan is to upstream this to op's repo. Created here to get review from the team first.

This PR builds on top of the feat--multiframe-altda-channel changes (already upstreamed to op's repo, waiting for review there)

It contains 2 commits:

  1. failover test: 61d7578
  2. failover logic: 5712618

Right now failover is done to calldata txs because that was trivial whereas failing over to blobs or their auto mode that switches between blobs and calldata would need a nontrivial refactor and some thinking. Not sure its worth putting effort into this atm given that the whole point of failover is that it should happen very rarely and also not last very long, but let me know if you guys think otherwise.

@samlaf samlaf marked this pull request as draft November 7, 2024 21:19
@samlaf samlaf force-pushed the samlaf/feat--op-batcher-altda-failover-to-ethda branch 2 times, most recently from adfa7ce to 61d7578 Compare November 11, 2024 06:16
@samlaf samlaf changed the base branch from develop to feat--multiframe-altda-channel November 11, 2024 07:07
@samlaf samlaf marked this pull request as ready for review November 11, 2024 07:07
@samlaf samlaf requested review from bxue-l2 and epociask November 11, 2024 07:08
@bxue-l2
Copy link
Collaborator

bxue-l2 commented Nov 12, 2024

At the high level, it usually takes us about 1 hour to fix the problem

@@ -130,6 +134,10 @@ func (s *FakeDAServer) HandleGet(w http.ResponseWriter, r *http.Request) {

func (s *FakeDAServer) HandlePut(w http.ResponseWriter, r *http.Request) {
time.Sleep(s.putRequestLatency)
if s.failoverCount > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point to decrement failoverCount, then actually handle the put

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just to simplify testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we only want to cause a temporary failure. So this way we can say "pretend you are down for the next failoverCount calls", and after that the system becomes normal again.

)

// TestBatcher_FailoverToEthDA_FallbackToAltDA tests that the batcher will failover to ethDA
// if the da-server returns 503, and then fallback to altDA once altDA is available again
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we always try altda first for every dispersal, then retry for non-altDA after sufficient retry. Wording seems odd.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the wording: 6928901


countEthDACommitment := uint64(0)

// Most likely, sequence of blocks will be: altDA, ethDA, ethDA, altDA, altDA, altDA.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why most likely? And why two ethDA in a row, because we set failoverCount=2, but why altDA in the beginning?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this new comment: 71d8697

@samlaf samlaf force-pushed the samlaf/feat--op-batcher-altda-failover-to-ethda branch from 71d8697 to a881df5 Compare November 26, 2024 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants