revision_inconsistency

revision inconsistency issue

Background

If etcd crashes during processing defragmentation operation, when the etcd instance starts again, it might reapply some entries which have already been applied, eventually the member's data & revision might be inconsistent with other members. Please note that there is no impact if performing the defragmentation operation offline using etcdutl.

Note that usually there is no data loss, and clients can always get the latest correct data. The only issue is the problematic etcd member’s revision might be a little larger than the other members. But if etcd reapplies some conditional transactions, then it might also cause data inconsistency.

This is a regression issue introduced in pull/12855, and all the existing 3.5.x releases (including 3.5.0 ~ 3.5.5) are impacted. Note that previous critical issue issues/13766 was also caused by the same PR (12855).

etcd 3.4 doesn't have this issue. Note that there is no impact on etcd 3.5 either if performing the defragmentation operation offline using etcdutl.

It should be very hard to reproduce this issue in production environment, because:

Usually users rarely execute the defragmentation operation.
It is low possibility for etcd to crash during defragmentation operation.
Even when etcd crashes during defragmentation operation, it isn't guaranteed to reproduce this issue. If there is no traffic when performing defragmentation, then it will not run into this issue.

Current status

I just delivered a PR pull/14730 for main branch (3.6.0) and will backport it to release-3.5 later.

The fix will be included in etcd v3.5.6.

It's really interesting and funny the PR number 14730 is very similar to previous important issue 14370.

How to reproduce this issue

Run load test on an etcd cluster, and perform defragmentation operation on one member. Kill the member when the defragmentation operation is in progress. Afterwards, start the member again, then the member's revision might be inconsistent with other members.

Usually the problematic member's revision will be larger than other members, because etcd re-applies some duplicated entries.

You can also reproduce this issue by executing the E2E test case TestLinearizability.

Root cause

When etcd processes the defragmentation operation, it commits all pending data into boltDB, but not including the consistent index, so the persisted data may not match the consistent index. If etcd crashes for whatever reason during or immediately after the defragmentation operation, when it starts again it will replay the WAL entries starting from the latest snapshot, accordingly it may re-apply some entries which might have already been applied, eventually the revision isn't consistent with other members.

Specifically, when etcd processes defragmentation operation, it calls unsafeCommit, which doesn't call the OnPreCommitUnsafe, so the consistent index isn't persisted.

How is the issue resolved

It's simple, call the OnPreCommitUnsafe in method unsafeCommit instead of commit. Please refer to pull/14730.

How to workaround this issue

If you run into this issue, then you need to remove the problematic member and cleanup its local data. Afterwards, add the member into the cluster again, then it will sync data from the leader automatically.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
duplicated_items.png		duplicated_items.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revision_inconsistency

revision_inconsistency

README.md

revision inconsistency issue

Table of Contents

Background

Current status

How to reproduce this issue

Root cause

How is the issue resolved

How to workaround this issue

Files

revision_inconsistency

Directory actions

More options

Directory actions

More options

Latest commit

History

revision_inconsistency

Folders and files

parent directory

README.md

revision inconsistency issue

Table of Contents

Background

Current status

How to reproduce this issue

Root cause

How is the issue resolved

How to workaround this issue