Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecoveryHelper to speed up recovery after restart #471

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

justpresident
Copy link

This patch introduces a change in WAL and recovery process.
The idea is to avoid expensive scan of all table files that could take longer than one hour for large tables and maintain a recovery record in the WAL instead. This record is written in the beginning of WAL log and it is not surrounded with 'begin and 'end' markers.
There are following situations possible:
a. There is no recovery record, there are normal records in WAL
b. There is no recovery record, no other records in WAL
c. There is a recovery record, there are normal records in WAL
d. There is a recovery record, no other records in WAL
Since recovery record is written in the beginning, then it contains the latest offset only in a case when there is nothing else in the log, or other records are invalid(temp files are deleted). So in cases a,c and d recovery process will pick the committed file from WAL with highest offset - either from recovery record or from normal records.
In case (b) when WAL log is empty or doesn't exist - latest offset will be discovered through full recursive folder scan.

This patch introduces a change in WAL and recovery process.
The idea is to avoid expensive scan of all table files and maintain a recovery record in the WAL instead. This record is written in the beginning of WAL log and it is not surrounded with 'begin and 'end' markers.
There are following situations possible:
a. There is no recovery record, there are normal records in WAL
b. There is no recovery record, no other records in WAL
c. There is a recovery record, there are normal records in WAL
d. There is a recovery record, no other records in WAL
Since recovery record is writted in the beginning, then it contains the latest offset only in a case when there is nothing else in the log, or other records are invalid(temp files are deleted). So in cases a,c and d recovery  process will pick the committed file from WAL with highest offset - either from recovery record or from normal records.
In case b when WAL log is empty or doesn't exist - latest offset will be discovered through full recoursive folder scan.
@ghost
Copy link

ghost commented Nov 18, 2019

@confluentinc It looks like @justpresident just signed our Contributor License Agreement. 👍

Always at your service,

clabot

@alexandrfox
Copy link

@kkonstantine would you please check this out? we've been running this in production for a while now

@kkonstantine kkonstantine requested a review from a team December 10, 2019 17:25
Copy link
Contributor

@ncliang ncliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justpresident Thanks for making this PR. I'm curious what sort of speedup you are seeing in your environment?

return instance;
}

private final Map<TopicPartition, List<String>> files = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this singleton instance will be accessed potentially by multiple threads when we have multiple tasks running on single worker. The map probably needs to be a ConcurrentHashMap or access protected by locks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, correct. The way we launch it is quite special. We launch all instances in standalone mode in kubernetes, one worker per pod. So I have overlooked possibility of having multiple workers on the same machine. Fixed

@@ -128,6 +130,9 @@ public void apply() throws ConnectException {
WALEntry mapKey = new WALEntry(key.getName());
WALEntry mapValue = new WALEntry(value.getName());
entries.put(mapKey, mapValue);
if (value.getName().equals(RecoveryHelper.RECOVERY_RECORD_KEY)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do anything? RECOVERY_RECORD_KEY is written to the key here - wal.append(RecoveryHelper.RECOVERY_RECORD_KEY, fileStatusWithMaxOffset.getPath().toString()).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a mistake. Fixed it.

@@ -120,6 +121,7 @@ public void apply() throws ConnectException {
for (Map.Entry<WALEntry, WALEntry> entry: entries.entrySet()) {
String tempFile = entry.getKey().getName();
String committedFile = entry.getValue().getName();
RecoveryHelper.getInstance().addFile(committedFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned that the lists in the map grow without bounds and eventually may cause OOM if the process is run long enough.

@justpresident
Copy link
Author

@justpresident Thanks for making this PR. I'm curious what sort of speedup you are seeing in your environment?

The speedup of course depends on the number of existing files in the table. The initial scan, that usually takes around 1 hour for large tables is eliminated completely. The startup is now instant

@pedro93
Copy link

pedro93 commented Mar 2, 2022

Hello,
Any updates on this PR?

@justpresident
Copy link
Author

I don't work with kafka-connect anymore and don't have such a setup with thousands of hdfs files to test, but it seems like the problem was solved in a very similar way in #556
Can someone please test and if there is no problem, this PR can be closed

@cla-assistant
Copy link

cla-assistant bot commented Aug 27, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Roman Studenikin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants