Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8590] fix: wrong file path for consistent-bucket-commit-marker-file #12344

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

TheR1sing3un
Copy link
Member

issue: #12338

Change Logs

  1. wrong file path for consistent-bucket-commit-marker-file

Describe context and summary for this change. Highlight if any code was copied.

Impact

Describe any public API or user-facing feature change or any performance impact.
none

Risk level (write none, low medium or high below)

low
If medium or high, explain what verification was done to mitigate the risks.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

1. wrong file path for consistent-bucket-commit-marker-file

Signed-off-by: TheR1sing3un <[email protected]>
@TheR1sing3un
Copy link
Member Author

@beyond1920 @danny0405 HI, I find a bug about incorrect path for the Consistent-Bucket-Commit-Marker-File. Please have a look. Thanks!

@github-actions github-actions bot added size:S PR with lines of changes in (10, 100] size:M PR with lines of changes in (100, 300] and removed size:S PR with lines of changes in (10, 100] labels Nov 27, 2024
@TheR1sing3un TheR1sing3un force-pushed the fix_wrong_consistent_commit_marker_file branch from df0fcce to da34c08 Compare November 27, 2024 10:01
1. fix unable to load latest committed consistent-bucket-hash-metadata

Signed-off-by: TheR1sing3un <[email protected]>
@TheR1sing3un TheR1sing3un force-pushed the fix_wrong_consistent_commit_marker_file branch from efde0e3 to fb1ea61 Compare November 27, 2024 10:51
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

final List<StoragePathInfo> hashingMetaFiles = metaFiles.stream().filter(hashingMetadataFilePredicate)
.sorted(Comparator.comparing(f -> f.getPath().getName()))

final TreeMap<String/*instantTime*/, Pair<StoragePathInfo/*hash metadata file path*/, Boolean/*commited*/>> versionedHashMetadataFiles = metaFiles.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beyond1920 can you review, and add test cases please.

Copy link
Member Author

@TheR1sing3un TheR1sing3un Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beyond1920 can you review, and add test cases please.

Thanks! I will add more test cases to verify metadata correctness.

@yuzhaojing
Copy link
Contributor

Hi, could u please describe what problems will be caused if this fix is not made? Will there be any issues regarding correctness?

@TheR1sing3un
Copy link
Member Author

Hi, could u please describe what problems will be caused if this fix is not made? Will there be any issues regarding correctness?

Commit-marker-file will always be created in wrong file path.You never know that a certain version of hash metadata has been committed because the scan path is different. Therefore, it is necessary to go to the timeline each time and then resubmit this file, which will put a lot of pressure on the timeline. In the current code logic, if there is only one hash metadata that is not committed and it is still in pending state, the current code logic will return empty instead of the recently committed hash metadata. So I fix these two problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M PR with lines of changes in (100, 300]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants