Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback when checkpoints indicated by _last_checkpoint hint is missing #582

Open
Sevenannn opened this issue Dec 10, 2024 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@Sevenannn
Copy link

Please describe why this is necessary.

When the checkpoints indicated by _last_checkpoint file is missing, the snapshot creation will simply fail. A fallback machenism to construct snapshot from a previous checkpoint versions, or a simpler fallback of completely construct snapshot from log files would be useful in this case, along with warning messages informing users that olders checkpoint file is used.

Describe the functionality you are proposing.

When the checkpoint indicated by _last_checkpoint file is missing, construct snapshot from the last valid checkpoint + subsequent logs file, or a more naive implementation of constructing snapshot purely from logs file.

Additional context

N/A

@Sevenannn Sevenannn added the enhancement New feature or request label Dec 10, 2024
@zachschuermann
Copy link
Collaborator

Hi @Sevenannn thanks for raising! This has been a TODO for a while I will try to get to it soon :)

Looks like delta-spark just lists from 0 in case there isn't a last checkpoint hint? Seems reasonable to do in the near-term

@scovich
Copy link
Collaborator

scovich commented Dec 10, 2024

There are two different issues here:

  1. _last_checkpoint file is missing or stale (points to an older checkpoint). This can happen for various reasons and clients have to be ready to deal with it. We usually list from 0 because there's no way to guess where the listing should actually start.
  2. _last_checkpoint file was wrong (points to a checkpoint that doesn't exist, and no newer checkpoint exists). This cannot happen under normal circumstances because we always write the checkpoint before updating the _last_checkpoint file, and metadata cleanup should never delete the newest checkpoint. Because this situation should not occur, it's less clear that we should try to handle it gracefully. The main reason I've seen for it to arise is when people physically delete files from the _delta_log directory. Either to "drop" and "recreate" the table (while workload is running), or to "recover" to an earlier state by deleting newer commits (again, while workload is running). Neither of those is a supported use case in Delta, tho delta-spark seems to go out of its way to tolerate them.

@Sevenannn
Copy link
Author

Thanks for the replies! @scovich Yeah I can see the error "Had a _last_checkpoint hint but didn't find any checkpoints" under the abnormal circumstances. (e.g. checkpoint file get manually deleted). Do you suggest that this would be better to left as an error for users' to handle, instead of having a fall back within the delta-kernel rs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants