You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the checkpoints indicated by _last_checkpoint file is missing, the snapshot creation will simply fail. A fallback machenism to construct snapshot from a previous checkpoint versions, or a simpler fallback of completely construct snapshot from log files would be useful in this case, along with warning messages informing users that olders checkpoint file is used.
Describe the functionality you are proposing.
When the checkpoint indicated by _last_checkpoint file is missing, construct snapshot from the last valid checkpoint + subsequent logs file, or a more naive implementation of constructing snapshot purely from logs file.
Additional context
N/A
The text was updated successfully, but these errors were encountered:
_last_checkpoint file is missing or stale (points to an older checkpoint). This can happen for various reasons and clients have to be ready to deal with it. We usually list from 0 because there's no way to guess where the listing should actually start.
_last_checkpoint file was wrong (points to a checkpoint that doesn't exist, and no newer checkpoint exists). This cannot happen under normal circumstances because we always write the checkpoint before updating the _last_checkpoint file, and metadata cleanup should never delete the newest checkpoint. Because this situation should not occur, it's less clear that we should try to handle it gracefully. The main reason I've seen for it to arise is when people physically delete files from the _delta_log directory. Either to "drop" and "recreate" the table (while workload is running), or to "recover" to an earlier state by deleting newer commits (again, while workload is running). Neither of those is a supported use case in Delta, tho delta-spark seems to go out of its way to tolerate them.
Thanks for the replies! @scovich Yeah I can see the error "Had a _last_checkpoint hint but didn't find any checkpoints" under the abnormal circumstances. (e.g. checkpoint file get manually deleted). Do you suggest that this would be better to left as an error for users' to handle, instead of having a fall back within the delta-kernel rs?
Please describe why this is necessary.
When the checkpoints indicated by _last_checkpoint file is missing, the snapshot creation will simply fail. A fallback machenism to construct snapshot from a previous checkpoint versions, or a simpler fallback of completely construct snapshot from log files would be useful in this case, along with warning messages informing users that olders checkpoint file is used.
Describe the functionality you are proposing.
When the checkpoint indicated by _last_checkpoint file is missing, construct snapshot from the last valid checkpoint + subsequent logs file, or a more naive implementation of constructing snapshot purely from logs file.
Additional context
N/A
The text was updated successfully, but these errors were encountered: