Fallback when checkpoints indicated by _last_checkpoint hint is missing #582

Sevenannn · 2024-12-10T01:34:41Z

Please describe why this is necessary.

When the checkpoints indicated by _last_checkpoint file is missing, the snapshot creation will simply fail. A fallback machenism to construct snapshot from a previous checkpoint versions, or a simpler fallback of completely construct snapshot from log files would be useful in this case, along with warning messages informing users that olders checkpoint file is used.

Describe the functionality you are proposing.

When the checkpoint indicated by _last_checkpoint file is missing, construct snapshot from the last valid checkpoint + subsequent logs file, or a more naive implementation of constructing snapshot purely from logs file.

Additional context

N/A

zachschuermann · 2024-12-10T05:23:27Z

Hi @Sevenannn thanks for raising! This has been a TODO for a while I will try to get to it soon :)

Looks like delta-spark just lists from 0 in case there isn't a last checkpoint hint? Seems reasonable to do in the near-term

scovich · 2024-12-10T22:31:23Z

There are two different issues here:

_last_checkpoint file is missing or stale (points to an older checkpoint). This can happen for various reasons and clients have to be ready to deal with it. We usually list from 0 because there's no way to guess where the listing should actually start.
_last_checkpoint file was wrong (points to a checkpoint that doesn't exist, and no newer checkpoint exists). This cannot happen under normal circumstances because we always write the checkpoint before updating the _last_checkpoint file, and metadata cleanup should never delete the newest checkpoint. Because this situation should not occur, it's less clear that we should try to handle it gracefully. The main reason I've seen for it to arise is when people physically delete files from the _delta_log directory. Either to "drop" and "recreate" the table (while workload is running), or to "recover" to an earlier state by deleting newer commits (again, while workload is running). Neither of those is a supported use case in Delta, tho delta-spark seems to go out of its way to tolerate them.

Sevenannn · 2024-12-11T04:39:50Z

Thanks for the replies! @scovich Yeah I can see the error "Had a _last_checkpoint hint but didn't find any checkpoints" under the abnormal circumstances. (e.g. checkpoint file get manually deleted). Do you suggest that this would be better to left as an error for users' to handle, instead of having a fall back within the delta-kernel rs?

Sevenannn added the enhancement New feature or request label Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback when checkpoints indicated by _last_checkpoint hint is missing #582

Fallback when checkpoints indicated by _last_checkpoint hint is missing #582

Sevenannn commented Dec 10, 2024

zachschuermann commented Dec 10, 2024

scovich commented Dec 10, 2024 •

edited

Loading

Sevenannn commented Dec 11, 2024

Fallback when checkpoints indicated by _last_checkpoint hint is missing #582

Fallback when checkpoints indicated by _last_checkpoint hint is missing #582

Comments

Sevenannn commented Dec 10, 2024

Please describe why this is necessary.

Describe the functionality you are proposing.

Additional context

zachschuermann commented Dec 10, 2024

scovich commented Dec 10, 2024 • edited Loading

Sevenannn commented Dec 11, 2024

scovich commented Dec 10, 2024 •

edited

Loading