Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make ExpressionHandler::get_evaluator fallible #577

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

cg-cognition
Copy link

Fixes #566 . Makes get_evaluator() return DeltaResult<Arc> to properly handle potential errors. This change is intended to support get_evaluator() doing eager validation checking, but doesn't do the validation itself.

What changes are proposed in this pull request?

Here's a comprehensive summary of all changes made:

  1. Core Interface Change (lib.rs):
  • Modified ExpressionHandler trait to make get_evaluator return DeltaResult<Arc>
  • This change makes error handling explicit at the trait level
  1. Implementation Changes (arrow_expression.rs):
  • Updated ArrowExpressionHandler to return Ok(Arc::new(...))
  • Simple change since this implementation can't fail
  1. Error Handling Improvements (data_skipping.rs):
  • Simplified error handling using ok()? operator
  • Made the code more idiomatic while maintaining error propagation
  • Applied to select_stats_evaluator, skipping_evaluator, and filter_evaluator
  1. Iterator Changes (log_replay.rs):
  • Changed to boxed iterator to handle both success and error cases
  • Added explicit error handling path using iter::once(Err(e))
  • Maintains original behavior while properly propagating errors
  1. Call Site Updates:
  • Updated all get_evaluator calls to handle Result type
  • Modified in transaction.rs, scan/mod.rs, and table_changes/log_replay.rs
  • Added proper error propagation using ? operator

How was this change tested?

Tested with cargo test --all-features --all-targets -- --skip read_table_version_hdfs. Had an issue with getting Java setup but that test doesn't seem relevant to the PR. No new tests need to be written since it's a no-op refactor.

Additional Context

This PR was entirely written by Devin with a little bit of review from me. Happy to address any feedback and get this over the finish line.

Original Run: https://preview.devin.ai/sessions/e839a4e9bcb444b69b99205babdcf1af

Make get_evaluator() return DeltaResult<Arc<dyn ExpressionEvaluator>> to properly handle potential errors. Update all call sites to handle the Result type and simplify error handling using the ok()? operator where appropriate.
@github-actions github-actions bot added the breaking-change Change that will require a version bump label Dec 9, 2024
STATS_EXPR.clone(),
DataType::STRING,
)
.ok()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have at least a warn! log or similar, to alert when these errors happen? Otherwise I worry it could be very difficult to debug why data skipping does not occur?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, this is the one part I wasn't too sure about

table_schema: &SchemaRef,
predicate: Option<ExpressionRef>,
) -> impl Iterator<Item = DeltaResult<ScanData>> {
) -> Box<dyn Iterator<Item = DeltaResult<ScanData>>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider returning DeltaResult<impl Iterator>, so the error is immediately obvious instead of buried in the iterator itself. Plenty of examples of that elsewhere in the kernel.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@scovich scovich requested a review from nicklan December 10, 2024 20:38
devin-ai-integration bot and others added 8 commits December 11, 2024 01:14
…tions

- Modified scan_action_iter to return DeltaResult<impl Iterator>
- Fixed code formatting in log_replay.rs files
- Reordered imports in mod.rs for better readability
- Improved error propagation in transaction.rs

Co-Authored-By: Calvin Giroud <[email protected]>
Added warning logs when evaluator creation fails in DataSkippingFilter:
- Log stats selector evaluator failures
- Log skipping evaluator failures
- Log filter evaluator failures

This improves debuggability when data skipping does not occur as expected.

Co-Authored-By: Calvin Giroud <[email protected]>
@cg-cognition cg-cognition requested a review from scovich December 11, 2024 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Change that will require a version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Engine::get_evaluator should be fallible
2 participants