Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: move unity catalog integration to it's own crate #3044

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

hntd187
Copy link
Collaborator

@hntd187 hntd187 commented Dec 2, 2024

Description

This moves the catalog integration of unity catalog to it's own crate. It also replaces the mock server with the httpmock crate and the custom retry logic with request_middleware's retry middleware. I think this will be easier to manage as a client and I do not believe many people were using this integration anyways.

Related Issue(s)

Documentation

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Dec 2, 2024
Copy link

codecov bot commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 55.17241% with 104 lines in your changes missing coverage. Please review.

Project coverage is 72.40%. Comparing base (e665ca8) to head (54d777c).

Files with missing lines Patch % Lines
crates/catalog-unity/src/client/retry.rs 14.89% 40 Missing ⚠️
crates/catalog-unity/src/lib.rs 52.72% 17 Missing and 9 partials ⚠️
crates/catalog-unity/src/credential.rs 72.54% 5 Missing and 9 partials ⚠️
crates/catalog-unity/src/client/mod.rs 64.70% 4 Missing and 2 partials ⚠️
crates/core/src/writer/stats.rs 54.54% 5 Missing ⚠️
crates/catalog-unity/src/datafusion.rs 0.00% 3 Missing ⚠️
crates/core/src/delta_datafusion/expr.rs 0.00% 3 Missing ⚠️
crates/core/src/kernel/models/actions.rs 50.00% 3 Missing ⚠️
crates/core/src/kernel/snapshot/replay.rs 0.00% 0 Missing and 2 partials ⚠️
crates/core/src/kernel/snapshot/mod.rs 66.66% 0 Missing and 1 partial ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3044      +/-   ##
==========================================
- Coverage   72.62%   72.40%   -0.23%     
==========================================
  Files         129      128       -1     
  Lines       41626    41368     -258     
  Branches    41626    41368     -258     
==========================================
- Hits        30232    29951     -281     
- Misses       9419     9445      +26     
+ Partials     1975     1972       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 60 to 64
// impl From<RetryError> for reqwest::Error {
// fn from(value: RetryError) -> Self {
// Into::into(value)
// }
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/catalog-unity/src/credential.rs Outdated Show resolved Hide resolved
r#"
let client = reqwest_middleware::ClientBuilder::new(Client::new()).build();

let _retry_config = RetryConfig::default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if I was going to remove it at that point from the testing, so it's a side effect left over from that.

Response::new(Body::from(
r#"
let client = reqwest_middleware::ClientBuilder::new(Client::new()).build();
let _retry_config = RetryConfig::default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unused variable that should be removed? 🤔

crates/core/Cargo.toml Outdated Show resolved Hide resolved
crates/core/Cargo.toml Outdated Show resolved Hide resolved
@@ -137,4 +138,4 @@ datafusion = [
datafusion-ext = ["datafusion"]
json = ["parquet/json"]
python = ["arrow/pyarrow"]
unity-experimental = ["reqwest", "hyper"]
unity-experimental = ["reqwest", "reqwest-middleware"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this feature come off the core crate now that this code is moved into its own crate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a discussion point I wanted to talk about. The main trait for the catalog and the errors remained in the core crate. So core can build I assume generic catalog support and catalog crates just depend on core and implement. At first glance I kinda didn't have a cohesive error handling thing here, but I think thinking about it. There should be a generic catalog error in core, but have no conditionals features on it. And individual crates implement a conversion to that error type. Sound good?

@hntd187 hntd187 marked this pull request as ready for review December 10, 2024 20:44
let compare_expr = match generalize_filter(
let compare_expr = generalize_filter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed a few warnings along the way, this statement could be just written with an ? operator instead.

Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just some minor questions.

More gene4rally speaking though, I think we need to make some more deliberate choices around how we want to integrate with catalogs.

AS I have been out for a while, I am not sure where we are in terms of eagerness to push the next release. I would hope that we can do some more cleaning and most importantly testing before we release get "more serious" with unity ...

w.r.t. testing I hope that we can even do some integration tests using OSS UC. Happy the help with that.

@@ -216,7 +226,10 @@ impl FromStr for UnityCatalogConfigKey {
"workspace_url" | "unity_workspace_url" | "databricks_workspace_url" => {
Ok(UnityCatalogConfigKey::WorkspaceUrl)
}
_ => Err(UnityCatalogError::UnknownConfigKey(s.into()).into()),
_ => Err(DataCatalogError::UnknownConfigKey {
catalog: "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in this case the "catalog" would be "Unity".

Btw... a lot fi the client impl was just stolen from Object store, where this field identifies the specific implementation (azure, gcp, ...). If we have this error live in a sepate create, this dioes not make much sense anymore.


Ok(resp.json().await?)
Ok(resp.json().await.map_err(UnityCatalogError::from)?)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have an impl From<...> for the error, why do we need the map? does the function return a different kind of err.

"GOOGLE_SERVICE_ACCOUNT",
account_path.as_path().to_str().unwrap(),
);
unsafe {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something, or did we remove unsafe around set_var further above, specifically in crates/aws/src/storage.rs, and here we are adding it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to add it back to the other one, thanks for pointing it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants