-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make all SchemaProvider trait APIs async #10339
Comments
This one is actually quite interesting. I hadnt gotten around to it yet but I was going to propose having both sync and async apis for |
+1 for this proposal. We ran into similar problems when implementing datafusion/iceberg integrations: apache/iceberg-rust#324 It's more nature to allow catalog/schema operations to be async. |
Yes, I agree allowing catalog APIs to be async makes sense to me One rationale against async, as I recall from @tustvold , was to encourage people to implement efficient catalog implementations (e.g. for many cases calling a network call for However, my personal opinion is that such encouragement can be done via documentation and if people want to implement RPC network calls during planning then the APIs shouldn't stop them I think the biggest challenge is, as @metesynnada hints at above, the viral nature of An alternate approach might be to implement, via some hackery and tokio channels, an struct that implements the |
I seem to remember omitting the methods other than table due to the sheer size of the resulting change, I have no objection to making them async |
I've made a stab at this.
The easiest way we've found to do this kind of thing is with a metadata cache. However, this cache gets invalidated and has cold start, etc. The problem with "warming the cache prior to the query" is that it is very difficult to determine which entries will be required by an SQL string. Loading the entire catalog into memory for a single query is prohibitively expensive for us.
Yes 😰 Although, most of the "down the stack" changes were not for the query side of the catalog but for the management side of the catalog (e.g. registering and deregistering tables) and this isn't something that is really relevant for caching anyways.
I could find no reasonable way to implement such hackery. |
Is your feature request related to a problem or challenge?
Currently, only
table
API is async, butregister_table
,deregister_table
,table_names
, andtable_exist
can be also async. Also, most of the usage is under async methods, thus there shouldn't be a big issue implementing it.However, this can be a huge change for the downstream usage, this is why I need some insides before me or someone else implements it.
cc @Dandandan @alamb
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: