Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Status #24

Open
norlandrhagen opened this issue Aug 5, 2024 · 9 comments
Open

Project Status #24

norlandrhagen opened this issue Aug 5, 2024 · 9 comments

Comments

@norlandrhagen
Copy link

Hi there @roeap!

I really appreciate all the work you've done here. I was wondering a bit about the future of the project. Do you still plan on maintaining it and adding features or is it on-hold for now?

@roeap
Copy link
Owner

roeap commented Aug 5, 2024

@norlandrhagen - fair point, as it has been a while since I have been active here - not for a lack of interest though, just live got in the way 😆.

SO I started tonight with merging all the open PRs and now want to do one round of review to see if we need some maintenance somewhere. And then do another release asap ..

going forward I am planning on doing maintenance, reviewing PRs in a timely manner, and implement the occasional feature if it makes sense. I am not going to be able to spend a large amount of time on this, but at least continuous.

As this essentially mirrors the object store APIs I am hoping that this will be sufficient to keep this attractive and foremost useful to users ... does that help?

@norlandrhagen
Copy link
Author

Thank you @roeap! Totally understand and great to know your plans.

@djouallah
Copy link

Do you mind please updating the package

@ByteBaker
Copy link

@roeap do you think it'd be a good idea to utilize Object Store from Apache as the backend? We could utilize PyO3.

This would take away the overhead of maintaining the Rust code (and any optimizations/issues that come with it).

@kylebarron
Copy link
Contributor

In kylebarron/arro3#229 I made a pyo3 integration for object-store, which is designed for other Rust developers making Python packages who want to use object-store in their own crates. See #3 for discussion on the original goals of that. I'll publish this to crates.io whenever pyo3-async-runtimes has its 0.22 release.

I'll likely make a Python-facing wrapper as well sometime soon. It won't have the same API as object-store-python, but I'm hoping my implementation will be easier to maintain and keep up to date.

@dsgibbons
Copy link

@roeap do you think it'd be a good idea to utilize Object Store from Apache as the backend? We could utilize PyO3.

@ByteBaker, isn't that what this project already does?

In kylebarron/arro3#229 I made a pyo3 integration for object-store, which is designed for other Rust developers making Python packages who want to use object-store in their own crates. See #3 for discussion on the original goals of that. I'll publish this to crates.io whenever pyo3-async-runtimes PyO3/pyo3-async-runtimes#1.

@kylebarron can you expand a bit more on the difference between your implementation and this project? I'm looking into writing Python bindings for SlateDB, which is also based on the object_store crate. I was hoping to take advantage of some prewritten PyO3 bindings for object_store.

@kylebarron
Copy link
Contributor

Sure, @dsgibbons. And for clarity I moved the object-store integration into a separate repo here: https://github.com/developmentseed/object-store-rs

Overall differences

  • Better maintained. This is subjective, and my project still has a bus factor of 1, but e.g. I've had PRs here sit for 8 months (Add get_opts and get_ranges #9)
  • object-store-rs is not a fork of this repo; it's a reimplementation to try and have simpler end-user APIs and easier internal maintenance.
  • IMO this repo is a bit overly complicated. The code across lib.rs and builder.rs totals like 1200 lines, and does a lot of manual re-implementation of the core object-store methods. The body of my Python function to create an S3 store is 13 lines of code. It's this simple because of smart use of the FromPyObject trait. PyAmazonS3ConfigKey is a tiny wrapper around object_store::aws::AmazonS3ConfigKey, which validates that the Python string input is indeed a valid key before it even reaches my function. Then my function can take in a HashMap<PyAmazonS3ConfigKey, String> and I don't need to do any validation in the body of my function, I can just pass it to builder.with_config.
  • Having these simple wrappers around upstream object_store config structs should hopefully mean less maintenance as well. If object_store adds a new key to object_store::aws::AmazonS3ConfigKey, I don't need to change anything on my side to support the new version; the validation will automatically still work.

Python facing differences

You can see my WIP API docs here

  • Fuller implementation, including stuff like multipart put (wip: add put_opts & put_multipart #14). So we can upload large files efficiently.
  • Uses Python native types where possible. This library overloads stuff like Path with custom Python classes. I want to handle whatever inputs the user already has, like str, and by handling this on the rust side, any other Rust library that uses my integration will get it for free.
  • A streaming get implementation is WIP, based on Add stream method to object store #29. We should be able to provide an async or sync iterator to the user for streaming the bytes of a file or the items in a ListResult.
  • Doesn't need a full Python-side wrapper in python code, for easier maintenance.

Rust-facing differences

I wanted a rust-facing library because I want to use this from other Rust libraries exported to Python, including arro3, geoarrow-rs, icechunk, etc.

In pyo3-arrow I figured out a nice way to have pyo3-integration for Arrow data, where each Rust library doesn't need to export anything new to Python. But this works because Arrow is ABI stable, while ObjectStore is not. So having a rust-facing pyo3 extension is slightly harder here because each Rust package will have to export its own Python classes that are built against your own library.

My crate uses the latest version of pyo3, v0.22. I can't publish this to crates.io yet because https://github.com/awestlake87/pyo3-asyncio is no longer maintained and the official fork https://github.com/PyO3/pyo3-async-runtimes hasn't published an 0.22 version yet (but is updated to 0.22 on git). I'm hoping that pyo3-async-runtimes will publish an 0.22 version very soon, and then I'll publish to crates.io.

All of these APIs under store are Python classes exported by pyo3-object_store, defined by register_store_module. And then all your own code has to do is accept PyObjectStore as a parameter, and then you can call into_inner to get an Arc<dyn ObjectStore>, and do whatever you want with it.

@dsgibbons
Copy link

Thank you for that @kylebarron. This is very helpful.

@kylebarron
Copy link
Contributor

kylebarron commented Oct 21, 2024

I published my own version of an object_store wrapper, object-store-rs to PyPI: https://github.com/developmentseed/object-store-rs

Edit: renamed to obstore: https://github.com/developmentseed/obstore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants