Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all compression types for commonly used Parquet creation tools. #2

Open
thompsonmj opened this issue May 16, 2024 · 0 comments

Comments

@thompsonmj
Copy link
Contributor

Polars enables compression types including:
{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}

Pandas enables compression types including:
None, ‘gzip’, ‘brotli’, ‘lz4’, ‘zstd’.

Pyspark enables compression types including:
(none, uncompressed, snappy, gzip, lzo, brotli, lz4, and zstd)

Tested and working:

  • uncompressed
  • lz4
  • snappy
  • gzip
  • zstd

Tested and not working:

  • brotli

Untested:

  • lzo

I believe this is determined by the libraries in the base image. Using Ubuntu vs Alpine solved some but not all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant