Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend list of basic partitioner: FieldAndTimeBasedPartitioner.java & HeaderAndTimeBasedPartitioner.java #290

Open
ostetsenko opened this issue Jan 19, 2023 · 0 comments

Comments

@ostetsenko
Copy link

ostetsenko commented Jan 19, 2023

We use KafkaConnect to dump topics to AWS S3. Analyzing data is pretty simple with Athena + AWS Glue (Crawlers) + AWS S3. It looks like a common way for AWS users.

Problem
The base problem happens when we partition by fields from the Kafka message. Athena can not create a table because parts of S3 subpath are separate columns and all Json keys are separate columns too. Two the same column names are impossible.

Solution
It's a good idea to add Partitioner based on Header field & Time

Extra
There is a good custom Partitioner which also can be used as default in this repo FieldAndTimeBasedPartitioner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant