Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field partitioner does not support multiple event type feature with schema references #170

Open
gokhansari opened this issue Jan 4, 2021 · 0 comments

Comments

@gokhansari
Copy link

gokhansari commented Jan 4, 2021

With the version 5.5+, Confluent now supports multiple event types in same topic. Based on this official documentation and this blog page, I tried to use this feature. I wrote a kafka streams application which produce avro messages with different schemas by using schema registry schema references. I even tried to consume these messages by an other kafka streams application to test multiple event type functionality and achieved a successful result.
This is my union Avro Schema to use in my tests. eventA and eventB schemas are also registered on Schema Registry:

[
    "com.xxx.xxx.eventA",
    "com.xxx.xxx.eventB"
]

Everything was good until this point. Then I tried to sink these messages to HDFS over Kafka Connect Hdfs Sink Connector alongside FieldPartitioner. And set relevant configuration properties in connector settings. This was the field that I want to use for partitioning:

"partition.field.name" : "field1"

Connector successfully started and read records from kafka but when It comes to partitioning process I got errors. It seems field partitioner was looking for field1 but actually It is not under root. Because of multiple event type functionality there is a wrapper root field with the name eventA. (I think this is made by toConnectSchema method in AvroData class of kafka-schema-registry-parent repository of confluent.)
Struct{eventA=Struct{field1=val1,field2=val2,field3=val3}}

So partition.field.name must be set "eventA.field1". But this is not appropriate approach, root object field name always changes with a different event type name. I think we can say, multiple event type feature broke field partitioning on kafka connect.

As a workaround should I go with implementing custom field partitioner or Is there any consistent solution that I missed?

@gokhansari gokhansari changed the title Field partitioner does not support multiple event type feature Field partitioner does not support multiple event type feature with schema references Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant