You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the version 5.5+, Confluent now supports multiple event types in same topic. Based on this official documentation and this blog page, I tried to use this feature. I wrote a kafka streams application which produce avro messages with different schemas by using schema registry schema references. I even tried to consume these messages by an other kafka streams application to test multiple event type functionality and achieved a successful result.
This is my union Avro Schema to use in my tests. eventA and eventB schemas are also registered on Schema Registry:
[
"com.xxx.xxx.eventA",
"com.xxx.xxx.eventB"
]
Everything was good until this point. Then I tried to sink these messages to HDFS over Kafka Connect Hdfs Sink Connector alongside FieldPartitioner. And set relevant configuration properties in connector settings. This was the field that I want to use for partitioning:
"partition.field.name" : "field1"
Connector successfully started and read records from kafka but when It comes to partitioning process I got errors. It seems field partitioner was looking for field1 but actually It is not under root. Because of multiple event type functionality there is a wrapper root field with the name eventA. (I think this is made by toConnectSchema method in AvroData class of kafka-schema-registry-parent repository of confluent.) Struct{eventA=Struct{field1=val1,field2=val2,field3=val3}}
So partition.field.name must be set "eventA.field1". But this is not appropriate approach, root object field name always changes with a different event type name. I think we can say, multiple event type feature broke field partitioning on kafka connect.
As a workaround should I go with implementing custom field partitioner or Is there any consistent solution that I missed?
The text was updated successfully, but these errors were encountered:
gokhansari
changed the title
Field partitioner does not support multiple event type feature
Field partitioner does not support multiple event type feature with schema references
Jan 5, 2021
With the version 5.5+, Confluent now supports multiple event types in same topic. Based on this official documentation and this blog page, I tried to use this feature. I wrote a kafka streams application which produce avro messages with different schemas by using schema registry schema references. I even tried to consume these messages by an other kafka streams application to test multiple event type functionality and achieved a successful result.
This is my union Avro Schema to use in my tests. eventA and eventB schemas are also registered on Schema Registry:
Everything was good until this point. Then I tried to sink these messages to HDFS over Kafka Connect Hdfs Sink Connector alongside FieldPartitioner. And set relevant configuration properties in connector settings. This was the field that I want to use for partitioning:
"partition.field.name" : "field1"
Connector successfully started and read records from kafka but when It comes to partitioning process I got errors. It seems field partitioner was looking for field1 but actually It is not under root. Because of multiple event type functionality there is a wrapper root field with the name eventA. (I think this is made by toConnectSchema method in AvroData class of kafka-schema-registry-parent repository of confluent.)
Struct{eventA=Struct{field1=val1,field2=val2,field3=val3}}
So partition.field.name must be set "eventA.field1". But this is not appropriate approach, root object field name always changes with a different event type name. I think we can say, multiple event type feature broke field partitioning on kafka connect.
As a workaround should I go with implementing custom field partitioner or Is there any consistent solution that I missed?
The text was updated successfully, but these errors were encountered: