You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm creating a table using INSERT mode with record level index. I see that the data and the partitions are written to s3 but then while appending records to the record index log my job fails.
DataSourceWriteOptions.TABLE_TYPE.key() -> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL,
HoodieStorageConfig.PARQUET_COMPRESSION_CODEC_NAME.key() -> "snappy",
HoodieStorageConfig.PARQUET_MAX_FILE_SIZE
.key() -> "2147483648",
"hoodie.parquet.small.file.limit" -> "1073741824",
HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key() -> "true",
HoodieIndexConfig.INDEX_TYPE.key() -> "RECORD_INDEX",
"hoodie.metadata.enable" -> "true",
"hoodie.datasource.write.hive_style_partitioning" -> "true",
"hoodie.metadata.record.index.enable" -> "true",
HoodieTableConfig.POPULATE_META_FIELDS.key() -> "true",
HoodieWriteConfig.MARKERS_TYPE.key() -> "DIRECT",
DataSourceWriteOptions.OPERATION.key() -> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, /
"hoodie.metadata.record.index.max.filegroup.count" -> "100000",
"hoodie.metadata.record.index.min.filegroup.count" -> "7500" // I have 10ish TB of data and trying to keep the record index log files to be around 400 MB each.
)
Stacktrace
Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to s3://SomeS3Path/.hoodie/metadata/record_index/.record-index-0195-0_00000000000000012.log.2_912-39-236765
at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:466)
at org.apache.hudi.io.HoodieAppendHandle.flushToDiskIfRequired(HoodieAppendHandle.java:599)
at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:428)
at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:337)
... 29 more
Caused by: org.apache.hudi.exception.HoodieIOException: IOException serializing records
at org.apache.hudi.common.util.HFileUtils.lambda$serializeRecordsToLogBlock$0(HFileUtils.java:219)
at java.util.TreeMap.forEach(TreeMap.java:1005)
at org.apache.hudi.common.util.HFileUtils.serializeRecordsToLogBlock(HFileUtils.java:213)
at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:108)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:117)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:163)
at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:458)
... 33 more
Caused by: java.io.IOException: Added a key not lexically larger than previous.
The text was updated successfully, but these errors were encountered:
@dataproblems Can you try the upsert mode? With RLI it anyway will not incur must cost for index lookup phase so insert/upsert will perform similar only.
We will look into this more why with insert it is failing
@ad1happy2go - This seemed to be related to the text present within the record key field. If I removed that particular entry from my dataset, the operation went through.
Is there a place which captures the restrictions on the characters present in a String for the record key field?
Describe the problem you faced
I'm creating a table using INSERT mode with record level index. I see that the data and the partitions are written to s3 but then while appending records to the record index log my job fails.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I should be able to create the record level index
Environment Description
Hudi version : 0.15.0
Spark version : 3.4
Hive version : N/A
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) :
Additional context
Hoodie options
Stacktrace
The text was updated successfully, but these errors were encountered: