[SUPPORT] Hoodie Insert operation failing while appending to record index log file #12320

dataproblems · 2024-11-23T00:48:04Z

Describe the problem you faced

I'm creating a table using INSERT mode with record level index. I see that the data and the partitions are written to s3 but then while appending records to the record index log my job fails.

To Reproduce

Steps to reproduce the behavior:

spark.write.format("hudi").options(...).save("...")

Expected behavior

I should be able to create the record level index

Environment Description

Hudi version : 0.15.0
Spark version : 3.4
Hive version : N/A
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) :

Additional context

Hoodie options

DataSourceWriteOptions.TABLE_TYPE.key() -> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, 
    HoodieStorageConfig.PARQUET_COMPRESSION_CODEC_NAME.key() -> "snappy", 
    HoodieStorageConfig.PARQUET_MAX_FILE_SIZE
      .key() -> "2147483648",
    "hoodie.parquet.small.file.limit" -> "1073741824",
    HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key() -> "true", 
    HoodieIndexConfig.INDEX_TYPE.key() -> "RECORD_INDEX", 
    "hoodie.metadata.enable" -> "true", 
    "hoodie.datasource.write.hive_style_partitioning" -> "true", 
    "hoodie.metadata.record.index.enable" -> "true", 
    HoodieTableConfig.POPULATE_META_FIELDS.key() -> "true", 
    HoodieWriteConfig.MARKERS_TYPE.key() -> "DIRECT",
    DataSourceWriteOptions.OPERATION.key() -> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, /
    "hoodie.metadata.record.index.max.filegroup.count" -> "100000",
   "hoodie.metadata.record.index.min.filegroup.count" -> "7500" // I have 10ish TB of data and trying to keep the record index log files to be around 400 MB each. 
  )

Stacktrace

Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to s3://SomeS3Path/.hoodie/metadata/record_index/.record-index-0195-0_00000000000000012.log.2_912-39-236765
	at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:466)
	at org.apache.hudi.io.HoodieAppendHandle.flushToDiskIfRequired(HoodieAppendHandle.java:599)
	at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:428)
	at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:337)
	... 29 more
Caused by: org.apache.hudi.exception.HoodieIOException: IOException serializing records
	at org.apache.hudi.common.util.HFileUtils.lambda$serializeRecordsToLogBlock$0(HFileUtils.java:219)
	at java.util.TreeMap.forEach(TreeMap.java:1005)
	at org.apache.hudi.common.util.HFileUtils.serializeRecordsToLogBlock(HFileUtils.java:213)
	at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:108)
	at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:117)
	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:163)
	at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:458)
	... 33 more
Caused by: java.io.IOException: Added a key not lexically larger than previous.

The text was updated successfully, but these errors were encountered:

ad1happy2go · 2024-11-26T04:22:57Z

@dataproblems Can you try the upsert mode? With RLI it anyway will not incur must cost for index lookup phase so insert/upsert will perform similar only.
We will look into this more why with insert it is failing

dataproblems · 2024-11-26T18:36:35Z

@ad1happy2go - This seemed to be related to the text present within the record key field. If I removed that particular entry from my dataset, the operation went through.

Is there a place which captures the restrictions on the characters present in a String for the record key field?

ad1happy2go added priority:critical production down; pipelines stalled; Need help asap. index metadata metadata table labels Nov 26, 2024

ad1happy2go added this to Hudi Issue Support Nov 26, 2024

github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Hoodie Insert operation failing while appending to record index log file #12320

[SUPPORT] Hoodie Insert operation failing while appending to record index log file #12320

dataproblems commented Nov 23, 2024

ad1happy2go commented Nov 26, 2024

dataproblems commented Nov 26, 2024

[SUPPORT] Hoodie Insert operation failing while appending to record index log file #12320

[SUPPORT] Hoodie Insert operation failing while appending to record index log file #12320

Comments

dataproblems commented Nov 23, 2024

Hoodie options

ad1happy2go commented Nov 26, 2024

dataproblems commented Nov 26, 2024