Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't select table If drop the corresponding column after replacing or dropping partition spec field #11314

Open
1 of 3 tasks
bknbkn opened this issue Oct 14, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@bknbkn
Copy link
Contributor

bknbkn commented Oct 14, 2024

Apache Iceberg version

Master branch

Query engine

None

Please describe the bug 🐞

If we replaced or dropped partition spec field and drop the corresponding column, we can't select table again,

Caused by: java.lang.NullPointerException: Type cannot be null
at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:921)
at org.apache.iceberg.types.Types$NestedField.(Types.java:446)
at org.apache.iceberg.types.Types$NestedField.optional(Types.java:415)
at org.apache.iceberg.PartitionSpec.partitionType(PartitionSpec.java:141)
at org.apache.iceberg.Partitioning.buildPartitionProjectionType(Partitioning.java:273)
at org.apache.iceberg.Partitioning.partitionType(Partitioning.java:241)
at org.apache.iceberg.Partitioning.partitionType(Partitioning.java:237)
at org.apache.iceberg.spark.source.SparkTable.metadataColumns(SparkTable.java:249)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation.metadataOutput$lzycompute(DataSourceV2Relation.scala:61)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation.metadataOutput(DataSourceV2Relation.scala:51)
at org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias.metadataOutput(basicLogicalOperators.scala:1339)
at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:960)
at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:960)
at scala.collection.Iterator.exists(Iterator.scala:969)

It can be easily reproduced if add sql("SELECT * FROM %s", tableName);
in TestAlterTablePartitionFileds.testDropColumnOfOldPartitionFieldV1 or TestAlterTablePartitionFileds.testDropColumnOfOldPartitionFieldV2

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@bknbkn bknbkn added the bug Something isn't working label Oct 14, 2024
@bknbkn
Copy link
Contributor Author

bknbkn commented Oct 14, 2024

The reason for this problem seems to be that each spec uses the latest schema, and historical specs may not be able to find fields in the latest schema.

I think it is necessary to persist the schma id into each spec in metadata.json. Based on this, each PartitionSpec can find its own schema when it is generated.

@bknbkn bknbkn changed the title If we replaced or dropped partition spec field and drop the corresponding column, we can't select table again Can't select table If drop the corresponding column after replacing or dropping partition spec field Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant