Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to query table by partition filed? #11329

Open
jiyis opened this issue Oct 16, 2024 · 0 comments
Open

How to query table by partition filed? #11329

jiyis opened this issue Oct 16, 2024 · 0 comments
Labels
question Further information is requested

Comments

@jiyis
Copy link

jiyis commented Oct 16, 2024

Query engine

Iceberg API 1.6.1

Question

How should I query data by partition(bucket) field?

Schema

use s3 store

TableIdentifier tableIdentifier = TableIdentifier.of("default", "example_table");
Schema schema = new Schema(
        Types.NestedField.optional(1, "event_id", Types.StringType.get()),
        Types.NestedField.optional(2, "username", Types.StringType.get()),
        Types.NestedField.optional(3, "userid", Types.IntegerType.get()),
        Types.NestedField.optional(4, "api_version", Types.StringType.get()),
        Types.NestedField.optional(5, "command", Types.StringType.get())
);

PartitionSpec spec = PartitionSpec.builderFor(schema)
        .bucket("event_id", 10)
        .build();

Insert data

TableIdentifier name = TableIdentifier.of("default", "example_table");
Table table = catalog.loadTable(name);
Schema schema = table.schema();
GenericAppenderFactory appenderFactory = new GenericAppenderFactory(schema);

int partitionId = 1, taskId = 1;
OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId)
        .format(FileFormat.AVRO).build();
final PartitionKey partitionKey = new PartitionKey(table.spec(), table.spec().schema());
PartitionedFanoutWriter<Record> partitionedFanoutWriter = new PartitionedFanoutWriter<>(
        table.spec(),
        FileFormat.AVRO, appenderFactory, outputFileFactory,
        table.io(), 10 * 1024 * 1024) {
    @Override
    protected PartitionKey partition(Record record) {
        partitionKey.partition(record);
        return partitionKey;
    }
};

GenericRecord genericRecord = GenericRecord.create(table.schema());
List<String> levels = Arrays.asList("info", "debug", "error", "warn");
Random random = new Random();
for (int i = 0; i < 10000; i++) {
    GenericRecord record = genericRecord.copy();
    String eventId = UUID.randomUUID().toString();
    record.setField("event_id", eventId);
    record.setField("username", levels.get(random.nextInt(levels.size())));
    record.setField("userid", random.nextInt(10000000));
    record.setField("api_version", "1.0");
    record.setField("command", eventId);
    partitionedFanoutWriter.write(record);
}

AppendFiles appendFiles = table.newAppend();
Arrays.stream(partitionedFanoutWriter.dataFiles()).forEach(appendFiles::appendFile);
Snapshot newSnapshot = appendFiles.apply();
appendFiles.commit();

Query

I'd like to filter data by bucket partition,but it seems that no data is being retrieved. I have confirmed that the data exists, and I can retrieve it using other fields.

 // empty result
  CloseableIterable<Record> result = IcebergGenerics.read(tbl)
         .where(Expressions.equal(
                  "event_id"
                  , "9c83f47c-9a07-4a6b-949c-3bedc31852fe"))
          .build();

 // empty result
  CloseableIterable<Record> result = IcebergGenerics.read(tbl)
         .where(Expressions.equal(Expressions.bucket("event_id", 10), 1))
          .build();

 // has result  Record(9c83f47c-9a07-4a6b-949c-3bedc31852fe, info, 2377306, 1.0, 9c83f47c-9a07-4a6b-949c-3bedc31852fe)
  CloseableIterable<Record> result = IcebergGenerics.read(tbl)
         .where(Expressions.equal(
                  "command"
                  , "9c83f47c-9a07-4a6b-949c-3bedc31852fe"))
          .build();
@jiyis jiyis added the question Further information is requested label Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant