Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nullable bug of Arrow MapVector in Bridge.cpp #11214

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

whutjs
Copy link

@whutjs whutjs commented Oct 10, 2024

There is a constraint defined in arrow/format/Schema.fbs, which requires the MapVector itself and the key of MapVector to be not nullable.

/// Neither the "entries" field nor the "key" field may be nullable.

There are also two not-nullable constraints in the Java implementation of MapVector:

  1. Map data should be a non-nullable struct type;
    https://github.com/apache/arrow/blob/d4516c5386f84619dfdf2a9f72fed6d7df89704c/java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java#L98
  2. Map data key type should be a non-nullable;
    https://github.com/apache/arrow/blob/d4516c5386f84619dfdf2a9f72fed6d7df89704c/java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java#L105

However, When the velox convert the MapVector to Arrow MapVector, velox will set the flag of MapVector to nullable, which violates these constraints:

arrowSchema.flags = ARROW_FLAG_NULLABLE;

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2024
Copy link

netlify bot commented Oct 10, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit ae1f99a
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6715c143f3675000087cb26b

Copy link
Contributor

@Yuhta Yuhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a test for this as well?

@whutjs
Copy link
Author

whutjs commented Oct 11, 2024

@Yuhta thanks for your review. I add some assertions in the ArrowBridgeSchemaTest to verify that the flag of MapVector should not be nullable.

@whutjs whutjs closed this Oct 11, 2024
@whutjs whutjs reopened this Oct 11, 2024
@whutjs whutjs changed the title Set the flag of Arrow MapVector to not-nullable in Bridge.cpp Fix nullable bug of Arrow MapVector in Bridge.cpp Oct 14, 2024
@whutjs
Copy link
Author

whutjs commented Oct 14, 2024

@mbasmanova Would you please take a look at this? Thanks.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whutjs Thank you for the fix. Would you update the PR description to clarify which constraint was violated and, if possible, provide a reference to Arrow documentation that describes that constraint.

@@ -1447,6 +1451,11 @@ void exportToArrow(
maps.getNullCount());
exportToArrow(rows, *child, options);
child->name = "entries";
// Map data should be a non-nullable struct type
child->flags = clearArrowNullableFlag(child->flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to change clearArrowNullableFlag take flags as in/out parameter?

clearArrowNullableFlag(child->flags);

Would it also make sense to remove 'Arrow' from the function name? It seems to be implied.

@whutjs whutjs force-pushed the bug/arrow_map_vector_not_nullable branch from edd660c to 244462e Compare October 15, 2024 03:23
@whutjs
Copy link
Author

whutjs commented Oct 15, 2024

@mbasmanova thanks for your review. I have modified this PR according to your suggestion.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Oct 15, 2024
@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kgpai
Copy link
Contributor

kgpai commented Oct 18, 2024

@whutjs Can you rebase against latest main and update ?

Verify MapVector schema is not null in ArrowBridgeSchemaTest.cpp
@whutjs whutjs force-pushed the bug/arrow_map_vector_not_nullable branch from 244462e to ae1f99a Compare October 21, 2024 02:49
@whutjs
Copy link
Author

whutjs commented Oct 21, 2024

@kgpai DONE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants