-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34698: [C++][Acero] Add node that emits explicit ordering after asserting order #44738
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
e66878f
to
937695d
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably not qualified to review this thoroughly but the CI failure is related:
[ FAILED ] TestScannerThreading/TestScanner.ScanOrderingFail/0Serial1d1b1024r, where GetParam() = serial-1d-1b-1024i (4 ms)
[ RUN ] TestScannerThreading/TestScanner.ScanOrderingFail/1Serial2d16b1024r
/arrow/cpp/src/arrow/dataset/scanner_test.cc:1040: Failure
Expected equality of these values:
next.status().message()
Which is: "Data is not ordered\n/arrow/cpp/src/arrow/acero/assert_order_node.cc:190 AssertInBatchOrder(*record_batch, ordering_, comparators_)\n/arrow/cpp/src/arrow/acero/source_node.cc:158 output_->InputReceived(this, std::move(batch))"
"Data is not ordered"
With diff:
@@ -1,3 @@
Data is not ordered
-/arrow/cpp/src/arrow/acero/assert_order_node.cc:190 AssertInBatchOrder(*record_batch, ordering_, comparators_)
-/arrow/cpp/src/arrow/acero/source_node.cc:158 output_->InputReceived(this, std::move(batch))
There seems to be some stray output on that specific job.
b767e88
to
7750094
Compare
…De]sc assertions robust
7750094
to
2e30308
Compare
@raulcd thanks, assertions adjusted |
Rationale for this change
An acero node that turns an implicit ordering into an explicit ordering (rows sorted by some columns) is useful to re-use order that already exists in the data.
What changes are included in this PR?
This PR adds the
AssertOrderNode
that implements this logic. TheScanner
employs that node to turn the implicit ordering of theScanNode
into an explicit order as defined by user code viaScanBuilder.Ordering
.Are these changes tested?
There are unit tests for the
AssertOrderNode
as well as for theScanNode
andScanBuilder
.Are there any user-facing changes?
The following options has been added:
ordering
option added toScanOptions
Ordering
option added toScannerBuilder
AssertNodeOptions
class