-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(expressions): speed up .describe()
and .info()
expression construction
#9684
base: main
Are you sure you want to change the base?
Conversation
.describe()
and .info()
methods.describe()
and .info()
methods
ad75a05
to
0e97036
Compare
.describe()
and .info()
methods.describe()
and .info()
expression construction
0e97036
to
353de24
Compare
|
Nope, I'm working on an improvement that will be both faster than the array optimization for expression construction and continue to support the backends that don't support arrays.
Much better, but I haven't quantified it yet, just ran it locally on the original case of 1.5m rows and 450 columns.
Seems like a solid follow up! |
2cdda1a
to
520913d
Compare
@jcrist Here are the benchmarks comparing
The Now to see about failing tests ... |
Ah, the slowdown is because I changed the implementation to always produce all aggregates! |
520913d
to
667e75d
Compare
667e75d
to
a6c0525
Compare
Found some more performance laying around in the form of not converting all memtables to pyarrow (which is noticeably expensive for wide tables). I'll put that in a separate PR. |
Latest benchmarks show improvements across the board for
|
Going to benchmark this with pyarrow to properly measure the speedup of these ops without the overhead of pandas memtable conversion |
e08b869
to
1004242
Compare
Added a benchmark for "end-to-end" which corresponds to construction + compilation + execution.
|
4f3eadb
to
9c3b11e
Compare
This is turning out to be kind of a mess. I'm having to move the translation down a layer in some cases which is getting annoying. I'll take the weekend to mull it over. |
I would like to get this into 9.3, but I'm not sure it'll happen. Just need to grind on it some more. |
9c3b11e
to
d4f84f7
Compare
Speed up
.info()
and.describe()
methods. This approach builds an array containing each aggregate, and then jointly unnests them in a single select. This safely bypasses the validation that happens in.agg
method, and only requires a single.agg
call at the end, to unnest the built arrays.Closes #9639.