Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avg calculation with Parquet file in vector runtime seems more "off" #5530

Open
philrz opened this issue Dec 12, 2024 · 0 comments
Open

avg calculation with Parquet file in vector runtime seems more "off" #5530

philrz opened this issue Dec 12, 2024 · 0 comments

Comments

@philrz
Copy link
Contributor

philrz commented Dec 12, 2024

tl;dr

This is a follow-on from #5516. Repeating the avg calculation using the test data from that issue, consider the results in the table below. Treating the result 1058.5234017218875 from DuckDB as a baseline, with the different super formats and runtime options we see:

Format Runtime Result Delta
BSUP sequential 1058.523401720017 0.0000000018705
CSUP sequential 1058.523401720017 0.0000000018705
Parquet sequential 1058.523401720017 0.0000000018705
CSUP vector 1058.5234017218877 0.0000000000002
Parquet vector 1058.5297770606794 0.0063753387919

Details

Repro is with super commit 883ffd2.

The the runs that generated the results in the table above are in #5516 (comment) and the test data is linked from that issue.

Users are accustomed to seeing small differences in precision with floating point math, so I expect this might all be explained by something about parallel operations. Indeed, we've seen some non-deterministic floating point calculations with other database systems, for instance. However, seeing the delta as soon as the 4th digit with vector Parquet is surprising compared to out at 9+ digits like the others, so I figured I'd surface this in case it's worthy of closer scrutiny.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant