Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing batches of input values to a lateral subquery #5348

Open
philrz opened this issue Oct 18, 2024 · 1 comment
Open

Passing batches of input values to a lateral subquery #5348

philrz opened this issue Oct 18, 2024 · 1 comment

Comments

@philrz
Copy link
Contributor

philrz commented Oct 18, 2024

tl;dr

A proposed new batch operator:

batch 2 => (
  collect(this) |  // do something with 2-element array
)

Details

Repro is with super commit 0af34e0.

A community user recently asked about this in the same Slack thread that spawned this comment. In their original question they were attempting to operate on just two back-to-back input values, but then widened their inquiry to the more general case of needing to handle an indefinite number of back-to-back input vales of a similar nature.

Let's use the example of 6 incrementing numbers.

$ seq 6
1
2
3
4
5
6

Using what's in the language currently, much like the response to the user's original question, it seems this would require some initial aggregation of the input pairs into a complex value and then picking apart the complex values, potentially using over and a lateral subquery. For instance if we wanted to add each pairing of numbers together:

$ super -version
Version: v1.18.0-35-g0af34e0f

$ seq 6 | zq -z '
yield {i:(count()+1)/2,v:this}
| values:=collect(v) by i
| sort i
| over values => (
  sum(this)
)' -

3
7
11

However, @mccanne pointed out that for a use case like this a purpose-built operator like batch proposed above would provide a more direct and intuitive way to get the values into the lateral subquery.

@philrz
Copy link
Contributor Author

philrz commented Oct 23, 2024

For comparison, here's a SQL approach to get the same result. I confess I had ChatGPT come up with this, so maybe there's a better way.

$ duckdb --version
v1.1.2 f680b7d08f

$ seq 6 | duckdb -c "                          
CREATE TEMP TABLE NumberedRows AS
    SELECT 
        column0 AS value,
        ROW_NUMBER() OVER () AS rn
    FROM 
        read_csv('/dev/stdin');

SELECT 
    (n1.value + n2.value) AS sum_value
FROM 
    NumberedRows n1
JOIN 
    NumberedRows n2 ON n1.rn = n2.rn - 1
WHERE 
    n1.rn % 2 = 1;"
┌───────────┐
│ sum_value │
│   int64   │
├───────────┤
│         3 │
│         7 │
│        11 │
└───────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant