Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-input pipelines don't construct properly #121

Open
hobu opened this issue Jul 22, 2022 · 3 comments
Open

Multi-input pipelines don't construct properly #121

hobu opened this issue Jul 22, 2022 · 3 comments

Comments

@hobu
Copy link
Member

hobu commented Jul 22, 2022

@gsakkis Complex pipelines with branches in them don't seem to construct properly with the bindings. It could be my usage is incorrect or not being provided as expected, or the bindings don't support branched pipelines as currently written. Can you take a look?

Expected output


Before => reproject_before \ 
                             => merge
After => reproject_after   /

Produced output

But I end up with

Before => before_reprojection \
                                => after_reprojection => merge
After =>                      / 

Example code


before = pdal.Reader.las("before.las")
after = pdal.Reader.las("after.las")

reproject_before = pdal.Filter.reprojection(out_srs="EPSG:26915")
reproject_after = pdal.Filter.reprojection(out_srs="EPSG:26915")

merge = ((before |reproject_before) | (after | reproject_after)) | pdal.Filter.merge()

print (merge.pipeline)

Example output

{
  "pipeline":
  [
    {
      "filename": "before.las",
      "tag": "readers_las1",
      "type": "readers.las"
    },
    {
      "inputs":
      [
        "readers_las1"
      ],
      "out_srs": "EPSG:26915",
      "tag": "filters_reprojection1",
      "type": "filters.reprojection"
    },
    {
      "filename": "after.las",
      "tag": "readers_las2",
      "type": "readers.las"
    },
    {
      "inputs":
      [
        "filters_reprojection1",
        "readers_las2"
      ],
      "out_srs": "EPSG:26915",
      "tag": "filters_reprojection2",
      "type": "filters.reprojection"
    },
    {
      "inputs":
      [
        "filters_reprojection2"
      ],
      "tag": "filters_merge1",
      "type": "filters.merge"
    }
  ]
}

@gsakkis
Copy link
Collaborator

gsakkis commented Jul 25, 2022

@hobu I took a look; some notes:

  1. Currently the pdal-python pipe operator all it does is concatenate the stage(s) of the piped stages/pipelines. That is, (before | reproject_before) | (after | reproject_after) is equivalent to before | reproject_before | after | reproject_after. Do you expect these to have different semantics in general and if yes how?

  2. A side-effect of doing a simple stage concatenation and nothing else is that pdal-python does not set implicitly the inputs of any stage, it relies on pdal-core to determine the inputs according to the algorithm described here. This is why the inputs of reproject_after is determined as ["filters_reprojection1", "readers_las2"].

  3. Regardless of the previous points, a pipeline such as p1 | p2 | pdal.Filter.merge() (where p1 = before | reproject_before and p2 = after | reproject_after) does not (and cannot) infer that the inputs of merge are p1 and p2. The only reasonable semantics of this pipeline is that the input of p2 is p1 and the input of merge is p2.

    I can think of two ways to support multiple inputs:

    • Specify them explicitly:
      p1 | p2 | pdal.Filter.merge(inputs=["reproject_before", "reproject_after"])
      This should have been working today but it doesn't (due to (1) and/or (2)).
    • Introduce a new syntax to support multiple inputs, for example allow the left hand side to be a list/tuple:
      (p1, p2) | pdal.Filter.merge()
      This needs some more thought and could be addressed as a separate feature.

@hobu
Copy link
Member Author

hobu commented Jul 26, 2022

The second syntax looks easier to read, but can it actually be made to work as we expect?

I guess I would be happy with the first proposed syntax working even if it is verbose to write.

@gsakkis
Copy link
Collaborator

gsakkis commented Jul 27, 2022

Regardless of the syntax for multiple inputs, the first thing to be decided is the (1) above: should the pipe operator be (left) associative or not? In other words is

[a]    (reader1 | filter1) | (reader2 | filter2)

equivalent to

[b]    (((reader1 | filter1) | reader2) | filter2)

(where [b] can be simplified to reader1 | filter1 | reader2 | filter2)?

  • If yes, what's the implied inputs of filter2? According to the pdal core algorithm (which pdal-python inherits), it is [filter1, reader2].
  • If not, do we want the implied inputs of filter2 to be [reader2] in [a] and [filter1, reader2] in [b]?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants