Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable filters.python to specify which cloud dimensions will be returned in the pipeline arrays #175

Closed
erobeck opened this issue Aug 21, 2024 · 2 comments · Fixed by #184

Comments

@erobeck
Copy link

erobeck commented Aug 21, 2024

Recommendation:

  • Provide an alternative method/function to access the arrays. The new method would include an optional iterable argument for the requested dimensions, defaulting to all dimensions.
  • Or provide an optional list element in the Python filter configuration to identify which dimensions are to be returned in the array. The new option would be valid whether the filter’s “function” option is applied or not. The filter would then be used to a) provide a callback function and/or b) identify which arrays to return.
  • Or create another Python filter solely for the purpose of identifying which dimensions to return in the arrays.

Justification:

  • Any solution to reduce memory usage could significantly improve the chances that a containerized process, like Docker, would not fail due to exceeding allocated memory.
  • A streamable pipeline could be used to help reduce the memory used for the returned arrays. However, when using a Python filter, among other filters, the PDAL pipeline is no longer streamable. This results in the entire dataset returned. That dataset can be extremely large when it contains hundreds of millions of points and has dozens of dimensions.
@abellgithub
Copy link
Collaborator

I don't understand. Only the dimensions that are referenced are copied.

@erobeck
Copy link
Author

erobeck commented Aug 21, 2024

The requested enhancement is for the returned pipeline arrays, i.e., Pipeline.arrays. To best of my knowledge, you cannot dictate dimensions while accessing the arrays property/attribute; only afterwards, i.e., arrays[0][[list of dimensions]].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants