Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add request splitting feature #77

Open
matthewhanson opened this issue Jul 30, 2021 · 1 comment
Open

Add request splitting feature #77

matthewhanson opened this issue Jul 30, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@matthewhanson
Copy link
Member

It's useful to be able to take a large request with many pages into smaller requests so they can be run in parallel or asynchronously. Datetime is convenient way to split up requests.

The code in the Cirrus stac-api feeder code is one example of doing this with sat-search.

The ItemSearch class should have a function(s) to return an array of Request objects. The simplest of these can just take in a num_requests parameter that will take the datetime range and split it into num_requests requests.

A more complicated function would take in the approximate number of Items to return in each request. A series of requests can be made to get number of hits to divide up the requests roughly equally (as in the above code link).

@matthewhanson matthewhanson added this to the 1.0.0 milestone Jul 30, 2021
@TomAugspurger
Copy link
Collaborator

https://nbviewer.jupyter.org/gist/TomAugspurger/ceadc4b2f8b7e4263ff172ee1ea76dbb has an example where we want to query many (100,000) points for not too long of a date range. In this case, it's more important / necessary to parallelize by space rather than time.

https://nbviewer.jupyter.org/gist/TomAugspurger/ceadc4b2f8b7e4263ff172ee1ea76dbb#Option-2:-Parallelize...-carefully. lays out the approach of using a Hilbert curve to spatially partition the points before (manually) making multiple requests to the item search endpoint. I don't know if that would be appropriate for pystac-client or not.

It also hints that async requests could help with performance (#4). We spend ~90% of our time waiting on I/O, and the rest on parsing JSON, so we're mostly (but not entirely) IO bound / non-blocking.

@matthewhanson matthewhanson modified the milestones: 0.4.0, 0.5.0 Apr 18, 2022
@gadomski gadomski modified the milestones: 0.5.0, 0.6.0 Aug 30, 2022
@gadomski gadomski added the enhancement New feature or request label Nov 9, 2022
@gadomski gadomski modified the milestones: 0.6.0, 0.7.0 Jan 27, 2023
@gadomski gadomski removed this from the 0.7.0 milestone May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants