Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job batching and limit for requests with many files #19

Open
mzur opened this issue Sep 26, 2023 · 2 comments
Open

Job batching and limit for requests with many files #19

mzur opened this issue Sep 26, 2023 · 2 comments

Comments

@mzur
Copy link
Member

mzur commented Sep 26, 2023

We should implement a limit to the maximum number of files per storage request. On biigle.de, we have a user who created several requests with multiple 10k or even more than 100k images. Maybe we could implement a 10k limit. If users want to upload more, they have to chunk the files into several storage requests.

One issue that too many files can cause is too long run times for the queue jobs. Also they can theoretically spam the service with millions of small files (as long as the total size is within their quota but that's easy).

@mzur mzur moved this to Medium Priority in BIIGLE Roadmap Sep 26, 2023
@mzur mzur changed the title Limit maximum number of files Limit maximum number of files per request Sep 26, 2023
@dlangenk
Copy link
Member

Wouldn't it make more sense to chunk the storage request into multiple queue jobs instead of limiting it on the user site? In the end both solutions would result in the same outcome (if the user submits multiple requests in your case), but having similar data as one storage request seems more manageable.

@mzur
Copy link
Member Author

mzur commented Sep 26, 2023

That's also a good idea! However, I think we need some kind of limit in any case. Maybe a higher one, then (100k, 500k)?

The ApproveStorageRequest job can be split up into several smaller jobs. The copying can be done as batched jobs (each copies 10k files) then the user is notified and then the pending directory is deleted.

@mzur mzur changed the title Limit maximum number of files per request Job batching and limit for requests with many files Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Medium Priority
Development

No branches or pull requests

2 participants