-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: decode request in threadpool #290
Feat: decode request in threadpool #290
Conversation
Is this what was intended, or have I got something wrong? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for opening the PR @deependujha. Looks great so far 🚀
We should provide control to users whether to enable decoding using a threadpool. It might come useful for cases when decode request is CPU intensive or not thread-safe.
Possible API:
class LitServer:
def __init__(..., concurrent_decode:bool=True)
cc: @lantiga
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #290 +/- ##
===================================
- Coverage 95% 95% -0%
===================================
Files 19 19
Lines 1244 1260 +16
===================================
+ Hits 1182 1196 +14
- Misses 62 64 +2 |
Co-authored-by: Aniket Maurya <[email protected]>
for more information, see https://pre-commit.ci
@deependujha nice! love the "faster" claim haha. do you have a benchmark or something showing that it is indeed faster and by how much? |
Hi @williamFalcon , I stated so on the basis of what was claimed in the original issue. Testing on 4 GPU machine does not reflect the same. Throughput stays in the similar range (300-380), averaging at 330. benchmark code by Aniket I'll try testing it with an audio model. If performance doesn't differ much, I'll close the PR. |
): | ||
if concurrent_decode: | ||
executor = concurrent.futures.ThreadPoolExecutor(max_workers=os.cpu_count()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can limit number of threads to batch size
executor = concurrent.futures.ThreadPoolExecutor(max_workers=os.cpu_count()) | |
executor = concurrent.futures.ThreadPoolExecutor(max_workers=min(max_batch_size, os.cpu_count())) |
tough luck, tested with the audio model but no significant difference in performance. I'll go ahead and close the PR. |
@deependujha how does your |
Before submitting
faster request decoding if it contains binary data (images, audios, etc).
GOOD:
faster decoding in batched loop
BAD:
Might be overhead in the case of simple requests
What does this PR do?
Fixes #166
Speed up reference here
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃