Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: read and parse file contents after having send the response headers #537

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Jeroeny
Copy link

@Jeroeny Jeroeny commented Jan 5, 2022

When running composer update (or other commands that read the registry's package data), composer will fetch the p2/<organization>/<package>.json endpoint for every package. It will first read the response headers, to determine if the contents of the response should be downloaded. The contents of the response will only be downloaded if the local composer cache is outdated (the last-modified of the server is later then that of the client's files).
If the cache is up-to-date, the rest of the response will not be downloaded (making the update command faster).

However, server-side, repman always reads and parses the package data. Performance can possibly be improved by first determining the lastModified and ensuring the headers are output before the reading and parsing of the package data.

Maybe I misinterpreted the code and this was already happening, but it didn't seem like it to me. Also this PR is not yet tested, but I'd like to get the proposal confirmed first.

Docs reference:

https://getcomposer.org/doc/05-repositories.md

Caching is done via the use of If-Modified-Since header, so make sure you return Last-Modified headers and that they are accurate.

@akondas
Copy link
Member

akondas commented Jan 6, 2022

But lastModified is set for each response, I thought it was enough 🤔

Can you provide some speed test (before and after) so we can see how it affects performance?

Generally, I am yes for this PR, but before we start polishing it, I would like to know how much it is worth

@Jeroeny
Copy link
Author

Jeroeny commented Jan 6, 2022

@akondas It was a bit of a challenge to setup locally, but I've got it now. I've done various tests with both parallel batches of requests and sequential. Though not always consistent, It seems the responsetime is up to ~5% less. E.g. when doing 100 requests, the average response time was 113ms vs 119ms. There could be more gain on setups with worse disk performance. Not sure how significant the difference is. But if a composer update checks 100+ packages, it may add up.

@akondas
Copy link
Member

akondas commented Jan 9, 2022

in theory you are right, but I need a little more time to think about it, because the improvment is relatively small,
but good job anyway 😉

@Jeroeny
Copy link
Author

Jeroeny commented Apr 6, 2022

@akondas This might have more performance gains for setups with network-mounted disks. I think the change is worth to try. Could you reconsider the PR? I've updated the code to use \fwrite(), like the other StreamResponse implementations do too.

@akondas
Copy link
Member

akondas commented Apr 6, 2022

Looks ok, we can give it a try. Do you have any benchmarks that can show how much we gain with this PR?

@Jeroeny
Copy link
Author

Jeroeny commented Apr 6, 2022

Looks ok, we can give it a try. Do you have any benchmarks that can show how much we gain with this PR?

I currently do not have a setup to do extensive benchmarks, nor the time to set it up unfortunately.

@Jeroeny Jeroeny force-pushed the stream branch 2 times, most recently from ff73147 to 4a3f401 Compare September 6, 2022 10:28
@Jeroeny
Copy link
Author

Jeroeny commented Sep 6, 2022

@akondas I've discovered that Headers were not outputted before the response content yet. The entire response was still buffered. An additional response header and fwrite call fixed this.
In our setup where files are on a mounted remote network filesystem, this gives significant performance increase. Otherwise calls to Repman can take more than a second.

It could be even more performant if lastModified was retrieved from this->packageQuery->getAllNames (a db query), so that no files would need to be read to calculate the response headers. But that would be for a separate PR.

I hope you can (re)consider the PR.

@Jeroeny Jeroeny force-pushed the stream branch 14 times, most recently from 8a07cf6 to ea47f8a Compare September 7, 2022 08:42
@codecov
Copy link

codecov bot commented Sep 7, 2022

Codecov Report

Merging #537 (ab441a3) into master (50fe808) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##             master     #537   +/-   ##
=========================================
  Coverage     99.16%   99.16%           
- Complexity     1910     1913    +3     
=========================================
  Files           301      301           
  Lines          6072     6092   +20     
=========================================
+ Hits           6021     6041   +20     
  Misses           51       51           
Impacted Files Coverage Δ
src/Controller/RepoController.php 100.00% <100.00%> (ø)
src/Service/Organization/PackageManager.php 93.54% <100.00%> (+0.56%) ⬆️
...curity/PackageScanner/SensioLabsPackageScanner.php 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants