-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up kart diff #1018
Comments
Here is some rough profiling data from running a large diff, in a relatively recent Kart (I think v0.15.0)
More threads is certainly worth trying, also we could try a different JSON library. |
orjson has a good reputation for being the quickest.
garbage collection/deallocation if we build up a lot of dynamic objects? |
refs #1018 When generating a large (2GB) diff as JSON-Lines this takes 20-30% less time than the stdlib. It may be possible to use this in other places, but note that orjson doesn't support streaming encoding (iterencode), which means it is of limited utility where we're trying to stream JSON diffs of huge datasets. This change uses it for individual features in JSONL diffs only where the lack of iterencode() isn't a concern. orjson is MIT licensed.
refs #1018 When generating a large (2GB) diff as JSON-Lines this takes 20-30% less time than the stdlib. It may be possible to use this in other places, but note that orjson doesn't support streaming encoding (iterencode), which means it is of limited utility where we're trying to stream JSON diffs of huge datasets. This change uses it for individual features in JSONL diffs only where the lack of iterencode() isn't a concern. orjson is MIT licensed.
more profile, from my local kart after merging that orjson speedup: https://gist.github.com/craigds/2b8c9eac356aae8f61744a3cf8ff4c54 |
msgspec: https://jcristharif.com/msgspec/benchmarks.html#messagepack-serialization
|
40% faster? (0.427s cf 0.799s) And 70% faster for encoding. Looks like msgspec does json too? So could potentially not ship it and orjson? |
yes, looks like it performs similarly to orjson so we could use it for that too and avoid bundling both with Kart |
Generating a large diff using kart is quite slow
ie this full diff is 2.1 GB and takes 230s for Kart to generate as JSONL at 9MiB/s.
Describe the solution you'd like
Ideally these diffs could be made 5-10x as fast.
We should start with some profiling, although I suspect the limiting factor is git ODB access, which we've found difficult to speed up in the past. If that's the case then the most obvious speedup would be using multiple threads to fetch objects from the ODB.
The text was updated successfully, but these errors were encountered: