-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local_working_copy: spawn snapshot job per directory, filter deleted files per job #5030
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main issue I'm concerned about is that I don't think we can rely on file_states
for correctness; see my comment on the last commit. I'm approving to unblock, you can re-request review later if you want.
comment (non-blocking): There's some very interesting implementation comments in the walk implementation for ignore
. At my last workplace, ignore
was faster than both my custom Rust walk implementation and our previous C++ walk implementation at traversing the filesystem. You might be interested to read through it sometime.
For example: There's this comment about DFS vs BFS on wide directory trees: https://github.com/BurntSushi/ripgrep/blob/79cbe89deb1151e703f4d91b19af9cdcc128b765/crates/ignore/src/walk.rs#L1405-L1408 (relevant if you need to accumulate lots of .gitignore
s).
For example: IIRC it uses a static work-stealing approach with a fixed buffer size (32 items?), unlike rayon
, which dynamically sizes the amount of work to steal (aiming for half the remaining work in the queue). (I don't know if it makes a significant performance difference.)
suggestion: For the future, I recommend you use 30 runs as a default count for benchmarking rather than 10. At my last workplace, I noticed that times didn't tend to converge unless I provided a higher amount. It aligns with the "conventional wisdom" (the stuff I vaguely remember from statistics class) where they suggested you needed 15–30 samples for the kinds of tests they covered.
question (non-blocking): In the commit message for ef6d9ef, is it also supposed to show "Benchmark 1"? Does that disappear because you provided --sort command
? Some of the other commit message benchmark results are missing jj-0
or jj-2
, so I had some trouble following the exact setup. (If it was intentionally or unintentionally removed, I don't think there's a need for you to regenerate those benchmarks.)
Philip brought this up on Discord and was wondering what the impact on Nixpkgs would be; results are as follows:
This seems reasonably consistent after a few runs, so it's 5% slower or somesuch. It's likely the depth and distribution of directories to files aren't quite as "fat" as some other repos; many directories in nixpkgs just have 1 file, even if they are deeply nested. Note that I use watchman for this repo anyway however, so this change would not meaningfully impact my usage of Nixpkgs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (non-blocking): In the commit message for ef6d9ef, is it also supposed to show "Benchmark 1"? Does that disappear because you provided
--sort command
?
I ran 3 benchmarks and split into two commit descriptions. That's why.
Yeah, the result in nixpkgs is bad. Appears that we need both per-directory |
Sounds good; a simple cutoff for file count may be enough (100% speculation). Note that like everything this is probably filesystem sensitive; I'll get around to testing on XFS sometime this weekend. The above results are ext4. |
ef6d9ef
to
6f9dc51
Compare
Uploaded this version. It's pretty okay on my Linux machine, ext4. The last patch is new, which I originally planned to send as a follow up PR. I'll probably split the first half instead. |
6f9dc51
to
ae0f8a8
Compare
Updated the benchmark. It now includes If needed, I'll extract the first 3 patches to separate PR. They just move codes around. The last patch wasn't included in the original PR. It can also be split to new PR. |
ae0f8a8
to
11ebf1a
Compare
… and state visit_directory() is big. Let's make it fit in one screen.
…files .try_for_each_with() wasn't needed because mpsc::Sender can be Sync from Rust 1.72.0.
It's annoying that rustfmt indents a large code block depending on the length of parameters and method calls needed to configure the parallel iterator.
…threshold This change basically means two things: a. a directory scan isn't split into too many small jobs, and b. a directory scan isn't blocked by recursive visit_directory() calls. Before, small jobs were created at each recursion depth, so there were silent time slice before these jobs started producing work. I don't know if this mitigates the issue martinvonz#4508, but it's slightly faster on my Linux machine. matcher.visit(dir) is moved to caller because it's silly to spawn an empty job. TreeState::snapshot() already checks that for the root path. Benchmark: 1. original 2. per-directory spawn (this patch) 3. per-directory deleted files (omitted) 4. shorter path comparison (omitted) gecko-dev (~357k files, ~25k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 764.9 ms ± 16.7 ms [User: 3274.7 ms, System: 2183.3 ms] Range (min … max): 731.9 ms … 814.2 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 710.7 ms ± 9.1 ms [User: 3070.7 ms, System: 2142.6 ms] Range (min … max): 695.9 ms … 740.1 ms 30 runs Relative speed comparison 1.89 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/gecko-dev debug snapshot 1.76 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot ``` linux (~87k files, ~6k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 268.2 ms ± 11.3 ms [User: 636.6 ms, System: 518.5 ms] Range (min … max): 247.5 ms … 295.2 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 242.3 ms ± 3.3 ms [User: 656.8 ms, System: 538.0 ms] Range (min … max): 236.9 ms … 252.3 ms 30 runs Relative speed comparison 1.40 ± 0.06 target/release-with-debug/jj-1 -R ~/mirrors/linux debug snapshot 1.27 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot ``` nixpkgs (~45k files, ~31k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 201.0 ms ± 8.5 ms [User: 929.3 ms, System: 917.6 ms] Range (min … max): 170.3 ms … 218.5 ms 30 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 190.7 ms ± 4.1 ms [User: 859.3 ms, System: 881.1 ms] Range (min … max): 184.6 ms … 202.4 ms 30 runs Relative speed comparison 1.24 ± 0.06 target/release-with-debug/jj-1 -R ~/mirrors/nixpkgs debug snapshot 1.18 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot ``` git (~4.5k files, 0.2k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 30 --runs 50 .. Benchmark 1: target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.3 ms ± 1.1 ms [User: 40.5 ms, System: 39.4 ms] Range (min … max): 28.3 ms … 35.7 ms 50 runs Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.6 ms ± 1.1 ms [User: 33.8 ms, System: 39.0 ms] Range (min … max): 29.0 ms … 35.0 ms 50 runs Relative speed comparison 1.05 ± 0.05 target/release-with-debug/jj-1 -R ~/mirrors/git debug snapshot 1.06 ± 0.05 target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot ``` - CPU: 8-core AMD Ryzen 7 PRO 4750U with Radeon Graphics (-MT MCP-) - speed/min/max: 1600/1400/1700 MHz Kernel: 6.11.10-amd64 x86_64 - Filesystem: ext4
This greatly reduces the amount of paths to be sent over the channel and the strings to be hashed. Benchmark: 1. original (omitted) 2. per-directory spawn (previous patch) 3. per-directory deleted files (this patch) 4. shorter path comparison (omitted) gecko-dev (~357k files, ~25k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 710.7 ms ± 9.1 ms [User: 3070.7 ms, System: 2142.6 ms] Range (min … max): 695.9 ms … 740.1 ms 30 runs Benchmark 3: target/release-with-debug/jj-3 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 480.1 ms ± 8.8 ms [User: 3190.5 ms, System: 2127.2 ms] Range (min … max): 471.2 ms … 509.8 ms 30 runs Relative speed comparison 1.76 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/gecko-dev debug snapshot 1.19 ± 0.03 target/release-with-debug/jj-3 -R ~/mirrors/gecko-dev debug snapshot ``` linux (~87k files, ~6k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 242.3 ms ± 3.3 ms [User: 656.8 ms, System: 538.0 ms] Range (min … max): 236.9 ms … 252.3 ms 30 runs Benchmark 3: target/release-with-debug/jj-3 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 204.2 ms ± 3.0 ms [User: 667.3 ms, System: 545.6 ms] Range (min … max): 197.1 ms … 209.2 ms 30 runs Relative speed comparison 1.27 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/linux debug snapshot 1.07 ± 0.02 target/release-with-debug/jj-3 -R ~/mirrors/linux debug snapshot ``` nixpkgs (~45k files, ~31k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 190.7 ms ± 4.1 ms [User: 859.3 ms, System: 881.1 ms] Range (min … max): 184.6 ms … 202.4 ms 30 runs Benchmark 3: target/release-with-debug/jj-3 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 173.3 ms ± 6.7 ms [User: 899.4 ms, System: 889.0 ms] Range (min … max): 166.5 ms … 197.9 ms 30 runs Relative speed comparison 1.18 ± 0.03 target/release-with-debug/jj-2 -R ~/mirrors/nixpkgs debug snapshot 1.07 ± 0.04 target/release-with-debug/jj-3 -R ~/mirrors/nixpkgs debug snapshot ``` git (~4.5k files, 0.2k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 30 --runs 50 .. Benchmark 2: target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot Time (mean ± σ): 30.6 ms ± 1.1 ms [User: 33.8 ms, System: 39.0 ms] Range (min … max): 29.0 ms … 35.0 ms 50 runs Benchmark 3: target/release-with-debug/jj-3 -R ~/mirrors/git debug snapshot Time (mean ± σ): 28.8 ms ± 1.0 ms [User: 33.0 ms, System: 37.6 ms] Range (min … max): 26.8 ms … 31.3 ms 50 runs Relative speed comparison 1.06 ± 0.05 target/release-with-debug/jj-2 -R ~/mirrors/git debug snapshot 1.00 target/release-with-debug/jj-3 -R ~/mirrors/git debug snapshot ```
Since all entries in filtered file states share the same directory prefix, we don't need to compare full file paths. The added functions take (path, name) instead of (path, sub_path) because the comparison can be slightly faster if the name is guaranteed to be a single path component. Benchmark: 1. original (omitted) 2. per-directory spawn (omitted) 3. per-directory deleted files (previous patch) 4. shorter path comparison (this patch) gecko-dev (~357k files, ~25k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 3: target/release-with-debug/jj-3 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 480.1 ms ± 8.8 ms [User: 3190.5 ms, System: 2127.2 ms] Range (min … max): 471.2 ms … 509.8 ms 30 runs Benchmark 4: target/release-with-debug/jj-4 -R ~/mirrors/gecko-dev debug snapshot Time (mean ± σ): 404.0 ms ± 4.4 ms [User: 1933.4 ms, System: 2148.8 ms] Range (min … max): 396.4 ms … 416.9 ms 30 runs Relative speed comparison 1.19 ± 0.03 target/release-with-debug/jj-3 -R ~/mirrors/gecko-dev debug snapshot 1.00 target/release-with-debug/jj-4 -R ~/mirrors/gecko-dev debug snapshot ``` linux (~87k files, ~6k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 3: target/release-with-debug/jj-3 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 204.2 ms ± 3.0 ms [User: 667.3 ms, System: 545.6 ms] Range (min … max): 197.1 ms … 209.2 ms 30 runs Benchmark 4: target/release-with-debug/jj-4 -R ~/mirrors/linux debug snapshot Time (mean ± σ): 191.3 ms ± 3.3 ms [User: 467.4 ms, System: 542.2 ms] Range (min … max): 186.1 ms … 200.6 ms 30 runs Relative speed comparison 1.07 ± 0.02 target/release-with-debug/jj-3 -R ~/mirrors/linux debug snapshot 1.00 target/release-with-debug/jj-4 -R ~/mirrors/linux debug snapshot ``` nixpkgs (~45k files, ~31k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 3 --runs 30 .. Benchmark 3: target/release-with-debug/jj-3 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 173.3 ms ± 6.7 ms [User: 899.4 ms, System: 889.0 ms] Range (min … max): 166.5 ms … 197.9 ms 30 runs Benchmark 4: target/release-with-debug/jj-4 -R ~/mirrors/nixpkgs debug snapshot Time (mean ± σ): 161.7 ms ± 2.5 ms [User: 739.1 ms, System: 881.7 ms] Range (min … max): 156.5 ms … 166.4 ms 30 runs Relative speed comparison 1.07 ± 0.04 target/release-with-debug/jj-3 -R ~/mirrors/nixpkgs debug snapshot 1.00 target/release-with-debug/jj-4 -R ~/mirrors/nixpkgs debug snapshot ``` git (~4.5k files, 0.2k dirs) ``` % JJ_CONFIG=/dev/null hyperfine --sort command --warmup 30 --runs 50 .. Benchmark 3: target/release-with-debug/jj-3 -R ~/mirrors/git debug snapshot Time (mean ± σ): 28.8 ms ± 1.0 ms [User: 33.0 ms, System: 37.6 ms] Range (min … max): 26.8 ms … 31.3 ms 50 runs Benchmark 4: target/release-with-debug/jj-4 -R ~/mirrors/git debug snapshot Time (mean ± σ): 28.8 ms ± 1.9 ms [User: 30.3 ms, System: 36.5 ms] Range (min … max): 26.0 ms … 39.2 ms 50 runs Relative speed comparison 1.00 target/release-with-debug/jj-3 -R ~/mirrors/git debug snapshot 1.00 ± 0.08 target/release-with-debug/jj-4 -R ~/mirrors/git debug snapshot ```
11ebf1a
to
7401006
Compare
Checklist
If applicable:
CHANGELOG.md
Benchmark:
gecko-dev (~357k files, ~25k dirs)
linux (~87k files, ~6k dirs)
nixpkgs (~45k files, ~31k dirs)
git (~4.5k files, 0.2k dirs)