-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazify more work in builtins targets #122703
Conversation
These commits modify compiler targets. |
This comment has been minimized.
This comment has been minimized.
0.2ms is not a lot, even if you amortize it across hundreds of rustc invocations, and this does make the code more complicated. Is this really worth it? |
Actually, now that I remeasured it, the difference is big than I previously though. With stage1 the minimum wall-time measured is now the same, while before with nightly (so way more aggressive optimization) the minimum was never below 0.4ms; therefore this PR has the potential to nearly eliminate the cost of loading all the well known cfgs. So yes, I definitively think it's worth it. Note that the measurement were done on a laptop where the noise is non-negligible, that's why I have 50 rounds of warmup, so the wall-time has to be taken with some precaution; but what's clear even with the noise is the clear decrease in wall-time.
Regarding the code complexity, the vast majority of it comes from adapting (ie adding Adding new targets shouldn't be more difficult than before. |
I dislike this for the same reason as #122207, but at much larger scale. |
I understand why you don't like #122207, I also don't like it much, I mainly proposed that one for discussion and perf test. However I think this PR is quite different, it goes into the direction of static target definitions, which IMO is sufficient in it self, even if it doesn't bring any perf improvements. @petrochenkov Could you elaborate on why you don't like this PR? EDIT: To elaborate a bit more, I always found it weird that the target specifications were not immutable, for the most part they are, except for some Apple shenanigans; and if they were and const-eval was a bit more permissive we could have them in a simple EDIT2: Yes, I realized I'm going into the static discussions, even through I said I wouldn't; but I really think this goes into a good direction by permitting more data to be static. |
This comment was marked as resolved.
This comment was marked as resolved.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Lazify more work in builtins targets This PR make some fields in `Target` and `TargetOptions` lazifyable. This is done in order to reduce the impact of check-cfg which needs to load all the built-ins targets but doesn't need all the fields. In paticular this PR introduce a `MaybeLazy<T>` struct to support a borrowed, owned and lazy state. The fields that are changed are: - `LinkArgs` fields (access to env + allocate): ~54 023 236 ins[^1] -> ~53 478 072 ins - `CrtObjects` fields (allocate): ~53 478 072 ins -> ~53 127 832 ins - `llvm_name` field (access to env + allocate): ~53 127 832 ins -> ~53 057 110 ins This brings around -1 000 000 ins*tructions* and -0.2/0.3ms of less wall-time (with some measurement even having the same minimum wall-time with or without check-cfg) 🎉. *This PR is also a step further to completely `static`-fying all the build-in targets, if we one day we decide to do it, but that's a separate conversion.* [^1]: This is the total number of instructions as measured with `cachegrind` for the entire `rustc` invocation on a cargo --lib hello world, variance was really low (<15 000 ins). In the scale of the check-cfg feature, last time I measured `CheckCfg::fill_well_known` was around ~2.8 millions ins. r? `@petrochenkov`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c3871b5): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 676.654s -> 674.986s (-0.25%) |
rust-lang/cargo#13571 has been merged, the PR has been rebased and the latest perf run shows some clear improvements across multiple benchmarks 🎉. @rustbot labels -S-blocked +S-waiting-on-review |
@Urgau Could you give a link to the PR with perf regressions that first enables cfg checking in a way observable to perf checking on merge? |
Lazify more work in builtins targets This PR make some fields in `Target` and `TargetOptions` lazy. This is done in order to reduce the impact of check-cfg which needs to load all the built-ins targets but doesn't need all the fields. In particular this PR introduce a `MaybeLazy<T>` struct to support a borrowed, owned and lazy state. ~~The fields that are changed are:~~ - ~~`LinkArgs` fields (access to env + allocate): 54 023 236 ins[^1] -> 53 478 072 ins~~ - ~~`CrtObjects` fields (allocate): 53 478 072 ins -> 53 127 832 ins~~ - ~~`llvm_name` field (access to env + allocate): 53 127 832 ins -> 53 057 110 ins~~ ~~This brings around -1 000 000 ins*tructions* and -0.2/0.3ms of less wall-time (with some measurement even having the same minimum wall-time with or without check-cfg) 🎉.~~ See the latest perf run for the actual improvements, rust-lang#122703 (comment). *This PR is also a step further to completely `static`-fying all the build-in targets, if we one day we decide to do it, but that's a separate conversion.* [^1]: This is the total number of instructions as measured with `cachegrind` for the entire `rustc` invocation on a cargo --lib hello world, variance was really low (<15 000 ins). In the scale of the check-cfg feature, last time I measured `CheckCfg::fill_well_known` was around ~2.8 millions ins. r? `@petrochenkov`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
Indeed. This has now been done for CRT objects and link args. (pushed a separate commits, for easier review and maintenance for me) The latest (and most accurate) perf run doesn't show as much perf improvement as the first one, I tried to improve more but it didn't lead anywhere. (it's worth noting that they are not directly comparable given that they are more than a month apart - maybe worth redoing a perf run without the latest changes?) @rustbot ready |
Let's maybe split these changes and try to benchmark and land them separately. |
by adding `TargetOptions::link_args_list` to handle multi link args
Handle Apple targets
Lazify `Target::llvm_target` field This PR lazify the `Target::llvm_target` field by introducing `MaybeLazy`, a 3-way lazy container (borrowed, owned and lazied state). Split from rust-lang#122703 r? `@petrochenkov`
Lazify CRT objects fields initilization This PR lazify the CRT objects by introducing `MaybeLazy`, a 3-way lazy container (borrowed, owned and lazied state). Split from rust-lang#122703 r? `@petrochenkov`
☔ The latest upstream changes (presumably #126907) made this pull request unmergeable. Please resolve the merge conflicts. |
Lazify `TargetOptions::*link_args` fields This PR lazify the link args by introducing `MaybeLazy`, a 3-way lazy container (borrowed, owned and lazied state). Split from rust-lang#122703 r? `@petrochenkov`
#127992 is closed so closing this as well. |
This PR make some fields in
Target
andTargetOptions
lazy.This is done in order to reduce the impact of check-cfg which needs to load all the built-ins targets but doesn't need all the fields.
In particular this PR introduce a
MaybeLazy<T>
struct to support a borrowed, owned and lazy state.The fields that are changed are:LinkArgs
fields (access to env + allocate): 54 023 236 ins1 -> 53 478 072 insCrtObjects
fields (allocate): 53 478 072 ins -> 53 127 832 insllvm_name
field (access to env + allocate): 53 127 832 ins -> 53 057 110 insThis brings around -1 000 000 instructions and -0.2/0.3ms of less wall-time (with some measurement even having the same minimum wall-time with or without check-cfg) 🎉.See the latest perf run for the actual improvements, #122703 (comment).
This PR is also a step further to completely
static
-fying all the build-in targets, if we one day we decide to do it, but that's a separate conversion.r? @petrochenkov
Footnotes
This is the total number of instructions as measured with
cachegrind
for the entirerustc
invocation on a cargo --lib hello world, variance was really low (<15 000 ins).In the scale of the check-cfg feature, last time I measured
CheckCfg::fill_well_known
was around ~2.8 millions ins. ↩