-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Deleting newlines from input makes it build 3% faster end-to-end #137681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Measuring on Intel(R) Xeon(R) CPU E5645 @ 2.40GHz:
Also, this kinda goes without saying, I don't think this should take that long, and to hell with a 3% difference then, but
#include <cstdint>
struct xyz { std::uint8_t x, y, z; };
static const constexpr GRID_SIZE = 100;
static const constexpr GRID_GILBERT[GRID_SIZE * GRID_SIZE * GRID_SIZE] =
#include "100.cpp"
;
int main() {} $ time c++ bugowcy.cpp -o bugowcy
real 0m19.012s
user 0m18.450s
sys 0m0.559s
$ time g++ bugowcy.cpp -o bugowcy
real 0m10.090s
user 0m8.900s
sys 0m0.697s and rustc is not gonna get much better than clang anyway (hence |
Did you run each compilation once or many times taking the average? The difference between those cases is so small that it may well be noise. Especially modern cpu's with dynamic frequency scaling can easily have performance differ by several percent if you run a benchmark several times in succession. |
I ran this a few times on the i7 in vivo and noticed the differences for newline/space there, but ran the in-vitro test-cases once (admittedly they do look pretty close to variance). See comment for 10x runs of each on the Xeon which I trust to produce useable perf data. |
It could be that the extra time is spent searching for and storing the positions of newlines in the source map. The source map is used for debug info and formatting error messages. Your code is a bit of a worst case scenario for overhead of the source map. For most code the vast majority of the is spent checking and compiling all functions, but if you have a single large array stored in a static there is no work being done on compiling functions and all time is spent parsing and lowering the static to an object file. |
In the live code I extracted this from the build takes 39s (incl. the 32s for the array alone) so the difference is still noticeable in vivo and doesn't seem to get compensated away with more stuff to do. (I'm pretty sure most of the other 7s can be attributed to a |
Even your Xeon timings look to have high variance, so here's a run from a quieter machine. I'm not personally seeing the 3% difference. > hyperfine -w1 -r10 -N 'rustc bugowcy.rs' 'rustc bugowcys.rs' 'rustc bugowcyd.rs'
Benchmark 1: rustc bugowcy.rs
Time (mean ± σ): 20.345 s ± 0.084 s [User: 16.190 s, System: 4.214 s]
Range (min … max): 20.243 s … 20.459 s 10 runs
Benchmark 2: rustc bugowcys.rs
Time (mean ± σ): 20.353 s ± 0.079 s [User: 16.213 s, System: 4.204 s]
Range (min … max): 20.264 s … 20.543 s 10 runs
Benchmark 3: rustc bugowcyd.rs
Time (mean ± σ): 20.364 s ± 0.085 s [User: 16.164 s, System: 4.266 s]
Range (min … max): 20.206 s … 20.472 s 10 runs
Summary
rustc bugowcy.rs ran
1.00 ± 0.01 times faster than rustc bugowcys.rs
1.00 ± 0.01 times faster than rustc bugowcyd.rs |
So either it's not real or it's real but only on older microarchitectures, and only applies to deeply pathological input in either case. Whichever way it goes it's not meaningfully actionable. Thanks for testing. |
(The actual data in my use-case is indices in a 100³ gilbert curve but isn't relevant here.)
(this is the code used in #137678)
Compare to
(this is the code used in #137680, it's the same size as the original)
And to
(1`000`002 bytes smaller)
100s vs 100: ~3% faster
100d vs 100s: within variance I think
Measurements from Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Meta
The text was updated successfully, but these errors were encountered: