Skip to content

Deleting newlines from input makes it build 3% faster end-to-end #137681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nabijaczleweli opened this issue Feb 26, 2025 · 7 comments
Closed

Deleting newlines from input makes it build 3% faster end-to-end #137681

nabijaczleweli opened this issue Feb 26, 2025 · 7 comments
Labels
C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times.

Comments

@nabijaczleweli
Copy link
Contributor

(The actual data in my use-case is indices in a 100³ gilbert curve but isn't relevant here.)

$ { echo '['; seq $(( 100 * 100 * 100 )) | sed 's/.*/(0,0,0),/'; echo ']'; } > 100.rs
$ tr    '\n' ' ' < 100.rs > 100s.rs
$ tr -d '\n'     < 100.rs > 100d.rs
const GRID_SIZE: usize = 100;
static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100.rs");
fn main() {}

(this is the code used in #137678)

$ time rustc bugowcy.rs
warning: constant `GRID_SIZE` is never used
 --> bugowcy.rs:1:7
  |
1 | const GRID_SIZE: usize = 100;
  |       ^^^^^^^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: static `GRID_GILBERT` is never used
 --> bugowcy.rs:2:8
  |
2 | static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100.rs");
  |        ^^^^^^^^^^^^

warning: 2 warnings emitted


real    0m32.474s
user    0m0.000s
sys     0m0.015s

Compare to

const GRID_SIZE: usize = 100;
static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100s.rs");
fn main() {}

(this is the code used in #137680, it's the same size as the original)

$ time rustc bugowcy.rs
warning: constant `GRID_SIZE` is never used
 --> bugowcy.rs:1:7
  |
1 | const GRID_SIZE: usize = 100;
  |       ^^^^^^^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: static `GRID_GILBERT` is never used
 --> bugowcy.rs:2:8
  |
2 | static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100s.rs");
  |        ^^^^^^^^^^^^

warning: 2 warnings emitted


real    0m31.846s
user    0m0.000s
sys     0m0.000s

And to

const GRID_SIZE: usize = 100;
static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100d.rs");
fn main() {}

(1`000`002 bytes smaller)

$ time rustc bugowcy.rs
warning: constant `GRID_SIZE` is never used
 --> bugowcy.rs:1:7
  |
1 | const GRID_SIZE: usize = 100;
  |       ^^^^^^^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: static `GRID_GILBERT` is never used
 --> bugowcy.rs:2:8
  |
2 | static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100d.rs");
  |        ^^^^^^^^^^^^

warning: 2 warnings emitted


real    0m31.938s
user    0m0.000s
sys     0m0.031s

100s vs 100: ~3% faster
100d vs 100s: within variance I think

Measurements from Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

Meta

$ rustc --version --verbose
rustc 1.84.1 (e71f9a9a9 2025-01-27)
binary: rustc
commit-hash: e71f9a9a98b0faf423844bf0ba7438f29dc27d58
commit-date: 2025-01-27
host: x86_64-pc-windows-gnu
release: 1.84.1
LLVM version: 19.1.5
@nabijaczleweli nabijaczleweli added the C-bug Category: This is a bug. label Feb 26, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 26, 2025
@fmease fmease added the I-compiletime Issue: Problems and improvements with respect to compile times. label Feb 26, 2025
@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Feb 26, 2025

Measuring on Intel(R) Xeon(R) CPU E5645 @ 2.40GHz:

$ hyperfine -N 'rustc bugowcy.rs' 'rustc bugowcys.rs' 'rustc bugowcyd.rs'
Benchmark 1: rustc bugowcy.rs
  Time (mean ± σ):     26.237 s ±  1.904 s    [User: 22.001 s, System: 3.893 s]
  Range (min … max):   24.828 s … 31.148 s    10 runs

Benchmark 2: rustc bugowcys.rs
  Time (mean ± σ):     25.673 s ±  0.671 s    [User: 21.825 s, System: 3.861 s]
  Range (min … max):   24.954 s … 26.690 s    10 runs

Benchmark 3: rustc bugowcyd.rs
  Time (mean ± σ):     24.958 s ±  0.311 s    [User: 21.480 s, System: 3.492 s]
  Range (min … max):   24.548 s … 25.704 s    10 runs

Summary
  'rustc bugowcyd.rs' ran
    1.03 ± 0.03 times faster than 'rustc bugowcys.rs'
    1.05 ± 0.08 times faster than 'rustc bugowcy.rs'

Also, this kinda goes without saying, I don't think this should take that long, and to hell with a 3% difference then, but

{ echo '{'; seq $(( 100 * 100 * 100 )) | sed 's/.*/{0,0,0},/'; echo '}'; } > 100.cpp
#include <cstdint>
struct xyz { std::uint8_t x, y, z; };
static const constexpr GRID_SIZE = 100;
static const constexpr GRID_GILBERT[GRID_SIZE * GRID_SIZE * GRID_SIZE] =
#include "100.cpp"
;
int main() {}
$ time c++ bugowcy.cpp -o bugowcy

real    0m19.012s
user    0m18.450s
sys     0m0.559s
$ time g++ bugowcy.cpp -o bugowcy

real    0m10.090s
user    0m8.900s
sys     0m0.697s

and rustc is not gonna get much better than clang anyway (hence #embed &c.). I suppose I'll just dump this as bytes and include_typed!((u8, u8, u8), ...) it.

@bjorn3
Copy link
Member

bjorn3 commented Feb 26, 2025

Did you run each compilation once or many times taking the average? The difference between those cases is so small that it may well be noise. Especially modern cpu's with dynamic frequency scaling can easily have performance differ by several percent if you run a benchmark several times in succession.

@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Feb 26, 2025

I ran this a few times on the i7 in vivo and noticed the differences for newline/space there, but ran the in-vitro test-cases once (admittedly they do look pretty close to variance). See comment for 10x runs of each on the Xeon which I trust to produce useable perf data.

@bjorn3
Copy link
Member

bjorn3 commented Feb 26, 2025

It could be that the extra time is spent searching for and storing the positions of newlines in the source map. The source map is used for debug info and formatting error messages. Your code is a bit of a worst case scenario for overhead of the source map. For most code the vast majority of the is spent checking and compiling all functions, but if you have a single large array stored in a static there is no work being done on compiling functions and all time is spent parsing and lowering the static to an object file.

@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Feb 26, 2025

In the live code I extracted this from the build takes 39s (incl. the 32s for the array alone) so the difference is still noticeable in vivo and doesn't seem to get compensated away with more stuff to do.

(I'm pretty sure most of the other 7s can be attributed to a groups_accumulator: Box<[[[Vec<u32>; GRID_SIZE]; GRID_SIZE]; GRID_SIZE]>, member but I haven't bisected this.)

@lqd
Copy link
Member

lqd commented Feb 26, 2025

Even your Xeon timings look to have high variance, so here's a run from a quieter machine. I'm not personally seeing the 3% difference.

> hyperfine -w1 -r10 -N 'rustc bugowcy.rs' 'rustc bugowcys.rs' 'rustc bugowcyd.rs'

Benchmark 1: rustc bugowcy.rs
  Time (mean ± σ):     20.345 s ±  0.084 s    [User: 16.190 s, System: 4.214 s]
  Range (min … max):   20.243 s … 20.459 s    10 runs

Benchmark 2: rustc bugowcys.rs
  Time (mean ± σ):     20.353 s ±  0.079 s    [User: 16.213 s, System: 4.204 s]
  Range (min … max):   20.264 s … 20.543 s    10 runs

Benchmark 3: rustc bugowcyd.rs
  Time (mean ± σ):     20.364 s ±  0.085 s    [User: 16.164 s, System: 4.266 s]
  Range (min … max):   20.206 s … 20.472 s    10 runs

Summary
  rustc bugowcy.rs ran
    1.00 ± 0.01 times faster than rustc bugowcys.rs
    1.00 ± 0.01 times faster than rustc bugowcyd.rs

@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Feb 26, 2025

So either it's not real or it's real but only on older microarchitectures, and only applies to deeply pathological input in either case. Whichever way it goes it's not meaningfully actionable. Thanks for testing.

@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times.
Projects
None yet
Development

No branches or pull requests

6 participants