Deleting newlines from input makes it build 3% faster end-to-end #137681

nabijaczleweli · 2025-02-26T15:24:24Z

(The actual data in my use-case is indices in a 100³ gilbert curve but isn't relevant here.)

$ { echo '['; seq $(( 100 * 100 * 100 )) | sed 's/.*/(0,0,0),/'; echo ']'; } > 100.rs
$ tr    '\n' ' ' < 100.rs > 100s.rs
$ tr -d '\n'     < 100.rs > 100d.rs

const GRID_SIZE: usize = 100;
static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100.rs");
fn main() {}

(this is the code used in #137678)

$ time rustc bugowcy.rs
warning: constant `GRID_SIZE` is never used
 --> bugowcy.rs:1:7
  |
1 | const GRID_SIZE: usize = 100;
  |       ^^^^^^^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: static `GRID_GILBERT` is never used
 --> bugowcy.rs:2:8
  |
2 | static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100.rs");
  |        ^^^^^^^^^^^^

warning: 2 warnings emitted


real    0m32.474s
user    0m0.000s
sys     0m0.015s

Compare to

const GRID_SIZE: usize = 100;
static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100s.rs");
fn main() {}

(this is the code used in #137680, it's the same size as the original)

$ time rustc bugowcy.rs
warning: constant `GRID_SIZE` is never used
 --> bugowcy.rs:1:7
  |
1 | const GRID_SIZE: usize = 100;
  |       ^^^^^^^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: static `GRID_GILBERT` is never used
 --> bugowcy.rs:2:8
  |
2 | static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100s.rs");
  |        ^^^^^^^^^^^^

warning: 2 warnings emitted


real    0m31.846s
user    0m0.000s
sys     0m0.000s

And to

const GRID_SIZE: usize = 100;
static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100d.rs");
fn main() {}

(1`000`002 bytes smaller)

$ time rustc bugowcy.rs
warning: constant `GRID_SIZE` is never used
 --> bugowcy.rs:1:7
  |
1 | const GRID_SIZE: usize = 100;
  |       ^^^^^^^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: static `GRID_GILBERT` is never used
 --> bugowcy.rs:2:8
  |
2 | static GRID_GILBERT: [(u8,u8,u8); GRID_SIZE*GRID_SIZE*GRID_SIZE] = include!("100d.rs");
  |        ^^^^^^^^^^^^

warning: 2 warnings emitted


real    0m31.938s
user    0m0.000s
sys     0m0.031s

100s vs 100: ~3% faster
100d vs 100s: within variance I think

Measurements from Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

Meta

$ rustc --version --verbose
rustc 1.84.1 (e71f9a9a9 2025-01-27)
binary: rustc
commit-hash: e71f9a9a98b0faf423844bf0ba7438f29dc27d58
commit-date: 2025-01-27
host: x86_64-pc-windows-gnu
release: 1.84.1
LLVM version: 19.1.5

The text was updated successfully, but these errors were encountered:

nabijaczleweli · 2025-02-26T15:42:14Z

Measuring on Intel(R) Xeon(R) CPU E5645 @ 2.40GHz:

$ hyperfine -N 'rustc bugowcy.rs' 'rustc bugowcys.rs' 'rustc bugowcyd.rs'
Benchmark 1: rustc bugowcy.rs
  Time (mean ± σ):     26.237 s ±  1.904 s    [User: 22.001 s, System: 3.893 s]
  Range (min … max):   24.828 s … 31.148 s    10 runs

Benchmark 2: rustc bugowcys.rs
  Time (mean ± σ):     25.673 s ±  0.671 s    [User: 21.825 s, System: 3.861 s]
  Range (min … max):   24.954 s … 26.690 s    10 runs

Benchmark 3: rustc bugowcyd.rs
  Time (mean ± σ):     24.958 s ±  0.311 s    [User: 21.480 s, System: 3.492 s]
  Range (min … max):   24.548 s … 25.704 s    10 runs

Summary
  'rustc bugowcyd.rs' ran
    1.03 ± 0.03 times faster than 'rustc bugowcys.rs'
    1.05 ± 0.08 times faster than 'rustc bugowcy.rs'

Also, this kinda goes without saying, I don't think this should take that long, and to hell with a 3% difference then, but

{ echo '{'; seq $(( 100 * 100 * 100 )) | sed 's/.*/{0,0,0},/'; echo '}'; } > 100.cpp

#include <cstdint>
struct xyz { std::uint8_t x, y, z; };
static const constexpr GRID_SIZE = 100;
static const constexpr GRID_GILBERT[GRID_SIZE * GRID_SIZE * GRID_SIZE] =
#include "100.cpp"
;
int main() {}

$ time c++ bugowcy.cpp -o bugowcy

real    0m19.012s
user    0m18.450s
sys     0m0.559s
$ time g++ bugowcy.cpp -o bugowcy

real    0m10.090s
user    0m8.900s
sys     0m0.697s

and rustc is not gonna get much better than clang anyway (hence #embed &c.). I suppose I'll just dump this as bytes and include_typed!((u8, u8, u8), ...) it.

bjorn3 · 2025-02-26T15:44:39Z

Did you run each compilation once or many times taking the average? The difference between those cases is so small that it may well be noise. Especially modern cpu's with dynamic frequency scaling can easily have performance differ by several percent if you run a benchmark several times in succession.

nabijaczleweli · 2025-02-26T15:47:39Z

I ran this a few times on the i7 in vivo and noticed the differences for newline/space there, but ran the in-vitro test-cases once (admittedly they do look pretty close to variance). See comment for 10x runs of each on the Xeon which I trust to produce useable perf data.

bjorn3 · 2025-02-26T15:58:03Z

It could be that the extra time is spent searching for and storing the positions of newlines in the source map. The source map is used for debug info and formatting error messages. Your code is a bit of a worst case scenario for overhead of the source map. For most code the vast majority of the is spent checking and compiling all functions, but if you have a single large array stored in a static there is no work being done on compiling functions and all time is spent parsing and lowering the static to an object file.

nabijaczleweli · 2025-02-26T16:15:49Z

In the live code I extracted this from the build takes 39s (incl. the 32s for the array alone) so the difference is still noticeable in vivo and doesn't seem to get compensated away with more stuff to do.

(I'm pretty sure most of the other 7s can be attributed to a groups_accumulator: Box<[[[Vec<u32>; GRID_SIZE]; GRID_SIZE]; GRID_SIZE]>, member but I haven't bisected this.)

lqd · 2025-02-26T19:24:25Z

Even your Xeon timings look to have high variance, so here's a run from a quieter machine. I'm not personally seeing the 3% difference.

> hyperfine -w1 -r10 -N 'rustc bugowcy.rs' 'rustc bugowcys.rs' 'rustc bugowcyd.rs'

Benchmark 1: rustc bugowcy.rs
  Time (mean ± σ):     20.345 s ±  0.084 s    [User: 16.190 s, System: 4.214 s]
  Range (min … max):   20.243 s … 20.459 s    10 runs

Benchmark 2: rustc bugowcys.rs
  Time (mean ± σ):     20.353 s ±  0.079 s    [User: 16.213 s, System: 4.204 s]
  Range (min … max):   20.264 s … 20.543 s    10 runs

Benchmark 3: rustc bugowcyd.rs
  Time (mean ± σ):     20.364 s ±  0.085 s    [User: 16.164 s, System: 4.266 s]
  Range (min … max):   20.206 s … 20.472 s    10 runs

Summary
  rustc bugowcy.rs ran
    1.00 ± 0.01 times faster than rustc bugowcys.rs
    1.00 ± 0.01 times faster than rustc bugowcyd.rs

nabijaczleweli · 2025-02-26T19:31:56Z

So either it's not real or it's real but only on older microarchitectures, and only applies to deeply pathological input in either case. Whichever way it goes it's not meaningfully actionable. Thanks for testing.

nabijaczleweli added the C-bug Category: This is a bug. label Feb 26, 2025

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 26, 2025

fmease added the I-compiletime Issue: Problems and improvements with respect to compile times. label Feb 26, 2025

nabijaczleweli closed this as completed Feb 26, 2025

jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deleting newlines from input makes it build 3% faster end-to-end #137681

Deleting newlines from input makes it build 3% faster end-to-end #137681

nabijaczleweli commented Feb 26, 2025

nabijaczleweli commented Feb 26, 2025 •

edited

Loading

bjorn3 commented Feb 26, 2025

nabijaczleweli commented Feb 26, 2025 •

edited

Loading

bjorn3 commented Feb 26, 2025

nabijaczleweli commented Feb 26, 2025 •

edited

Loading

lqd commented Feb 26, 2025 •

edited

Loading

nabijaczleweli commented Feb 26, 2025 •

edited

Loading

Deleting newlines from input makes it build 3% faster end-to-end #137681

Deleting newlines from input makes it build 3% faster end-to-end #137681

Comments

nabijaczleweli commented Feb 26, 2025

Meta

nabijaczleweli commented Feb 26, 2025 • edited Loading

bjorn3 commented Feb 26, 2025

nabijaczleweli commented Feb 26, 2025 • edited Loading

bjorn3 commented Feb 26, 2025

nabijaczleweli commented Feb 26, 2025 • edited Loading

lqd commented Feb 26, 2025 • edited Loading

nabijaczleweli commented Feb 26, 2025 • edited Loading

nabijaczleweli commented Feb 26, 2025 •

edited

Loading

nabijaczleweli commented Feb 26, 2025 •

edited

Loading

nabijaczleweli commented Feb 26, 2025 •

edited

Loading

lqd commented Feb 26, 2025 •

edited

Loading

nabijaczleweli commented Feb 26, 2025 •

edited

Loading