Skip to content

cmd/compile: consider speeding up compile of large ast package in microsoft/typescript-go #73044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thepudds opened this issue Mar 25, 2025 · 6 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Implementation Issues describing a semantics-preserving change to the Go implementation. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed
Milestone

Comments

@thepudds
Copy link
Contributor

Go version

go version go1.24.1 linux/amd64

What did you do?

Compile the ast package of the Go-based TypeScript compiler (https://github.com/microsoft/typescript-go).

$ git clone https://github.com/microsoft/typescript-go && cd typescript-go
$ go1.24.1 build -a -v -gcflags='-bench=base.bench' github.com/microsoft/typescript-go/internal/ast

What did you see happen?

If the changes in #72815 land, it looks like the large typescript-go/internal/ast package then becomes the long pole for the overall typescript-go compilation.

There might be some opportunities within cmd/compile or cmd/go to speed it up. There might also be some workarounds that could be used today, which might in turn hint towards possible changes within the Go toolchain.

To get some starting data points, I tried compiling the typescript-go ast package with Go 1.24.1 to see what might speed it up:

                     compile time   max RSS

baseline:               8.812 sec   1,499 MiB    # Go 1.24.1
add -c=16:              6.651 sec   1,582 MiB    # -gcflags='-c=16'
add disable-gc:         6.255 sec   3,834 MiB    # GOGC=off, -gcflags='-c=16'
add disable-inline:     4.312 sec   2,747 MiB    # GOGC=off, -gcflags='-c=16 -l'
add disable-other-opt:  3.808 sec   2,441 MiB    # GOGC=off, -gcflags='-c=16 -l -N'

(The adjustments here are cumulative. The reported seconds are just for compiling the ast package itself, and were measured via -gcflags='-bench=<file>.txt' and then summarized with benchstat. Peak RSS was measured via time).

Possible changes to the Go toolchain might include:

  1. Perhaps the default cmd/compile concurrency (-gcflags=-c=N) could be increased by default or increased via some heuristic or threshold, such as perhaps a trivial threshold like number of input bytes in a package or function count or ____. (I think I recall older comments maybe by Josh saying that as HW evolves it might make sense to adjust the default of 4 upwards. Some related discussion in cmd/go,cmd/compile: let cmd/compile choose how much concurrency to use #48497).

  2. Perhaps cmd/compile could be more aggressive about adjusting GOGC. (Probably part of the reason it didn't help too much to disable the GC here is that the compiler is already somewhat aggressive about this. See for example https://go.dev/cl/436235 by David Chase, which introduces a fairly aggressive adjustment to GOGC, but perhaps that could be revisited for some modest benefit).

  3. Perhaps other changes based on examining the current performance.

In terms of workarounds, some of the above flags might be useful today for the TypeScript team (for example, perhaps something like -gcflags='-c=16' is useful now if the TypeScript team is not already adjusting that).

Others, such as disabling inlining, might waste total wall clock time if for example the immediate next step is to then kick off multiple minutes of regression tests, but some chance it could make sense in other cases if for example the next step is to instead run, say, ~500ms of tests via go test -short or similar. (Separately, maybe someone will find it interesting to look at cmd/compile performance with and without inlining enabled for this package -- maybe the resulting performance delta is "expected", or maybe it shows something more interesting).

What did you expect to see?

Ideally, it would be faster.

Additional Details

Here is a high-level breakdown of wall clock time spent overall (as reported by -gcflags=-bench=<file>.txt) comparing the starting point vs. the final experiment above (and then below that are individual results for each step).

commit: go1.24.1
goos: linux
goarch: amd64
                                    │ 1-base.bench │  5-add-disable-opt.bench │
                                    │  sec/op      │    sec/op      vs base   │
Compile:fe:init                      931.6µ ± 13%    917.3µ ± 16%        ~ (p=0.394 n=6)
Compile:fe:loadsys                   126.8µ ±  8%    123.6µ ±  5%        ~ (p=0.065 n=6)
Compile:fe:parse                      1.601 ±  1%     1.390 ±  1%  -13.18% (p=0.002 n=6)
Compile:fe:pgo-load-profile          910.0n ± 33%   1005.5n ± 14%        ~ (p=0.331 n=6)
Compile:fe:devirtualize-and-inline  1580.2m ±  1%    425.3m ±  2%  -73.09% (p=0.002 n=6)
Compile:fe:escapes                   291.0m ±  6%    119.4m ±  4%  -58.96% (p=0.002 n=6)
Compile:fe:subtotal                   3.486 ±  1%     1.935 ±  1%  -44.50% (p=0.002 n=6)
Compile:be:compilefuncs             4002.8m ±  1%    897.5m ±  4%  -77.58% (p=0.002 n=6)
Compile:be:dumpobj                  1316.8m ±  4%    974.2m ±  2%  -26.02% (p=0.002 n=6)
Compile:be:subtotal                   5.326 ±  1%     1.874 ±  2%  -64.81% (p=0.002 n=6)
Compile:total                         8.812 ±  1%     3.808 ±  1%  -56.79% (p=0.002 n=6)
$ benchstat 1-base.bench 2-add-c16.bench
commit: go1.24.1
goos: linux
goarch: amd64
                                   │ 1-base.bench │        2-add-c16.bench        │
                                   │       sec/op │    sec/op     vs base         │
Compile:fe:init                     931.6µ ± 13%   890.7µ ± 11%        ~ (p=0.394 n=6)
Compile:fe:loadsys                  126.8µ ±  8%   126.9µ ±  5%        ~ (p=0.513 n=6)
Compile:fe:parse                     1.601 ±  1%    1.572 ±  1%   -1.83% (p=0.002 n=6)
Compile:fe:pgo-load-profile         910.0n ± 33%   815.5n ± 26%        ~ (p=0.132 n=6)
Compile:fe:devirtualize-and-inline   1.580 ±  1%    1.569 ±  1%   -0.71% (p=0.026 n=6)
Compile:fe:escapes                  291.0m ±  6%   285.8m ±  2%        ~ (p=0.240 n=6)
Compile:fe:subtotal                  3.486 ±  1%    3.430 ±  0%   -1.61% (p=0.002 n=6)
Compile:be:compilefuncs              4.003 ±  1%    1.867 ±  1%  -53.36% (p=0.002 n=6)
Compile:be:dumpobj                   1.317 ±  4%    1.346 ±  3%        ~ (p=0.818 n=6)
Compile:be:subtotal                  5.326 ±  1%    3.218 ±  2%  -39.58% (p=0.002 n=6)
Compile:total                        8.812 ±  1%    6.651 ±  1%  -24.53% (p=0.002 n=6)

$ benchstat 2-add-c16.bench 3-add-disable-gc.bench
commit: go1.24.1
goos: linux
goarch: amd64
                               │ 2-add-c16.bench │    3-add-disable-gc.bench     │
                               │        sec/op   │    sec/op     vs base         │
Compile:fe:init                     890.7µ ± 11%   918.5µ ±  6%        ~ (p=0.394 n=6)
Compile:fe:loadsys                  126.9µ ±  5%   123.7µ ±  7%        ~ (p=0.699 n=6)
Compile:fe:parse                     1.572 ±  1%    1.402 ±  1%  -10.85% (p=0.002 n=6)
Compile:fe:pgo-load-profile         815.5n ± 26%   900.0n ± 11%        ~ (p=0.193 n=6)
Compile:fe:devirtualize-and-inline   1.569 ±  1%    1.376 ±  1%  -12.30% (p=0.002 n=6)
Compile:fe:escapes                  285.8m ±  2%   249.8m ±  2%  -12.57% (p=0.002 n=6)
Compile:fe:subtotal                  3.430 ±  0%    3.020 ±  1%  -11.93% (p=0.002 n=6)
Compile:be:compilefuncs              1.867 ±  1%    1.950 ±  2%   +4.46% (p=0.002 n=6)
Compile:be:dumpobj                   1.346 ±  3%    1.273 ±  2%   -5.41% (p=0.002 n=6)
Compile:be:subtotal                  3.218 ±  2%    3.222 ±  1%        ~ (p=0.310 n=6)
Compile:total                        6.651 ±  1%    6.255 ±  1%   -5.95% (p=0.002 n=6)

$ benchstat 3-add-disable-gc.bench 4-add-disable-inline.bench
commit: go1.24.1
goos: linux
goarch: amd64
                          │ 3-add-disable-gc.bench │   4-add-disable-inline.bench │
                          │            sec/op      │    sec/op      vs base       │
Compile:fe:init                      918.5µ ±  6%   914.2µ ±  22%        ~ (p=1.000 n=6)
Compile:fe:loadsys                   123.7µ ±  7%   126.1µ ± 113%        ~ (p=0.394 n=6)
Compile:fe:parse                      1.402 ±  1%    1.390 ±   1%        ~ (p=0.394 n=6)
Compile:fe:pgo-load-profile          900.0n ± 11%   990.0n ±  14%        ~ (p=0.221 n=6)
Compile:fe:devirtualize-and-inline  1376.1m ±  1%   459.9m ±   1%  -66.58% (p=0.002 n=6)
Compile:fe:escapes                   249.8m ±  2%   117.6m ±   3%  -52.92% (p=0.002 n=6)
Compile:fe:subtotal                   3.020 ±  1%    1.971 ±   0%  -34.75% (p=0.002 n=6)
Compile:be:compilefuncs               1.950 ±  2%    1.206 ±   3%  -38.19% (p=0.002 n=6)
Compile:be:dumpobj                    1.273 ±  2%    1.129 ±   1%  -11.30% (p=0.002 n=6)
Compile:be:subtotal                   3.222 ±  1%    2.337 ±   1%  -27.48% (p=0.002 n=6)
Compile:total                         6.255 ±  1%    4.312 ±   1%  -31.07% (p=0.002 n=6)

$ benchstat 4-add-disable-inline.bench 5-add-disable-opt.bench
commit: go1.24.1
goos: linux
goarch: amd64
                         │ 4-add-disable-inline.bench │    5-add-disable-opt.bench     │
                         │              sec/op        │    sec/op      vs base         │
Compile:fe:init                      914.2µ ±  22%    917.3µ ± 16%        ~ (p=0.818 n=6)
Compile:fe:loadsys                   126.1µ ± 113%    123.6µ ±  5%        ~ (p=0.310 n=6)
Compile:fe:parse                      1.390 ±   1%     1.390 ±  1%        ~ (p=0.937 n=6)
Compile:fe:pgo-load-profile          990.0n ±  14%   1005.5n ± 14%        ~ (p=0.686 n=6)
Compile:fe:devirtualize-and-inline   459.9m ±   1%    425.3m ±  2%   -7.53% (p=0.002 n=6)
Compile:fe:escapes                   117.6m ±   3%    119.4m ±  4%        ~ (p=0.818 n=6)
Compile:fe:subtotal                   1.971 ±   0%     1.935 ±  1%   -1.83% (p=0.002 n=6)
Compile:be:compilefuncs             1205.6m ±   3%    897.5m ±  4%  -25.55% (p=0.002 n=6)
Compile:be:dumpobj                  1129.1m ±   1%    974.2m ±  2%  -13.72% (p=0.002 n=6)
Compile:be:subtotal                   2.337 ±   1%     1.874 ±  2%  -19.79% (p=0.002 n=6)
Compile:total                         4.312 ±   1%     3.808 ±  1%  -11.69% (p=0.002 n=6)

CC @jakebailey, @RyanCavanaugh

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 25, 2025
@gabyhelp gabyhelp added the Implementation Issues describing a semantics-preserving change to the Go implementation. label Mar 25, 2025
@jakebailey
Copy link
Contributor

jakebailey commented Mar 25, 2025

Others, such as disabling inlining, might waste total wall clock time if for example the immediate next step is to then kick off multiple minutes of regression tests,

Just to note it, our full main suite of tests (78k in ./internal/testrunner) take only about 9 seconds to run total; passing -gcflags='all=-N -l' brings that up to about 15s on my machine.

To run the entire repo's tests takes only 11 seconds on my machine (packages in parallel, most tests are in the testrunner package). Most of the bottleneck is just the Go compiler or #72992.

@jakebailey
Copy link
Contributor

jakebailey commented Mar 25, 2025

With all builds cached (using gotestsum for brevity and test counting):

$ gotestsum --hide-summary=skipped --format-hide-empty-pkg -- -count=1 ./...
<snip>

DONE 105775 tests, 1051 skipped in 11.748s

But if I do go clean -cache and rerun:

$ gotestsum --hide-summary=skipped --format-hide-empty-pkg -- -count=1 ./...
<snip>

DONE 105775 tests, 1051 skipped in 117.253s

100s are just compile time. So, it's likely disabling some optimizations could help us go faster, probably.

Of course, improving things without turning off optimizations would be better 😄

@dmitshur dmitshur added ToolSpeed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 25, 2025
@jakebailey
Copy link
Contributor

jakebailey commented Mar 25, 2025

Just to show it for the "total end to end" calculus, having cleaned my build cache, adding -l -N looks like:

$ gotestsum --hide-summary=skipped --format-hide-empty-pkg -- -gcflags='-l -N' -count=1 ./...
<snip>

DONE 105775 tests, 1051 skipped in 59.638s

And a rerun is only 16s or so. Despite turning off optimizations, our large number of tests still run pretty fast.

@thepudds
Copy link
Contributor Author

Depending on the hardware, my perhaps optimistic hope is that if you ask for more cmd/compile concurrency (-gcflags=-c=N), it could be a pure win (and might be evidence for making some modest change to cmd/compile or cmd/go).

Whether some of the others I included above end up helping you in practice depends on how long the tests take, etc., though it seems plausible it could help in some cases.

Just to note it, our full main suite of tests (78k in ./internal/testrunner) take only about 9 seconds to run total; passing -gcflags='all=-N -l' [...]

I know that was just a quick test (and I think part of the point you were making is that you applied it broadly), but just in case it helps, including the all there means it will affect packages outside typescript-go as well, such as the Go stdlib, and executing code from other packages (like encoding/base64 or whatever) will likely be slower.

The goal is not to make it complicated for you, but you could scope the flags to specific packages based on what seems to give a useful tradeoff of build time vs. resulting execution time.

For example, to just specify that the ast package should be built with more concurrency and some of the optimizations disabled:

$ go build -a -v -x -gcflags='github.com/microsoft/typescript-go/internal/ast=-c=16 -l -N -bench=pkg-pattern.bench' .

Or to say all packages underneath github.com/microsoft/typescript-go (by using /... in the -gcflags package pattern):

$ go build -a -v -x -gcflags='github.com/microsoft/typescript-go/...=-c=16 -l -N -bench=pkg-pattern-wildcard.bench' .

You can also have different options for different packages (with the last match winning). There's a little more on that here (ctrl-f for gcflags):

One other quick tip is using -x like go build -x to double-check what flags are passed can be useful, especially if you are doing some multi-package pattern with -gcflags.

@jakebailey
Copy link
Contributor

Ah, I didn't even put together that all was a package specifier; I just remember reading that from delve's output. Doing github.com/microsoft/typescript-go/... is comparable in total compile time and test run time, so most of the benefit is coming from the handling of our packages anyway, it seems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Implementation Issues describing a semantics-preserving change to the Go implementation. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed
Projects
Development

No branches or pull requests

7 participants