Skip to content

Commit e0d27ea

Browse files
committed
cmd/compile: add initial backend concurrency support
This CL adds initial support for concurrent backend compilation. CAVEATS I suspect it's going to end up on Twitter. If you're coming here from the internet: * Don't believe the hype. * If you're going to try it out, please also try with the race detector enabled and report an issue at golang.org/issue/new if you see a race report. Just run 'go install -race cmd/compile' and 'go build -a yourpackages'. And then run make.bash again to return a reasonable compiler. BACKGROUND The compiler current consists (very roughly) of a handful of phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise let in the development cycle, we can simply have cmd/go set -c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. It requires additional sorting to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.Hashmu. ctxt.Hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is easily the most heavily contended mutex. I hope that the syncmap proposed in golang#18177 may provide some free speed-ups here. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal, aside from some mild awkwardness to avoid deadlocks due to recursive calls. * types.Pkg.Symsmu. Looking up symbols in a package happens a fair amount during backend compilation, including the construction of gcargs/gclocals symbols, and types, particularly via ngotype. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. * types.Sym.Lsymmu. Syms keep a cache of their associated LSym, to reduce lookups in ctxt.Hash. This cache itself needs concurrency protection. This mutex adds to the size of a moderately important data structure, but the alloc benchmarks below show that this doesn't hurt much in practice. It is moderately contended, mostly because when lookups fail, the lock is held while vying for the contended ctxt.Hash mutex. The fact that it keeps load off the ctxt.Hash mutex, though, makes this mutex worth keeping. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed golang#19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is golang#19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 costs about 3% CPU and has almost no memory impact. name old time/op new time/op delta Template 194ms ± 4% 195ms ± 4% +0.91% (p=0.002 n=49+47) Unicode 82.9ms ± 4% 85.2ms ± 3% +2.68% (p=0.000 n=47+48) GoTypes 518ms ± 3% 527ms ± 2% +1.81% (p=0.000 n=46+46) SSA 5.59s ± 2% 5.77s ± 2% +3.12% (p=0.000 n=48+50) Flate 120ms ± 3% 122ms ± 3% +1.54% (p=0.000 n=49+48) GoParser 140ms ± 3% 143ms ± 4% +2.24% (p=0.000 n=47+49) Reflect 333ms ± 3% 342ms ± 3% +2.67% (p=0.000 n=47+49) Tar 102ms ± 6% 104ms ± 4% +2.37% (p=0.000 n=50+47) XML 202ms ±13% 196ms ± 4% -2.63% (p=0.036 n=50+48) name old user-time/op new user-time/op delta Template 236ms ± 9% 237ms ±10% ~ (p=0.750 n=50+50) Unicode 104ms ± 7% 107ms ± 4% +2.09% (p=0.000 n=49+47) GoTypes 691ms ± 3% 701ms ± 3% +1.40% (p=0.000 n=50+49) SSA 7.91s ± 3% 8.07s ± 4% +1.98% (p=0.000 n=48+49) Flate 142ms ± 4% 145ms ± 5% +2.12% (p=0.000 n=47+48) GoParser 172ms ± 6% 175ms ± 5% +1.76% (p=0.002 n=50+49) Reflect 425ms ± 9% 436ms ± 8% +2.55% (p=0.001 n=50+50) Tar 119ms ± 6% 121ms ± 5% +2.52% (p=0.000 n=49+49) XML 242ms ± 8% 239ms ± 6% -1.54% (p=0.039 n=49+49) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.8MB ± 0% ~ (p=0.247 n=10+10) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.631 n=10+10) GoTypes 113MB ± 0% 113MB ± 0% ~ (p=0.218 n=10+10) SSA 1.25GB ± 0% 1.25GB ± 0% +0.02% (p=0.000 n=10+10) Flate 25.3MB ± 0% 25.3MB ± 0% ~ (p=0.315 n=9+10) GoParser 31.7MB ± 0% 31.7MB ± 0% ~ (p=0.089 n=10+10) Reflect 78.2MB ± 0% 78.2MB ± 0% +0.07% (p=0.019 n=10+10) Tar 26.5MB ± 0% 26.6MB ± 0% ~ (p=0.165 n=10+10) XML 42.4MB ± 0% 42.4MB ± 0% ~ (p=0.497 n=9+10) name old allocs/op new allocs/op delta Template 378k ± 1% 379k ± 1% ~ (p=0.353 n=10+10) Unicode 321k ± 0% 321k ± 1% ~ (p=0.684 n=10+10) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.247 n=10+10) SSA 9.71M ± 0% 9.72M ± 0% +0.08% (p=0.000 n=10+10) Flate 234k ± 1% 234k ± 1% ~ (p=0.280 n=10+10) GoParser 316k ± 0% 315k ± 1% -0.26% (p=0.040 n=9+9) Reflect 980k ± 0% 981k ± 0% ~ (p=0.052 n=10+10) Tar 249k ± 1% 250k ± 1% ~ (p=0.190 n=10+10) XML 391k ± 1% 391k ± 1% ~ (p=0.829 n=8+10) Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates golang#15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea
1 parent 5b78ea5 commit e0d27ea

File tree

14 files changed

+170
-23
lines changed

14 files changed

+170
-23
lines changed

src/cmd/compile/internal/gc/dcl.go

+4
Original file line numberDiff line numberDiff line change
@@ -1069,7 +1069,9 @@ func funcsym(s *types.Sym) *types.Sym {
10691069
// symbols will be created explicitly with makefuncsym.
10701070
// See the makefuncsym comment for details.
10711071
if !Ctxt.Flag_dynlink && !existed {
1072+
funcsymsmu.Lock()
10721073
funcsyms = append(funcsyms, s)
1074+
funcsymsmu.Unlock()
10731075
}
10741076
return sf
10751077
}
@@ -1096,7 +1098,9 @@ func makefuncsym(s *types.Sym) {
10961098
return
10971099
}
10981100
if _, existed := s.Pkg.LookupOK(funcsymname(s)); !existed {
1101+
funcsymsmu.Lock()
10991102
funcsyms = append(funcsyms, s)
1103+
funcsymsmu.Unlock()
11001104
}
11011105
}
11021106

src/cmd/compile/internal/gc/go.go

+5-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import (
1010
"cmd/internal/bio"
1111
"cmd/internal/obj"
1212
"cmd/internal/src"
13+
"sync"
1314
)
1415

1516
const (
@@ -171,7 +172,10 @@ var exportlist []*Node
171172

172173
var importlist []*Node // imported functions and methods with inlinable bodies
173174

174-
var funcsyms []*types.Sym
175+
var (
176+
funcsymsmu sync.Mutex // protects funcsyms
177+
funcsyms []*types.Sym
178+
)
175179

176180
var dclcontext Class // PEXTERN/PAUTO
177181

src/cmd/compile/internal/gc/gsubr.go

+3-2
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,11 @@ type Progs struct {
5454
}
5555

5656
// newProgs returns a new Progs for fn.
57-
func newProgs(fn *Node) *Progs {
57+
func newProgs(fn *Node, shard int) *Progs {
5858
pp := new(Progs)
5959
if Ctxt.CanReuseProgs() {
60-
pp.progcache = sharedProgArray[:]
60+
sz := len(sharedProgArray) / ncpu
61+
pp.progcache = sharedProgArray[sz*shard : sz*(shard+1)]
6162
}
6263
pp.curfn = fn
6364

src/cmd/compile/internal/gc/main.go

+59-2
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,7 @@ func Main(archInit func(*Arch)) {
182182
obj.Flagcount("W", "debug parse tree after type checking", &Debug['W'])
183183
flag.StringVar(&asmhdr, "asmhdr", "", "write assembly header to `file`")
184184
flag.StringVar(&buildid, "buildid", "", "record `id` as the build id in the export metadata")
185+
flag.IntVar(&ncpu, "c", 1, "number of concurrent backend compilations")
185186
flag.BoolVar(&pure_go, "complete", false, "compiling complete package (no C or assembly)")
186187
flag.StringVar(&debugstr, "d", "", "print debug information about items in `list`")
187188
obj.Flagcount("e", "no limit on number of errors reported", &Debug['e'])
@@ -278,6 +279,13 @@ func Main(archInit func(*Arch)) {
278279
if compiling_runtime && Debug['N'] != 0 {
279280
log.Fatal("cannot disable optimizations while compiling runtime")
280281
}
282+
if ncpu < 1 {
283+
log.Fatalf("-c must be at least 1, got %d", ncpu)
284+
}
285+
if ncpu > 1 && !concurrentBackendAllowed() {
286+
log.Fatalf("cannot use concurrent backend compilation with provided flags")
287+
}
288+
compilenow = ncpu == 1
281289

282290
// parse -d argument
283291
if debugstr != "" {
@@ -548,16 +556,39 @@ func Main(archInit func(*Arch)) {
548556
}
549557
timings.AddEvent(fcount, "funcs")
550558

559+
if ncpu > 1 {
560+
for _, fn := range needscompile {
561+
compilec <- fn
562+
}
563+
close(compilec)
564+
needscompile = nil
565+
compilewg.Wait()
566+
}
567+
// We autogenerate and compile some small functions
568+
// such as method wrappers and equality/hash routines
569+
// while exporting code.
570+
// Disable concurrent compilation from here on,
571+
// at least until this convoluted structure has been unwound.
572+
ncpu = 1
573+
compilenow = true
574+
551575
if nsavederrors+nerrors == 0 {
552576
fninit(xtop)
553577
}
554578

555579
if compiling_runtime {
556580
checknowritebarrierrec()
557581
}
558-
for _, largePos := range largeStackFrames {
559-
yyerrorl(largePos, "stack frame too large (>2GB)")
582+
largeStackFramesMu.Lock()
583+
if len(largeStackFrames) > 0 {
584+
obj.SortSlice(largeStackFrames, func(i, j int) bool {
585+
return largeStackFrames[i].Before(largeStackFrames[j])
586+
})
587+
for _, largePos := range largeStackFrames {
588+
yyerrorl(largePos, "stack frame too large (>2GB)")
589+
}
560590
}
591+
largeStackFramesMu.Unlock()
561592
}
562593

563594
// Phase 9: Check external declarations.
@@ -1027,3 +1058,29 @@ func clearImports() {
10271058
func IsAlias(sym *types.Sym) bool {
10281059
return sym.Def != nil && asNode(sym.Def).Sym != sym
10291060
}
1061+
1062+
// By default, assume any debug flags are incompatible with concurrent compilation.
1063+
// A few are safe and potentially in common use for normal compiles, though; mark them as such here.
1064+
var concurrentFlagOK = [256]bool{
1065+
'B': true, // disabled bounds checking
1066+
'C': true, // disable printing of columns in error messages
1067+
'I': true, // add `directory` to import search path
1068+
'N': true, // disable optimizations
1069+
'l': true, // disable inlining
1070+
}
1071+
1072+
func concurrentBackendAllowed() bool {
1073+
for i, x := range Debug {
1074+
if x != 0 && !concurrentFlagOK[i] {
1075+
return false
1076+
}
1077+
}
1078+
if Debug_asm || Debug_vlog || debugstr != "" || debuglive > 0 {
1079+
return false
1080+
}
1081+
// TODO: fix races and enable the following flags
1082+
if Ctxt.Flag_shared || Ctxt.Flag_dynlink || flag_race {
1083+
return false
1084+
}
1085+
return true
1086+
}

src/cmd/compile/internal/gc/obj.go

+7
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,11 @@ func dumpglobls() {
219219
ggloblnod(n)
220220
}
221221

222+
funcsymsmu.Lock()
223+
defer funcsymsmu.Unlock()
224+
obj.SortSlice(funcsyms, func(i, j int) bool {
225+
return linksymname(funcsyms[i]) < linksymname(funcsyms[j])
226+
})
222227
for _, s := range funcsyms {
223228
sf := s.Pkg.Lookup(funcsymname(s))
224229
dsymptr(sf, 0, s, 0)
@@ -264,6 +269,8 @@ func Linksym(s *types.Sym) *obj.LSym {
264269
if s == nil {
265270
return nil
266271
}
272+
s.Lsymmu.Lock()
273+
defer s.Lsymmu.Unlock()
267274
if s.Lsym == nil {
268275
s.Lsym = Ctxt.Lookup(linksymname(s), 0)
269276
}

src/cmd/compile/internal/gc/pgen.go

+23-3
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,19 @@ import (
1313
"cmd/internal/sys"
1414
"fmt"
1515
"sort"
16+
"sync"
1617
)
1718

1819
// "Portable" code generation.
1920

21+
var (
22+
ncpu int // the number of concurrent backend compiles, set by a compiler flag
23+
needscompile []*Node // slice of functions waiting to be compiled
24+
compilenow bool // indicates whether to compile immediately or enqueue in needscompile
25+
compilewg sync.WaitGroup // wait for all backend compilers to complete
26+
compilec chan *Node // channel of functions for backend compilers to drain
27+
)
28+
2029
func emitptrargsmap() {
2130
if Curfn.Func.Nname.Sym.Name == "_" {
2231
return
@@ -210,9 +219,20 @@ func compile(fn *Node) {
210219
// Set up the function's LSym early to avoid data races with the assemblers.
211220
fn.Func.initLSym()
212221

213-
// Build an SSA backend function.
214-
ssafn := buildssa(fn)
215-
pp := newProgs(fn)
222+
if compilenow {
223+
compileSSA(fn, 0)
224+
} else {
225+
needscompile = append(needscompile, fn)
226+
}
227+
}
228+
229+
// compileSSA builds an SSA backend function,
230+
// uses it to generate a plist,
231+
// and flushes that plist to machine code.
232+
func compileSSA(fn *Node, shard int) {
233+
cache := &ssaCaches[shard]
234+
ssafn := buildssa(fn, cache)
235+
pp := newProgs(fn, shard)
216236
genssa(ssafn, pp)
217237
fieldtrack(pp.Text.From.Sym, fn.Func.FieldTrack)
218238
if pp.Text.To.Offset < 1<<31 {

src/cmd/compile/internal/gc/reflect.go

+18-3
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ import (
1313
"os"
1414
"sort"
1515
"strings"
16+
"sync"
1617
)
1718

1819
type itabEntry struct {
@@ -35,9 +36,13 @@ type ptabEntry struct {
3536
}
3637

3738
// runtime interface and reflection data structures
38-
var signatlist = make(map[*types.Type]bool)
39-
var itabs []itabEntry
40-
var ptabs []ptabEntry
39+
var (
40+
signatlistmu sync.Mutex // protects signatlist
41+
signatlist = make(map[*types.Type]bool)
42+
43+
itabs []itabEntry
44+
ptabs []ptabEntry
45+
)
4146

4247
type Sig struct {
4348
name string
@@ -930,7 +935,9 @@ func typenamesym(t *types.Type) *types.Sym {
930935
Fatalf("typenamesym %v", t)
931936
}
932937
s := typesym(t)
938+
signatlistmu.Lock()
933939
addsignat(t)
940+
signatlistmu.Unlock()
934941
return s
935942
}
936943

@@ -1420,14 +1427,17 @@ func addsignat(t *types.Type) {
14201427

14211428
func dumptypestructs() {
14221429
// copy types from externdcl list to signatlist
1430+
signatlistmu.Lock()
14231431
for _, n := range externdcl {
14241432
if n.Op == OTYPE {
14251433
addsignat(n.Type)
14261434
}
14271435
}
1436+
signatlistmu.Unlock()
14281437

14291438
// Process signatlist. Use a loop, as dtypesym adds
14301439
// entries to signatlist while it is being processed.
1440+
signatlistmu.Lock()
14311441
signats := make([]typeAndStr, len(signatlist))
14321442
for len(signatlist) > 0 {
14331443
signats = signats[:0]
@@ -1436,6 +1446,9 @@ func dumptypestructs() {
14361446
signats = append(signats, typeAndStr{t: t, s: typesymname(t)})
14371447
delete(signatlist, t)
14381448
}
1449+
// Don't hold signatlistmu while processing signats,
1450+
// since signats can generate new entries for signatlist.
1451+
signatlistmu.Unlock()
14391452
sort.Sort(typesByString(signats))
14401453
for _, ts := range signats {
14411454
t := ts.t
@@ -1444,7 +1457,9 @@ func dumptypestructs() {
14441457
dtypesym(types.NewPtr(t))
14451458
}
14461459
}
1460+
signatlistmu.Lock()
14471461
}
1462+
signatlistmu.Unlock()
14481463

14491464
// process itabs
14501465
for _, i := range itabs {

src/cmd/compile/internal/gc/ssa.go

+20-5
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,10 @@ import (
1919
"cmd/internal/sys"
2020
)
2121

22-
var ssaConfig *ssa.Config
23-
var ssaCache *ssa.Cache
22+
var (
23+
ssaConfig *ssa.Config
24+
ssaCaches []ssa.Cache
25+
)
2426

2527
func initssaconfig() {
2628
types_ := ssa.Types{
@@ -66,7 +68,20 @@ func initssaconfig() {
6668
if thearch.LinkArch.Name == "386" {
6769
ssaConfig.Set387(thearch.Use387)
6870
}
69-
ssaCache = new(ssa.Cache)
71+
72+
ssaCaches = make([]ssa.Cache, ncpu)
73+
if ncpu > 1 {
74+
compilec = make(chan *Node)
75+
for i := 0; i < ncpu; i++ {
76+
compilewg.Add(1)
77+
go func(shard int) {
78+
for fn := range compilec {
79+
compileSSA(fn, shard)
80+
}
81+
compilewg.Done()
82+
}(i)
83+
}
84+
}
7085

7186
// Set up some runtime functions we'll need to call.
7287
Newproc = Sysfunc("newproc")
@@ -88,7 +103,7 @@ func initssaconfig() {
88103
}
89104

90105
// buildssa builds an SSA function.
91-
func buildssa(fn *Node) *ssa.Func {
106+
func buildssa(fn *Node, cache *ssa.Cache) *ssa.Func {
92107
name := fn.Func.Nname.Sym.Name
93108
printssa := name == os.Getenv("GOSSAFUNC")
94109
if printssa {
@@ -116,7 +131,7 @@ func buildssa(fn *Node) *ssa.Func {
116131
s.f = ssa.NewFunc(&fe)
117132
s.config = ssaConfig
118133
s.f.Config = ssaConfig
119-
s.f.Cache = ssaCache
134+
s.f.Cache = cache
120135
s.f.Cache.Reset()
121136
s.f.DebugTest = s.f.DebugHashMatch("GOSSAHASH", name)
122137
s.f.Name = name

src/cmd/compile/internal/gc/subr.go

+5-1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ import (
1717
"sort"
1818
"strconv"
1919
"strings"
20+
"sync"
2021
"unicode"
2122
"unicode/utf8"
2223
)
@@ -28,7 +29,10 @@ type Error struct {
2829

2930
var errors []Error
3031

31-
var largeStackFrames []src.XPos // positions of functions whose stack frames are too large (rare)
32+
var (
33+
largeStackFramesMu sync.Mutex
34+
largeStackFrames []src.XPos // positions of functions whose stack frames are too large (rare)
35+
)
3236

3337
func errorexit() {
3438
flusherrors()

0 commit comments

Comments
 (0)