look into garble functions showing up in "perf report" #475

mvdan · 2022-01-22T15:15:28Z

I just captured a Linux perf CPU profile from the build in #472 with an empty cache:

$ rm -rf /tmp/gocache; GOCACHE=/tmp/gocache GOMAXPROCS=1 GOGARBLE=* perf record -g garble build testdata/bench/main.go
$ perf report

If we just look at the main.main funcs, we see:

+   69.97%     0.00%  compile  compile  [.] main.main                                                                                                                                              
+    8.17%     0.00%  garble   garble   [.] main.main                                                                                                                                              
+    0.59%     0.00%  asm      asm      [.] main.main                                                                                                                                              
     0.25%     0.00%  link     link     [.] main.main                                                                                                                                              
     0.23%     0.00%  go       go       [.] main.main                                                                                                                                              
     0.02%     0.00%  cgo      cgo      [.] main.main

The first column is cumulative CPU cost as a percentage. It seems like compile takes ~70% of the CPU, garble takes ~8%, ~1% goes to the other toolchain binaries, and presumably the other ~20% is other overhead like blocking I/O or syscalls.

This seems like a good start; garble itself is not being horribly wasteful with CPU.

Then, zooming into garble with d to skip the CPU cost from compile and link, we see:

+    8.75%     0.00%  garble   [.] runtime.goexit.abi0                                                                                                                                             
+    8.35%     0.00%  garble   [.] runtime.main                                                                                                                                                    
+    8.17%     0.00%  garble   [.] main.main                                                                                                                                                       
+    8.17%     0.00%  garble   [.] main.main1                                                                                                                                                      
+    8.17%     0.00%  garble   [.] main.mainErr                                                                                                                                                    
+    7.92%     0.00%  garble   [.] main.transformCompile                                                                                                                                           
+    2.58%     0.02%  garble   [.] main.(*transformer).typecheck                                                                                                                                   
+    2.56%     0.01%  garble   [.] main.printFile                                                                                                                                                  
+    2.46%     2.46%  garble   [.] go/printer.walkBinary                                                                                                                                           
+    2.41%     0.00%  garble   [.] go/types.(*Config).Check                                                                                                                                        
+    2.41%     0.00%  garble   [.] go/types.(*Checker).checkFiles                                                                                                                                  
+    2.28%     0.72%  garble   [.] runtime.mallocgc                                                                                                                                                
+    1.83%     0.01%  garble   [.] runtime.systemstack.abi0                                                                                                                                        
+    1.68%     0.01%  garble   [.] go/printer.(*Config).fprint                                                                                                                                     
+    1.53%     0.01%  garble   [.] go/printer.(*printer).printNode                                                                                                                                 
+    1.53%     0.00%  garble   [.] go/printer.(*printer).file                                                                                                                                      
+    1.52%     0.00%  garble   [.] go/printer.(*printer).declList                                                                                                                                  
+    1.51%     0.00%  garble   [.] go/printer.(*printer).decl                                                                                                                                      
+    1.46%     0.00%  garble   [.] go/types.(*Checker).rawExpr                                                                                                                                     
+    1.42%     0.06%  garble   [.] golang.org/x/tools/go/ast/astutil.(*application).apply                                                                                                          
+    1.42%     0.00%  garble   [.] main.(*transformer).transformGo                                                                                                                                 
+    1.42%     0.00%  garble   [.] golang.org/x/tools/go/ast/astutil.Apply                                                                                                                         
+    1.42%     0.02%  garble   [.] golang.org/x/tools/go/ast/astutil.(*application).applyList                                                                                                      
+    1.37%     0.00%  garble   [.] go/parser.ParseFile                                                                                                                                             
+    1.37%     0.03%  garble   [.] go/types.(*Checker).exprInternal                                                                                                                                
+    1.35%     0.00%  garble   [.] go/parser.(*parser).parseFile                                                                                                                                   
+    1.30%     0.72%  garble   [.] runtime.scanobject                                                                                                                                              
+    1.29%     0.01%  garble   [.] go/types.(*Checker).processDelayed                                                                                                                              
+    1.28%     0.00%  garble   [.] go/types.(*Checker).funcDecl.func1                                                                                                                              
+    1.28%     0.00%  garble   [.] go/types.(*Checker).funcBody                                                                                                                                    
+    1.26%     0.01%  garble   [.] go/types.(*Checker).stmtList    
[...]
+    0.91%     0.91%  garble   [.] crypto/sha256.block.abi0       
+    0.76%     0.76%  garble   [.] go/ast.(*BinaryExpr).Pos

Some interesting bits:

2.5% goes to typechecking alone. stop loading type information of obfuscated dependencies #456 should help.
We generate a significant amount of garbage, with mallocgc at 2.3% and scanobject at 1.3%. We should try to do better, though I imagine most of the allocs happen in go/parser and go/types.
printFile alone is responsible for 2.6%, most of which comes from go/printer.WalkBinary. That looks suspicious.
1% is to be blamed on sha256. Should we reuse some of the hashes, or perhaps look at more efficient cryptographic hashes?
0.8% also goes to BinaryExpr, this time for looking up its position. Do we have some massive binary expressions?

This issue is to post this information and act as a reminder to look into these potential optimizations.

The text was updated successfully, but these errors were encountered:

capnspacehook · 2022-01-22T15:54:49Z

A simple suggestion would be to cache hashes of names so we don't waste cpu cycles rehashing the same inputs, but that may or may not help. The extra allocs might hurt perf, will have to test

mvdan · 2022-01-22T16:09:04Z

There are multiple things we could try there:

use a different hash func, perhaps one that doesn't output as many bits
memoize hashes, either all of them or just the ones likely to get reused (e.g. short names)
avoid hashing names we don't need to (e.g. when building binaries, obfuscating local variables is unnecessary, if you keep in mind the potential name clashes with code inserted by literal obfuscation)

mvdan · 2022-01-22T16:10:19Z

Note to self: it would also be useful to obtain a flamegraph representation of where garble is spending its time, to then look at what parts of a build could be parallelized. I imagine a good candidate would be obfuscating the Go files in transformCompile.

pagran · 2022-01-22T16:35:52Z

Of the cryptographic hashes sha1 usually shows good speeds (although it is obsolete)
Bench: https://github.com/SimonWaldherr/golang-benchmarks#hash

But we can use non-cryptographic hashes, such as fnv (example implementation: https://github.com/segmentio/fasthash) 64 bits should be enough, right?

mvdan · 2022-01-22T17:12:46Z

I'd prefer to stick to decent cryptographic hashes; if the hashes are non-cryptographic, then presumably they are easier to bruteforce or break.

One candidate could be blake2, which overall seems slightly better and faster than sha2. I reckon that switch might be worthwhile even if we implement some of the other improvements; they could all help.

capnspacehook · 2022-01-22T23:31:20Z

I would suggest blake3, as it's much faster than blake2 from what I've seen, but there's not an official/reviewed implementation in Go AFAIK

mvdan · 2022-01-24T13:46:21Z

I've found why binary expressions were taking up so much CPU time. I initially thought that the go/printer or go/ast packages were too inefficient, but it turns out they are okay. They are simply dealing with, let's say, a horribly long binary expression:

https://github.com/golang/go/blob/0ef6dd74409506eb084bd8d2fe61e0e70ed9e5a4/src/crypto/elliptic/p256_asm_table.go

With some napkin math, we can see ~1400 lines with 8 + binary expressions per line. That's a binary expression with beyond 10k elements. Which is made worse by how binary expressions are essentially linked lists; walking the entire expression jumps through 10k+ pointers.

I'll be sending a patch upstream to remedy this somewhat. This is a regression from 1.17 via https://go-review.googlesource.com/c/go/+/339591, and I think this was an unintended effect.

mvdan · 2022-01-24T15:51:07Z

I would suggest blake3, as it's much faster than blake2 from what I've seen, but there's not an official/reviewed implementation in Go AFAIK

I slightly lean towards blake2, given that it's had more time to prove itself, and there's an official Go implementation that has likely been well reviewed and optimized. There are a few blake3 implementations out there, but it's hard to judge them.

mvdan · 2022-01-24T15:52:38Z

It's also a good idea to subscribe to golang/go#36632. My guess is that within a couple of years, one of those implementations will be promoted to x/crypto.

If package P1 imports package P2, P1 needs to know which names from P2 weren't obfuscated. For instance, if P2 declares T2 and does "reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and neither should P1. This information should flow from P2 to P1, as P2 builds before P1. We do this via obfuscatedTypesPackage; P1 loads the type information of the obfuscated version of P2, and does a lookup for T2. If T2 exists, then it wasn't obfuscated. This mechanism has served us well, but it has downsides: 1) It wastes CPU; we load the type information for the entire package. 2) It's complex; for instance, we need KnownObjectFiles as an extra. 3) It makes our code harder to understand, as we load both the original and obfuscated type informaiton. Instead, we now have each package record what names were not obfuscated as part of its cachedOuput file. Much like KnownObjectFiles, the map records incrementally through the import graph, to avoid having to load cachedOutput files for indirect dependencies. We shouldn't need to worry about those maps getting large; we only skip obfuscating declared names in a few uncommon scenarios, such as the use of reflection or cgo's "//export". Since go/types is relatively allocation-heavy, and the export files contain a lot of data, we get a nice speed-up: name old time/op new time/op delta Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5) name old bin-B new bin-B delta Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5) name old user-time/op new user-time/op delta Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5) Fixes burrowers#456. Updates burrowers#475.

If package P1 imports package P2, P1 needs to know which names from P2 weren't obfuscated. For instance, if P2 declares T2 and does "reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and neither should P1. This information should flow from P2 to P1, as P2 builds before P1. We do this via obfuscatedTypesPackage; P1 loads the type information of the obfuscated version of P2, and does a lookup for T2. If T2 exists, then it wasn't obfuscated. This mechanism has served us well, but it has downsides: 1) It wastes CPU; we load the type information for the entire package. 2) It's complex; for instance, we need KnownObjectFiles as an extra. 3) It makes our code harder to understand, as we load both the original and obfuscated type informaiton. Instead, we now have each package record what names were not obfuscated as part of its cachedOuput file. Much like KnownObjectFiles, the map records incrementally through the import graph, to avoid having to load cachedOutput files for indirect dependencies. We shouldn't need to worry about those maps getting large; we only skip obfuscating declared names in a few uncommon scenarios, such as the use of reflection or cgo's "//export". Since go/types is relatively allocation-heavy, and the export files contain a lot of data, we get a nice speed-up: name old time/op new time/op delta Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5) name old bin-B new bin-B delta Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5) name old user-time/op new user-time/op delta Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5) Fixes #456. Updates #475.

mvdan · 2022-02-08T08:41:39Z

The crypto/elliptic issue is now fixed with https://go-review.googlesource.com/c/go/+/380475.

mvdan · 2022-02-12T10:39:26Z

Some updated numbers:

main.(*transformer).typecheck remains at 2.8%; unlikely that we can bring it down significantly
main.printFile remains at 2.7%; we can probably reduce the amount of work there
mallocgc remains at 2.3%; we can probably reduce some amount of garbage
Nothing related to binary expressions shows up near or above 1% anymore, as expected
crypto/sha256.block.abi0 is now at 0.2%; unclear if the change is just noise.

So it seems to me like the only two places we need to pay attention to in the very short term is printFile and unnecessary garbage allocation. I'll do the former; any help with the latter is appreciated.

mvdan · 2022-06-15T22:22:19Z

go/printer shows up as 3.3% total CPU cost as of today, so I took a look and sent a few CLs upstream to make it ~15% faster. Reviews or thoughts welcome: https://go-review.googlesource.com/c/go/+/412557

Pretty sure I could squeeze another 10-20% out of it once these CLs are merged. I don't want to go any further for now, before reviews.

mvdan mentioned this issue Jan 27, 2022

stop loading obfuscated type information from deps #476

Merged

mvdan mentioned this issue Feb 3, 2022

crypto/elliptic: regression from 1.17 in the cost of parsing and compiling p256_asm_table.go golang/go#50995

Closed

lu4p added the enhancement New feature or request label Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

look into garble functions showing up in "perf report" #475

look into garble functions showing up in "perf report" #475

mvdan commented Jan 22, 2022

capnspacehook commented Jan 22, 2022

mvdan commented Jan 22, 2022

mvdan commented Jan 22, 2022

pagran commented Jan 22, 2022

mvdan commented Jan 22, 2022

capnspacehook commented Jan 22, 2022

mvdan commented Jan 24, 2022

mvdan commented Jan 24, 2022

mvdan commented Jan 24, 2022

mvdan commented Feb 8, 2022

mvdan commented Feb 12, 2022

mvdan commented Jun 15, 2022

look into garble functions showing up in "perf report" #475

look into garble functions showing up in "perf report" #475

Comments

mvdan commented Jan 22, 2022

capnspacehook commented Jan 22, 2022

mvdan commented Jan 22, 2022

mvdan commented Jan 22, 2022

pagran commented Jan 22, 2022

mvdan commented Jan 22, 2022

capnspacehook commented Jan 22, 2022

mvdan commented Jan 24, 2022

mvdan commented Jan 24, 2022

mvdan commented Jan 24, 2022

mvdan commented Feb 8, 2022

mvdan commented Feb 12, 2022

mvdan commented Jun 15, 2022