Compiler-assisted Performance Analysis
Adam Nemet Apple anemet@apple.com
Compiler-assisted Performance Analysis Adam Nemet Apple - - PowerPoint PPT Presentation
Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com Hotspot User Bottleneck Compiler Optimization X, Y 2 Hotspot Hotspot Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations?
Adam Nemet Apple anemet@apple.com
Compiler Optimization X, Y
2
Hotspot Bottleneck
Some Optimizations? Compiler Optimization X, Y
2
Hotspot Bottleneck
Hotspot Legality Cost Model
Some Optimizations? Compiler Optimization X, Y
2
Hotspot Bottleneck
Hotspot Legality Cost Model
Some Optimizations? Compiler Optimization X, Y
Disassemble
2
Hotspot Bottleneck
Hotspot Legality Cost Model
Some Optimizations? Compiler Optimization X, Y
2
Hotspot Bottleneck
Hotspot Legality Cost Model
Optimization Diagnostics
Some Optimizations? Compiler Optimization X, Y
2
Hotspot Bottleneck
Hotspot Legality Cost Model
3
foo.c:8:5: remark: accumulate inlined into compute_sum[-Rpass=inline] accumulate(arr[i], sum); ^
3
4
4
Remarks for hot and cold code are intermixed Messages appear in no particular order Messages from successful and failed
targeted low-level debugging
code
5
6
7
9
$ clang -O3 —fsave-optimization-record -c foo.c $ utils/opt-viewer/opt-viewer.py foo.opt.yaml html $ open html/foo.c.html
11
13
Remarks appear inline under the referenced line Name of the pass Green for successful
Further details about the optimization
14
Column aligned with the expression HTML link to facilitate further analysis
15
Remarks in white are Analysis remarks Optimizations can expose interesting analyses
15
16
Red means failed
22
ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold));
OptimizationRemarkEmitter
foo.c:8:5: remark: accumulate can be inlined into compute_sum with cost=-5 (threshold=487) [-Rpass-analysis=inline] accumulate(arr[i], sum); ^
Inliner LoopVectorizer
Pass pipeline
new
IR IR
22
ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold));
OptimizationRemarkEmitter YAML
Inliner LoopVectorizer
Pass pipeline
enables source line debug info
(-gline-tables-only)
new
IR IR
22
ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold));
OptimizationRemarkEmitter YAML
Inliner LoopVectorizer
Pass pipeline
enables source line debug info
(-gline-tables-only)
new
IR IR
Pass: inline Name: CanBeInlined DebugLoc: { File: s.cc, Line: 8, Column: 5 } Function: compute_sum Args:
DebugLoc: { File: s.cc, Line: 1, Column: 0 }
DebugLoc: { File: s.cc, Line: 5, Column: 0 }
...
YAML utils/opt-viewer/opt-viewer.py index.html foo.o.html
23
new
24
24
Noisy: Most of this code not hot Sort by hotness
IR
Inliner LoopVectorizer OptimizationRemarkEmitter YAML LazyBlockFrequencyInfo
Pass: inline Name: CanBeInlined DebugLoc: { File: s.cc, Line: 8, Column: 5 } Function: compute_sum Hotness: 3 Args:
DebugLoc: { File: s.cc, Line: 1, Column: 0 }
DebugLoc: { File: s.cc, Line: 5, Column: 0 }
...
BlockFrequencyInfo
25
new
Pass pipeline IR
Relative to maximum hotness, NOT total time %
27
Function Inliner Loop Vectorizer Loop Unroller LoopDataPrefetch
28
LICM GVN Loop Idiom Loop Deletion SLP Vectorizer
… more to follow
29
sufficient detail to reconstruct what happened?
30
31
33
Inlining Context
36
38
40
42
45
46
48
50
51
52
Not modified via LinP, maybe writes through other pointers
Not modified via LinD, maybe writes through other pointers
Reads and writes don’t alias
Reads and writes don’t alias Loop versioning with array overlap checks?
55
LICM-based LoopVersioning
(-enable-loop-versioning-licm)
55
LICM-based LoopVersioning
(-enable-loop-versioning-licm)
Performance opportunity if we can improve this pass
55
LICM-based LoopVersioning
(-enable-loop-versioning-licm)
Performance opportunity if we can improve this pass Approximate the opportunity by manually modifying the source
58
http://lab.llvm.org:8080/artifacts/opt-view_test-suite
59
60
Code Author Tool Compiler Developer Tool Initial version on LLVM trunk Now New tools using Optimization Records
61
61
same object consider using ‘restrict’
and suggest:
loop transformations, non-temporal stores)
62
same object consider using ‘restrict’
and suggest:
loop transformations, non-temporal stores)
62
63
63
SELECT benchmark, hotspot, hotness FROM optimizations WHERE pass = ‘licm’ AND type = ‘missed’ AND name = ‘LoadWithLoopInvariantAddressInvalidated’ ORDER BY hotness
in the hottest code
63
64
65
66
67
68
69
70
71
72
Look at the loads
73
Look at the loads
74
75
Look at the stores
76
Look at the stores
77
Can ‘m’ and ’n’ really alias?
78
Probably not!
exon_p_t m = mCol->e.exon[i];
79
We need to use ‘restrict’
exon_p_t m = mCol->e.exon[i];
80
81
82
83
84