Compiler-assisted Performance Analysis Adam Nemet Apple - PowerPoint PPT Presentation

Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

Hotspot User Bottleneck Compiler Optimization X, Y 2

Hotspot Hotspot Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2

Hotspot Hotspot Disassemble Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2

Hotspot Hotspot -debug-only Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2

Hotspot Hotspot Optimization Diagnostics Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2

Optimization Diagnostics in LLVM • Supported in LLVM • Only a small number of passes emit them • -Rpass options to enable them in the compiler output foo.c:8:5: remark: accumulate inlined into compute_sum[-Rpass=inline] accumulate(arr[i], sum); ^ 3

Optimization Diagnostics in LLVM • Supported in LLVM • Only a small number of passes emit them • -Rpass options to enable them in the compiler output • For large programs, the output of -Rpass is noisy and unstructured 3

Messages appear Remarks for hot and cold in no particular order code are intermixed How can we make this information accessible and actionable? Messages from successful and failed optimizations are dumped together 4

Wish List • All in one place : Optimizations Dashboard • At a glance : See high-level interaction between optimizations for targeted low-level debugging • Filtering : Noise-level should be minimized by focusing on the hot code • Integration : Display hot code and the optimizations side-by-side 5

opt-viewer 6

Approach • Extend existing optimization remark infrastructure • Add the new optimizations • Add ability to output remarks to a data file • Visualize data in HTML • Targeting compiler developers initially 7

Example 9

Work Flow $ clang -O3 —fsave-optimization-record -c foo.c $ utils/opt-viewer/opt-viewer.py foo.opt.yaml html $ open html/foo.c.html 11

Successful Optimizations Remarks appear inline under Further details the referenced line about the optimization Green for successful Name of the pass optimization 13

Successful Optimizations Column aligned with the expression HTML link to facilitate further analysis 14

Successful Optimizations Optimizations can expose interesting Remarks in white analyses are Analysis remarks 15

Missed Optimizations 15

Missed Optimizations Red means failed optimization 16

old LLVM Changes new Pass pipeline IR IR Inliner LoopVectorizer ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold)); OptimizationRemarkEmitter -Rpass-analysis=inline foo.c:8:5: remark: accumulate can be inlined into compute_sum with cost=-5 (threshold=487) [-Rpass-analysis=inline] accumulate(arr[i], sum); ^ 22

old LLVM Changes new Pass pipeline IR IR Inliner LoopVectorizer ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold)); OptimizationRemarkEmitter -fsave-optmization-record enables source line debug info YAML (-gline-tables-only) 22

old LLVM Changes new Pass pipeline IR IR Inliner LoopVectorizer ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold)); --- !Analysis Pass: inline Name: CanBeInlined DebugLoc: { File: s.cc, Line: 8, Column: 5 } OptimizationRemarkEmitter Function: compute_sum Args: - Callee: accumulate DebugLoc: { File: s.cc, Line: 1, Column: 0 } -fsave-optmization-record - String: ' can be inlined into ' - Caller: compute_sum enables source line DebugLoc: { File: s.cc, Line: 5, Column: 0 } debug info YAML - String: ' with cost=' (-gline-tables-only) - Cost: '-5' - String: ' (threshold=' - Threshold: '487' - String: ')' ... 22

old opt-viewer new YAML utils/opt-viewer/opt-viewer.py index.html foo.o.html 23

Index 24

Index Noisy: Most of this code not hot Sort by hotness 24

old Use PGO for Hotness new Pass pipeline --- !Analysis IR IR Inliner LoopVectorizer Pass: inline Name: CanBeInlined DebugLoc: { File: s.cc, Line: 8, Column: 5 } Function: compute_sum Hotness: 3 Args: - Callee: accumulate DebugLoc: { File: s.cc, Line: 1, Column: 0 } OptimizationRemarkEmitter - String: ' can be inlined into ' - Caller: compute_sum DebugLoc: { File: s.cc, Line: 5, Column: 0 } - String: ' with cost=' - Cost: '-5' - String: ' (threshold=' - Threshold: '487' LazyBlockFrequencyInfo - String: ')' ... YAML BlockFrequencyInfo 25

Hotness Relative to maximum hotness, NOT total time % 27

Optimizations Recorded LICM Function Inliner GVN Loop Vectorizer Loop Idiom Loop Unroller Loop Deletion LoopDataPrefetch SLP Vectorizer … more to follow 28

Test Drive on LLVM test suite 29

Improve & Evaluate 1. Does the information presented in this high-level view contain sufficient detail to reconstruct what happened? 2. Can we discover the interactions between optimizations? 3. With the improved visibility, can we quickly find real performance opportunities? 30

DhryStone (SingleSource/Benchmark) Interaction of Optimizations 31

DhryStone Inlining Context 33

DhryStone 36

DhryStone 38

DhryStone 40

DhryStone 42

DhryStone 45

DhryStone 46

DhryStone 48

DhryStone 50

DhryStone: Summary • Without low-level debugging, quickly reconstructed what happened • Even though it involved interaction between multiple optimizations • Inlining and Alias Analysis/GVN • Missed optimizations: Extra analysis to manage with false positives 1. Filter trivially false positives 2. Expose enough information for quick detection by user 51

Freebench/distray (MultiSource/Benchmarks) Finding Performance Opportunity 52

Not modified via LinP, maybe writes through other pointers

Not modified via LinD, maybe writes through other pointers

Reads and writes don’t alias

Loop versioning Reads and writes don’t alias with array overlap checks?

LICM-based LoopVersioning (-enable-loop-versioning-licm) 55

LICM-based LoopVersioning Performance opportunity if we can (-enable-loop-versioning-licm) improve this pass 55

LICM-based LoopVersioning Performance opportunity if we can (-enable-loop-versioning-licm) Approximate the opportunity by improve this pass manually modifying the source 55

Dynamic Instruction Count Reduced by 11%

Dynamic Instruction Count Performance headroom Reduced by 11% 11%

Freebench/distray: Summary • Found optimization opportunity while staying in the high-level view • Reconstructed the reason for missed optimization • High-level view exposed that the gain may be substantial • Got immediate feedback of the desired effect on the prototype • Identified the pass for low-level debugging 58

Check Out More Examples http://lab.llvm.org:8080/artifacts/opt-view_test-suite 59

Development Timeline Initial version on LLVM trunk Now New tools using Optimization Compiler Developer Tool Records Code Author Tool 60

Compiler Developer Tool: Status • Written in Python • Hook up new passes • Improve diagnostics quality for existing passes • Perform extra analysis for insightful messages • Improve UI 61

Compiler Developer Tool: Status • Written in Python p • Hook up new passes l e H r o f • Improve diagnostics quality for existing passes t s e u q e R • Perform extra analysis for insightful messages • Improve UI 61

Compiler-assisted Performance Analysis Adam Nemet Apple - PowerPoint PPT Presentation

Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com Hotspot User Bottleneck Compiler Optimization X, Y 2 Hotspot Hotspot Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations?

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Medication Assisted Treatment For Opioid Use Disorder Medication Assisted Treatment For Opioid

Influencing and voluntary assisted dying Slide Voluntary assisted dying, euthanasia, dying with

11/8/2012 The Structure of a Compiler (2) The Structure of a Compiler (1) Any compiler must

Compiler Development (CMPSC 401) Janyl Jumadinova January 17, 2018 Janyl Jumadinova Compiler

Principles of Compiler Design - The Brainf*ck Compiler - Clifford Wolf - www.clifford.at

Assisted warmup with the Zing JVM Ivn Kr lov @JohnWings Assisted warmup with the Zing JVM

Compiler verification for fun and profit Xavier Leroy Inria Paris-Rocquencourt FMCAD,

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

Compiler Construction Compiler Construction 1 / 54 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 193 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Development (CMPSC 401) Intermediate Representations Janyl Jumadinova March 28, 2019

Compiler Construction October 20, 2018 Compiler Construction October 20, 2018 1 / 115 Mayer

Compiler Construction Compiler Construction 1 / 177 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 87 Mayer Goldberg \ Ben-Gurion University

BUILDING A SOE / MOE Adam Reed The Australian National University Hashtag : #xw13 Please leave

Welcome! Org. Names Org. Names Org. Names Org. Names Technical Set-up Denver Art

Semantic Modeling with Frames Rainer Osswald & Wiebke Petersen Department of Linguistics and

B.Sc., Computer Software Engineering Jerusalem College of Technology Calendar Math 1 Hour =

Processes and threads assignments (LE: 1,3,5,7,8,9,10,11,13) 1. Consider a computer running an mp3

SURVIVING CRAPPY INTERNET Anouk Ruhaak @anoukruhaak BETWEEN OFFLINE AND ONLINE SPEED

Named Entity Recognition & Sequence Labeling CSCI 699: ML for Knowledge Extraction &

The Strategist Adam Brandenburger J.P . Valles Professor, NYU Stern School of

Compiler-assisted Performance Analysis Adam Nemet Apple - PowerPoint PPT Presentation

Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com Hotspot User Bottleneck Compiler Optimization X, Y 2 Hotspot Hotspot Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations?

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Medication Assisted Treatment For Opioid Use Disorder Medication Assisted Treatment For Opioid

Influencing and voluntary assisted dying Slide Voluntary assisted dying, euthanasia, dying with

11/8/2012 The Structure of a Compiler (2) The Structure of a Compiler (1) Any compiler must

Compiler Development (CMPSC 401) Janyl Jumadinova January 17, 2018 Janyl Jumadinova Compiler

Principles of Compiler Design - The Brainf*ck Compiler - Clifford Wolf - www.clifford.at

Assisted warmup with the Zing JVM Ivn Kr lov @JohnWings Assisted warmup with the Zing JVM

Compiler verification for fun and profit Xavier Leroy Inria Paris-Rocquencourt FMCAD,

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

Compiler Construction Compiler Construction 1 / 54 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 193 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Development (CMPSC 401) Intermediate Representations Janyl Jumadinova March 28, 2019

Compiler Construction October 20, 2018 Compiler Construction October 20, 2018 1 / 115 Mayer

Compiler Construction Compiler Construction 1 / 177 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 87 Mayer Goldberg \ Ben-Gurion University

BUILDING A SOE / MOE Adam Reed The Australian National University Hashtag : #xw13 Please leave

Welcome! Org. Names Org. Names Org. Names Org. Names Technical Set-up Denver Art

Semantic Modeling with Frames Rainer Osswald &amp; Wiebke Petersen Department of Linguistics and

B.Sc., Computer Software Engineering Jerusalem College of Technology Calendar Math 1 Hour =

Processes and threads assignments (LE: 1,3,5,7,8,9,10,11,13) 1. Consider a computer running an mp3

SURVIVING CRAPPY INTERNET Anouk Ruhaak @anoukruhaak BETWEEN OFFLINE AND ONLINE SPEED

Named Entity Recognition &amp; Sequence Labeling CSCI 699: ML for Knowledge Extraction &amp;

The Strategist Adam Brandenburger J.P . Valles Professor, NYU Stern School of

Semantic Modeling with Frames Rainer Osswald & Wiebke Petersen Department of Linguistics and

Named Entity Recognition & Sequence Labeling CSCI 699: ML for Knowledge Extraction &