Controlling Virtual Register Pressure in LLVM Middle-End 1 Outline - PowerPoint PPT Presentation

Ivan Baev Controlling Virtual Register Pressure in LLVM Middle-End 1

Outline Motivation Related work Register pressure background LLVM LICM LLVM GVN Performance results Future work: LLVM Inliner Summary 2

Motivation When we compared LTO -Ofast vs -Ofast performance we saw 10% degradation in spec2000/crafty benchmark Analysis revealed the impact of the following LLVM passes − Inliner, LICM, and GVN Extra spill code and additional execution time in Evaluate() and Swap() Increased register pressure This scenario is typical for other compilers too − When enabling a new optimization, or increasing the optimization level 3

Related work Register pressure has been a known problem for compiler/performance engineers − Mismatch between the infinite number of virtual registers and fixed number of physical registers A lot of work to handle register pressure (RP) at machine-level IR: register allocator, scheduler, and related passes − Rematerialization (Briggs, 1992) − Fighting register pressure in GCC (Makarov, 2004) − Prematerialization (Baev, Hank, Gross, 2006) − Region-based register allocation (Baev, 2009) − Register pressure-aware scheduling (Touati, 2001; Govindarajan, 2003) − LLVM register pressure tracking and RP-aware scheduling (Trick, 2012) 4

Related work (cont.) Some work at higher-lever IR: LICM, PRE, and loop transformations − Register pressure sensitive redundancy elimination (Gupta, Bodik, 1999) − Register pressure guided unroll-and-jam (Ma, Carr, 2008) − Model-based framework: an approach for profit-driven optimization (Zhao, Childers, Soffa, 2005) − Inlining – most papers acknowledge the problem of register pressure but do not address it directly (Zhao, Amaral, 2003; Chakrabarty, Liu, 2006) Handling register pressure at a single place in the compiler – e.g. in register allocator or scheduler – is usually not enough This talk will focus on middle-end, target-independent optimizations 5

Virtual register pressure At a given program point − the number of overlapping live ranges at that point For a basic block(BB)/loop/function − the highest register pressure over all program points in the BB/loop/function For a BB/loop (in this work) − the number of liveins for the BB/loop Integer RP, floating-point RP, predicate RP, etc. It is an approximation − Sources of approximation: register promotion of memory, calls, register pairs, etc. − Tradeoff between precision and compile-time, e.g. better precision requires data-flow analysis 6

Our approach Analyze a pass and its components w.r.t. register pressure − Study the code, collect statistics Add a measure of register pressure to control the component(s) with a high impact Allow a component/pass to be invoked multiple times Include a comparison with the number of hardware registers of the corresponding type for the target processor 7

Register pressure analysis of LLVM LICM pass Loop-level pass with three components − Sinking code − Hoisting code − Promotion of memory locations 8

Register pressure analysis of LLVM LICM pass Loop-level pass with three components − Sinking code – not likely to impact RP (# liveins for the loop) 9

Register pressure analysis of LLVM LICM pass Loop-level pass with three components − Sinking code – not likely to substantially impact RP − Hoisting code - may impact RP Instructions to be hoisted and the impact on RP for the loop %528 = load i64* @rank_mask.1 %527 = load i64* getelementptr inbounds (%struct.CHESS_POSITION.86* @search, i32 0, i32 7) Both instructions increase RP (# liveins) by 1 %tobool1418 = icmp ne i64 %and1417, 0 // assume %and1417 is only used in this instruction No change in RP %and1417 = and i64 %518, %517 // assume %518 and %517 are only used in this instruction Decrease RP by 1 10

Register pressure analysis of LLVM LICM pass Loop-level pass with three components − Sinking code – not likely to impact RP (# liveins for the loop) − Hoisting code - may impact RP − Promotion of memory locations – not likely to substantially increase RP y = ld [a] a y // removed a, but added y to liveins y = ld [a] y1 = phi (y, y2) y1 = y + Expr y2 = y1 + Expr st [a] = y1 st [a] = y1 11

Implementation of LICM RP heuristic int MaxLIs // Max number of new liveins allowed for hoisting for the loop int NumLIs // Current number of new liveins for the loop estimateRegisterPressure (Loop *L) { unsigned MaxLiveIns = TTI->getNumberOfRegisters(false) Set LiveIns Iterate over all BBs in L Iterate over all instructions in BB Iterate over source operands in Instruction if (Operand is of integer or pointer type) if (OperandValue is defined outside L) || (OperandValue is argument or global variable)) LiveIns.insert(OperandValue) NumLIs = 0 if (LiveIns.cardinality >= MaxLiveIns) MaxLIs = 0 else MaxLIs = MaxLiveIns - LiveIns.cardinality } // also, provision for user-defined MaxLiveIns (not shown) 12

Implementation of LICM RP heuristic (cont.) bool doesReducePressure (Instruction &I, Loop *L, int &NumLiveInReduce) { NumLiveInReduce = -1 // start with - 1 due to hoisting I’s destination operand Iterate over all source operands of I if (Operand is of integer or pointer type) if ((OperandValue is defined outside L) || (OperandValue is argument or global variable)) if (OperandValue has one use) NumLiveInReduce++ if (NumLiveInReduce >= 1) return true else return false } hoist (Instruction &I) { bool ReducePressure = doesReducePressure(I, L, NumLiveInReduced) if (NumLIs >= MaxLIs) && !ReducePressure) return // skip hoisting . . . hoist I NumLIs -= NumLiveInReduced // keep track of loop’s liveins } 13

Register pressure analysis of LLVM GVN pass Function-level pass with two major components GVN part − processBlock() -> processInstruction() − SimplifyInstruction − processLoad() -> processNonLocalLoad − processBranch − processSwitch − ProcessOtherInstruction PRE part − Simple local PRE on diamond control-flow patterns 14

Implementation GVN RP heuristic GVN mostly operates on BB basis Estimate RP for the basic block enclosing the load − estimateRegisterPressure(BB) Estimate RP for the loop enclosing the load − estimateRegisterPressure(Loop) // similar to the version in LICM Using loop-based RP performs better if (estimateRegisterPressure(Loop) >= TTI->getNumberOfRegisters(false)) skip processing/promoting this load 15

Performance evaluation of RP heuristics Speedup with LICM- Speedup with GVN- Speedup with both Benchmark RP over LTO (%) RP over LTO (%) over LTO (%) ammp 1.7 0.5 1.6 bzip2 -0.8 -2.1 -1.5 crafty 2.5 1.4 4.3 equake 0.4 1.6 1.9 mesa 9.1 3.8 5.7 twolf 3.1 3.1 3.0 vpr 2.2 1.6 2.4 With QC LLVM compiler - on Nexus4 Android devices, ARMv7, thumb mode Good improvements also in ARM mode and for Hexagon processors, for both -Ofast and LTO optimization levels 16

Controlling RP in Inliner (future work) Calculate maxRP for each function traversing the call graph bottom-up − At each call site add the callee’s RP to the current RP of the caller Add RP at call site to the goodness factor (ranking) of the call site − In the denominator - as a cost − Add extra cost if RP exceeds the number of hardware registers for the target Likely no need to update maxRP for a function after inlining any of call sites within the function 17

Summary The register pressure problem will likely to stay − Newer generation processors feature more registers, however compiler engineers quickly make the extra registers insufficient We presented a general approach and specific heuristics for controlling register pressure in LLVM LICM, GVN, and Inliner passes − Will upstream RP patches if there is interest − Unroller is another candidate for RP tuning in the middle-end Compiler optimizations should be designed to take into account register pressure − Simple heuristics are good in many cases As a community, continue improving RP in machine-level passes − Compute and report the sum of weighted spills when profile information is available 18

Acknowledgements Zhaoshi Zheng, Balaram Makam, Yin Ma, Taylor Simpson, QuIC Any questions? 19

Controlling Virtual Register Pressure in LLVM Middle-End 1 Outline - PowerPoint PPT Presentation

Ivan Baev Controlling Virtual Register Pressure in LLVM Middle-End 1 Outline Motivation Related work Register pressure background LLVM LICM LLVM GVN Performance results Future work: LLVM Inliner Summary 2 Motivation When we compared

LLVM IR and the IoT Dvid Juhsz david.juhasz@imsystech.com 4/2/2018 1 FOSDEM 2018 LLVM

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

LLVM Binutils BoF 2019 EuroLLVM Developers' Meeting James Henderson (SN Systems) Jordan

LLVM/Clang Mouna Abidi & Manel Grichi 1 Plan What is LLVM? How will you be using it?

PRESSURE PRESSURE IN SOLIDS Pressure is the unit force per unit area. The unit of

Register Allocation (via graph coloring and spilling) Register allocation LLVM IR uses an

LLVM Coroutines Bringing resumable functions to LLVM LLVM Dev Meeting 2016 Gor Nishanov

Wring an LLVM Pass: 101 LLVM 2019 tutorial Andrzej Warzyski arm October 2019 Andrzejs

A Brief Introduction to Using LLVM Nick Sumner Spring 2013 What is LLVM? A compiler? What

Building, Testing and Debugging a Simple out-of-tree LLVM Pass October 29, 2015, LLVM

LLVM Simone Campanoni simonec@eecs.northwestern.edu Problems with Canvas? Problems with slides?

LLVM Passes Nick Sumner (see also https://github.com/nsumner/llvm-demo) Matt Dwyer (see also

Register Packing Register Packing Exploiting Narrow- -Width Operands Width Operands Exploiting

Welcome to the Back End: The LLVM Machine Representation Matthias Braun, Apple Program

IC220 Set #7: Controlling the Single Cycle Implementation (Chapter Four) 1 Control Selecting

Control Unit Datapath Elements & Single Cycle Datapath Unit Register Files Register Layout

Grover Mixers for QAOA: Shifting Complexity from Mixer Design to State Preparation you joint

Companion symmetry of SUSY PHENO 2008 Hye-Sung Lee Lightest U -parity Particle (LUP) dark matter

Dimension Reduction using PCA and SVD Plan of Class Starting the machine Learning part of the

Intro to harmonic analysis on groups Risi Kondor . The Fourier series (1807) Any (sufficiently

"Key Migration Protocol" Sounds Scary Singapore Plenary - Oct, 2018

Lecture 2.6: Singular points and the Frobenius method Matthew Macauley Department of Mathematical

CSE 115 Introduction to Computer Science I Road map Review (sorting) Empirical Demo

Reminders Continue working on Homework #2 It is due in class on Thursday Check Piazza

Controlling Virtual Register Pressure in LLVM Middle-End 1 Outline - PowerPoint PPT Presentation

Ivan Baev Controlling Virtual Register Pressure in LLVM Middle-End 1 Outline Motivation Related work Register pressure background LLVM LICM LLVM GVN Performance results Future work: LLVM Inliner Summary 2 Motivation When we compared

LLVM IR and the IoT Dvid Juhsz david.juhasz@imsystech.com 4/2/2018 1 FOSDEM 2018 LLVM

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

LLVM Binutils BoF 2019 EuroLLVM Developers' Meeting James Henderson (SN Systems) Jordan

LLVM/Clang Mouna Abidi &amp; Manel Grichi 1 Plan What is LLVM? How will you be using it?

PRESSURE PRESSURE IN SOLIDS Pressure is the unit force per unit area. The unit of

Register Allocation (via graph coloring and spilling) Register allocation LLVM IR uses an

LLVM Coroutines Bringing resumable functions to LLVM LLVM Dev Meeting 2016 Gor Nishanov

Wring an LLVM Pass: 101 LLVM 2019 tutorial Andrzej Warzyski arm October 2019 Andrzejs

A Brief Introduction to Using LLVM Nick Sumner Spring 2013 What is LLVM? A compiler? What

Building, Testing and Debugging a Simple out-of-tree LLVM Pass October 29, 2015, LLVM

LLVM Simone Campanoni simonec@eecs.northwestern.edu Problems with Canvas? Problems with slides?

LLVM Passes Nick Sumner (see also https://github.com/nsumner/llvm-demo) Matt Dwyer (see also

Register Packing Register Packing Exploiting Narrow- -Width Operands Width Operands Exploiting

Welcome to the Back End: The LLVM Machine Representation Matthias Braun, Apple Program

IC220 Set #7: Controlling the Single Cycle Implementation (Chapter Four) 1 Control Selecting

Control Unit Datapath Elements &amp; Single Cycle Datapath Unit Register Files Register Layout

Grover Mixers for QAOA: Shifting Complexity from Mixer Design to State Preparation you joint

Companion symmetry of SUSY PHENO 2008 Hye-Sung Lee Lightest U -parity Particle (LUP) dark matter

Dimension Reduction using PCA and SVD Plan of Class Starting the machine Learning part of the

Intro to harmonic analysis on groups Risi Kondor . The Fourier series (1807) Any (sufficiently

&quot;Key Migration Protocol&quot; Sounds Scary Singapore Plenary - Oct, 2018

Lecture 2.6: Singular points and the Frobenius method Matthew Macauley Department of Mathematical

CSE 115 Introduction to Computer Science I Road map Review (sorting) Empirical Demo

Reminders Continue working on Homework #2 It is due in class on Thursday Check Piazza

LLVM/Clang Mouna Abidi & Manel Grichi 1 Plan What is LLVM? How will you be using it?

Control Unit Datapath Elements & Single Cycle Datapath Unit Register Files Register Layout

"Key Migration Protocol" Sounds Scary Singapore Plenary - Oct, 2018