Revisiting Graph Coloring Register Allocation A Study of the - PowerPoint PPT Presentation

Revisiting Graph Coloring Register Allocation A Study of the Chaitin-Briggs and Callahan-Koblenz Algorithms Keith Cooper, Anshuman Dasgupta, Jason Eckhardt Presenter: Anshuman Dasgupta

Register Allocation � Process of mapping values in the program to a limited set of physical registers on the target architecture � Program values contained in locations called virtual registers � Must handle arbitrarily large number of virtual registers � Registers are the fastest members in the memory hierarchy � Proficient allocation extremely important for application performance � Most programs contain segments where the number of values exceeds the number of physical registers � Allocator must insert loads and stores: spill code Cooper, Dasgupta, Eckhardt LCPC 2005

Register Allocation � Spills are memory accesses and therefore expensive � Register allocators attempt to minimize the number of spills � Optimal register allocation is a NP-complete problem � Allocation algorithms use heuristics to approximate optimal solution Cooper, Dasgupta, Eckhardt LCPC 2005

Graph Coloring Register Allocation � Effective approach: Use graph coloring to model the allocation problem � Build an interference graph � Construct live ranges from examining program values � Live ranges are nodes in the graph � Edges between nodes indicate that they cannot share a physical register. The nodes interfere. � Each color represents a physical register � Neighbor nodes cannot share the same color � The interference graph encodes safety constraints � The allocator respects these constraints to preserve program semantics Cooper, Dasgupta, Eckhardt LCPC 2005

Graph Coloring Register Allocation � We examine two graph coloring allocation algorithms � Popular Chaitin-Briggs algorithm � Callahan-Koblenz algorithm � We shall use two major points of comparison � Amount of spill code inserted � Efficacy of copy removal � Copy removal: Tries to merge two live ranges connected by a register-to-register copy � Can decrease register pressure � Important for good allocation Cooper, Dasgupta, Eckhardt LCPC 2005

The Chaitin-Briggs Register Allocator spill code coalesce spill costs simplify select build � 6 major phases � Aggressive coalescing phase •Iterates until no more copies can be coalesced away � Simple spill insertion strategy if coloring fails •Choose spill candidates using heuristics ( spill costs ) •Spill all occurrences of candidate live range: loads before every use, stores after every de fi nition •Restart process after adding spills Cooper, Dasgupta, Eckhardt LCPC 2005

The Chaitin-Briggs Register Allocator � Often cited shortcomings: � No topological program information preserved in interference graph � Approximated via spill costs • References in deeper loop nests given higher spill cost � Spill-everywhere approach Different strategy suggested by Callahan and Koblenz… Cooper, Dasgupta, Eckhardt LCPC 2005

The Callahan-Koblenz Register Allocator � Callahan-Koblenz developed allocator around the same time as Briggs � Augments Chaitin-style allocator: � Builds hierarchical structure ( tile tree ) to represent program flow • A tile is a set of basic blocks � Tile boundaries are candidates for live-range splitting � Tries to schedule spill code in less frequently executed blocks � Algorithm is more intricate than Chaitin-Briggs Cooper, Dasgupta, Eckhardt LCPC 2005

Callahan-Koblenz: The Tile Tree T0 start 0 T0.1 A 0.1 0.2 T0.1.2 B 0.1.2 Tile T0: {start, A, B, C D} C Tile T0.1: {A, B, C} Tile T0.1.2: {B} Tile T0.2: {D} T0.2 D Cooper, Dasgupta, Eckhardt LCPC 2005

The Callahan-Koblenz Register Allocator postorder traversal of tile tree subtile tile tree build & prefs color summarize conflicts preorder traversal of tile tree resolve rebuild color summarize spill code conflicts Cooper, Dasgupta, Eckhardt LCPC 2005

The Callahan-Koblenz Register Allocator � Implemented at Cray, published in 1991, but no comparison with Chaitin-Briggs � Key questions: � How does the Callahan-Koblenz approach affect: • The number of dynamic spill instructions executed • The removal of register-to-register copies � Callahan-Koblenz inserts some extra branches. How does this affect performance? Cooper, Dasgupta, Eckhardt LCPC 2005

Spill Code Insertion

Chaitin-Briggs � Simple strategy for spill code insertion � Choose candidates based on spill heuristic Prefer spilling nodes with lower values Heuristic function for live range l, H(l) = SpillCost l / Degree l SpillCost l = LoadCosts l + StoreCosts l LoadCost l = � 10 loopdepth(i) StoreCost l = � 10 loopdepth(j) where i � SpillLoads(l), j � SpillStores(l) Cooper, Dasgupta, Eckhardt LCPC 2005

Callahan-Koblenz Spill Costs Higher values indicate better fi t for a register � Weight t = (Reg s (v) � Mem s ( v )) + LocalWeight t ( v ) s � subtiles ( t ) � LocalWeight t ( v ) = P ( b ) � Ref b ( v ) b � blocks ( t ) � Transfer t ( v ) = P ( e ) � Live e (v) e � E ( t ) � Reg t (v) = 0, if ¬ InReg t (v) � min ( Transfer t ( v ), Weight t ( v )), if InReg t (v) � Mem t (v) = Transfer t ( v ), if ¬ InReg t (v) � � 0, if InReg t (v) � Penalty Costs for tile boundary spills and differing locations Cooper, Dasgupta, Eckhardt LCPC 2005

Experimental Methodology � Implemented both allocators on LLVM � LLVM from Univ. of Illinois is a SSA-based, language independent, intermediate representation and compiler framework � We ran our experiments on: � Pentium 4, 3.2 GHz., 1 GB RAM, Redhat Linux 9.0 � 7 allocatable general purpose integer registers � 8 floating point registers � Evaluated on SPEC CPU 2000 integer benchmarks and epic from the Mediabench suite Cooper, Dasgupta, Eckhardt LCPC 2005

Dynamic Spill Code Comparison Mean spill-code reduction: 20.5 % Callahan-Koblenz can insert copies on tile boundaries Improvement with tile boundary copies: 19.1% On epic, does much worse… Cooper, Dasgupta, Eckhardt LCPC 2005

Why Callahan-Koblenz Performs Better x = … x = … x = … store x def t def t def t store t load t uses of t uses of t uses of t load x Heavy use of x Heavy use of x Heavy use of x Before allocation Chaitin-Briggs Callhan-Koblenz Cooper, Dasgupta, Eckhardt LCPC 2005

Execution Counts of Instructions Inserted by Callahan-Koblenz � Note disproportionate number of dynamic memory spills on tile boundaries for epic � Occurs due to differing locations for global values at each level in triply nested loops � Can tweak spill heuristic to correct this anomaly Cooper, Dasgupta, Eckhardt LCPC 2005

Removal of register-to-register copies

Inter-register Copy Removal r 1 = x op y r 1 = x op y After copy removal r 2 = r 1 r 2 = r 1 use r 2 use r 1 � Helps allocation by decreasing register pressure Cooper, Dasgupta, Eckhardt LCPC 2005

Different Strategies Used For Copy Removal � Chatin-Briggs uses coalescing and biased coloring � Coalesce if r 1 and r 2 are connected by a copy and do not interfere � Copies between a physical and virtual register (instruction peculiarities, procedure calling conventions) are marked � Coloring phase attempts to assign the same color to the virtual register � Callahan-Koblenz uses preferencing � On encountering a copy between r 1 and r 2 , add one to the other’s preference list � Try to satisfy preference during coloring � Chaitin-Briggs’ strategy is far more aggressive Cooper, Dasgupta, Eckhardt LCPC 2005

Copy Coalescing: Experimental Evaluation � Coalescing + biased coloring outperforms preferencing � 3.6% fewer static copies in code � 4.5% fewer copies executed � We expected coalescing to win but were surprised at the competitive performance of preferencing Cooper, Dasgupta, Eckhardt LCPC 2005

Callahan-Koblenz: Control- fl ow Overhead � Tile tree construction may warrant an insertion of basic blocks � Most inserted blocks fall through to successor. No extra branches needed � Some do not. � We measured the overhead of these branches • 5.8% more static branches • But only marginal increase in branches executed: 1.4% � Branches at tile boundaries are infrequently executed Cooper, Dasgupta, Eckhardt LCPC 2005

Execution Times of Allocated Code � Callahan-Koblenz achieves a 6.1% improvement over Chaitin-Briggs on average � We chose not to use this metric as our major criteria for comparison • Very architecture dependent • Might not re fl ect qualitative differences in allocation Cooper, Dasgupta, Eckhardt LCPC 2005

Conclusions � Considering program structure yields substantial reduction in dynamic spill code � Tile boundary based spilling outperforms spill-everywhere � We were concerned about the performance of Callahan- Koblenz’s copy coalescing mechanism � Chaitin-Briggs is very aggressive in removing copies � Preferencing does reasonably well � Performs within 4.5% of Chaitin-Briggs � Control-flow overhead incurred by Callahan-Koblenz is small Cooper, Dasgupta, Eckhardt LCPC 2005

Revisiting Graph Coloring Register Allocation A Study of the - PowerPoint PPT Presentation

Revisiting Graph Coloring Register Allocation A Study of the Chaitin-Briggs and Callahan-Koblenz Algorithms Keith Cooper, Anshuman Dasgupta, Jason Eckhardt Presenter: Anshuman Dasgupta Register Allocation Process of mapping values in the

Graph Coloring Graph Coloring CSE, IIT KGP K- - coloring coloring K A A k k- -coloring

More Register Allocation Last time Register allocation Global allocation via graph

Global Register Allocation Memory Hierarchy Management Register Allocation via Graph

Graph Coloring Independent Set and Coloring k -Coloring : 3-Coloring IndependentSet Instance:

Register Allocation (via graph coloring and spilling) Register allocation LLVM IR uses an

Outline Fine-Grain Register Allocation Based on a Global Spill Costs Analysis Graph coloring

Global Register Allocation Lecture Outline Memory Hierarchy Management Register

Graph coloring Simone Campanoni simonec@eecs.northwestern.edu Outline Graph coloring

Today. Graph Coloring. Planar graphs and maps. Given G = ( V , E ) , a coloring of a G assigns

1 Coalescing Logistics Computing the Interference Graph (in MiniJava compiler) Rule Use results

Parameterized Maximum Path Coloring Michael Lampis September 9, 2011 1 / 17 Path Coloring

Outline What is Register Allocation Webs Interference Graphs Graph Coloring

On-line Graph Coloring Iwona Cie slik Algorithmics Research Group, Jagiellonian University,

GRAPH COLORING ON THE GPU AND SOME TECHNIQUES TO IMPROVE LOAD IMBALANCE SHUAI CHE, GREGORY

DP-coloring of planar graphs Runrun Liu Central China Normal University May 17, 2019 Runrun Liu

Register Allocation Based on slides by E. Ernst Register Allocation Recall: Interference graph

Heuristics Dr Matthew Hyde mvh@cs.nott.ac.uk Prof Edmund Burke, Prof Graham Kendall, Dr John

Informed Search Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of

Finding and Exploiting LTL Trajectory Constraints in Heuristic Search Salom e Simon Gabriele

Learn to Floorplan through Acquisition of Effective Local Search Heuristics Zhuolun He 1 , Yuzhe

Heuristic Search 1/25/17 Generic search algorithm add start to frontier while frontier not

Informed Search A* Algorithm CE417: Introduction to Artificial Intelligence Sharif University of

Z3str3: A String Solver with Theory-Aware Heuristics Murphy Berzish 1 , Yunhui Zheng 2 , Vijay

Feasibility Pump Heuristics for Column Generation Approaches Ruslan Sadykov 2 Pierre Pesneau 1 , 2