Register Allocation Interference graph The problem: Represent notion of “simultaneously live” using interference graph assign machine resources (registers, stack locations) to hold run-time data • nodes are “units of allocation” • n 1 is linked by an edge to n 2 if n 1 and n 2 are simultaneously live at some program point Constraint: • symmetric, not reflexive, not transitive simultaneously live data allocated to different locations Two adjacent nodes must be allocated to distinct locations Goal: minimize overhead of stack loads & stores and register moves Craig Chambers 181 CSE 501 Craig Chambers 182 CSE 501 Units of allocation A bigger example a := ... What are the units of allocation? b := ... • variables? c := ... • separate def/use chains ( live ranges )? ... b ... • values? ... a ... • i.e., variables, in SSA form after copy propagation d := ... x := 5 ... d ... ... c ... a := ... a := ... y := x ... x ... ... d ... x := y + 1 x := 3 c := ... ... d ... ... x ... e := ... ... a ... ... e ... ... b ... Craig Chambers 183 CSE 501 Craig Chambers 184 CSE 501
Computing interference graph Allocating registers using interference graph Construct as side-effect of live variables analysis Allocating variables to k registers is equivalent to finding a k -coloring of the interference graph • backwards iterative dfa algorithm k -coloring: color nodes of graph using up to k colors, Flow function: identify defs & last uses adjacent nodes have different colors • optimal graph coloring: NP-complete LV x := ...y... : LV if ... : Craig Chambers 185 CSE 501 Craig Chambers 186 CSE 501 Spilling Static frequency estimates If can’t find k -coloring of interference graph, Initial node: weight = 1 must spill some variables to stack, Nodes after branch: 1/2 weight of branch until the resulting interference graph is k -colorable Nodes in loop: 10x nodes outside loop Which to spill? Dynamic profiles could give better frequency estimates • least frequently accessed variables • most conflicting variables (nodes with highest out-degree) Weighted interference graph : Just need heuristic ranking of variables weight( n ) = sum over all references (uses and defs) r of n : execution frequency of r Try to spill nodes with lowest weight and highest out-degree, if forced to spill Craig Chambers 187 CSE 501 Craig Chambers 188 CSE 501
Simple greedy allocation algorithm Example For all nodes, in decreasing order of weight: a 1 a 2 Weight Order: • try to allocate node to a register, if possible c • if not, allocate to a stack location d b e a 2 Reserve 2-3 scratch registers to use when manipulating nodes b allocated to stack locations a 1 e d c Assume 3 registers available Craig Chambers 189 CSE 501 Craig Chambers 190 CSE 501 Improvement #1: add simplification phase The algorithm [Chaitin 82] while interference graph not empty: Key idea: while there exists a node with < k neighbors: remove it from the graph nodes with < k neighbors can be allocated push it on a stack after all their neighbors, but still guaranteed a register if all remaining nodes have� k neighbors, then blocked : pick a node to spill So remove them from the graph first (choose node with lowest (spill cost/degree)) • reduces the degree of the remaining nodes remove node from graph add to spill set Must resort to spilling only when all remaining nodes have degree ≥ k if any nodes in spill set: insert spill code for all spilled nodes (insert stores after defs, loads before uses) reconstruct interference graph, start over while stack not empty: pop node from stack allocate to register Craig Chambers 191 CSE 501 Craig Chambers 192 CSE 501
Example Example a 1 a 2 a 1 a 2 Weight Order: Weight Order: c c d d b b e e a 2 a 2 b b a 1 a 1 e e d c d c Assume 3 registers available Assume 2 registers available Craig Chambers 193 CSE 501 Craig Chambers 194 CSE 501 “Subsumption” An annoying case A Twist in Chaitin’s algorithm: if see x := y , where x & y not simultaneously live, then merge live ranges & eliminate all such copies B D + avoids generating code for simple copies − can introduce extra spilling If allocate values instead of variables or live ranges, C then subsumption happens implicitly If only 2 registers available ⇒ blocked immediately, must spill Craig Chambers 195 CSE 501 Craig Chambers 196 CSE 501
Improvement #2: blocked doesn’t mean spill Improvement #3: live range splitting [Briggs et al. 89] Priority-Based Coloring [Chow & Hennessy 84] Key idea: Key idea: if a variable can’t be allocated to a register, just because a node has k neighbors try to split it into multiple subranges that can be allocated doesn’t mean it will need to be spilled separately (neighbors may get overlapping colors) • move instructions inserted at split points • some live range pieces in registers, some in memory ⇒ selective spilling Algorithm: Like Chaitin, except: • when removing blocked node, just push onto stack (“optimistic spilling”) • when done removing nodes: • pop nodes off stack and see if they can be allocated • really spill only if it can’t be allocated at this stage Other miscellaneous enhancements Craig Chambers 197 CSE 501 Craig Chambers 198 CSE 501 Example Improvement #4: rematerialization a := ... Idea: instead of reloading value from memory, recompute it instead, if recomputation is cheaper than reloading ... a ... b := ... c 1 := ... Simple strategy: choose rematerialization over spilling, if ... c 1 ... • can recompute a value in a single instruction, and • all operands will always be available d 2 := ... ... b ... ... c 1 ... ... d 2 ... Examples: d 1 := ... • constants ... d 1 ... Weight Order: • address of global var b • address of var in stack frame d 2 ... a ... a c 2 c 2 := ... c 1 ... c 2 ... d 1 Assume 2 registers available Craig Chambers 199 CSE 501 Craig Chambers 200 CSE 501
Performance results Register allocation and calls [Briggs et al. 94] Simple approach: calling conventions E.g. More sophisticated: interprocedural register allocation For some procedure: XXX spill instructions before YYY spill instructions after YYY is Z% smaller than XXX • Z ranges between -2% and 48% for “optimistic spilling” • Z ranges between -26% and 33% for rematerialization Optimistic spilling a good heuristic Mixed results for rematerialization Craig Chambers 201 CSE 501 Craig Chambers 202 CSE 501 Calling conventions Callee-save vs. caller-save registers Goals: Need a convention at calls for which registers managed by caller ( caller-save ) and which managed by callee ( callee-save ) • fast calls • SPARC has hardware-save registers, too • pass k arguments in registers, result in register • language-independent Caller-save: • support debugger, profiler, etc. • caller must save/restore any caller-save registers live across calls • callee is free to use these registers w/o any overhead Problematic language features: • varargs Callee-save: • passing/returning aggregates • callee must save/restore any callee-save registers it uses • returning multiple values • caller is free to use these registers, even across calls • exceptions, setjmp / longjmp Hardware-save: • caller and callee can use freely Craig Chambers 203 CSE 501 Craig Chambers 204 CSE 501
A problem with callee-save registers Impact on register allocator Run-time utilities (e.g. longjmp ) and How should register allocator deal w/ calling conventions? programming environment tools (e.g. debugger) need to be able to find contents of registers relative to a Simple: calling-convention-oblivious register allocation particular stack frame • spill all live caller-save registers before call, restore after call • save all callee-save registers at entry, restore at return Caller-save registers are on stack in stack frame at known place Callee-save registers? Better: calling-convention-aware register allocation • incorporate preferred registers for formals, actuals • call kills caller-save registers • allocator knows to avoid these registers, save/restore code turns into normal spills • live-range splitting particularly useful to split var into before call/during call/after call segments • entry is def of all callee-save registers, exit is use • allocator knows must spill these registers if used in proc Craig Chambers 205 CSE 501 Craig Chambers 206 CSE 501 Exploiting calling convention Rich man’s interprocedural register allocation Calling-convention-aware register allocator Allocate registers across calls to minimize overlap between can customize its usage to use “cheaper” registers caller and callee subgraph • leaf routines (try to) use only caller-save registers • routines with calls use callee-save registers for Allocate global variables to registers over entire program variables live across calls Poor man’s interprocedural register allocation Could do compile-time interprocedural register allocation + gains most benefit − might be expensive − might require lots of recompilation after programming change Or, could do link-time re-allocation + low compile-time cost + little impact on separate compilation − cost at link time − probably less effective Craig Chambers 207 CSE 501 Craig Chambers 208 CSE 501
Recommend
More recommend