register allocation interference graph
play

Register Allocation Interference graph The problem: Represent - PDF document

Register Allocation Interference graph The problem: Represent notion of simultaneously live using interference graph assign machine resources (registers, stack locations) to hold run-time data nodes are units of allocation


  1. Register Allocation Interference graph The problem: Represent notion of “simultaneously live” using interference graph assign machine resources (registers, stack locations) to hold run-time data • nodes are “units of allocation” • n 1 is linked by an edge to n 2 if n 1 and n 2 are simultaneously live at some program point Constraint: • symmetric, not reflexive, not transitive simultaneously live data allocated to different locations Two adjacent nodes must be allocated to distinct locations Goal: minimize overhead of stack loads & stores and register moves Craig Chambers 181 CSE 501 Craig Chambers 182 CSE 501 Units of allocation A bigger example a := ... What are the units of allocation? b := ... • variables? c := ... • separate def/use chains ( live ranges )? ... b ... • values? ... a ... • i.e., variables, in SSA form after copy propagation d := ... x := 5 ... d ... ... c ... a := ... a := ... y := x ... x ... ... d ... x := y + 1 x := 3 c := ... ... d ... ... x ... e := ... ... a ... ... e ... ... b ... Craig Chambers 183 CSE 501 Craig Chambers 184 CSE 501

  2. Computing interference graph Allocating registers using interference graph Construct as side-effect of live variables analysis Allocating variables to k registers is equivalent to finding a k -coloring of the interference graph • backwards iterative dfa algorithm k -coloring: color nodes of graph using up to k colors, Flow function: identify defs & last uses adjacent nodes have different colors • optimal graph coloring: NP-complete LV x := ...y... : LV if ... : Craig Chambers 185 CSE 501 Craig Chambers 186 CSE 501 Spilling Static frequency estimates If can’t find k -coloring of interference graph, Initial node: weight = 1 must spill some variables to stack, Nodes after branch: 1/2 weight of branch until the resulting interference graph is k -colorable Nodes in loop: 10x nodes outside loop Which to spill? Dynamic profiles could give better frequency estimates • least frequently accessed variables • most conflicting variables (nodes with highest out-degree) Weighted interference graph : Just need heuristic ranking of variables weight( n ) = sum over all references (uses and defs) r of n : execution frequency of r Try to spill nodes with lowest weight and highest out-degree, if forced to spill Craig Chambers 187 CSE 501 Craig Chambers 188 CSE 501

  3. Simple greedy allocation algorithm Example For all nodes, in decreasing order of weight: a 1 a 2 Weight Order: • try to allocate node to a register, if possible c • if not, allocate to a stack location d b e a 2 Reserve 2-3 scratch registers to use when manipulating nodes b allocated to stack locations a 1 e d c Assume 3 registers available Craig Chambers 189 CSE 501 Craig Chambers 190 CSE 501 Improvement #1: add simplification phase The algorithm [Chaitin 82] while interference graph not empty: Key idea: while there exists a node with < k neighbors: remove it from the graph nodes with < k neighbors can be allocated push it on a stack after all their neighbors, but still guaranteed a register if all remaining nodes have� k neighbors, then blocked : pick a node to spill So remove them from the graph first (choose node with lowest (spill cost/degree)) • reduces the degree of the remaining nodes remove node from graph add to spill set Must resort to spilling only when all remaining nodes have degree ≥ k if any nodes in spill set: insert spill code for all spilled nodes (insert stores after defs, loads before uses) reconstruct interference graph, start over while stack not empty: pop node from stack allocate to register Craig Chambers 191 CSE 501 Craig Chambers 192 CSE 501

  4. Example Example a 1 a 2 a 1 a 2 Weight Order: Weight Order: c c d d b b e e a 2 a 2 b b a 1 a 1 e e d c d c Assume 3 registers available Assume 2 registers available Craig Chambers 193 CSE 501 Craig Chambers 194 CSE 501 “Subsumption” An annoying case A Twist in Chaitin’s algorithm: if see x := y , where x & y not simultaneously live, then merge live ranges & eliminate all such copies B D + avoids generating code for simple copies − can introduce extra spilling If allocate values instead of variables or live ranges, C then subsumption happens implicitly If only 2 registers available ⇒ blocked immediately, must spill Craig Chambers 195 CSE 501 Craig Chambers 196 CSE 501

  5. Improvement #2: blocked doesn’t mean spill Improvement #3: live range splitting [Briggs et al. 89] Priority-Based Coloring [Chow & Hennessy 84] Key idea: Key idea: if a variable can’t be allocated to a register, just because a node has k neighbors try to split it into multiple subranges that can be allocated doesn’t mean it will need to be spilled separately (neighbors may get overlapping colors) • move instructions inserted at split points • some live range pieces in registers, some in memory ⇒ selective spilling Algorithm: Like Chaitin, except: • when removing blocked node, just push onto stack (“optimistic spilling”) • when done removing nodes: • pop nodes off stack and see if they can be allocated • really spill only if it can’t be allocated at this stage Other miscellaneous enhancements Craig Chambers 197 CSE 501 Craig Chambers 198 CSE 501 Example Improvement #4: rematerialization a := ... Idea: instead of reloading value from memory, recompute it instead, if recomputation is cheaper than reloading ... a ... b := ... c 1 := ... Simple strategy: choose rematerialization over spilling, if ... c 1 ... • can recompute a value in a single instruction, and • all operands will always be available d 2 := ... ... b ... ... c 1 ... ... d 2 ... Examples: d 1 := ... • constants ... d 1 ... Weight Order: • address of global var b • address of var in stack frame d 2 ... a ... a c 2 c 2 := ... c 1 ... c 2 ... d 1 Assume 2 registers available Craig Chambers 199 CSE 501 Craig Chambers 200 CSE 501

  6. Performance results Register allocation and calls [Briggs et al. 94] Simple approach: calling conventions E.g. More sophisticated: interprocedural register allocation For some procedure: XXX spill instructions before YYY spill instructions after YYY is Z% smaller than XXX • Z ranges between -2% and 48% for “optimistic spilling” • Z ranges between -26% and 33% for rematerialization Optimistic spilling a good heuristic Mixed results for rematerialization Craig Chambers 201 CSE 501 Craig Chambers 202 CSE 501 Calling conventions Callee-save vs. caller-save registers Goals: Need a convention at calls for which registers managed by caller ( caller-save ) and which managed by callee ( callee-save ) • fast calls • SPARC has hardware-save registers, too • pass k arguments in registers, result in register • language-independent Caller-save: • support debugger, profiler, etc. • caller must save/restore any caller-save registers live across calls • callee is free to use these registers w/o any overhead Problematic language features: • varargs Callee-save: • passing/returning aggregates • callee must save/restore any callee-save registers it uses • returning multiple values • caller is free to use these registers, even across calls • exceptions, setjmp / longjmp Hardware-save: • caller and callee can use freely Craig Chambers 203 CSE 501 Craig Chambers 204 CSE 501

  7. A problem with callee-save registers Impact on register allocator Run-time utilities (e.g. longjmp ) and How should register allocator deal w/ calling conventions? programming environment tools (e.g. debugger) need to be able to find contents of registers relative to a Simple: calling-convention-oblivious register allocation particular stack frame • spill all live caller-save registers before call, restore after call • save all callee-save registers at entry, restore at return Caller-save registers are on stack in stack frame at known place Callee-save registers? Better: calling-convention-aware register allocation • incorporate preferred registers for formals, actuals • call kills caller-save registers • allocator knows to avoid these registers, save/restore code turns into normal spills • live-range splitting particularly useful to split var into before call/during call/after call segments • entry is def of all callee-save registers, exit is use • allocator knows must spill these registers if used in proc Craig Chambers 205 CSE 501 Craig Chambers 206 CSE 501 Exploiting calling convention Rich man’s interprocedural register allocation Calling-convention-aware register allocator Allocate registers across calls to minimize overlap between can customize its usage to use “cheaper” registers caller and callee subgraph • leaf routines (try to) use only caller-save registers • routines with calls use callee-save registers for Allocate global variables to registers over entire program variables live across calls Poor man’s interprocedural register allocation Could do compile-time interprocedural register allocation + gains most benefit − might be expensive − might require lots of recompilation after programming change Or, could do link-time re-allocation + low compile-time cost + little impact on separate compilation − cost at link time − probably less effective Craig Chambers 207 CSE 501 Craig Chambers 208 CSE 501

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend