1
play

1 Coalescing Logistics Computing the Interference Graph (in - PowerPoint PPT Presentation

Register Allocation III Interference Graph Allocators Chaitin Last time Briggs Register allocation across function calls Today Register allocation options CS553 Lecture Register Allocation III 1 CS553 Lecture Register Allocation


  1. Register Allocation III Interference Graph Allocators Chaitin Last time Briggs – Register allocation across function calls Today – Register allocation options CS553 Lecture Register Allocation III 1 CS553 Lecture Register Allocation III 2 Granularity of Allocation (Renumber step in Briggs) Coalescing What is allocated to registers? Move instructions – Variables/Temporaries – Code generation can produce unnecessary move instructions – Live ranges/Webs ( i.e., du-chains with common uses) mov t1, t2 – If we can assign t1 and t2 to the same register, we can eliminate the move – Values ( i.e., definitions; same as variables with SSA) Idea b 1 t 1 : x := 5 Variables: 2 ( x & y ) – If t1 and t2 are not connected in the interference graph, coalesce them into Live Ranges/Web: 3 (t 1 → t 2 ,t 4 ; a single variable t 2 : y := x t 4 : ... x ... t 2 → t 3 ; b 2 b 3 t 3 : x := y+1 t 5 : x := 3 t 3 ,t 5 → t 6 ) Problem Values: 4 (t 1 , t 2 , t 3 , t 5 , φ (t 3 ,t 5 )) – Coalescing can increase the number of edges and make a graph uncolorable t 6 : ... x ... b 4 – Limit coalescing coalesce What are the tradeoffs? to avoid uncolorable t1 t2 t1 t2 Each allocation unit is given a symbolic register name ( e.g., s1 , s2 , etc .) graphs CS553 Lecture Register Allocation III 3 CS553 Lecture Register Allocation III 4 1

  2. Coalescing Logistics Computing the Interference Graph (in MiniJava compiler) Rule Use results of live variable analysis – When building the interference graph, do NOT make virtual registers Computing interference graph to enable coalescing interfere due to copies. – If the virtual registers s1 and s2 do not interfere and there is a copy for each flow graph node n do statement s1 = s2 then s1 and s2 can be coalesced. for each def in def(n) do – Example for each temp in liveout(n) do if ( not stmt(n) isa MOVE or use != temp) then E ← E ∪ (def, temp) CS553 Lecture Register Allocation III 5 CS553 Lecture Register Allocation III 6 Coalescing in MiniJava compiler Register Allocation: Spilling Currently the InterferenceGraph only has one Temp.Temp associated If we can’t find a k-coloring of the interference graph with each node – Spill variables (nodes) until the graph is colorable – represent each merged node with just one of the temps – keep a separate map of representatives mapped to sets of temps Choosing variables to spill – also keep a map of temps mapped to their representative – Choose least frequently accessed variables – when rewriting the code use the representative instead of the original temp – Break ties by choosing nodes with the most conflicts in the interference graph – Yes, these are heuristics! CS553 Lecture Register Allocation III 7 CS553 Lecture Register Allocation III 8 2

  3. Weighted Interference Graph Improvement #1: Simplification Phase [Chaitin 81] Goal Idea f ( r ) � – Weight( s ) = f(r) is execution frequency of r – Nodes with < k neighbors are guaranteed colorable references r of s � – Improvement over simple greedy coloring algorithm Static approximation – Use some reasonable scheme to rank variables Remove them from the graph first – Some possibilities – Reduces the degree of the remaining nodes – Weight( s ) = num of times s is used in program – Weight( s ) = 10 × (# uses in loops) + (# uses in straightline code) Must spill only when all remaining nodes have degree ≥ k – Weight( s ) = 20 × (# uses in loops) + 2 × (# uses in straightline code) + (# uses in a branch statement) Referred to as pessimistic spilling CS553 Lecture Register Allocation III 9 CS553 Lecture Register Allocation III 10 The Problem: Worst Case Assumptions Improvement #2: Optimistic Spilling [Briggs 89] Is the following graph 2-colorable? s1 s1 s4 s2 s4 s2 s3 Idea s3 – Some neighbors might get the same color – Nodes with k neighbors might be colorable – Blocking does not imply that spilling is necessary Clearly 2-colorable – Push blocked nodes on stack (rather than place in − But Chaitin’s algorithm leads to an immediate block and spill spill set) Defer − The algorithm assumes the worst case, namely, that all neighbors will – Check colorability upon popping the stack, when decision be assigned a different color more information is available CS553 Lecture Register Allocation III 11 CS553 Lecture Register Allocation III 12 3

  4. Algorithm [Briggs et al . 89] Example while interference graph not empty do Attempt to 2-color this graph ( , ) while ∃ a node n with < k neighbors do simplify Remove n from the graph Push n on a stack if any nodes remain in the graph then { blocked with >= k edges } Increasing Weight: Stack: a b Pick a node n to spill { lowest spill-cost/highest degree } e d Push n on stack a c defer decision f b * Remove n from the graph e c b f * while stack not empty do c a * Pop node n from stack d e * if n is colorable then f d make decision Allocate n to a register * blocked node else Insert spill code for n { Store after def; load before use } Reconstruct interference graph & start over CS553 Lecture Register Allocation III 13 CS553 Lecture Register Allocation III 14 Possible Register Allocation Design Example After Spilling t2 // missing stack setup // missing stack setup // callee-save assigns to temps Overall algorithm: graph coloring with simplification // callee-save assigns to temps t2 = $s0 t2 = $s0 Interference graph: two temps interfere if M[$fp-12] = t2 // procedure body – one is defined in a stmt and the other is live out of the same stmt // procedure body t3 = 1 – exception is a MOVE statement where the temps are the source and dest t3 = 1 Loop: Loop: if t3 > 20 goto End Coalesce : Briggs strategy if t3 > 20 goto End t3 = t3 + 1 – coalesce if new node will have fewer than K neighbors of significant t3 = t3 + 1 jal bar // defs: $v0 jal bar // defs: $v0 degree (>= K) and both nodes are involved in a move goto Loop goto Loop End: Spill heuristic : End: t4 = t3 + $v0 t4 = t3 + $v0 – spill the node with the lowest weight and break ties by spilling the node t5 = t4 + t3 t5 = t4 + t3 $v0 = t5 with the most total adjacent edges $v0 = t5 Simplification : // callee-save reads from temps // callee-save reads from temps $s0 = t2 – optimistic, push by increasing weight and feasibility, select blocked nodes t2 = M[$fp-12] using spill heuristic $s0 = t2 // sink stmt, uses callee-saves // and caller-saves Select : // sink stmt, uses callee-saves // missing stack cleanup and ret – attempt selection on everything in the stack before generating spill code // and caller-saves // missing stack cleanup and ret CS553 Lecture Register Allocation III 15 CS553 Lecture Register Allocation III 16 4

  5. Example After coalesce Improvement #3: Live Range Splitting [Chow & Hennessy 84] After allocation // missing stack setup // missing stack setup Idea // callee-save assigns to temps // callee-save assigns to temps – Start with variables as our allocation unit M[$fp-12] = $s0 M[$fp-12] = $s0 – When a variable can’t be allocated, split it into multiple subranges for // procedure body // procedure body separate allocation t3 = 1 $s0 = 1 Loop: Loop: – Selective spilling: put some subranges in registers, some in memory if t3 > 20 goto End if $s0 > 20 goto End t3 = t3 + 1 – Insert memory operations at boundaries $s0 = $s0 + 1 jal bar // defs: $v0 jal bar // defs: $v0 goto Loop goto Loop End: End: Why is this a good idea? t4 = t3 + $v0 $v0 = $s0 + $v0 t5 = t4 + t3 $s0 = $v0 + $s0 $v0 = t5 $v0 = $s0 // callee-save reads from temps // callee-save reads from temps $s0 = M[$fp-12] $s0 = M[$fp-12] // sink stmt, uses callee-saves // sink stmt, uses callee-saves // and caller-saves // and caller-saves // missing stack cleanup and ret // missing stack cleanup and ret CS553 Lecture Register Allocation III 17 CS553 Lecture Register Allocation III 18 Improvement #4: Rematerialization [Chaitin 82]&[Briggs 84] Next Time Idea Lecture – Selectively re-compute values rather than loading from memory – Instruction scheduling – “Reverse CSE” Easy case – Value can be computed in single instruction, and – All operands are available Examples – Constants – Addresses of global variables – Addresses of local variables (on stack) CS553 Lecture Register Allocation III 19 CS553 Lecture Register Allocation III 20 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend