263-2810: Advanced Compiler Design 6.0 Partial redundancy - - PowerPoint PPT Presentation
263-2810: Advanced Compiler Design 6.0 Partial redundancy - - PowerPoint PPT Presentation
263-2810: Advanced Compiler Design 6.0 Partial redundancy elimination Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Outline PRE with SSA in five (easy) steps Assumption: Program in SSA form for scalar variables 1.
Outline
PRE with SSA in five (easy) steps
§ Assumption: Program in SSA form for scalar variables
- 1. Insert Φ functions for expressions
§ Introduce “temporary” to isolate expression
x = a1 + b1 ➞ E = a1 + b1 x = E
§ Identify places where different versions merge
- 2. Identify/set version numbers for expressions
§ Expressions on LHS: set version number § Operands of functions: find correct version
2
(5 steps continued)
- 3. Identify places where is is legal to insert a copy of an
expression
§ “legal” is defined later
- 4. Find places where the insertion on a copy is profitable
§ Only places that are legal can be considered § Expect removal of operations § Expect reduction in cycles § Metrics for profitability to be discussed
- 5. Exploit full redundancy of expressions
§ Transform program to reuse computed values
3
6.3 Insertion of Φ functions
§ Given the CFG of a program.
§ SSA format for scalars needs dominator tree, dominance frontier. § Keep for insertion of Φ functions
§ Consider an expression E = a + b in some block B. § Must insert a Φ function somewhere if
§ E is computed explicitly or § One of the operands of E (i.e., a or b) is changed in block B.
4
Part 1: E computed in B
§ Find dominance frontier DF(B)
§ Here {B1, B2}
§ Insert a Φ function (unless already present)
5
B
E = a1 + b1
B2 B1
Part 1: E computed in B
§ Find dominance frontier DF(B)
§ Here {B1, B2}
§ Insert a Φ function (unless already present) § This step deals with all explicit computations of E in some block
6
B
E = a1 + b1
B2 B1
E = Φ ( , ) E = Φ ( , )
Part 2: Redefinition of operand(s)
§ Consider E = a + b and a (or b) is defined in block B’
7
Part 2: Redefinition of operand(s)
§ Consider E = a + b and a (or b) is defined in block B’
§ There must be a φ function in nodes in DF(B’) § Consider B ∈ DF(B’)
8
a1 = … b1 = … E = a1 + b1 a2 = … a3 = φ(a1,a2) = a3 + b1 B’’ B’ B
Part 2: Redefinition of operand(s)
§ Insert into B (∈DF(B’)) a Φ function for all expressions E that contain a as an operand
9
a1 = … b1 = … E = a1 + b1 a2 = … E = Φ ( , ) a3 = φ(a1,a2) = a3 + b1 B’’ B’ B
6.4 Version numbers for expressions
§ If E appears on the LHS: get a new version § For the operands of a Φ function:
§ Identify correct version
§ Recall: for functions (for scalars) we used a stack of versions
§ Array [variable] of stacks
§ Use stack for operands of expression E to figure out which version is used
10
§ Consider E E , the set of all expressions of interest in the program
§ stack[E ]: one stack for each expression
§ E ∈ E E stack[E] : stack of versions (integers) § Given an expression E = a + b, after turning program into SSA for scalars, versions of a and b are known
§ E = ai + bj
§ If we have processed E = ai + bj before (with these versions of a, b) then we use the version k of E on top of stack[E]
§ Ek = ai + bj § Must be the current version of E
11
§ So use version k of E if
§ stack(a).top = i § stack(b).top = j § and stack(E) = k with Ek = ai + bj
§ If E = ai + bj has not been processed before, i.e.,
§ stack(a).top = i § stack(b).top = j § and stack(E) = k with Ek ≠ ai + bj(i.e., operand versions changed but we did not find a new version for E) then
§ Get new version (say m) § stack(E).push(m) § Use version m for E
12
Example
15
a1 = … b1 = … E2 = a1 + b1 a2 = … E = a2 + b1 E = a1 + b1 B’’ B’
6.4 Version numbers for expressions
§ If E appears on the LHS: decide if a new version is needed or if the current version should be used
§ “current”: Version computed in a dominating node and no operand has been redefined
§ For the operands of a Φ function:
§ Identify correct version
17
§ Recall: for functions (for scalars) we used a stack of versions
§ Array [V] of stacks, with V the set of all variables
§ For E E , the set of all expressions of interest in the program
§ stack[E ]: one stack for each expression § E ∈ E: stack[E] is a stack of versions (integers)
§ Given an expression E = a + b, after turning program into SSA for scalars, versions of a and b are known
§ E = ai + bj
§ If we have processed E = ai + bj before (with these versions of a, b) then we use the version k of E on top of stack[E]
§ Ek = ai + bj § ai, bj must be current versions of a, b § Must be the current version of E
18
§ So use version k of E if
§ stack(a).top = i § stack(b).top = j § and stack(E) = k with Ek = ai + bj
19
§ If E = ai + bj has not been processed before, i.e.,
§ stack(a).top = i § stack(b).top = j § and stack(E) = k with Ek ≠ ai + bj(i.e., operand versions changed but we do not find a new version for E) then
§ Get new version (say m) § stack(E).push(m) § Use version m of E: Em
20
Example
21
a1 = … b1 = … E2 = a1 + b1 a2 = … E = a2 + b1 E = a1 + b1 B’’ B’
Operands of Φ function
§ Visit nodes in depth-first order
§ Use dominator tree § Like when handling φ functions
§ After processing a block (node), record expression version for Φ function(s) in successor blocks that are not dominated
§ Make sure argument to Φ function reflects expression set along corresponding path
23
Example
24
a1 = … b1 = … E = a2 + b1 a3 = … a4 = φ(a2,a3) E = a4 + b1 START a2 = φ(a4,a1) EXIT
6.5 Are copies legal at point P?
§ Given a program in SSA format with φ and Φ functions § Versions of scalars and expressions have been determined § Given a basic block, P point at the start of block. A Φ function for E at point P with (one or more) ⊥ operands indicates that
§ Value of expression E is undefined if control reaches P along path that corresponds to ⊥ operand § Expression E is defined if control reaches P along a path that corresponds to version Ek of E
§ Predecessor basic block that corresponds to ⊥operand is a candidate to place a copy of E
26
Example
27
a1 = … b1 = … E1 = a1 + b1 E2 = Φ (⊥,E1) P B = E2 EXIT
§ A copy of E can be inserted into B
§ Or any of its predecessors § (As long as operands of E are available)
§ Is inserting a copy acceptable?
28
§ A copy of E can be inserted into B
§ Or any of its predecessors § (As long as operands of E are available)
§ Is inserting a copy (into B) legal? § “Gold standard”: insertion of a copy must not change the program (results)
29
Detour: “must not change the program”
§ Given program P, transformed program T(P) = P’ § Transformation is legal if
§ P computes the same result as P’
§ Same output
§ Returns no new errors § Throws no new exceptions § Termination behavior unchanged
§ Allow that P encounters an error earlier or later § Allow that P throws an exception earlier or later § Note: values stored in memory by P may differ from values stored by P‘
§ May accept that values after error/exception differ
30
Example
31
a1 = … b1 = … E1 = a1 + b1 E2 = Φ (⊥,E1) P B = E2 EXIT
Example
32
a1 = … b1 = … E1 = a1 + b1 E2 = Φ (⊥,E1) P B EXIT a2 = … E3 = a2 + b1
Legal copies
§ Could we insert a copy of E into B?
33
Legal copies
§ Could we insert a copy of E into B? § We do not know anything about the effect of E.
§ Might throw an exception § Might raise an error (overflow, memory protection error, …)
§ A copy of E can be inserted into B if E is evaluated along all paths from B to EXIT
34
Example
35
a1 = … b1 = … E1 = a1 + b1 E2 = Φ (⊥,E1) P B EXIT a2 = … E3 = a2 + b1 B’
§ Our model of legality: insert into B only if there is a copy of E on all paths from B to EXIT
§ Blocks with this property are called “downsafe” § Earlier papers use the phrase “E anticipated in B”
§ Insertion legal: iff B is downsafe
36
Downsafety
§ Check if there is a path from B to EXIT that does not include E § If E occurs only as the operand to a Φ function in block B’. Check that E is used in B’ or B’ is downsafe. § We say a Φ function in block B’ is downsafe if E appears
- n the RHS of a statement in B’ or B’ is downsafe.
§ Can insert copies for Φ functions that are downsafe
37
Downsafety
§ Need to check that along all paths from B to EXIT there is a real occurrence of E or a downsafe Φ function § Simple algorithm:
§ Start at EXIT § Visit recursively all predecessor nodes, until all nodes have been visited § Mark a Φ function as downsafe iff a real occurrence or a downsafe Φ function appears on all paths to EXIT § After visiting all nodes: downsafe Φ functions are marked
§ Downsafety is a necessary condition
38
Example
39
= E3 E3 = Φ(E2, ⊥) EXIT = E2 E2 = Φ(E1, ⊥) = E1
6.6 Profitable copy insertion
§ Goal: insert a copy of E into basic block if
1. Basic block is downsafe (for E) 2. Optimization reduces execution time
§ Problem: (2) depends on later phases of compiler
§ Loose coupling between IR and machine code
41
A copy may not be profitable because …
§ Expression E evaluated only once – but second evaluation happened “for free”
§ During a stall/memory fetch § Using ILP (Instruction Level Parallelism) that’s otherwise unused
§ Insertion/removal of expressions may change the memory layout
§ Code (binary) may shrink or grow § Instruction cache may miss more (less likely with current size Icaches) § Conflict misses may increase
42
§ Register pressure may suppress benefits
§ No need to re-evaluate E at some point P, use value computed at P’ § Must keep value from P’ → → → P § … but there is no register, so value stored in memory § [sometimes] cost of spilling/restoring register > cost of evaluation
§ … 2nd order effects matter sometimes
§ Lowering the IR (and to model more machine properties) an option but with new problems
43
Possible solutions
§ Ignore phase ordering problem and insert a copy of E wherever legal (downsafe)
§ + : no extra analysis § - : 2nd order effects may result in slowdown
44
Possible solutions
§ Estimate effect of insertion § Consider properties of block that receives a copy of E
§ “Environment” § Number of instructions/operations § Frequency of execution of a block
§ Insert if target block pays “low” expected overhead § (See 7.0 Compiling with dynamic information)
46
6.7 Transformations enabled by PRE
§ Assume φ and Φ functions inserted, expression versions identified, downsafety information computed, block B with Φ function with ⊥ argument(s)
§ Decided to insert copy of E into predecessors of B § Insert assignment tempE = E
§ Process basic blocks B’ s.t. some version Ei of E is used (read) in B’
§ Remove computation of E, replace with reading tempE § Bookkeeping similar to du chains
47
6.8 Example
§ SSA with φ functions and version numbers for variables § Φ functions and versions for expressions § Downsafety § Insertion of copies § Replace expressions with temporaries
48
49
a = …. START EXIT = a + b a = … = a + b
51
a1 = … t1 = a1 + b0 … START EXIT = [E3] t3 a3 = … t2 = a3 + b0 a4 = φ (a3,a2) E2 = Φ (⊥,E3) t4 = φ (t2,t3) = [E2] t4 a2 = φ (a1,a4) E3= Φ (⊥,E2) t3 = φ (t1,t4) …
Comments
§ PRE provides (compiler) engineering benefit
§ One optimization to deal with local and global optimizations § One optimization that subsumes several distinct optimizations
§ Downsafety constraint (when inserting copies into basic blocks) guarantees that transformed program is computationally optimal
§ Number of operations cannot be reduced further by safe code motion
§ In reality, like to see programs that are lifetime optimal
§ Computationally optimal and minimal live ranges for all temporaries
53
Beyond PRE
§ Are there other opportunities for optimization (in the CFG context)? § Consider this program fragment j = m; while (j < n) { a = b + j; c = a – j; d = b + j; j ++; }
54
§ Same fragment with PRE: j = m; while (j < n) { a = b + j; c = a – j; d = b + j; j ++; }
55
§ Same fragment with algebraic simplification (re-association): c = a – j ⟹c = ( b + j) – j ⟹c = b j = m; while (j < n) { a = b + j; c = b; d = b + j; j ++; }
56
Now with PRE
j = m; c = b; while (j < n) { a = b + j; d = a; j ++; }
57