263-2810: Advanced Compiler Design 6.0 Partial redundancy - - PowerPoint PPT Presentation

263 2810 advanced compiler design 6 0 partial redundancy
SMART_READER_LITE
LIVE PREVIEW

263-2810: Advanced Compiler Design 6.0 Partial redundancy - - PowerPoint PPT Presentation

263-2810: Advanced Compiler Design 6.0 Partial redundancy elimination Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Outline PRE with SSA in five (easy) steps Assumption: Program in SSA form for scalar variables 1.


slide-1
SLIDE 1

263-2810: Advanced Compiler Design 6.0 Partial redundancy elimination

Thomas R. Gross Computer Science Department ETH Zurich, Switzerland

slide-2
SLIDE 2

Outline

PRE with SSA in five (easy) steps

§ Assumption: Program in SSA form for scalar variables

  • 1. Insert Φ functions for expressions

§ Introduce “temporary” to isolate expression

x = a1 + b1 ➞ E = a1 + b1 x = E

§ Identify places where different versions merge

  • 2. Identify/set version numbers for expressions

§ Expressions on LHS: set version number § Operands of functions: find correct version

2

slide-3
SLIDE 3

(5 steps continued)

  • 3. Identify places where is is legal to insert a copy of an

expression

§ “legal” is defined later

  • 4. Find places where the insertion on a copy is profitable

§ Only places that are legal can be considered § Expect removal of operations § Expect reduction in cycles § Metrics for profitability to be discussed

  • 5. Exploit full redundancy of expressions

§ Transform program to reuse computed values

3

slide-4
SLIDE 4

6.3 Insertion of Φ functions

§ Given the CFG of a program.

§ SSA format for scalars needs dominator tree, dominance frontier. § Keep for insertion of Φ functions

§ Consider an expression E = a + b in some block B. § Must insert a Φ function somewhere if

§ E is computed explicitly or § One of the operands of E (i.e., a or b) is changed in block B.

4

slide-5
SLIDE 5

Part 1: E computed in B

§ Find dominance frontier DF(B)

§ Here {B1, B2}

§ Insert a Φ function (unless already present)

5

B

E = a1 + b1

B2 B1

slide-6
SLIDE 6

Part 1: E computed in B

§ Find dominance frontier DF(B)

§ Here {B1, B2}

§ Insert a Φ function (unless already present) § This step deals with all explicit computations of E in some block

6

B

E = a1 + b1

B2 B1

E = Φ ( , ) E = Φ ( , )

slide-7
SLIDE 7

Part 2: Redefinition of operand(s)

§ Consider E = a + b and a (or b) is defined in block B’

7

slide-8
SLIDE 8

Part 2: Redefinition of operand(s)

§ Consider E = a + b and a (or b) is defined in block B’

§ There must be a φ function in nodes in DF(B’) § Consider B ∈ DF(B’)

8

a1 = … b1 = … E = a1 + b1 a2 = … a3 = φ(a1,a2) = a3 + b1 B’’ B’ B

slide-9
SLIDE 9

Part 2: Redefinition of operand(s)

§ Insert into B (∈DF(B’)) a Φ function for all expressions E that contain a as an operand

9

a1 = … b1 = … E = a1 + b1 a2 = … E = Φ ( , ) a3 = φ(a1,a2) = a3 + b1 B’’ B’ B

slide-10
SLIDE 10

6.4 Version numbers for expressions

§ If E appears on the LHS: get a new version § For the operands of a Φ function:

§ Identify correct version

§ Recall: for functions (for scalars) we used a stack of versions

§ Array [variable] of stacks

§ Use stack for operands of expression E to figure out which version is used

10

slide-11
SLIDE 11

§ Consider E E , the set of all expressions of interest in the program

§ stack[E ]: one stack for each expression

§ E ∈ E E stack[E] : stack of versions (integers) § Given an expression E = a + b, after turning program into SSA for scalars, versions of a and b are known

§ E = ai + bj

§ If we have processed E = ai + bj before (with these versions of a, b) then we use the version k of E on top of stack[E]

§ Ek = ai + bj § Must be the current version of E

11

slide-12
SLIDE 12

§ So use version k of E if

§ stack(a).top = i § stack(b).top = j § and stack(E) = k with Ek = ai + bj

§ If E = ai + bj has not been processed before, i.e.,

§ stack(a).top = i § stack(b).top = j § and stack(E) = k with Ek ≠ ai + bj(i.e., operand versions changed but we did not find a new version for E) then

§ Get new version (say m) § stack(E).push(m) § Use version m for E

12

slide-13
SLIDE 13

Example

15

a1 = … b1 = … E2 = a1 + b1 a2 = … E = a2 + b1 E = a1 + b1 B’’ B’

slide-14
SLIDE 14

6.4 Version numbers for expressions

§ If E appears on the LHS: decide if a new version is needed or if the current version should be used

§ “current”: Version computed in a dominating node and no operand has been redefined

§ For the operands of a Φ function:

§ Identify correct version

17

slide-15
SLIDE 15

§ Recall: for functions (for scalars) we used a stack of versions

§ Array [V] of stacks, with V the set of all variables

§ For E E , the set of all expressions of interest in the program

§ stack[E ]: one stack for each expression § E ∈ E: stack[E] is a stack of versions (integers)

§ Given an expression E = a + b, after turning program into SSA for scalars, versions of a and b are known

§ E = ai + bj

§ If we have processed E = ai + bj before (with these versions of a, b) then we use the version k of E on top of stack[E]

§ Ek = ai + bj § ai, bj must be current versions of a, b § Must be the current version of E

18

slide-16
SLIDE 16

§ So use version k of E if

§ stack(a).top = i § stack(b).top = j § and stack(E) = k with Ek = ai + bj

19

slide-17
SLIDE 17

§ If E = ai + bj has not been processed before, i.e.,

§ stack(a).top = i § stack(b).top = j § and stack(E) = k with Ek ≠ ai + bj(i.e., operand versions changed but we do not find a new version for E) then

§ Get new version (say m) § stack(E).push(m) § Use version m of E: Em

20

slide-18
SLIDE 18

Example

21

a1 = … b1 = … E2 = a1 + b1 a2 = … E = a2 + b1 E = a1 + b1 B’’ B’

slide-19
SLIDE 19

Operands of Φ function

§ Visit nodes in depth-first order

§ Use dominator tree § Like when handling φ functions

§ After processing a block (node), record expression version for Φ function(s) in successor blocks that are not dominated

§ Make sure argument to Φ function reflects expression set along corresponding path

23

slide-20
SLIDE 20

Example

24

a1 = … b1 = … E = a2 + b1 a3 = … a4 = φ(a2,a3) E = a4 + b1 START a2 = φ(a4,a1) EXIT

slide-21
SLIDE 21

6.5 Are copies legal at point P?

§ Given a program in SSA format with φ and Φ functions § Versions of scalars and expressions have been determined § Given a basic block, P point at the start of block. A Φ function for E at point P with (one or more) ⊥ operands indicates that

§ Value of expression E is undefined if control reaches P along path that corresponds to ⊥ operand § Expression E is defined if control reaches P along a path that corresponds to version Ek of E

§ Predecessor basic block that corresponds to ⊥operand is a candidate to place a copy of E

26

slide-22
SLIDE 22

Example

27

a1 = … b1 = … E1 = a1 + b1 E2 = Φ (⊥,E1) P B = E2 EXIT

slide-23
SLIDE 23

§ A copy of E can be inserted into B

§ Or any of its predecessors § (As long as operands of E are available)

§ Is inserting a copy acceptable?

28

slide-24
SLIDE 24

§ A copy of E can be inserted into B

§ Or any of its predecessors § (As long as operands of E are available)

§ Is inserting a copy (into B) legal? § “Gold standard”: insertion of a copy must not change the program (results)

29

slide-25
SLIDE 25

Detour: “must not change the program”

§ Given program P, transformed program T(P) = P’ § Transformation is legal if

§ P computes the same result as P’

§ Same output

§ Returns no new errors § Throws no new exceptions § Termination behavior unchanged

§ Allow that P encounters an error earlier or later § Allow that P throws an exception earlier or later § Note: values stored in memory by P may differ from values stored by P‘

§ May accept that values after error/exception differ

30

slide-26
SLIDE 26

Example

31

a1 = … b1 = … E1 = a1 + b1 E2 = Φ (⊥,E1) P B = E2 EXIT

slide-27
SLIDE 27

Example

32

a1 = … b1 = … E1 = a1 + b1 E2 = Φ (⊥,E1) P B EXIT a2 = … E3 = a2 + b1

slide-28
SLIDE 28

Legal copies

§ Could we insert a copy of E into B?

33

slide-29
SLIDE 29

Legal copies

§ Could we insert a copy of E into B? § We do not know anything about the effect of E.

§ Might throw an exception § Might raise an error (overflow, memory protection error, …)

§ A copy of E can be inserted into B if E is evaluated along all paths from B to EXIT

34

slide-30
SLIDE 30

Example

35

a1 = … b1 = … E1 = a1 + b1 E2 = Φ (⊥,E1) P B EXIT a2 = … E3 = a2 + b1 B’

slide-31
SLIDE 31

§ Our model of legality: insert into B only if there is a copy of E on all paths from B to EXIT

§ Blocks with this property are called “downsafe” § Earlier papers use the phrase “E anticipated in B”

§ Insertion legal: iff B is downsafe

36

slide-32
SLIDE 32

Downsafety

§ Check if there is a path from B to EXIT that does not include E § If E occurs only as the operand to a Φ function in block B’. Check that E is used in B’ or B’ is downsafe. § We say a Φ function in block B’ is downsafe if E appears

  • n the RHS of a statement in B’ or B’ is downsafe.

§ Can insert copies for Φ functions that are downsafe

37

slide-33
SLIDE 33

Downsafety

§ Need to check that along all paths from B to EXIT there is a real occurrence of E or a downsafe Φ function § Simple algorithm:

§ Start at EXIT § Visit recursively all predecessor nodes, until all nodes have been visited § Mark a Φ function as downsafe iff a real occurrence or a downsafe Φ function appears on all paths to EXIT § After visiting all nodes: downsafe Φ functions are marked

§ Downsafety is a necessary condition

38

slide-34
SLIDE 34

Example

39

= E3 E3 = Φ(E2, ⊥) EXIT = E2 E2 = Φ(E1, ⊥) = E1

slide-35
SLIDE 35

6.6 Profitable copy insertion

§ Goal: insert a copy of E into basic block if

1. Basic block is downsafe (for E) 2. Optimization reduces execution time

§ Problem: (2) depends on later phases of compiler

§ Loose coupling between IR and machine code

41

slide-36
SLIDE 36

A copy may not be profitable because …

§ Expression E evaluated only once – but second evaluation happened “for free”

§ During a stall/memory fetch § Using ILP (Instruction Level Parallelism) that’s otherwise unused

§ Insertion/removal of expressions may change the memory layout

§ Code (binary) may shrink or grow § Instruction cache may miss more (less likely with current size Icaches) § Conflict misses may increase

42

slide-37
SLIDE 37

§ Register pressure may suppress benefits

§ No need to re-evaluate E at some point P, use value computed at P’ § Must keep value from P’ → → → P § … but there is no register, so value stored in memory § [sometimes] cost of spilling/restoring register > cost of evaluation

§ … 2nd order effects matter sometimes

§ Lowering the IR (and to model more machine properties) an option but with new problems

43

slide-38
SLIDE 38

Possible solutions

§ Ignore phase ordering problem and insert a copy of E wherever legal (downsafe)

§ + : no extra analysis § - : 2nd order effects may result in slowdown

44

slide-39
SLIDE 39

Possible solutions

§ Estimate effect of insertion § Consider properties of block that receives a copy of E

§ “Environment” § Number of instructions/operations § Frequency of execution of a block

§ Insert if target block pays “low” expected overhead § (See 7.0 Compiling with dynamic information)

46

slide-40
SLIDE 40

6.7 Transformations enabled by PRE

§ Assume φ and Φ functions inserted, expression versions identified, downsafety information computed, block B with Φ function with ⊥ argument(s)

§ Decided to insert copy of E into predecessors of B § Insert assignment tempE = E

§ Process basic blocks B’ s.t. some version Ei of E is used (read) in B’

§ Remove computation of E, replace with reading tempE § Bookkeeping similar to du chains

47

slide-41
SLIDE 41

6.8 Example

§ SSA with φ functions and version numbers for variables § Φ functions and versions for expressions § Downsafety § Insertion of copies § Replace expressions with temporaries

48

slide-42
SLIDE 42

49

a = …. START EXIT = a + b a = … = a + b

slide-43
SLIDE 43

51

a1 = … t1 = a1 + b0 … START EXIT = [E3] t3 a3 = … t2 = a3 + b0 a4 = φ (a3,a2) E2 = Φ (⊥,E3) t4 = φ (t2,t3) = [E2] t4 a2 = φ (a1,a4) E3= Φ (⊥,E2) t3 = φ (t1,t4) …

slide-44
SLIDE 44

Comments

§ PRE provides (compiler) engineering benefit

§ One optimization to deal with local and global optimizations § One optimization that subsumes several distinct optimizations

§ Downsafety constraint (when inserting copies into basic blocks) guarantees that transformed program is computationally optimal

§ Number of operations cannot be reduced further by safe code motion

§ In reality, like to see programs that are lifetime optimal

§ Computationally optimal and minimal live ranges for all temporaries

53

slide-45
SLIDE 45

Beyond PRE

§ Are there other opportunities for optimization (in the CFG context)? § Consider this program fragment j = m; while (j < n) { a = b + j; c = a – j; d = b + j; j ++; }

54

slide-46
SLIDE 46

§ Same fragment with PRE: j = m; while (j < n) { a = b + j; c = a – j; d = b + j; j ++; }

55

slide-47
SLIDE 47

§ Same fragment with algebraic simplification (re-association): c = a – j ⟹c = ( b + j) – j ⟹c = b j = m; while (j < n) { a = b + j; c = b; d = b + j; j ++; }

56

slide-48
SLIDE 48

Now with PRE

j = m; c = b; while (j < n) { a = b + j; d = a; j ++; }

57