CS502: Compiler Design Code Optimization Manas Thakur Fall 2020 - - PowerPoint PPT Presentation

cs502 compiler design code optimization manas thakur
SMART_READER_LITE
LIVE PREVIEW

CS502: Compiler Design Code Optimization Manas Thakur Fall 2020 - - PowerPoint PPT Presentation

CS502: Compiler Design Code Optimization Manas Thakur Fall 2020 Fast. Faster. Fastest? Character stream Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer B a c k e n d Token stream Intermediate representation F r o n


slide-1
SLIDE 1

CS502: Compiler Design Code Optimization Manas Thakur

Fall 2020

slide-2
SLIDE 2

Manas Thakur CS502: Compiler Design 2

  • Fast. Faster. Fastest?

Lexical Analyzer Lexical Analyzer Syntax Analyzer Syntax Analyzer Semantic Analyzer Semantic Analyzer Intermediate Code Generator Intermediate Code Generator Character stream Token stream Syntax tree Syntax tree Intermediate representation Machine-Independent Code Optimizer Code Generator Code Generator Target machine code Intermediate representation Machine-Dependent Code Optimizer Target machine code Symbol Table

F r o n t e n d B a c k e n d

slide-3
SLIDE 3

Manas Thakur CS502: Compiler Design 3

Role of Code Optimizer

  • Make the program better

– time, memory, energy, ...

  • No guarantees in this land!

– Will a particular optimization for sure improve something? – Will performing an optimization affect something else? – In what order should I perform the optimizations? – What “scope” to perform certain optimization at? – Is the optimizer fast enough?

  • Can an optimized program be optimized further?
slide-4
SLIDE 4

Manas Thakur CS502: Compiler Design 4

Full employment theorem for compiler writers

  • Statement: There is no fully optimizing compiler.
  • Assume it exists:

– such that it transforms a program P to the smallest program

Opt(P) that has the same behaviour as P.

– Halting problem comes to the rescue:

  • Smallest program that never halts:

L1: goto L1

– Thus, a fully optimizing compiler could solve the halting problem by

checking if a given program is L1: goto L1!

– But HP is an undecidable problem. – Hence, a fully optimizing compiler can’t exist!

  • Therefore we talk just about an optimizing compiler.

– and keep working without worrying about future prospects!

slide-5
SLIDE 5

Manas Thakur CS502: Compiler Design 5

How to perform optimizations?

  • Analysis

– Go over the program – Identify some properties

  • Potentially useful properties
  • Transformation

– Use the information computed by the analysis to transform the

program

  • without affecting the semantics
  • An example that we have (not literally) seen:

– Compute liveness information – Delete assignments to variables that are dead

slide-6
SLIDE 6

Manas Thakur CS502: Compiler Design 6

Classifying optimizations

  • Based on scope:

– Local to basic blocks – Intraprocedural – Interprocedural

  • Based on positioning:

– High-level (transform source code or high-level IR) – Low-level (transform mid/low-level IR)

  • Based on (in)dependence w.r.t. target machine:

– Machine independent (general enough) – Machine dependent (specific to the architecture)

slide-7
SLIDE 7

Manas Thakur CS502: Compiler Design 7

May versus Must information

  • Consider the program:
  • Which variables may be assigned?

– a, b, c

  • Which variables must be assigned?

– a

  • May analysis:

– the computed information may hold in at least one execution of the

program.

  • Must analysis:

– the computed information must hold every time the program is

executed.

if (c) { a = ... b = ... } else { a = ... c = ... }

slide-8
SLIDE 8

Manas Thakur CS502: Compiler Design 8

Many many optimizations

  • Constant folding, constant propagation, tail-call elimination,

redundancy elimination, dead code elimination, loop-invariant code motion, loop splitting, loop fusion, strength reduction, array scalarization, inlining, synchronization elision, cloning, data prefetching, parallelization . . . etc . .

  • How do they interact?

– Optimist: we get the sum of all improvements. – Realist: many are in direct opposition.

  • Let us study some of them!
slide-9
SLIDE 9

Manas Thakur CS502: Compiler Design 9

Constant propagation

  • Idea:

– If the value of a variable is known to be a constant at compile-time,

replace the use of the variable with the constant.

– Usually a very helpful optimization – e.g., Can we now unroll the loop?

  • Why is it good?
  • Why could it be bad?

– When can we eliminate n and c themselves?

  • Now you know how well different optimizations might interact!

n = 10; c = 2; for (i=0; i<n; ++i) s = s + i * c; n = 10; c = 2; for (i=0; i<10; ++i) s = s + i * 2;

slide-10
SLIDE 10

Manas Thakur CS502: Compiler Design 10

Constant folding

  • Idea:

– If operands are known at compile-time, evaluate expression at

compile-time.

– What if the code was? – And what now?

r = 3.141 * 10; r = 31.41; PI = 3.141; r = PI * 10; PI = 3.141; r = PI * 10; d = 2 * r; Constant propagation Constant folding

Called partial evaluation

slide-11
SLIDE 11

Manas Thakur CS502: Compiler Design 11

Common sub-expression elimination

  • Idea:

– If program computes the same value multiple times,

reuse the value.

– Subexpressions can be reused until operands are redefined.

a = b + c; c = b + c; d = b + c; t = b + c; a = t; c = t; d = b + c;

slide-12
SLIDE 12

Manas Thakur CS502: Compiler Design 12

Copy propagation

  • Idea:

– After an assignment x = y, replace the uses of x with y. – Can only apply up to another assignment to x, or

... another assignment to y!

– What if there was an assignment y = z earlier?

  • Apply transitively to all assignments.

x = y; if (x > 1) s = x + f(x); x = y; if (y > 1) s = y + f(y);

slide-13
SLIDE 13

Manas Thakur CS502: Compiler Design 13

Dead-code elimination

  • Idea:

– If the result of a computation is never used,

remove the computation.

– Remove code that assigns to dead variables.

  • Liveness analysis done before would help!

– This may, in turn, create more dead code.

  • Dead-code elimination usually works transitively.

x = y + 1; y = 1; x = 2 * z; y = 1; x = 2 * z;

slide-14
SLIDE 14

Manas Thakur CS502: Compiler Design 14

Unreachable-code elimination

  • Idea:

– Eliminate code that can never be executed – High-level: look for if (false) or while (false)

  • perhaps after constant folding!

– Low-level: more difficult

  • Code is just labels and gotos
  • Traverse the CFG, marking reachable blocks

#define DEBUG 0 if (DEBUG) print(“Current value = ", v);

slide-15
SLIDE 15

Manas Thakur CS502: Compiler Design 15

Next class

  • Next class:

– How to perform the optimizations that we have seen using a

dataflow analysis?

  • Starting with:

– The back-end fullform of CFG!

  • Approximately only 10 more classes left.

– Hope this course is being successful in making (y)our hectic days a

bit more exciting :-)

slide-16
SLIDE 16

CS502: Compiler Design Code Optimization (Cont.) Manas Thakur

Fall 2020

slide-17
SLIDE 17

Manas Thakur CS502: Compiler Design 17

Recall A2

  • Is ‘a’ initialized in this program?

– Reality during run-time: Depends – What to tell at compile-time?

  • Is this a ‘must’ question or a ‘may’ question?
  • Correct answer: No

– How do we obtain such answers?

  • Need to model the control-flow

int a; if (*) { a = 10; else { //something that doesn’t touch ‘a’ } x = a;

slide-18
SLIDE 18

Manas Thakur CS502: Compiler Design 18

Control-Flow Graph (CFG)

  • Nodes represent instructions; edges represent flow of control

a = 0 L1: b = a + 1 c = c + b a = b * 2 if a < N goto L1 return c

a = 0 b = a + 1 c = c + b a = b * 2 a < N return c

slide-19
SLIDE 19

Manas Thakur CS502: Compiler Design 19

Some CFG terminology

  • pred[n] gives predecessors of n

– pred[1]? pred[4]? pred[2]?

  • succ[n] gives successors of n

– succ[2]? succ[5]?

  • def(n) gives variables defined by n

– def(3) = {c}

  • use(n) gives variables used by n

– use(3) = {b, c}

a = 0 b = a + 1 c = c + b a = b * 2 a < N return c

1 2 3 4 5 6

slide-20
SLIDE 20

Manas Thakur CS502: Compiler Design 20

Live ranges revisited

  • A variable is live, if its current value may be used in future.

– Insight:

  • work from future to past
  • backward over the CFG
  • Live ranges:

– a: {1->2, 4->5->2} – b: {2->3, 3->4} – c: All edges except 1->2

a = 0 b = a + 1 c = c + b a = b * 2 a < N return c

1 2 3 4 5 6

slide-21
SLIDE 21

Manas Thakur CS502: Compiler Design 21

Liveness

  • A variable v is live on an edge if there is a

directed path from that edge to a use of v that does not go through any def of v.

  • A variable is live-in at a node if it is live on any
  • f the in-edges of that node.
  • A variable is live-out at a node if it is live on

any of the out-edges of that node.

  • Verify:

– a: {1->2, 4->5->2} – b: {2->4}

a = 0 b = a + 1 c = c + b a = b * 2 a < N return c

1 2 3 4 5 6

slide-22
SLIDE 22

Manas Thakur CS502: Compiler Design 22

Computation of liveness

  • Say live-in of n is in[n], and live-out of n is out[n].
  • We can compute in[n] and out[n] for any n as follows:

in[n] = use[n] ∪ (out[n] – def[n])

  • ut[n] = s

∀ ∈succ[n] ∪ in[s]

Called datafmow equations. Called fmow functions.

slide-23
SLIDE 23

Manas Thakur CS502: Compiler Design 23

Liveness as an iterative dataflow analysis

Initialize Save previous values Compute new values Repeat till fjxed-point

IDFA

for each n in[n] = {}; out[n] = {} repeat for each n in’[n] = in[n]; out’[n] = out[n] in[n] = use[n] (out[n] – def[n]) ∪

  • ut[n] = s succ[n]

∀ ∈ in[s] ∪

  • until in’[n] == in[n] and out’[n] == out[n] ∀n
slide-24
SLIDE 24

Manas Thakur CS502: Compiler Design 24

Liveness analysis example

a = 0 b = a + 1 c = c + b a = b * 2 a < N return c

1 2 3 4 5 6 in[n] = use[n] ∪ (out[n] – def[n])

  • ut[n] = s

∀ ∈succ[n] ∪ in[s]

Fixed point

slide-25
SLIDE 25

Manas Thakur CS502: Compiler Design 25

In backward order

a = 0 b = a + 1 c = c + b a = b * 2 a < N return c

1 2 3 4 5 6

  • Fixed point only in 3 iterations!
  • Thus, the order of processing statements is important for efficiency.

in[n] = use[n] ∪ (out[n] – def[n])

  • ut[n] = s

∀ ∈succ[n] ∪ in[s]

slide-26
SLIDE 26

Manas Thakur CS502: Compiler Design 26

Complexity of our liveness computation algorithm

  • For input program of size N

– ≤N nodes in CFG

⇒ N variables ⇒ N elements per in/out ⇒ O(N) time per set union

– for loop performs constant number of set operations per node

⇒ O(N2) time for for loop

– Each iteration of for loop can only add to each set (monotonicity) – Sizes of all in and out sets sum to 2N2

thus bounding the number of iterations of the repeat loop ⇒ worst-case complexity of O(N4)

– Much less in practice (usually O(N) or O(N2)) if ordered properly.

repeat for each n in’[n] = in[n]; out’[n] = out[n] in[n] = use[n] (out[n] – def[n]) ∪

  • ut[n] = s succ[n]

∀ ∈ in[s] ∪ until in’[n] == in[n] and out’[n] == out[n] ∀n

slide-27
SLIDE 27

Manas Thakur CS502: Compiler Design 27

Least fixed points

  • There is often more than one solution for a given dataflow

problem.

– Any solution to dataflow equations is a conservative approximation.

  • Conservatively assuming a variable is live does not break the

program:

– Just means more registers may be needed.

  • Assuming a variable is dead when really live will break things.
  • Many possible solutions; but we want the smallest: the least

fixed point.

  • The iterative algorithm computes this least fixed point.
slide-28
SLIDE 28

Manas Thakur CS502: Compiler Design 28

Confused!?

  • Is compilers a theoretical topic or a

practical one?

  • Recall:

– “A sangam of theory and practice.”

  • Next class:

– We are not leaving a topic as

important as IDFA so soon!

slide-29
SLIDE 29

CS502: Compiler Design Code Optimization (Cont.) Manas Thakur

Fall 2020

slide-30
SLIDE 30

Manas Thakur CS502: Compiler Design 30

Recall our IDFA algorithm

for each n in[n] = ...; out[n] = ... repeat for each n in’[n] = in[n]; out’[n] = out[n] in[n] = ...

  • ut[n] = ...

until in’[n] = in[n] and out’[n] = out[n] for all n Initialize Save previous values Compute new values Repeat till fjxed-point Do we need to process all the nodes in each iteration?

slide-31
SLIDE 31

Manas Thakur CS502: Compiler Design 31

Worklist-based Implementation of IDFA

  • Initialize a worklist of statements
  • Forward analysis:

– Start with the entry node – If OUT(n) changes, then add succ(n) to the worklist

  • Backward analysis:

– Start with the exit node – If IN(n) changes, then add pred(n) to the worklist

  • In both the cases, iterate till fixed point.
slide-32
SLIDE 32

Manas Thakur CS502: Compiler Design 32

Writing an IDFA (Cont.)

  • Initialization of IN and OUT sets depends on the analysis:

– empty if the information grows – all the nodes if the information shrinks

  • Requirement for termination:

– unidirectional growth/shrinkage – Called monotonicity

  • Confluence/Meet operation (at control-flow merges):

– Union – Intersection

Depends on the analysis

slide-33
SLIDE 33

Manas Thakur CS502: Compiler Design 33

Live-variable analysis revisited

  • Direction:

– Backward

  • Initialization:

– Empty sets

  • Flow functions:

– out[n] = s

∀ ∈succ[n] ∪ in[s]

– in[n] = use[n] ∪ (out[n] – def[n])

  • Confluence operation:

– Union

slide-34
SLIDE 34

Manas Thakur CS502: Compiler Design 34

Common sub-expressions revisited

  • Idea:

– If a program computes the same value multiple times,

reuse the value.

– Subexpressions can be reused until operands are redefined. – Say given a node n, the expressions computed at n are denoted as

gen(n) and the ones killed (operands redefined) at n are denoted as kill(n).

a = b + c; c = b + c; d = b + c; t = b + c; a = t; c = t; d = b + c;

slide-35
SLIDE 35

Manas Thakur CS502: Compiler Design 35

Common subexpressions as an IDFA

  • Direction:

– Forward

  • Initialization:

– Empty sets

  • Flow functions:

– in[n] = p

pred ∈

[n] ∩ out[p]

– out[n] = gen[n] ∪ (in[n] – kill[n])

  • Confluence operation:

– Intersection

slide-36
SLIDE 36

Manas Thakur CS502: Compiler Design 36

Are we efficient enough?

  • When can IDFAs take a lot of time?
  • Which operations could be expensive?

– Confluence – Equality

  • Compilers may have to perform several IDFAs.
  • How can we make an IDFA more efficient (perhaps with some

loss of precision)?

repeat for each n in’[n] = in[n]; out’[n] = out[n] in[n] = use[n] (out[n] – def[n]) ∪

  • ut[n] = s succ[n] in[s]

∀ ∈ ∪ until in’[n] == in[n] and out’[n] == out[n] n ∀

slide-37
SLIDE 37

Manas Thakur CS502: Compiler Design 37

Basic Blocks

a = 0 L1: b = a + 1 c = c + b a = b * 2 if a < N goto L1 return c

a = 0 b = a + 1 c = c + b a = b * 2 a < N return c a = 0 b = a + 1 c = c + b a = b * 2 a < N return c

Each instruction as a node Using basic blocks

slide-38
SLIDE 38

Manas Thakur CS502: Compiler Design 38

Basic Blocks (Cont.)

  • Idea:

– Once execution enters a basic block, all statements are executed

in sequence.

– Single-entry, single-exit region

  • Details:

– Starts with a label – Ends with one or more branches – Edges may be labeled with predicates

  • True/false
  • Exceptions
  • Key: Improve efficiency, with reasonable precision.
slide-39
SLIDE 39

Manas Thakur CS502: Compiler Design 39

Have you got a compiler’s eyes yet?

  • What properties can you identify about this program?
  • What’s the advantage if it was rewritten as follows?
  • Def-use becomes explicit.

S1: y = 1; S2: y = 2; S3: x = y; S1: y1 = 1; S2: y2 = 2; S3: x = y2;

slide-40
SLIDE 40

Manas Thakur CS502: Compiler Design 40

Static Single Assignment (SSA)

  • A form of IR in which each use can be mapped to a single

definition.

– Achieved using variable renaming and phi nodes.

  • Many compilers use SSA form in their IRs.

if (flag) x = -1; else x = 1; y = x * a; if (flag) x1 = -1; else x2 = 1; x3 = Φ(x1, x2) y = x3 * a;

slide-41
SLIDE 41

Manas Thakur CS502: Compiler Design 41

SSA Classwork

  • Convert the following program to SSA form:

– (Hint: First convert to 3AC)

x = 0; for (i=0; i<N; ++i) { x += i; i = i + 1; x--; } x = x + i; x1 = 0; i1 = 0; L1: i13 = Φ(i1,i3); if (i13 < N) { x13 = Φ(x1,x3); x2 = x13 + i13; i2 = i13 + 1; x3 = x2 – 1; i3 = i2 + 1; goto L1; } x4 = Φ(x1, x3); x5 = x4 + i13;

slide-42
SLIDE 42

Manas Thakur CS502: Compiler Design 42

Effect of SSA on Register Allocation!?

  • What is the effect of SSA form on liveness?
  • What does SSA do?

– Breaks a single variable into multiple instances – Instances represent distinct, non-overlapping uses

  • Effect:

– Breaks up live ranges; often improves register allocation

x x1 x2

slide-43
SLIDE 43

Manas Thakur CS502: Compiler Design 43

Featuring Next in Code Optimization

  • Heard of the 80-20 or 90-10 rule?

– X% of time is spent in executing y% of the code, where X >> y.

  • Which all kinds of code portions tend to form the region ‘y’ in

typical programs?

– Loops – Methods

  • Tomorrow: Loop optimizations
slide-44
SLIDE 44

CS502: Compiler Design Loop Optimizations Manas Thakur

Fall 2020

slide-45
SLIDE 45

Manas Thakur CS502: Compiler Design 45

Why optimize loops?

  • Form a significant portion of the time spent in executing

programs.

– If N is just 10000 (not uncommon), we have too many instructions!

  • How many in the above loop?

– What if S1/S2 is/are function calls?

  • Involve costly instructions in each iteration:

– Comparisons – Jumps

for (i=0; i<N; i++) { S1; S2; }

slide-46
SLIDE 46

Manas Thakur CS502: Compiler Design 46

What is a loop?

  • A loop in a CFG is a set of nodes S such that:

– There is a designated header node h in S – There is a path from each node in S to h – There is a path from h to each node in S – h is the only node in S with an incoming edge from outside S

slide-47
SLIDE 47

Manas Thakur CS502: Compiler Design 47

Are all these loops?

slide-48
SLIDE 48

Manas Thakur CS502: Compiler Design 48

What about these?

slide-49
SLIDE 49

Manas Thakur CS502: Compiler Design 49

Identifying loops using dominators

  • A node d dominates a node n if every path from entry to n

goes through d.

  • Compute dominators of each node:
slide-50
SLIDE 50

Manas Thakur CS502: Compiler Design 50

Flow function for computing dominators

  • Assuming D[i] is the set of dominators of node i:

D[entry] = {entry} D[n] = {n} ∪ p ∀

pred ∈

[n] ∩ D[p]

slide-51
SLIDE 51

Manas Thakur CS502: Compiler Design 51

Identifying loops using dominators (Cont.)

  • First, identify a back edge:

– An edge from a node n to another node h, where h dominates n

  • Each back edge leads to a loop:

– Set X of nodes such that for each x ∈ X, h dominates x and there

is a path from x to n not containing h

– h is the header

  • Verify:
slide-52
SLIDE 52

Manas Thakur CS502: Compiler Design 52

Loop-Invariant Code Motion (LICM)

  • Loop-invariant code:

– d: t = a OP b, such that:

  • a and b are constants; or
  • all the definitions of a and b that reach d are outside the loop; or
  • only one definition each of a and b reaches d, and that definition is

loop-invariant.

  • Example:

L0: t = 0 L1: i = i + 1 t = a * b M[i] = t if i<N goto L1 L2: x = t

slide-53
SLIDE 53

Manas Thakur CS502: Compiler Design 53

LICM: Get ready for code hoisting

  • Can we always hoist loop-invariant code?
  • Criteria for hoisting d: t = a OP b:

– d dominates all loop exits at which t is live-out, and – there is only one definition of t in the loop, and – t is not live-out of the loop preheader

  • How can we hoist code in the pink and the orange blocks?

L0: t = 0 L1: i = i + 1 t = a * b M[i] = t if i<N goto L1 L2: x = t L0: t = 0 L1: if i>=N goto L2 i = i + 1 t = a * b M[i] = t goto L1 L2: x = t L0: t = 0 L1: M[j] = t i = i + 1 t = a * b M[i] = t if i<N goto L1 L2: x = t

slide-54
SLIDE 54

Manas Thakur CS502: Compiler Design 54

Induction-variable optimization

  • Induction variables:

– Variables whose value depends on iteration variable

  • Optimization:

– Compute them efficiently, if possible

s = 0 i = 0 L1: if i>=N goto L2 j = i * 4 k = j + a x = M[k] s = s + x i = i + 1 goto L1 L2: s = 0 k’ = a b = N * 4 c = a + b L1: if k’>=c goto L2 x = M[k’] s = s + x k’ = k’ + 4 goto L1 L2:

slide-55
SLIDE 55

Manas Thakur CS502: Compiler Design 55

Loop unrolling

  • Minimize the number of increments and condition-checks
  • Be careful about the increase in code size (I-cache misses!)

L1: x = M[i] s = s + x i = i + 4 if i<N goto L1 L2: L1: x = M[i] s = s + x x = M[i+4] s = s + x i = i + 8 if i<N goto L1 L2: if i<N-8 goto L1 goto L2 L1: x = M[i] s = s + x x = M[i+4] s = s + x i = i + 8 if i<N-8 goto L1 L2: x = M[i] s = s + x i = i + 4 if i<N goto L2 L3: Only even no. of iterations: Any no. of iterations: Unroll by factor of 2

slide-56
SLIDE 56

Manas Thakur CS502: Compiler Design 56

Loop interchange

  • A C/Java programmer starting with MATLAB:
  • But MATLAB stores matrices in column-major order!
  • Implication?

– Cache misses (perhaps in each iteration)!

  • Solution (interchange the loops!):

for i=1:1000, for j=1:1000, a(i) = a(i) + b(i,j)*c(i) end end for j=1:1000, for i=1:1000, a(i) = a(i) + b(i,j)*c(i) end end

slide-57
SLIDE 57

Manas Thakur CS502: Compiler Design 57

Many more loop optimizations

  • Loop fusion
  • Loop fission
  • Loop inversion
  • Loop tiling
  • Loop unswitching
  • . . .
  • Vectorization
  • Parallelization Next class!

Some other time!