CS502: Compiler Design Code Optimization Manas Thakur Fall 2020 - - PowerPoint PPT Presentation
CS502: Compiler Design Code Optimization Manas Thakur Fall 2020 - - PowerPoint PPT Presentation
CS502: Compiler Design Code Optimization Manas Thakur Fall 2020 Fast. Faster. Fastest? Character stream Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer B a c k e n d Token stream Intermediate representation F r o n
Manas Thakur CS502: Compiler Design 2
- Fast. Faster. Fastest?
Lexical Analyzer Lexical Analyzer Syntax Analyzer Syntax Analyzer Semantic Analyzer Semantic Analyzer Intermediate Code Generator Intermediate Code Generator Character stream Token stream Syntax tree Syntax tree Intermediate representation Machine-Independent Code Optimizer Code Generator Code Generator Target machine code Intermediate representation Machine-Dependent Code Optimizer Target machine code Symbol Table
F r o n t e n d B a c k e n d
Manas Thakur CS502: Compiler Design 3
Role of Code Optimizer
- Make the program better
– time, memory, energy, ...
- No guarantees in this land!
– Will a particular optimization for sure improve something? – Will performing an optimization affect something else? – In what order should I perform the optimizations? – What “scope” to perform certain optimization at? – Is the optimizer fast enough?
- Can an optimized program be optimized further?
Manas Thakur CS502: Compiler Design 4
Full employment theorem for compiler writers
- Statement: There is no fully optimizing compiler.
- Assume it exists:
– such that it transforms a program P to the smallest program
Opt(P) that has the same behaviour as P.
– Halting problem comes to the rescue:
- Smallest program that never halts:
L1: goto L1
– Thus, a fully optimizing compiler could solve the halting problem by
checking if a given program is L1: goto L1!
– But HP is an undecidable problem. – Hence, a fully optimizing compiler can’t exist!
- Therefore we talk just about an optimizing compiler.
– and keep working without worrying about future prospects!
Manas Thakur CS502: Compiler Design 5
How to perform optimizations?
- Analysis
– Go over the program – Identify some properties
- Potentially useful properties
- Transformation
– Use the information computed by the analysis to transform the
program
- without affecting the semantics
- An example that we have (not literally) seen:
– Compute liveness information – Delete assignments to variables that are dead
Manas Thakur CS502: Compiler Design 6
Classifying optimizations
- Based on scope:
– Local to basic blocks – Intraprocedural – Interprocedural
- Based on positioning:
– High-level (transform source code or high-level IR) – Low-level (transform mid/low-level IR)
- Based on (in)dependence w.r.t. target machine:
– Machine independent (general enough) – Machine dependent (specific to the architecture)
Manas Thakur CS502: Compiler Design 7
May versus Must information
- Consider the program:
- Which variables may be assigned?
– a, b, c
- Which variables must be assigned?
– a
- May analysis:
– the computed information may hold in at least one execution of the
program.
- Must analysis:
– the computed information must hold every time the program is
executed.
if (c) { a = ... b = ... } else { a = ... c = ... }
Manas Thakur CS502: Compiler Design 8
Many many optimizations
- Constant folding, constant propagation, tail-call elimination,
redundancy elimination, dead code elimination, loop-invariant code motion, loop splitting, loop fusion, strength reduction, array scalarization, inlining, synchronization elision, cloning, data prefetching, parallelization . . . etc . .
- How do they interact?
– Optimist: we get the sum of all improvements. – Realist: many are in direct opposition.
- Let us study some of them!
Manas Thakur CS502: Compiler Design 9
Constant propagation
- Idea:
– If the value of a variable is known to be a constant at compile-time,
replace the use of the variable with the constant.
– Usually a very helpful optimization – e.g., Can we now unroll the loop?
- Why is it good?
- Why could it be bad?
– When can we eliminate n and c themselves?
- Now you know how well different optimizations might interact!
n = 10; c = 2; for (i=0; i<n; ++i) s = s + i * c; n = 10; c = 2; for (i=0; i<10; ++i) s = s + i * 2;
Manas Thakur CS502: Compiler Design 10
Constant folding
- Idea:
– If operands are known at compile-time, evaluate expression at
compile-time.
– What if the code was? – And what now?
r = 3.141 * 10; r = 31.41; PI = 3.141; r = PI * 10; PI = 3.141; r = PI * 10; d = 2 * r; Constant propagation Constant folding
Called partial evaluation
Manas Thakur CS502: Compiler Design 11
Common sub-expression elimination
- Idea:
– If program computes the same value multiple times,
reuse the value.
– Subexpressions can be reused until operands are redefined.
a = b + c; c = b + c; d = b + c; t = b + c; a = t; c = t; d = b + c;
Manas Thakur CS502: Compiler Design 12
Copy propagation
- Idea:
– After an assignment x = y, replace the uses of x with y. – Can only apply up to another assignment to x, or
... another assignment to y!
– What if there was an assignment y = z earlier?
- Apply transitively to all assignments.
x = y; if (x > 1) s = x + f(x); x = y; if (y > 1) s = y + f(y);
Manas Thakur CS502: Compiler Design 13
Dead-code elimination
- Idea:
– If the result of a computation is never used,
remove the computation.
– Remove code that assigns to dead variables.
- Liveness analysis done before would help!
– This may, in turn, create more dead code.
- Dead-code elimination usually works transitively.
x = y + 1; y = 1; x = 2 * z; y = 1; x = 2 * z;
Manas Thakur CS502: Compiler Design 14
Unreachable-code elimination
- Idea:
– Eliminate code that can never be executed – High-level: look for if (false) or while (false)
- perhaps after constant folding!
– Low-level: more difficult
- Code is just labels and gotos
- Traverse the CFG, marking reachable blocks
#define DEBUG 0 if (DEBUG) print(“Current value = ", v);
Manas Thakur CS502: Compiler Design 15
Next class
- Next class:
– How to perform the optimizations that we have seen using a
dataflow analysis?
- Starting with:
– The back-end fullform of CFG!
- Approximately only 10 more classes left.
– Hope this course is being successful in making (y)our hectic days a
bit more exciting :-)
CS502: Compiler Design Code Optimization (Cont.) Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 17
Recall A2
- Is ‘a’ initialized in this program?
– Reality during run-time: Depends – What to tell at compile-time?
- Is this a ‘must’ question or a ‘may’ question?
- Correct answer: No
– How do we obtain such answers?
- Need to model the control-flow
int a; if (*) { a = 10; else { //something that doesn’t touch ‘a’ } x = a;
Manas Thakur CS502: Compiler Design 18
Control-Flow Graph (CFG)
- Nodes represent instructions; edges represent flow of control
a = 0 L1: b = a + 1 c = c + b a = b * 2 if a < N goto L1 return c
a = 0 b = a + 1 c = c + b a = b * 2 a < N return c
Manas Thakur CS502: Compiler Design 19
Some CFG terminology
- pred[n] gives predecessors of n
– pred[1]? pred[4]? pred[2]?
- succ[n] gives successors of n
– succ[2]? succ[5]?
- def(n) gives variables defined by n
– def(3) = {c}
- use(n) gives variables used by n
– use(3) = {b, c}
a = 0 b = a + 1 c = c + b a = b * 2 a < N return c
1 2 3 4 5 6
Manas Thakur CS502: Compiler Design 20
Live ranges revisited
- A variable is live, if its current value may be used in future.
– Insight:
- work from future to past
- backward over the CFG
- Live ranges:
– a: {1->2, 4->5->2} – b: {2->3, 3->4} – c: All edges except 1->2
a = 0 b = a + 1 c = c + b a = b * 2 a < N return c
1 2 3 4 5 6
Manas Thakur CS502: Compiler Design 21
Liveness
- A variable v is live on an edge if there is a
directed path from that edge to a use of v that does not go through any def of v.
- A variable is live-in at a node if it is live on any
- f the in-edges of that node.
- A variable is live-out at a node if it is live on
any of the out-edges of that node.
- Verify:
– a: {1->2, 4->5->2} – b: {2->4}
a = 0 b = a + 1 c = c + b a = b * 2 a < N return c
1 2 3 4 5 6
Manas Thakur CS502: Compiler Design 22
Computation of liveness
- Say live-in of n is in[n], and live-out of n is out[n].
- We can compute in[n] and out[n] for any n as follows:
in[n] = use[n] ∪ (out[n] – def[n])
- ut[n] = s
∀ ∈succ[n] ∪ in[s]
Called datafmow equations. Called fmow functions.
Manas Thakur CS502: Compiler Design 23
Liveness as an iterative dataflow analysis
Initialize Save previous values Compute new values Repeat till fjxed-point
IDFA
for each n in[n] = {}; out[n] = {} repeat for each n in’[n] = in[n]; out’[n] = out[n] in[n] = use[n] (out[n] – def[n]) ∪
- ut[n] = s succ[n]
∀ ∈ in[s] ∪
- until in’[n] == in[n] and out’[n] == out[n] ∀n
Manas Thakur CS502: Compiler Design 24
Liveness analysis example
a = 0 b = a + 1 c = c + b a = b * 2 a < N return c
1 2 3 4 5 6 in[n] = use[n] ∪ (out[n] – def[n])
- ut[n] = s
∀ ∈succ[n] ∪ in[s]
Fixed point
Manas Thakur CS502: Compiler Design 25
In backward order
a = 0 b = a + 1 c = c + b a = b * 2 a < N return c
1 2 3 4 5 6
- Fixed point only in 3 iterations!
- Thus, the order of processing statements is important for efficiency.
in[n] = use[n] ∪ (out[n] – def[n])
- ut[n] = s
∀ ∈succ[n] ∪ in[s]
Manas Thakur CS502: Compiler Design 26
Complexity of our liveness computation algorithm
- For input program of size N
– ≤N nodes in CFG
⇒ N variables ⇒ N elements per in/out ⇒ O(N) time per set union
– for loop performs constant number of set operations per node
⇒ O(N2) time for for loop
– Each iteration of for loop can only add to each set (monotonicity) – Sizes of all in and out sets sum to 2N2
thus bounding the number of iterations of the repeat loop ⇒ worst-case complexity of O(N4)
– Much less in practice (usually O(N) or O(N2)) if ordered properly.
repeat for each n in’[n] = in[n]; out’[n] = out[n] in[n] = use[n] (out[n] – def[n]) ∪
- ut[n] = s succ[n]
∀ ∈ in[s] ∪ until in’[n] == in[n] and out’[n] == out[n] ∀n
Manas Thakur CS502: Compiler Design 27
Least fixed points
- There is often more than one solution for a given dataflow
problem.
– Any solution to dataflow equations is a conservative approximation.
- Conservatively assuming a variable is live does not break the
program:
– Just means more registers may be needed.
- Assuming a variable is dead when really live will break things.
- Many possible solutions; but we want the smallest: the least
fixed point.
- The iterative algorithm computes this least fixed point.
Manas Thakur CS502: Compiler Design 28
Confused!?
- Is compilers a theoretical topic or a
practical one?
- Recall:
– “A sangam of theory and practice.”
- Next class:
– We are not leaving a topic as
important as IDFA so soon!
CS502: Compiler Design Code Optimization (Cont.) Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 30
Recall our IDFA algorithm
for each n in[n] = ...; out[n] = ... repeat for each n in’[n] = in[n]; out’[n] = out[n] in[n] = ...
- ut[n] = ...
until in’[n] = in[n] and out’[n] = out[n] for all n Initialize Save previous values Compute new values Repeat till fjxed-point Do we need to process all the nodes in each iteration?
Manas Thakur CS502: Compiler Design 31
Worklist-based Implementation of IDFA
- Initialize a worklist of statements
- Forward analysis:
– Start with the entry node – If OUT(n) changes, then add succ(n) to the worklist
- Backward analysis:
– Start with the exit node – If IN(n) changes, then add pred(n) to the worklist
- In both the cases, iterate till fixed point.
Manas Thakur CS502: Compiler Design 32
Writing an IDFA (Cont.)
- Initialization of IN and OUT sets depends on the analysis:
– empty if the information grows – all the nodes if the information shrinks
- Requirement for termination:
– unidirectional growth/shrinkage – Called monotonicity
- Confluence/Meet operation (at control-flow merges):
– Union – Intersection
Depends on the analysis
Manas Thakur CS502: Compiler Design 33
Live-variable analysis revisited
- Direction:
– Backward
- Initialization:
– Empty sets
- Flow functions:
– out[n] = s
∀ ∈succ[n] ∪ in[s]
– in[n] = use[n] ∪ (out[n] – def[n])
- Confluence operation:
– Union
Manas Thakur CS502: Compiler Design 34
Common sub-expressions revisited
- Idea:
– If a program computes the same value multiple times,
reuse the value.
– Subexpressions can be reused until operands are redefined. – Say given a node n, the expressions computed at n are denoted as
gen(n) and the ones killed (operands redefined) at n are denoted as kill(n).
a = b + c; c = b + c; d = b + c; t = b + c; a = t; c = t; d = b + c;
Manas Thakur CS502: Compiler Design 35
Common subexpressions as an IDFA
- Direction:
– Forward
- Initialization:
– Empty sets
- Flow functions:
– in[n] = p
∀
pred ∈
[n] ∩ out[p]
– out[n] = gen[n] ∪ (in[n] – kill[n])
- Confluence operation:
– Intersection
Manas Thakur CS502: Compiler Design 36
Are we efficient enough?
- When can IDFAs take a lot of time?
- Which operations could be expensive?
– Confluence – Equality
- Compilers may have to perform several IDFAs.
- How can we make an IDFA more efficient (perhaps with some
loss of precision)?
repeat for each n in’[n] = in[n]; out’[n] = out[n] in[n] = use[n] (out[n] – def[n]) ∪
- ut[n] = s succ[n] in[s]
∀ ∈ ∪ until in’[n] == in[n] and out’[n] == out[n] n ∀
Manas Thakur CS502: Compiler Design 37
Basic Blocks
a = 0 L1: b = a + 1 c = c + b a = b * 2 if a < N goto L1 return c
a = 0 b = a + 1 c = c + b a = b * 2 a < N return c a = 0 b = a + 1 c = c + b a = b * 2 a < N return c
Each instruction as a node Using basic blocks
Manas Thakur CS502: Compiler Design 38
Basic Blocks (Cont.)
- Idea:
– Once execution enters a basic block, all statements are executed
in sequence.
– Single-entry, single-exit region
- Details:
– Starts with a label – Ends with one or more branches – Edges may be labeled with predicates
- True/false
- Exceptions
- Key: Improve efficiency, with reasonable precision.
Manas Thakur CS502: Compiler Design 39
Have you got a compiler’s eyes yet?
- What properties can you identify about this program?
- What’s the advantage if it was rewritten as follows?
- Def-use becomes explicit.
S1: y = 1; S2: y = 2; S3: x = y; S1: y1 = 1; S2: y2 = 2; S3: x = y2;
Manas Thakur CS502: Compiler Design 40
Static Single Assignment (SSA)
- A form of IR in which each use can be mapped to a single
definition.
– Achieved using variable renaming and phi nodes.
- Many compilers use SSA form in their IRs.
if (flag) x = -1; else x = 1; y = x * a; if (flag) x1 = -1; else x2 = 1; x3 = Φ(x1, x2) y = x3 * a;
Manas Thakur CS502: Compiler Design 41
SSA Classwork
- Convert the following program to SSA form:
– (Hint: First convert to 3AC)
x = 0; for (i=0; i<N; ++i) { x += i; i = i + 1; x--; } x = x + i; x1 = 0; i1 = 0; L1: i13 = Φ(i1,i3); if (i13 < N) { x13 = Φ(x1,x3); x2 = x13 + i13; i2 = i13 + 1; x3 = x2 – 1; i3 = i2 + 1; goto L1; } x4 = Φ(x1, x3); x5 = x4 + i13;
Manas Thakur CS502: Compiler Design 42
Effect of SSA on Register Allocation!?
- What is the effect of SSA form on liveness?
- What does SSA do?
– Breaks a single variable into multiple instances – Instances represent distinct, non-overlapping uses
- Effect:
– Breaks up live ranges; often improves register allocation
x x1 x2
Manas Thakur CS502: Compiler Design 43
Featuring Next in Code Optimization
- Heard of the 80-20 or 90-10 rule?
– X% of time is spent in executing y% of the code, where X >> y.
- Which all kinds of code portions tend to form the region ‘y’ in
typical programs?
– Loops – Methods
- Tomorrow: Loop optimizations
CS502: Compiler Design Loop Optimizations Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 45
Why optimize loops?
- Form a significant portion of the time spent in executing
programs.
– If N is just 10000 (not uncommon), we have too many instructions!
- How many in the above loop?
– What if S1/S2 is/are function calls?
- Involve costly instructions in each iteration:
– Comparisons – Jumps
for (i=0; i<N; i++) { S1; S2; }
Manas Thakur CS502: Compiler Design 46
What is a loop?
- A loop in a CFG is a set of nodes S such that:
– There is a designated header node h in S – There is a path from each node in S to h – There is a path from h to each node in S – h is the only node in S with an incoming edge from outside S
Manas Thakur CS502: Compiler Design 47
Are all these loops?
Manas Thakur CS502: Compiler Design 48
What about these?
Manas Thakur CS502: Compiler Design 49
Identifying loops using dominators
- A node d dominates a node n if every path from entry to n
goes through d.
- Compute dominators of each node:
Manas Thakur CS502: Compiler Design 50
Flow function for computing dominators
- Assuming D[i] is the set of dominators of node i:
D[entry] = {entry} D[n] = {n} ∪ p ∀
pred ∈
[n] ∩ D[p]
Manas Thakur CS502: Compiler Design 51
Identifying loops using dominators (Cont.)
- First, identify a back edge:
– An edge from a node n to another node h, where h dominates n
- Each back edge leads to a loop:
– Set X of nodes such that for each x ∈ X, h dominates x and there
is a path from x to n not containing h
– h is the header
- Verify:
Manas Thakur CS502: Compiler Design 52
Loop-Invariant Code Motion (LICM)
- Loop-invariant code:
– d: t = a OP b, such that:
- a and b are constants; or
- all the definitions of a and b that reach d are outside the loop; or
- only one definition each of a and b reaches d, and that definition is
loop-invariant.
- Example:
L0: t = 0 L1: i = i + 1 t = a * b M[i] = t if i<N goto L1 L2: x = t
Manas Thakur CS502: Compiler Design 53
LICM: Get ready for code hoisting
- Can we always hoist loop-invariant code?
- Criteria for hoisting d: t = a OP b:
– d dominates all loop exits at which t is live-out, and – there is only one definition of t in the loop, and – t is not live-out of the loop preheader
- How can we hoist code in the pink and the orange blocks?
L0: t = 0 L1: i = i + 1 t = a * b M[i] = t if i<N goto L1 L2: x = t L0: t = 0 L1: if i>=N goto L2 i = i + 1 t = a * b M[i] = t goto L1 L2: x = t L0: t = 0 L1: M[j] = t i = i + 1 t = a * b M[i] = t if i<N goto L1 L2: x = t
Manas Thakur CS502: Compiler Design 54
Induction-variable optimization
- Induction variables:
– Variables whose value depends on iteration variable
- Optimization:
– Compute them efficiently, if possible
s = 0 i = 0 L1: if i>=N goto L2 j = i * 4 k = j + a x = M[k] s = s + x i = i + 1 goto L1 L2: s = 0 k’ = a b = N * 4 c = a + b L1: if k’>=c goto L2 x = M[k’] s = s + x k’ = k’ + 4 goto L1 L2:
Manas Thakur CS502: Compiler Design 55
Loop unrolling
- Minimize the number of increments and condition-checks
- Be careful about the increase in code size (I-cache misses!)
L1: x = M[i] s = s + x i = i + 4 if i<N goto L1 L2: L1: x = M[i] s = s + x x = M[i+4] s = s + x i = i + 8 if i<N goto L1 L2: if i<N-8 goto L1 goto L2 L1: x = M[i] s = s + x x = M[i+4] s = s + x i = i + 8 if i<N-8 goto L1 L2: x = M[i] s = s + x i = i + 4 if i<N goto L2 L3: Only even no. of iterations: Any no. of iterations: Unroll by factor of 2
Manas Thakur CS502: Compiler Design 56
Loop interchange
- A C/Java programmer starting with MATLAB:
- But MATLAB stores matrices in column-major order!
- Implication?
– Cache misses (perhaps in each iteration)!
- Solution (interchange the loops!):
for i=1:1000, for j=1:1000, a(i) = a(i) + b(i,j)*c(i) end end for j=1:1000, for i=1:1000, a(i) = a(i) + b(i,j)*c(i) end end
Manas Thakur CS502: Compiler Design 57
Many more loop optimizations
- Loop fusion
- Loop fission
- Loop inversion
- Loop tiling
- Loop unswitching
- . . .
- Vectorization
- Parallelization Next class!