Static analysis Dataflow Dataflow frameworks
CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects - - PowerPoint PPT Presentation
CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects - - PowerPoint PPT Presentation
Static analysis CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects Available 1. Crashes to exploits 2. Pointer analysis for JavaScript 3. Private data management languages 4. Programming robots to assemble IKEA furniture
Master’s Projects Available
- 1. Crashes to exploits
- 2. Pointer analysis for JavaScript
- 3. Private data management languages
- 4. Programming robots to assemble IKEA furniture
- 5. Project in software security
- 6. Security vulnerabilities in web browsers
- 7. Toward auditable financial software
- 8. User tracking in mobile browsers
2
3
We are in the Idealized World of CFGs
t = x+y a = t t = x+y b = t c = t t = x+y a = t b = t c = t t = x+y b = t
Data Flow Equations
4
Dataflow Analysis
- Computes facts about values in the program
- Little or no interaction between facts
- Based on all paths through program
- Including, sometimes, infeasible paths
- Let’s consider some dataflow analyses…
5
Some Static Analysis Goals
- For example
- What can values can integer x have?
- What locations can pointer p point to?
- Can double y be negative?
- Can it assume value 17?
- etc.
- This is static reasoning – we are approximating
runtime execution here
6
Static vs. Runtime
i = 1; while(true){ i = i + 2; if(…) break; }
- How can we
approximate the possible values of i?
- What can we conclude
- n the basis of this
code?
7
i = 1; while(i < 1000){ i = i + 2; a = i*2; }
- How about now?
Examples of Dataflow Analysis
- We will cover three common types of
analysis
- Reaching definitions
- Available expressions
- Live variables
8
Reaching Definitions
9
Reaching Definitions
- We will start this discussion by talking about an
analysis called Reaching Definitions…
- A basic block can generate a definition
- A basic block can either
- Kill a definition of x if it surely redefines x
- Transmit a definition if it may not redefine the same
variable(s) as that definition
10
11
IN and OUT
The following sets are defined:
- IN(B)
= set of definitions reaching the beginning of block B
- OUT(B)
= set of definitions reaching the end of B
12
Equations
Two kinds of equations:
- Confluence equations: IN(B) in terms of OUTs of
predecessors of B
- Transfer equations: OUT(B) in terms of IN(B) and
what goes on in block B
13
Confluence Equations
IN(B) = ∪predecessors P of B OUT(P)
P2 B P1 {d1, d2, d3} {d2, d3} {d1, d2}
14
Transfer Equations
- Generate a definition in the block if its variable is
not definitely rewritten later in the basic block
- Kill a definition if its variable is definitely rewritten
in the block
- An internal definition may be both killed and
generated
Example: GEN and KILL
15
- For each basic
block B1, B2, B3 we can compute GEN and KILL sets independently
- These will be
part of the transfer function
16
Transfer Function for a Block
Connecting IN and OUT sets… For any block B:
OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B)
17
Iterative Solution --- (2)
IN(entry) = ∅; for each block do OUT(B)= ∅; while (changes occur) do for each block B do { IN(B) = ∪predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B); }
18
Iterative Solution to Equations
- For an n-block flow graph, there are 2*n equations
and 2*n unknowns.
- Alas, the solution is not unique.
- Standard theory assumes a field of constants; sets
are not a field.
- Use iterative solution to get the least fixedpoint.
- Identifies any def that might reach a point
Reaching Definitions: Algorithm in Action
19
d1: x = 5 if x == 10 d2: x = 15
B1 B3 B2
IN(B1) = {} OUT(B1) = { OUT(B2) = { OUT(B3) = { d1} IN(B2) = {d1, d1, IN(B3) = {d1, d2} d2} d2} d2}
A bit-vector representation for greater computational efficiency
20
Aside: Notice the Conservatism
- Not only the most conservative assumption about
when a def is KILLed or GEN’d
- Also the conservative assumption that any path in
the flow graph can actually be taken
- Also, this is a may analysis, not a must analysis
21
Available Expressions
22
23
Another Data-Flow Problem: Available Expressions
- An expression x+y is available at a point if no
matter what path has been taken to that point from the entry, x+y has been evaluated, and neither x nor y have even possibly been redefined
- Useful for global common-subexpression
elimination
Available expressions example
24
- Watch out for
things that are possibly KILLed by an assignment
2010 Stephen Chong, Harvard University
25
Defining GEN(B) and KILL(B)
- An expression x+y is generated if it is computed in
B, and afterwards there is no possibility that either x or y is redefined
- An expression x+y is killed if it is not generated in B
and either x or y is possibly redefined
26
Equations for Available Expressions
- The equations for AE are essentially the same as for
RD, with one exception
- Confluence of paths involves intersection of sets of
expressions rather than union of sets of definitions
- Available expressions is a forward must analysis
- Forward means that data facts flow from IN to OUT
- Must means that join points, only keep facts that hold
- n all paths that are joined
27
Example of GEN and KILL for Available Expressions
x = x+y z = a+b
Generates a+b Kills x+y, w*x, etc. Kills z-w, x+z, etc.
28
Transfer Equations
- Transfer equation is exactly the same as
before:
OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B)
- Which is good – we can use the same
template for all GEN/KILL problems
29
Confluence Equations
- Confluence involves intersection, because an
expression is available coming into a block if and
- nly if it is available coming out of each
predecessor
IN(B) = ∩predecessors P of B OUT(P)
30
Iterative Solution
IN(entry) = ∅; for each block B do OUT(B)= ALL; while (changes occur) do for each block B do { IN(B) = ∩predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B); }
31
Why It Works
- An expression x+y is unavailable at point p iff there
is a path from the entry to p that either:
1. Never evaluates x+y, or 2. Kills x+y after its last evaluation
- IN(entry) = ∅ takes care of #1 above
- OUT(B) = ALL, plus intersection during iteration
handles #2 above
32
Example of Why We Want Intersection
point p
Entry
x+y never gen’d x+y killed x+y never GEN’d
33
Subtle Point
- It is conservative to assume an expression isn’t
available, even if it is
- But we don’t have to be “insanely conservative”
- If after considering all paths, and assuming x+y killed by
any possibility of redefinition, we still can’t find a path explaining its unavailability, then x+y is available
- This is a delicate dance between soundness and
precision
How Would the Algorithm Change for A Backwards Analysis?
34
Live Variables
35
36
Live Variable Analysis
- Variable x is live at a point p if on some path from
p, x is used before it is redefined
- Useful in code generation: if x is not live on exit
from a basic block, there is no need to copy x from a register to memory
- Captures if there is a demand for a variable
37
Equations for Live Variables
- LV is essentially a “backwards” version of RD
- In place of GEN(B): Use(B) = set of variables x
possibly used in B prior to any certain definition of x
- In place of KILL(B): Def(B) = set of variables x
certainly defined before any possible use of x
38
Transfer Equations
- Transfer equations give IN’s in terms of OUT’s:
IN(B) = (OUT(B) – Def(B)) ∪ Use(B)
- This is a little different – the direction is reversed
39
Confluence Equations
- Confluence involves union over successors, so a
variable is in OUT(B) if it is live on entry to any of B’s successors.
OUT(B) = ∪successors S of B IN(S)
40
Iterative Solution for Live Variables
OUT(exit) = ∅; for each block B do IN(B)= ∅; while (changes occur) do for each block B do { OUT(B) = ∪successors S of B IN(S); IN(B) = (OUT(B) – Def(B)) ∪ Use(B); }
Data-Flow Frameworks
Lattice-Theoretic Formulation Meet-Over-Paths Solution Monotonicity/Distributivity
41
Data-Flow Analysis Frameworks
- Generalizes and unifies each of the DFA examples
from previous lecture.
- Important ingredients :
42
Element Symbol Explanation Direction D forward or backward Domain V (possible values for IN, OUT) Meet operator ∧ (effect of path confluence) Transfer functions F (effect of passing through a basic block)
43
Good News!
- All three analyses above fit the model
- RD’s: Forward, meet = union, transfer
functions based on GEN and KILL
- AE’s: Forward, meet = intersection,
transfer functions based on GEN and KILL
- LV’s: Backward, meet = union, transfer
functions based on USE and DEF
May vs. Must Analysis
May Must Forward Reaching definitions Available expressions Backward Live variables Very busy expressions
44
45
Semilattices
We stay that a set V and operation meet (denoted ∧) form a semilattice if for all x, y, and z in V:
1. x ∧ x = x (idempotence) 2. x ∧ y = y ∧ x (commutativity) 3. x ∧ (y ∧ z) = (x ∧ y) ∧ z (associativity ) 4. Top element ⊤ such that for all x, ⊤∧ x = x. 5. Bottom element (optional) ⊥ such that for all x: ⊥ ∧ x = ⊥
Available expressions (semi)lattice
46
In this example we have a+b, a+1, a*b as possible computations in this program
47
Example: Semilattice
- V = power set of some set (like previous
example)
- ∧ = union
- Union is idempotent, commutative, and associative
- What are the top and bottom elements?
48
Partial Order for a Semilattice
- Say x ≤ y iff x ∧ y = x
- Also, x < y iff x ≤ y and x ≠ y
- ≤ is really a partial order:
1. x ≤ y and y ≤ z imply x ≤ z (proof in the Dragon book) 2. x ≤ y and y ≤ x iff x = y.
Proof:
- x ∧ y = x and y ∧ x = y.
- Thus, x = x ∧ y = y ∧ x = y
49
Axioms for Transfer Functions
- Transfer function F includes the identity function
- Why needed? Constructions often require introduction
- f an empty block.
- 2. F is closed under composition.
- Why needed?
- The concatenation of two blocks is a block.
- Transfer function for a block can be constructed from
individual statements.
50
Example: Reaching Definitions
- Direction D = forward.
- Domain V = set of all sets of definitions in the flow
graph.
- ∧ = union.
- Functions F = all “gen-kill” functions of the form
f(x) = (x - K) ∪ G, where KILL and GEN are sets of definitions (members of V).
51
Example: Satisfies Axioms
- Union on a power set forms a semilattice
(idempotent, commutative, associative).
- Identity function: let K = G = ∅.
- Composition: A little algebra.
52
Example: Partial Order
- For RD’s, S ≤ T means S ∪ T = S.
- Equivalently S ⊇ T.
- Seems “backward,” but that’s what the definitions give
you
- Intuition: ≤ measures “ignorance.”
- The more definitions we know about, the less
ignorance we have.
- ⊤ = “total ignorance.”
53
DFA Frameworks
- (D, V, ∧, F)
- A flow graph, with an associated function fB in F for
each block B
- A boundary value vENTRY or vEXIT if D = forward or
backward, respectively.
54
Iterative Algorithm (Forward)
OUT[entry] = vENTRY; for (other blocks B) OUT[B] = ⊤; while (changes to any OUT) for (each block B) { IN(B) = ∧ predecessors P of B OUT(P); OUT(B) = fB(IN(B)); }
Iterative Algorithm (Backward)
Almost the same thing – just make a few changes:
- 1. Swap IN and OUT everywhere
- 2. Replace ENTRY by EXIT
55
GCC
- ptimizations
56
- Why does gcc
generate 15-20% faster code if I
- ptimize for size
instead of speed?
- http://stackoverflow
.com/questions/194 70873/why-does- gcc-generate-15-20- faster-code-if-i-
- ptimize-for-size-
instead-of-speed
Multiple Processors
57
- By default compilers optimize
for "average" processor. Since different processors favor different instruction sequences, compiler
- ptimizations enabled by -O2
might benefit average processor, but decrease performance on your particular processor (and the same applies to -Os).
- If you try the same example
- n different processors, you
will find that on some of them benefit from -O2 while other are more favorable to -Os
- ptimizations