CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects - - PowerPoint PPT Presentation

co444h
SMART_READER_LITE
LIVE PREVIEW

CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects - - PowerPoint PPT Presentation

Static analysis CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects Available 1. Crashes to exploits 2. Pointer analysis for JavaScript 3. Private data management languages 4. Programming robots to assemble IKEA furniture


slide-1
SLIDE 1

Static analysis Dataflow Dataflow frameworks

CO444H

Ben Livshits

slide-2
SLIDE 2

Master’s Projects Available

  • 1. Crashes to exploits
  • 2. Pointer analysis for JavaScript
  • 3. Private data management languages
  • 4. Programming robots to assemble IKEA furniture
  • 5. Project in software security
  • 6. Security vulnerabilities in web browsers
  • 7. Toward auditable financial software
  • 8. User tracking in mobile browsers

2

slide-3
SLIDE 3

3

We are in the Idealized World of CFGs

t = x+y a = t t = x+y b = t c = t t = x+y a = t b = t c = t t = x+y b = t

slide-4
SLIDE 4

Data Flow Equations

4

slide-5
SLIDE 5

Dataflow Analysis

  • Computes facts about values in the program
  • Little or no interaction between facts
  • Based on all paths through program
  • Including, sometimes, infeasible paths
  • Let’s consider some dataflow analyses…

5

slide-6
SLIDE 6

Some Static Analysis Goals

  • For example
  • What can values can integer x have?
  • What locations can pointer p point to?
  • Can double y be negative?
  • Can it assume value 17?
  • etc.
  • This is static reasoning – we are approximating

runtime execution here

6

slide-7
SLIDE 7

Static vs. Runtime

i = 1; while(true){ i = i + 2; if(…) break; }

  • How can we

approximate the possible values of i?

  • What can we conclude
  • n the basis of this

code?

7

i = 1; while(i < 1000){ i = i + 2; a = i*2; }

  • How about now?
slide-8
SLIDE 8

Examples of Dataflow Analysis

  • We will cover three common types of

analysis

  • Reaching definitions
  • Available expressions
  • Live variables

8

slide-9
SLIDE 9

Reaching Definitions

9

slide-10
SLIDE 10

Reaching Definitions

  • We will start this discussion by talking about an

analysis called Reaching Definitions…

  • A basic block can generate a definition
  • A basic block can either
  • Kill a definition of x if it surely redefines x
  • Transmit a definition if it may not redefine the same

variable(s) as that definition

10

slide-11
SLIDE 11

11

IN and OUT

The following sets are defined:

  • IN(B)

= set of definitions reaching the beginning of block B

  • OUT(B)

= set of definitions reaching the end of B

slide-12
SLIDE 12

12

Equations

Two kinds of equations:

  • Confluence equations: IN(B) in terms of OUTs of

predecessors of B

  • Transfer equations: OUT(B) in terms of IN(B) and

what goes on in block B

slide-13
SLIDE 13

13

Confluence Equations

IN(B) = ∪predecessors P of B OUT(P)

P2 B P1 {d1, d2, d3} {d2, d3} {d1, d2}

slide-14
SLIDE 14

14

Transfer Equations

  • Generate a definition in the block if its variable is

not definitely rewritten later in the basic block

  • Kill a definition if its variable is definitely rewritten

in the block

  • An internal definition may be both killed and

generated

slide-15
SLIDE 15

Example: GEN and KILL

15

  • For each basic

block B1, B2, B3 we can compute GEN and KILL sets independently

  • These will be

part of the transfer function

slide-16
SLIDE 16

16

Transfer Function for a Block

Connecting IN and OUT sets… For any block B:

OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B)

slide-17
SLIDE 17

17

Iterative Solution --- (2)

IN(entry) = ∅; for each block do OUT(B)= ∅; while (changes occur) do for each block B do { IN(B) = ∪predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B); }

slide-18
SLIDE 18

18

Iterative Solution to Equations

  • For an n-block flow graph, there are 2*n equations

and 2*n unknowns.

  • Alas, the solution is not unique.
  • Standard theory assumes a field of constants; sets

are not a field.

  • Use iterative solution to get the least fixedpoint.
  • Identifies any def that might reach a point
slide-19
SLIDE 19

Reaching Definitions: Algorithm in Action

19

d1: x = 5 if x == 10 d2: x = 15

B1 B3 B2

IN(B1) = {} OUT(B1) = { OUT(B2) = { OUT(B3) = { d1} IN(B2) = {d1, d1, IN(B3) = {d1, d2} d2} d2} d2}

slide-20
SLIDE 20

A bit-vector representation for greater computational efficiency

20

slide-21
SLIDE 21

Aside: Notice the Conservatism

  • Not only the most conservative assumption about

when a def is KILLed or GEN’d

  • Also the conservative assumption that any path in

the flow graph can actually be taken

  • Also, this is a may analysis, not a must analysis

21

slide-22
SLIDE 22

Available Expressions

22

slide-23
SLIDE 23

23

Another Data-Flow Problem: Available Expressions

  • An expression x+y is available at a point if no

matter what path has been taken to that point from the entry, x+y has been evaluated, and neither x nor y have even possibly been redefined

  • Useful for global common-subexpression

elimination

slide-24
SLIDE 24

Available expressions example

24

  • Watch out for

things that are possibly KILLed by an assignment

2010 Stephen Chong, Harvard University

slide-25
SLIDE 25

25

Defining GEN(B) and KILL(B)

  • An expression x+y is generated if it is computed in

B, and afterwards there is no possibility that either x or y is redefined

  • An expression x+y is killed if it is not generated in B

and either x or y is possibly redefined

slide-26
SLIDE 26

26

Equations for Available Expressions

  • The equations for AE are essentially the same as for

RD, with one exception

  • Confluence of paths involves intersection of sets of

expressions rather than union of sets of definitions

  • Available expressions is a forward must analysis
  • Forward means that data facts flow from IN to OUT
  • Must means that join points, only keep facts that hold
  • n all paths that are joined
slide-27
SLIDE 27

27

Example of GEN and KILL for Available Expressions

x = x+y z = a+b

Generates a+b Kills x+y, w*x, etc. Kills z-w, x+z, etc.

slide-28
SLIDE 28

28

Transfer Equations

  • Transfer equation is exactly the same as

before:

OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B)

  • Which is good – we can use the same

template for all GEN/KILL problems

slide-29
SLIDE 29

29

Confluence Equations

  • Confluence involves intersection, because an

expression is available coming into a block if and

  • nly if it is available coming out of each

predecessor

IN(B) = ∩predecessors P of B OUT(P)

slide-30
SLIDE 30

30

Iterative Solution

IN(entry) = ∅; for each block B do OUT(B)= ALL; while (changes occur) do for each block B do { IN(B) = ∩predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B); }

slide-31
SLIDE 31

31

Why It Works

  • An expression x+y is unavailable at point p iff there

is a path from the entry to p that either:

1. Never evaluates x+y, or 2. Kills x+y after its last evaluation

  • IN(entry) = ∅ takes care of #1 above
  • OUT(B) = ALL, plus intersection during iteration

handles #2 above

slide-32
SLIDE 32

32

Example of Why We Want Intersection

point p

Entry

x+y never gen’d x+y killed x+y never GEN’d

slide-33
SLIDE 33

33

Subtle Point

  • It is conservative to assume an expression isn’t

available, even if it is

  • But we don’t have to be “insanely conservative”
  • If after considering all paths, and assuming x+y killed by

any possibility of redefinition, we still can’t find a path explaining its unavailability, then x+y is available

  • This is a delicate dance between soundness and

precision

slide-34
SLIDE 34

How Would the Algorithm Change for A Backwards Analysis?

34

slide-35
SLIDE 35

Live Variables

35

slide-36
SLIDE 36

36

Live Variable Analysis

  • Variable x is live at a point p if on some path from

p, x is used before it is redefined

  • Useful in code generation: if x is not live on exit

from a basic block, there is no need to copy x from a register to memory

  • Captures if there is a demand for a variable
slide-37
SLIDE 37

37

Equations for Live Variables

  • LV is essentially a “backwards” version of RD
  • In place of GEN(B): Use(B) = set of variables x

possibly used in B prior to any certain definition of x

  • In place of KILL(B): Def(B) = set of variables x

certainly defined before any possible use of x

slide-38
SLIDE 38

38

Transfer Equations

  • Transfer equations give IN’s in terms of OUT’s:

IN(B) = (OUT(B) – Def(B)) ∪ Use(B)

  • This is a little different – the direction is reversed
slide-39
SLIDE 39

39

Confluence Equations

  • Confluence involves union over successors, so a

variable is in OUT(B) if it is live on entry to any of B’s successors.

OUT(B) = ∪successors S of B IN(S)

slide-40
SLIDE 40

40

Iterative Solution for Live Variables

OUT(exit) = ∅; for each block B do IN(B)= ∅; while (changes occur) do for each block B do { OUT(B) = ∪successors S of B IN(S); IN(B) = (OUT(B) – Def(B)) ∪ Use(B); }

slide-41
SLIDE 41

Data-Flow Frameworks

Lattice-Theoretic Formulation Meet-Over-Paths Solution Monotonicity/Distributivity

41

slide-42
SLIDE 42

Data-Flow Analysis Frameworks

  • Generalizes and unifies each of the DFA examples

from previous lecture.

  • Important ingredients :

42

Element Symbol Explanation Direction D forward or backward Domain V (possible values for IN, OUT) Meet operator ∧ (effect of path confluence) Transfer functions F (effect of passing through a basic block)

slide-43
SLIDE 43

43

Good News!

  • All three analyses above fit the model
  • RD’s: Forward, meet = union, transfer

functions based on GEN and KILL

  • AE’s: Forward, meet = intersection,

transfer functions based on GEN and KILL

  • LV’s: Backward, meet = union, transfer

functions based on USE and DEF

slide-44
SLIDE 44

May vs. Must Analysis

May Must Forward Reaching definitions Available expressions Backward Live variables Very busy expressions

44

slide-45
SLIDE 45

45

Semilattices

We stay that a set V and operation meet (denoted ∧) form a semilattice if for all x, y, and z in V:

1. x ∧ x = x (idempotence) 2. x ∧ y = y ∧ x (commutativity) 3. x ∧ (y ∧ z) = (x ∧ y) ∧ z (associativity ) 4. Top element ⊤ such that for all x, ⊤∧ x = x. 5. Bottom element (optional) ⊥ such that for all x: ⊥ ∧ x = ⊥

slide-46
SLIDE 46

Available expressions (semi)lattice

46

In this example we have a+b, a+1, a*b as possible computations in this program

slide-47
SLIDE 47

47

Example: Semilattice

  • V = power set of some set (like previous

example)

  • ∧ = union
  • Union is idempotent, commutative, and associative
  • What are the top and bottom elements?
slide-48
SLIDE 48

48

Partial Order for a Semilattice

  • Say x ≤ y iff x ∧ y = x
  • Also, x < y iff x ≤ y and x ≠ y
  • ≤ is really a partial order:

1. x ≤ y and y ≤ z imply x ≤ z (proof in the Dragon book) 2. x ≤ y and y ≤ x iff x = y.

Proof:

  • x ∧ y = x and y ∧ x = y.
  • Thus, x = x ∧ y = y ∧ x = y
slide-49
SLIDE 49

49

Axioms for Transfer Functions

  • Transfer function F includes the identity function
  • Why needed? Constructions often require introduction
  • f an empty block.
  • 2. F is closed under composition.
  • Why needed?
  • The concatenation of two blocks is a block.
  • Transfer function for a block can be constructed from

individual statements.

slide-50
SLIDE 50

50

Example: Reaching Definitions

  • Direction D = forward.
  • Domain V = set of all sets of definitions in the flow

graph.

  • ∧ = union.
  • Functions F = all “gen-kill” functions of the form

f(x) = (x - K) ∪ G, where KILL and GEN are sets of definitions (members of V).

slide-51
SLIDE 51

51

Example: Satisfies Axioms

  • Union on a power set forms a semilattice

(idempotent, commutative, associative).

  • Identity function: let K = G = ∅.
  • Composition: A little algebra.
slide-52
SLIDE 52

52

Example: Partial Order

  • For RD’s, S ≤ T means S ∪ T = S.
  • Equivalently S ⊇ T.
  • Seems “backward,” but that’s what the definitions give

you

  • Intuition: ≤ measures “ignorance.”
  • The more definitions we know about, the less

ignorance we have.

  • ⊤ = “total ignorance.”
slide-53
SLIDE 53

53

DFA Frameworks

  • (D, V, ∧, F)
  • A flow graph, with an associated function fB in F for

each block B

  • A boundary value vENTRY or vEXIT if D = forward or

backward, respectively.

slide-54
SLIDE 54

54

Iterative Algorithm (Forward)

OUT[entry] = vENTRY; for (other blocks B) OUT[B] = ⊤; while (changes to any OUT) for (each block B) { IN(B) = ∧ predecessors P of B OUT(P); OUT(B) = fB(IN(B)); }

slide-55
SLIDE 55

Iterative Algorithm (Backward)

Almost the same thing – just make a few changes:

  • 1. Swap IN and OUT everywhere
  • 2. Replace ENTRY by EXIT

55

slide-56
SLIDE 56

GCC

  • ptimizations

56

  • Why does gcc

generate 15-20% faster code if I

  • ptimize for size

instead of speed?

  • http://stackoverflow

.com/questions/194 70873/why-does- gcc-generate-15-20- faster-code-if-i-

  • ptimize-for-size-

instead-of-speed

slide-57
SLIDE 57

Multiple Processors

57

  • By default compilers optimize

for "average" processor. Since different processors favor different instruction sequences, compiler

  • ptimizations enabled by -O2

might benefit average processor, but decrease performance on your particular processor (and the same applies to -Os).

  • If you try the same example
  • n different processors, you

will find that on some of them benefit from -O2 while other are more favorable to -Os

  • ptimizations