Special topics on binarylevel program analysis: More on Static - - PowerPoint PPT Presentation

special topics on binary level program analysis more on
SMART_READER_LITE
LIVE PREVIEW

Special topics on binarylevel program analysis: More on Static - - PowerPoint PPT Presentation

Special topics on binarylevel program analysis: More on Static Analysis Gang Tan CSE 597 Spring 2019 Penn State University 1 ITERATION ALGORITHMS 2 Chaotic Iteration Suppose there are n equations in total RD j = F j ( RD 1 ,


slide-1
SLIDE 1

Special topics on binary‐level program analysis: More on Static Analysis

Gang Tan

CSE 597 Spring 2019 Penn State University

1

slide-2
SLIDE 2

ITERATION ALGORITHMS

2

slide-3
SLIDE 3

Chaotic Iteration

  • Suppose there are n equations in total

– RDj = Fj (RD1, …, RDn), 1 ≤ j ≤ n

For all j, RDj := ∅ while RDj  Fj(RD1, …, RDn) for some j do

RDj := Fj (RD1, …, RDn)

3

slide-4
SLIDE 4

Example

  • [x:=1]1; (while [y>0]2 do [x:=x-1]3); [x:=2]4
  • Equations

– RDentry(1) = {(x,?), (y,?)} – RDentry(2) = RDexit(1) ∪ RDexit(3) – RDentry(3) = RDexit(2) – RDentry(4) = RDexit(2) – RDexit(1) = (RDentry(1) \ {(x,l)}) ∪ {(x,1)} – RDexit(2) = RDentry(2) – RDexit(3) = (RDentry(3) \ {(x,l)}) ∪ {(x,3)} – RDexit(4) = (RDentry(4) \ {(x,l)}) ∪ {(x,4)}

4

slide-5
SLIDE 5

Work‐list Algorithm for Reaching Definitions

  • dep(j) = {k | RDk depends on RDj}

– That is, if RDj changes, then RDk will change too; things that depend

  • n RDj

W ← {1, 2,… , n}; For all j, RDj := ∅; while W  ∅ do { Remove a number j from W; If RDj  Fj(RD1, …, RDn) { RDj  Fj(RD1, …, RDn); W = W ∪ dep(j) } }

5

slide-6
SLIDE 6

Example

  • [x:=1]1; (while [y>0]2 do [x:=x-1]3); [x:=2]4
  • Equations

– dep(1n) = {1x}; dep(2n) = {2x}; dep(3n) = {3x}; dep(4n) = {4x}; – dep(1x) = {2n}; dep(2x) = {3n, 4n}; dep(3x) = {2n}; dep(4x) = { };

6

slide-7
SLIDE 7

Example

  • [x:=1]1; (while [y>0]2 do [x:=x-1]3); [x:=2]4
  • Solution

– RDentry(1) = {(x,?), (y,?)} – RDentry(2) = {(x,1), (x,3), (y,?)} – RDentry(3) = {(x,1), (x,3), (y,?)} – RDentry(4) = {(x,1), (x,3), (y,?)} – RDexit(1) = {(x,1), (y,?)} – RDexit(2) = {(x,1), (x,3), (y,?)} – RDexit(3) = {(x,3), (y,?)} – RDexit(4) = {(x,4), (y,?)}

7

slide-8
SLIDE 8

COMPLETE LATTICE

8

slide-9
SLIDE 9

Foundation of Static Analysis: Fixed Point Theory of Complete Lattice

  • A partial order is a mathematical

structure: L = (S, v)

– S is a set; v is a binary relation on S – Reflexive: ∀ x ∈ S. x v x – Transitive:

  • ∀x,y,z ∈ S. x v y ∧ y v z → x v z

– Anti‐symmetric

  • ∀x,y ∈ S. x v y ∧ y v x → x = y

9

slide-10
SLIDE 10

Partial Order

  • Examples

– (N, ≤) – (N, ≥) – (P(A), ⊆) – (P(A), ⊇)

  • Partial order diagrams

10

slide-11
SLIDE 11

Upper bound and lower bound

  • y is an upper bound for X, if ∀x ∈ X: x v y
  • tX is the least upper bound of X

– Called the join operator

  • uX is the greatest lower bound of X

– Called the meet operator

  • L = (S, v) is a complete lattice if

– It is a partial order, and – tX and uX exist for every X ⊆ S

  • > stands for the greatest element
  • ⊥ stands for the least element

11

slide-12
SLIDE 12

INTERPROCEDURAL ANALYSIS

12

slide-13
SLIDE 13

Interprocedural CFGs

void main() { x := 7; r := p(x); x := r; z := p(x + 10); } int p(int a) { y := a+2; return y; }

13

y:=a+2 ret y x:=7 call p(x) r:= ret p(x) x:=r call p(x+10) z:= ret p(x+10)

slide-14
SLIDE 14

One Idea for Interprocedural Analysis

  • Ignore the differences between inter and

intra‐procedural edges

– Conflate them into one kind of edges – Context‐insensitive interprocedural analysis

  • Introduce a lot of imprecision

– Because of many invalid paths

14

slide-15
SLIDE 15

Conflating Intra and Inter Edges

void main() { x := 7; r := p(x); x := r; z := p(x + 10); } int p(int a) { y := a+2; return y; }

15

y:=a+2 ret y x:=7 call p(x) r:= ret p(x) x:=r call p(x+10) z:= ret p(x+10)

{x:7} {r:T} {x:T} {z:T} {a:T} {y:T}

slide-16
SLIDE 16

Invalid Paths

  • Information about all call sites are merged

– Loss of precision – Put it in another way, it considers “the worst case” when calls and returns do not match

  • When returns return to nonmatching call sites
  • One Easy Fix: Inlining function calls

– Essentially use a new copy of the function whenever it’s called – So that different calls don’t mix information together

16

slide-17
SLIDE 17

Inlining for the Example

17

void main() { x := 7; r := p1(x); x := r; z := p2(x + 10); } int p1(int a) { y := a+2; return y; } int p2(int a) { y := a+2; return y; }

{a:7} {y:9} {a:19} {y:21}

slide-18
SLIDE 18

Problem with Inlining?

  • Code/CFG blow‐up

– Can be exponential in the worst case

void p1() { p2(); p2(); } Void p2() { p3(); p3(); } void p3() { p4(); p4(); }

  • Cannot deal with recursion

void p1() { … p1() … }

18

slide-19
SLIDE 19

Context Sensitivity

  • Group calls into a finite number of contexts

– Label information using contexts so that information related to different contexts do not mix – For a context, analyze the callee function w.r.t that context

  • Common contexts

– Call‐site stack of a finite size k

  • also called the call‐string context

– Let k=1, then interprocedural constant propagation computes information like this:

  • (1, {x:2, y:T}), (2, {x:T, y:3})

19

slide-20
SLIDE 20

Size‐one Call‐String Contexts

void main() { x := 7; r := p(x); x := r; z := p(x + 10); } int p(int a) { y := a+2; return y; }

20

y:=a+2 ret y x:=7 call p(x) r:= ret p(x) x:=r call p(x+10) z:= ret p(x+10)

(‐,{x:7}) (‐, {r:9}) (‐, {x:9}) (‐, {z:21}) (1, {a:7}), (2,{a:19})

1 2

(1, {y:9}), (2,{y:21})

slide-21
SLIDE 21

Call‐String Contexts of Various Sizes

void main() { 1: fib(7); } int fib(int n) { if n <= 1 x := 0 else { 2: y := fib(n‐1); 3: z := fib(n‐2); x:= y+z; } return x; }

21

Size‐one call strings: ‐; 1; 2; 3; Size‐two call strings: ‐; 1::‐; 2::1; 3::1; 2::2; 3::2; 2::3; 3::3

slide-22
SLIDE 22

Other Kinds of Contexts

  • Assumption sets

– What states at the call site? – Example paper: “ESP: path‐sensitive program verification in polynomial time”

  • Caller stack

– The stack of caller functions – Less precise than call‐site stack (2::3 versus fib::fib)

  • OO programs

– Object sensitivity

22

slide-23
SLIDE 23

MISC.

23

slide-24
SLIDE 24

Flow Sensitivity

  • Dataflow analysis is flow sensitive

– Take into account the order of statements – E.g., “x:=1; y:=x” would get different liveness analysis result from “y:=x; x:=1”

  • A flow‐insensitive analysis

– Do not consider the order of statements – E.g., a simple analysis that collects all the constants used in the program is flow‐insensitive

  • “x:=1; y:=x” would produce {1}, so is “y:=x; x:=1”

24

slide-25
SLIDE 25

Path Sensitivity

  • Dataflow analysis is path insensitive

25

AV AV AV

slide-26
SLIDE 26

Path Sensitive Analysis

  • Example

if (x>1) {t1= x+y} else {t2 = x‐y}; if (x>1) {u = (x+y) –z}

  • By conventional available expression analysis,

“x+y” is not available

  • Path sensitive analysis

– Associating information with edges – At the end of “if (x>1) …”

  • {(x>1, x+y), (x<=1, x‐y)}

– Then “x+y” is available inside the second branch

26

Is “x+y” available here?

slide-27
SLIDE 27

Analyzing the Heap

  • The heap poses a major challenge for static

analysis

– Many static analysis disregard the heap completely – Source of false positives and false negatives

27

slide-28
SLIDE 28

Pointer Analysis, Points‐to Analysis, Alias Analysis

  • Example:

int x = 3, y = 4; int *p = &x; int t = x + y; *p = 5; if (x+y > 10) {…}

28

Is “x+y” available here?

No! x was modified through its alias *p

slide-29
SLIDE 29

Shape Analysis

  • Dataflow analysis

– Good at analyzing atomic values: labels, constants, variable names – Cannot easily extend to data structures in the heap: arrays, trees, lists, …

  • Shape analysis can analyze the shapes of data

structures

– A very active research area

29