special topics on binary level program analysis more on
play

Special topics on binarylevel program analysis: More on Static - PowerPoint PPT Presentation

Special topics on binarylevel program analysis: More on Static Analysis Gang Tan CSE 597 Spring 2019 Penn State University 1 ITERATION ALGORITHMS 2 Chaotic Iteration Suppose there are n equations in total RD j = F j ( RD 1 ,


  1. Special topics on binary‐level program analysis: More on Static Analysis Gang Tan CSE 597 Spring 2019 Penn State University 1

  2. ITERATION ALGORITHMS 2

  3. Chaotic Iteration • Suppose there are n equations in total – RD j = F j ( RD 1 , …, RD n ), 1 ≤ j ≤ n For all j, RD j := ∅ while RD j  F j (RD 1 , …, RD n ) for some j do RD j := F j (RD 1 , …, RD n ) 3

  4. Example • [x:= 1] 1 ; (while [ y>0] 2 do [x:= x-1] 3 ); [x:= 2] 4 • Equations – RD entry (1) = {(x,?), (y,?)} – RD entry (2 ) = RD exit (1 ) ∪ RD exit (3 ) – RD entry (3 ) = RD exit (2 ) – RD entry (4 ) = RD exit (2 ) – RD exit (1) = (RD entry (1) \ {(x,l)}) ∪ {(x,1)} – RD exit (2) = RD entry (2) – RD exit (3) = (RD entry (3) \ {(x,l)}) ∪ {(x,3)} – RD exit (4) = (RD entry (4) \ {(x,l)}) ∪ {(x,4)} 4

  5. Work‐list Algorithm for Reaching Definitions • dep(j) = {k | RD k depends on RD j } – That is, if RD j changes, then RD k will change too; things that depend on RD j W ← {1, 2,… , n}; For all j, RD j := ∅ ; while W  ∅ do { Remove a number j from W; If RD j  F j (RD 1 , …, RD n ) { RD j  F j (RD 1 , …, RD n ); W = W ∪ dep(j) } } 5

  6. Example • [x:= 1] 1 ; (while [ y>0] 2 do [x:= x-1] 3 ); [x:= 2] 4 • Equations – dep(1n) = {1x}; dep(2n) = {2x}; dep(3n) = {3x}; dep(4n) = {4x}; – dep(1x) = {2n}; dep(2x) = {3n, 4n}; dep(3x) = {2n}; dep(4x) = { }; 6

  7. Example • [x:= 1] 1 ; (while [ y>0] 2 do [x:= x-1] 3 ); [x:= 2] 4 • Solution – RD entry (1) = {(x,?), (y,?)} – RD entry (2) = {(x,1), (x,3), (y,?)} – RD entry (3) = {(x,1), (x,3), (y,?)} – RD entry (4) = {(x,1), (x,3), (y,?)} – RD exit (1) = {(x,1), (y,?)} – RD exit (2) = {(x,1), (x,3), (y,?)} – RD exit (3) = {(x,3), (y,?)} – RD exit (4) = {(x,4), (y,?)} 7

  8. COMPLETE LATTICE 8

  9. Foundation of Static Analysis: Fixed Point Theory of Complete Lattice • A partial order is a mathematical structure: L = (S, v ) – S is a set; v is a binary relation on S – Reflexive: ∀ x ∈ S. x v x – Transitive: • ∀ x,y,z ∈ S. x v y ∧ y v z → x v z – Anti‐symmetric • ∀ x,y ∈ S. x v y ∧ y v x → x = y 9

  10. Partial Order • Examples – (N, ≤) – (N, ≥) – (P(A), ⊆ ) – (P(A), ⊇ ) • Partial order diagrams 10

  11. Upper bound and lower bound • y is an upper bound for X, if ∀ x ∈ X: x v y • t X is the least upper bound of X – Called the join operator • u X is the greatest lower bound of X – Called the meet operator • L = (S, v ) is a complete lattice if – It is a partial order, and – t X and u X exist for every X ⊆ S • > stands for the greatest element • ⊥ stands for the least element 11

  12. INTERPROCEDURAL ANALYSIS 12

  13. Interprocedural CFGs void main() { x:=7 x := 7; r := p(x); y:=a+2 call p(x) x := r; r:= ret p(x) z := p(x + 10); } ret y x:=r int p(int a) { y := a+2; call p(x+10) return y; z:= ret p(x+10) } 13

  14. One Idea for Interprocedural Analysis • Ignore the differences between inter and intra‐procedural edges – Conflate them into one kind of edges – Context‐insensitive interprocedural analysis • Introduce a lot of imprecision – Because of many invalid paths 14

  15. Conflating Intra and Inter Edges void main() { x:=7 x := 7; {x:7} r := p(x); {a:T} y:=a+2 call p(x) x := r; r:= ret p(x) z := p(x + 10); {r:T} } ret y {y:T} x:=r int p(int a) { {x:T} y := a+2; call p(x+10) return y; z:= ret p(x+10) } {z:T} 15

  16. Invalid Paths • Information about all call sites are merged – Loss of precision – Put it in another way, it considers “the worst case” when calls and returns do not match • When returns return to nonmatching call sites • One Easy Fix: Inlining function calls – Essentially use a new copy of the function whenever it’s called – So that different calls don’t mix information together 16

  17. Inlining for the Example {a:7} void main() { int p1(int a) { x := 7; y := a+2; {y:9} return y; r := p1(x); } x := r; z := p2(x + 10); {a:19} int p2(int a) { } y := a+2; {y:21} return y; } 17

  18. Problem with Inlining? • Code/CFG blow‐up – Can be exponential in the worst case void p1() { p2(); p2(); } Void p2() { p3(); p3(); } void p3() { p4(); p4(); } • Cannot deal with recursion void p1() { … p1() … } 18

  19. Context Sensitivity • Group calls into a finite number of contexts – Label information using contexts so that information related to different contexts do not mix – For a context, analyze the callee function w.r.t that context • Common contexts – Call‐site stack of a finite size k • also called the call‐string context – Let k=1, then interprocedural constant propagation computes information like this: • (1, {x:2, y:T}), (2, {x:T, y:3}) 19

  20. Size‐one Call‐String Contexts void main() { x:=7 x := 7; (1, {a:7}), (2,{a:19}) (‐,{x:7}) r := p(x); y:=a+2 1 call p(x) x := r; r:= ret p(x) z := p(x + 10); (‐, {r:9}) } ret y x:=r int p(int a) { (1, {y:9}), (‐, {x:9}) y := a+2; (2,{y:21}) 2 call p(x+10) return y; z:= ret p(x+10) } (‐, {z:21}) 20

  21. Call‐String Contexts of Various Sizes void main() { Size‐one call strings: 1: fib(7); ‐; 1; 2; 3; } int fib(int n) { if n <= 1 Size‐two call strings: x := 0 else { ‐; 1::‐; 2::1; 3::1; 2: y := fib(n‐1); 2::2; 3::2; 2::3; 3::3 3: z := fib(n‐2); x:= y+z; } return x; } 21

  22. Other Kinds of Contexts • Assumption sets – What states at the call site? – Example paper: “ESP: path‐sensitive program verification in polynomial time” • Caller stack – The stack of caller functions – Less precise than call‐site stack (2::3 versus fib::fib) • OO programs – Object sensitivity 22

  23. MISC. 23

  24. Flow Sensitivity • Dataflow analysis is flow sensitive – Take into account the order of statements – E.g., “x:=1; y:=x” would get different liveness analysis result from “y:=x; x:=1” • A flow‐insensitive analysis – Do not consider the order of statements – E.g., a simple analysis that collects all the constants used in the program is flow‐insensitive • “x:=1; y:=x” would produce {1}, so is “y:=x; x:=1” 24

  25. Path Sensitivity • Dataflow analysis is path insensitive AV AV AV 25

  26. Path Sensitive Analysis • Example if (x>1) {t1= x+y} else {t2 = x‐y}; Is “x+y” available here? if (x>1) {u = (x+y) –z} • By conventional available expression analysis, “x+y” is not available • Path sensitive analysis – Associating information with edges – At the end of “if (x>1) …” • {(x>1, x+y), (x<=1, x‐y)} – Then “x+y” is available inside the second branch 26

  27. Analyzing the Heap • The heap poses a major challenge for static analysis – Many static analysis disregard the heap completely – Source of false positives and false negatives 27

  28. Pointer Analysis, Points‐to Analysis, Alias Analysis • Example: int x = 3, y = 4; int *p = &x; Is “x+y” available here? int t = x + y; *p = 5; if (x+y > 10) {…} No! x was modified through its alias *p 28

  29. Shape Analysis • Dataflow analysis – Good at analyzing atomic values: labels, constants, variable names – Cannot easily extend to data structures in the heap: arrays, trees, lists, … • Shape analysis can analyze the shapes of data structures – A very active research area 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend