fundamentals of program analysis generation of linear prg
play

Fundamentals of Program Analysis + Generation of Linear Prg. - PowerPoint PPT Presentation

Fundamentals of Program Analysis + Generation of Linear Prg. Invariants Markus Mller-Olm Westflische Wilhelms-Universitt Mnster, Germany 2nd Tutorial of SPP RS3: Reliably Secure Software Systems Schloss Buchenau, September 3-6, 2012


  1. A Lattice for Constant Propagation An order ⊑ on Z ∪� { ⊤ }: ⊤ „unknown value“ . . . -2 -1 0 1 2 . . . ⊥ { ℤ } { } ⊤ = ρ ρ Var → ∪ ∪ L | : ( { }) CP G ⊑  ⊑ � ⊥ ⊥ ⊥ ( ) ρ ρ ⇔ ρ = ∨ ρ ≠ ∧ ρ ≠ ∧ ∀ ∈ Var ρ ρ ' : ' x : ( ) x '( ) x CP G ⊑ Remark : (L , is a complete l att ice . ) CP CP �������������������������������������������������������������������������������������������� ���!"� "%

  2. Constant Propagation ρ ρ ρ ( ( ), ( ), ( )) x y z ( ⊤ , ⊤ , ⊤ ) 1 x:=2 x:=3 (3, ⊤ , ⊤ ) (2, ⊤ , ⊤ ) 2 5 y:=3 y:=2 (3,2, ⊤ ) (2,3, ⊤ ) 3 6 z:=x+y z:=x+y 4 7 (2,3,5) (3,2,5) 8 ( ⊤ , ⊤ ,5) �������������������������������������������������������������������������������������������� ���!"� �!

  3. Specifying Constant Propagation by a Constraint System Sei G = ( N,E,st,te ) ein Flussgraph über BA std . Compute (smallest) solution over ( L , ⊑ ) = (L CP , ⊑ CP ) of: ⊒ V st [ ] init , for st , the start note ⊒ V v [ ] f V u ( [ ]), for each edge e = ( , , ) u s v e where: init = ⊤ CP ∈ L CP is the mapping ⊤ CP ( x ) = ⊤ and f e : L CP → L CP is defined by ρ { � t � CP ( ρ ) / x} , if e = (u,x:=t,v) und ρ ≠ ⊥ f e ( ρ ) = df ρ , otherwise �������������������������������������������������������������������������������������������� ���!"� �"

  4. Specifying Constant Propagation by a Constraint System Remarks : Again, every solution is „correct“ (whatever this means). 1. Again, the smallest solution is called MFP-solution ; 2. it comprises a value MFP[u] ∈ L for each program point u. The MFP-solution is the most precise one. 3. �������������������������������������������������������������������������������������������� ���!"� �'

  5. Backwards vs. Forward Analyses Live Variables Analysis is a Backwards Analysis , i.e.: analysis info flows from target node to source node of an edge � the initial inequality is for the termination node of the flow graph � ⊒ A te [ ] init , for te , the termination point ⊒ A u [ ] f A v ( [ ]), for each edge e = ( , , ) u s v ∈ E e Dually, constant propagation is a Forward Analyses i.e..: analysis info flows from source node to target node of an edge. � the initial inequality is for the start node of the flow graph � ⊒ A st [ ] init , for st ,the start node ⊒ A v [ ] f A u ( [ ]), for each edge e = ( , , ) u s v ∈ E e Other examples: reaching definitions, available expressions, ... �������������������������������������������������������������������������������������������� ���!"� �

  6. Monotone Data-Flow Problems Goal: A generic notion that captures what is common for different analyses � Advantages: Study general properties of data flow problems independently of concrete � analysis questions Build efficient, generic implementations � �������������������������������������������������������������������������������������������� ���!"� �#

  7. Monotone Data-Flow Problems Definition: A monotone data-flow problem is a tuple P = (( L , ⊑ ),F,(N,E),st,init) consisting of: a complete lattice ( L , ⊑ ). � The elements of L are called (data-flow facts). a set F of transfer functions f : L → L, such that: � � each f ∈ F is monotone: ∀ x,y ∈ L : x ⊑ y ⇒ f(x) ⊑ f(y) � id ∈ F � F is closed under composition: ∀ f,g ∈ F : f ◦ g ∈ F . A graph ( N, E ) with a finite set of edges N; � each node of the graph is annotated with a transfer function f ∈ F : E ⊆ N × F × N . st ∈ N is a designated initial node. � init ∈ L is a designated initial information. � �������������������������������������������������������������������������������������������� ���!"� �$

  8. Constraint System for a Data-Flow Problem Let P = (( L , ⊑ ),F,(N,E),u 0 ,init) be a data-flow problem Compute (smallest) solution over ( L , ⊑ ) of the followi constraint system: ⊒ A st [ ] init, for st, the start node ⊒ A v [ ] f A u ( [ ]), for each node e = ( , , ) u f v ∈ E Note: Here, information flows from nodes to their successor nodes only. Hence, for backwards analyses the direction of the edges must be reversed when mapping it to the corresponding data-flow problem. �������������������������������������������������������������������������������������������� ���!"� �!

  9. Constraint System for a Data-Flow Problem Remarks : Again, every solution is „correct“ (whatever this means). 1. Again, the smallest solution is called MFP-solution ; 2. it comprises a value MFP[u] ∈ L for each program point u. The MFP-solution is the most precise one. 3. �������������������������������������������������������������������������������������������� ���!"� �"

  10. Three Questions � Do (smallest) solutions always exist ? � How to compute the (smallest) solution ? � How to justify that a solution is what we want ? �������������������������������������������������������������������������������������������� ���!"� �

  11. Three Questions � Do (smallest) solutions always exist ? How to compute the ( smallest ) solution ? � How How to to compute compute the the ( (smallest smallest) ) solution solution ? ? � � How to justify that a solution is what we want ? � How How to to justify justify that that a a solution solution is is what what we we want want ? ? � � �������������������������������������������������������������������������������������������� ���!"� �#

  12. Knaster-Tarski Fixpoint Theorem Definitions: Let ( L , ⊑ ) be a partial order. � f : L → L is monotonic iff ∀ x,y ∈ L : x ⊑ y ⇒ f ( x ) ⊑ f ( y ). � x ∈ L is a fixpoint of f iff f ( x )= x . Fixpoint Theorem of Knaster-Tarski: Every monotonic function f on a complete lattice L has a least fixpoint lfp( f ) and a greatest fixpoint gfp( f ). More precisely , lfp( f ) = ⊓ { x ∈� L | f ( x ) ⊑ x } least pre-fixpoint gfp( f ) = ⊔ { x ∈� L | x ⊑ f ( x ) } greatest post-fixpoint

  13. Knaster-Tarski Fixpoint Theorem ⊤ ⊤ ⊤ ⊤ L: pre-fixpoints of f gfp(f) fixpoints of f lfp(f) post-fixpoints of f ⊥ ⊥ ⊥ ⊥ Picture from: Nielson/Nielson/Hankin, Principles of Program Analysis

  14. Smallest Solutions Always Exist � Define functional F : L n → L n from right hand sides of constraints such that: σ solution of constraint system iff σ pre-fixpoint of F � � Functional F is monotonic. � By Knaster-Tarski Fixpoint Theorem: F has a least fixpoint which equals its least pre-fixpoint. � ☺ ☺ ☺ ☺ �������������������������������������������������������������������������������������������� ���!"� '!

  15. Three Questions Do ( smallest ) solutions always exist ? � Do ( Do (smallest smallest) ) solutions solutions always always exist exist ? ? � � � How to compute the (smallest) solution ? How to justify that a solution is what we want ? � How How to to justify justify that that a a solution solution is is what what we we want want ? ? � � �������������������������������������������������������������������������������������������� ���!"� '"

  16. Workset-Algorithm W = ∅ ; { } forall ( v ) { A v [ ] =⊥ ; W = W ∪ v ; } program points A st [ ] = init ; while W ≠ ∅ { u = Extract W ( ); forall ( , s v with e = ( , , ) u s v ) { edge t = f A u ( [ ]); e ⊑ if ¬ ( t A v [ ]) { ⊔ A v [ ] = A v [ ] t ; { } W = W ∪ v ; } } } �������������������������������������������������������������������������������������������� ���!"� '�

  17. Invariants of the Main Loop ⊑� a) A u [ ] MFP[ ] u f.a. prg. points u ⊒� b1) A st [ ] init ⇒ ⊒ b2) u ∉ W [ ] A v f ( [ ]) f.a. edges A u e = ( , , ) u s v e If and when workset algorithm terminates: is a solution of the constraint system by b1)&b2) A ⇒ ⊒�� [ ] A u MFP u [ ] f.a. u ☺ ☺ ☺ ☺ �� Hence, with a): [ ] A u = MFP u [ ] f.a. u �������������������������������������������������������������������������������������������� ���!"� '�

  18. How to Guarantee Termination � Lattice ( L , ⊑ ) has finite heights ⇒ algorithm terminates after at most #prg points � (heights(L)+1) iterations of main loop � Lattice ( L , ⊑ ) has no infinite ascending chains ⇒ algorithm terminates � Lattice ( L , ⊑ ) has infinite ascending chains: ⇒ algorithm may not terminate; use widening operators in order to enforce termination �������������������������������������������������������������������������������������������� ���!"� ''

  19. [Cousot/Cousot] Widening Operator ▽ : L × L → L is called a widening operator iff 1) ∀ x,y ∈ L : x ⊔ y ⊑ x ▽ y 2) for all sequences ( l n ) n , the (ascending) chain ( w n ) n w 0 = l 0 , w i +1 = w i ▽ l i+1 for i > 0 stabilizes eventually.

  20. Workset-Algorithm with Widening W = ∅ ; { } forall ( v ) { A v [ ] =⊥ ; W = W ∪ v ; } program points A st [ ] = init ; while W ≠ ∅ { u = Extract W ( ); forall ( , s v with e = ( , , ) u s v ) { edge t = f A u ( [ ]); e ⊑ if ¬ ( t A v [ ]) { ▽ A v [ ] = A v [ ] t ; { } W = W ∪ v ; } } } �������������������������������������������������������������������������������������������� ���!"� '

  21. Invariants of the Main Loop ⊑� a) A u [ ] MFP[ ] u f.a. prg. points u ⊒� b1) A st [ ] init ⇒ ⊒ b2) u ∉ W [ ] A v f ( [ ]) f.a. edges A u e = ( , , ) u s v e With a widening operator we enforce termination but we loose invariant a ) . Upon termination, we have: is a solution of the constraint system by b1)&b2) A ⇒ ⊒�� [ ] A u MFP u [ ] f.a . u � � � � Compute a sound upper approximation (only) ! �������������������������������������������������������������������������������������������� ���!"� '#

  22. Example of a Widening Operator: Interval Analysis The goal Find save interval for the values of program variables, e.g. of i in: for (i=0; i<42; i++) if (0<=i and i<42) { A1 = A+i; M[A1] = i; } ☺ ..., e.g., in order to remove the redundant array range check.

  23. Example of a Widening Operator: Interval Analysis The lattice... ( ) ⊑ { } ( ) ℤ { } ℤ { } { } L , = [ , ] | l u l ∈ ∪ −∞ , u ∈ ∪ +∞ , l ≤ u ∪ ∅ , ⊆ ... has infinite ascending chains, e.g.: [0,0] ⊂ [0,1] ⊂ [0,2] ⊂ ... A widening operator: ▽  [ , l u ] [ , l u ] = [ , l u ], where 0 0 1 1 2 2   l if l ≤ l u if u ≥ u 0 0 1 0 0 1   l = and u = 2 2   −∞ otherwise +∞ otherwise A chain of maximal length arising with this widening operator:   ∅ ⊂ [3,7] ⊂ [3, +∞ ⊂ −∞ +∞ ] [ , ]

  24. Analyzing the Program with the Widening Operator 0 i:=0 1 ¬( i < 42) i<42 8 2 0 ≤ i < 42 ¬ (0 ≤ i < 42) 3 7 A 1 := A + i 4 M[A 1 ] := i 5 i := i+1 6 � ⇒ Result is far too imprecise ! Example taken from: H. Seidl, Vorlesung „Programmoptimierung“

  25. Remedy 1: Loop Separators � Apply the widening operator only at a „ loop separator “ (a set of program points that cuts each loop). � We use the loop separator {1} here. 0 i:=0 1 ¬( i < 42) i<42 8 2 0 ≤ i < 42 ¬ (0 ≤ i < 42) 3 7 A 1 := A + i 4 M[A 1 ] := i 5 i := i+1 6 ⇒ Identify condition at edge from 2 to 3 as redundant ! ☺ Find out, prg. point 7 is unreachable !

  26. Remedy 2: Narrowing � Iterate again from the result obtained by widening --- Iteration from a prefix-point stays above the least fixpoint ! --- 0 i:=0 1 ¬( i < 42) i<42 8 2 0 ≤ i < 42 ¬ (0 ≤ i < 42) 3 7 A 1 := A + i 4 M[A 1 ] := i 5 i := i+1 6 ☺ ⇒ We get the exact result in this example (but not guaranteed) !

  27. Remarks � Can use a work-list instead of a work-set � Special iteration strategies in special situations � Semi-naive iteration (later!) � Narrowing operators �������������������������������������������������������������������������������������������� ���!"� &�

  28. Three Questions Do ( smallest ) solutions always exist ? � Do ( Do (smallest smallest) ) solutions solutions always always exist exist ? ? � � How to compute the ( smallest ) solution ? � How How to to compute compute the the ( (smallest smallest) ) solution solution ? ? � � � How to justify that a solution is what we want ? � MOP vs MFP-solution � Abstract interpretation �������������������������������������������������������������������������������������������� ���!"� &#

  29. Three Questions Do ( smallest ) solutions always exist ? � Do ( Do (smallest smallest) ) solutions solutions always always exist exist ? ? � � How to compute the ( smallest ) solution ? � How How to to compute compute the the ( (smallest smallest) ) solution solution ? ? � � � How to justify that a solution is what we want ? � MOP vs MFP-solution Abstract interpretation � Abstract Abstract interpretation interpretation � � �������������������������������������������������������������������������������������������� ���!"� &$

  30. Assessing Data Flow Frameworks Execution Abstraction MOP-solution Semantics sound? sound? MFP-solution how precise? precise? �������������������������������������������������������������������������������������������� ���!"� &%

  31. Live Variables ∅ ∅ {y} x := 17 y := 17 x := 42 y := x+y x := 10 x := y+1 x := x+1 y := 11 x := y+1 infinitely many such paths out(x) = ∅ ∪ = MOP[ ] { } { } v y y

  32. Meet-Over-All-Paths Solution (MOP) Definition: The transfer function f π π : L → → → L of a path π : v 0 f 0 ...f k-1 v k , k ≥ 0, is: → π π f π = f k-1 ◦ ... ◦ f 0 . The MOP-solution is: MOP[v] = ⊔ { f π (init) | π ∈ Paths[st,v] } für alle v ∈ N. �������������������������������������������������������������������������������������������� ���!"� �

  33. Coincidence Theorem Definition: A data-flow problem is positively-distributive if f( ⊔ X)= ⊔ { f(x) | x ∈ X} for all sets ∅ ≠ X ⊆ L and transfer functions f ∈ F. Theorem: For any instance of a positively-distributive data-flow problem: MOP[u] = MFP[u] for all program points u (if all program points reachable). Remark: A data-flow problem is positively-distributive if a) and b) hold: (a) it is distributive: f(x ⊔ y) = f(x) ⊔ f(y) f.a. f ∈ F, x,y ∈ L. (b) it is effective: the lattice L does not have infinite ascending chains . Remark: All bitvector frameworks are distributive and effective. �������������������������������������������������������������������������������������������� ���!"� �

  34. Recall: Lattice for Constant Propagation ⊤ unknown value . . . -2 -1 0 1 2 . . . ⊥ ℤ ⊤ ρ ρ → ∪ ∪ lattice : L { | : Var ( { })} { } �������⊑ �� ⊑ ⊥ ρ ρ ⇔ ρ = ∨ : ' : ⊑ ⊥ ρ ρ ≠ ∧∀ ρ ρ ( , ' x : ( ) x '( )) x

  35. ρ ρ ρ ( ( ), x ( ), y ( )) z x := 17 x := 2 x := 3 y := 3 y := 2 z := x+y (2,3,5) (3,2,5) ⊤ ⊤ out(x) = MOP[ ] ( v , ,5) �������������������������������������������������������������������������������������������� ���!"� &

  36. ρ ρ ρ ( ( ), x ( ), y ( )) z x := 17 ( ⊤ , ⊤ , ⊤ ) x := 2 x := 3 (2, ⊤ , ⊤ ) (3, ⊤ , ⊤ ) y := 3 y := 2 (2,3, ⊤ ) (3,2, ⊤ ) ( ⊤ , ⊤ , ⊤ ) z := x+y ⊤ ⊤ ⊤ = M FP[ ] ( v , , ) ( ⊤ , ⊤ , ⊤ ) ⊤ ⊤ out(x) = MOP[ ] ( v , ,5) �������������������������������������������������������������������������������������������� ���!"�

  37. Correctness Theorem Recall: We assume transfer functions in a data- flow problem to be monotone i.e.: x ⊑ y f(x) ⊑ f(y) ⇒ for all f ∈ F, x,y ∈ L. ☺ ☺ ☺ ☺ . Theorem: For any data-flow problem: MOP[u] ⊑ MFP[u] for all program points u. �������������������������������������������������������������������������������������������� ���!"� #

  38. Assessing Data Flow Frameworks Execution Abstraction MOP-solution Semantics sound sound MFP-solution precise, if distrib. �������������������������������������������������������������������������������������������� ���!"� $

  39. Where Flow Analysis Looses Precision Execution MOP MFP Widening semantics Potential loss of precision �������������������������������������������������������������������������������������������� ���!"� %

  40. Three Questions Do ( smallest ) solutions always exist ? � Do ( Do (smallest smallest) ) solutions solutions always always exist exist ? ? � � How to compute the ( smallest ) solution ? � How How to to compute compute the the ( (smallest smallest) ) solution solution ? ? � � � How to justify that a solution is what we want ? MOP vs MFP - solution � MOP MOP vs vs MFP MFP- -solution solution � � � Abstract interpretation �������������������������������������������������������������������������������������������� ���!"� #!

  41. Abstract Interpretation constraint system for constraint system for Replace Reference Semantics Analysis concrete operators o on concrete lattice (D, ⊑ ) on abstract lattice (D # , ⊑ # ) by abstract operators o # MFP # MFP Often used as reference semantics: sets of reaching runs: � (D, ⊑ ) = (P(Edges * ), ⊆ ) or (D, ⊑ ) = (P(Stmt * ), ⊆ ) sets of reaching states („C ollecting Semantics“ ): � (D, ⊑ ) = (P( Σ * ), ⊆ ) with Σ = Var → Val �������������������������������������������������������������������������������������������� ���!"� #"

  42. Abstract Interpretation constraint system for constraint system for Replace Reference Semantics Analysis concrete operators o on concrete lattice (D, ⊑ ) on abstract lattice (D # , ⊑ # ) by abstract operators o # MFP # MFP Assume a universally-disjunctive abstraction function α : D → D # . Correct abstract interpretation: Show α (o(x 1 ,...,x k )) ⊑ # o # ( α (x 1 ),..., α (x k )) f.a. x 1 ,...,x k ∈ L, operators o Then α (MFP[u]) ⊑ # MFP # [u] f.a. u Correct and precise abstract interpretation: Show α (o(x 1 ,...,x k )) = o # ( α (x 1 ),..., α (x k )) f.a. x 1 ,...,x k ∈ L, operators o Then α (MFP[u]) = MFP # [u] f.a. u Use this as a guideline for designing correct (and precise) analyses ! �������������������������������������������������������������������������������������������� ���!"� #�

  43. Abstract Interpretation Constraint system for reaching runs: { } R st [ ] ⊇ ε , for , the start node st { } R v [ ] ⊇ R u [ ] ⋅ e , for each edge e = ( , , ) u s v Operational justification: Let R[u] be components of smallest solution over P(Edges *) . Then op r R u [ ] = R [ ] u = { r ∈ Edges *| st  → u } for all u def Prove: a) R op [u] satisfies all constraints (direct) ⇒ R[u] ⊆ R op [u] f.a. u b) w ∈ R op [u] ⇒ w ∈ R[u] (by induction on |w|) ⇒ R op [u] ⊆ R[u] f.a. u

  44. Abstract Interpretation Constraint system for reaching runs: { } R st [ ] ⊇ ε , for st , the start node { } R v [ ] ⊇ R u [ ] ⋅ e , for each edge e = ( , , ) u s v Derive the analysis: Replace { ε } by init ( • ) � { � e � } by f e Obtain abstracted constraint system: ⊒ # R st [ ] init , for st , the start node ⊒ # # R v [ ] f R u ( [ ]), for each edge e = ( , , ) u s v e

  45. Abstract Interpretation MOP-Abstraction: Define α MOP : P(Edges *) → L by ⊔ { } � α MOP ( R ) = f init ( ) | r ∈ R where f = Id f , = f f r ε e s s e ⋅ Remark: If all transfer functions f e are monotone, the abstraction is correct, hence: α Μ OP (R[u]) ⊑ R # [u] f.a. prg. points u If all transfer function f e are universally-distributive, i.e., f( ⊔ X)= ⊔ { f(x) | x ∈ X} for all sets X ⊆ L the abstraction is correct and precise, hence: α Μ OP (R[u]) = R # [u] f.a. prg. points u ☺ ☺ ☺ ☺ Justifies MOP vs. MFP theorems ( cum grano salis ). �������������������������������������������������������������������������������������������� ���!"� ##

  46. Overview Introduction � Fundamentals of Program Analysis � Excursion 1 Interprocedural Analysis � Excursion 2 Analysis of Parallel Programs � Excursion 3 Conclusion � �������������������������������������������������������������������������������������������� ���!"� #$

  47. Challenges for Automatic Analysis � Data aspects: infinite number domains � dynamic data structures (e.g. lists of unbounded length) � pointers � ... � � Control aspects: recursion � concurrency � creation of processes / threads � synchronization primitives (locks, monitors, communication stmts ...) � ... � ⇒ ⇒ ⇒ ⇒ infinite/unbounded state spaces �������������������������������������������������������������������������������������������� ���!"� #%

  48. Classifying Analysis Approaches control aspects data aspects analysis techniques �������������������������������������������������������������������������������������������� ���!"� $!

  49. (My) Main Interests of Recent Years Data aspects: algebraic invariants over Q , Z , Z m ( m = 2 n ) in sequential programs, � partly with recursive procedures invariant generation relative to Herbrand interpretation � Control aspects: recursion � concurrency with process creation / threads � synchronization primitives, in particular locks/monitors � Technics: fixpoint-based � automata-based � (linear) algebra � syntactic substitution-based techniques � ... �

  50. Overview Introduction � Fundamentals of Program Analysis � Excursion 1 Interprocedural Analysis � Excursion 2 Analysis of Parallel Programs � Excursion 3 Conclusion � �������������������������������������������������������������������������������������������� ���!"� $�

  51. A Note on Karr´s Algorithm Markus Müller-Olm Joint work with Helmut Seidl (TU München) ICALP 2004, Turku, July 12-16, 2004

  52. What this Excursion is About… 0 x 1 :=1 x 2 :=1 x 3 :=1 1 x 1 :=x 1 +1 x 2 :=2x 2 -2x 1 +5 x 3 :=x 3 +x 2 2 2 x 3 = x 1 x 2 = 2x 1 -1

  53. Affine Programs � Basic Statements: � affine assignments: x 1 := x 1 -2x 3 +7 � unknown assignments: x i := ? → abstract too complex statements � Affine Programs: � control flow graph G=(N,E,st), where � N finite set of program points � E ⊆ N × Stmt × N set of edges � st ∈ N start node � Note: non-deterministic instead of guarded branching

  54. The Goal: Precise Analysis Given an affine program, determine for each program point � all valid affine relations: a 0 + ∑ a i x i = 0 a i ∈ � 5x 1 +7x 2 -42=0 More ambitious goal: � determine all valid polynomial relations (of degree � d): p ∈ � [x 1 ,…,x n ] p(x 1 ,…,x k ) = 0 2 +7x 3 3 =0 5x 1 x 2

  55. Applications of Affine (and Polynomial) Relations � Data-flow analysis: definite equalities: x = y � constant detection: x = 42 � discovery of symbolic constants: x = 5yz+17 � complex common subexpressions: xy+42 = y 2 +5 � loop induction variables � � Program verification strongest valid affine (or polynomial) assertions � (cf. Petri Net invariants) � RS3: Improve precision of PDG-based IFC analysis � (with Gregor Snelting (KIT, Karlsruhe) and his group)

  56. [Karr, 1976] Karr´s Algorithm � Determines valid affine relations in programs. � Idea: Perform a data-flow analysis maintaining for each program point a set of affine relations, i.e., a linear equation system. � Fact: Set of valid affine relations forms a vector space of dimension at most k +1, where k = #program variables. ⇒ can be represented by a basis. ⇒ forms a complete lattice of height k+1.

  57. Deficiencies of Karr´s Algorithm � Basic operations are complex „non-invertible“ assignments � union of affine spaces � � O( n � k 4 ) arithmetic operations n size of the program � k number of variables � � Number may grow to exponential size

  58. Our Contribution � Reformulation of Karr´s algorithm: basic operations are simple � O( n � k 3 ) arithmetic operations � numbers stay of polynomial length: O( n � k 2 ) � Moreover: generalization to polynomial relations of bounded degree � show, algorithm finds all affine relations in „affine programs“ � � Ideas: represent affine spaces by affine bases instead of lin. eq. syst. � use semi-naive fixpoint iteration � keep a reduced affine basis for each program point during fixpoint � iteration

  59. Affine Basis

  60. Concrete Collecting Semantics Smallest solution over subsets of � k of: ℚ k V st [ ] ⊇ V v [ ] ⊇ f V u ( [ ]) , for each edge ( , , ) u s v s where { ֏ } f ( X ) = x x [ t x ( )] | x ∈ X x : = t i i { } ֏ ℚ f ( X ) = x x [ c ] | x ∈ X c , ∈ x : ? = i i First goal: compute affine hull of V [ u ] for each u .

  61. Abstraction Affine hull: = ∑ ∑ { } ℚ aff X ( ) λ x | x ∈ X , λ ∈ , λ = 1 i i i i i The affine hull operator is a closure operator: ⇒ aff X ( ) ⊇ X aff aff X , ( ( )) = X , X ⊆ Y aff X ( ) ⊆ aff Y ( ) Affine subspaces of Q k ordered by set inclusion ⇒ form a complete lattice: ( ) { } ⊑ ℚ k ( , ) D = X ⊆ | aff X ( ) = X , ⊆ . Affine hull is even a precise abstraction: Lemma : ( f aff X ( )) = aff f ( ( X )). s s

  62. Abstract Semantics Smallest solution over (D, ⊑ ) of: ⊒ # ℚ k V [ st ] ⊒ # # V [ ] v f V ( [ ]) , u for each edge ( , , ) u s v s # Le mma : V [ ] u = aff V u ( [ ]) for all progr am points u.

  63. Basic Semi-naive Fixpoint Algorithm forall ( v ∈ N ) G v [ ] = ∅ ; G st [ ] {0, e ,..., e }; = 1 k W = {( st ,0),( st e , ),...,( st e , )}; 1 k while W ≠ ∅ { ( , ) u x = Extract W ( ); forall ( , s v with ( , , ) u s v ∈ E ) { � � t = s x ; if ( t ∉ aff G v ( [ ])) { { } G v [ ] = G v [ ] ∪ t ; { } W = W ∪ ( , ) ; v t } } }

  64. Example         0 1 0 0         0 , 0 , 1 , 0 0         0 0 0 1         x 1 :=1 x 2 :=1 x 3 :=1                   1 2 3 4 1 2 3                     1 3 5 7 ∈ aff 1 , 3 , 5 1                   1 4 9       16 1 4 9             x 1 :=x 1 +1 x 2 :=2x 2 -2x 1 +5 x 3 :=x 3 +x 2       2 3 4       3 2 5 7       4   9 16    

  65. Correctness Theore m : a) Algorithm terminates after at most nk + n iterations of the loop, where n = N and is the number of variables. k # b) For all v ∈ N , we have aff G ( [ ]) v = V [ v ] . fin Invariants for b) I1: ∀ ∈ v N G v : [ ] ⊆ V v [ ] and ( , ) ∀ u x ∈ W : x ∈ V u [ ]. � � ( ) { } ⊒ I2: (u,s,v) ∀ ∈ E: aff G v [ ] ∪ s x | ( , ) u x ∈ W f aff G u ( ( [ ]). s

  66. Complexity Theo re m : # a) The affine hulls V [ ] u = aff V u ( [ ]) can be computed in time 3 O( n k ⋅ ), where n = | N | + | E |. b) In this computation only arithmetic operations on numbers 2 with O( n k ⋅ ) bits are u sed . Store diagonal basis for membership tests. Propagate original vectors.

  67. Point + Linear Basis

  68. Example         0 1 0 0         0 , 0 , 1 , 0 0         0 0 0 1         x 1 :=1 x 2 :=1 x 3 :=1                   1 0 1 2 3 4 1 2 0                   2 , 0 1 3 5 2 4 0 7 1                       0 2 1 4 9 3 8 0       16         x 1 :=x 1 +1 x 2 :=2x 2 -2x 1 +5 x 3 :=x 3 +x 2               2 3 4 1 2 1 0               3 2 5 7 2 4 2 , 0               4   9 16 5 12             0 2

  69. Determining Affine Relations Lemm a a : is valid for X ⇔ a is va lid for aff X ( ). ⇒ suffices to determine the affine relations valid for affine bases; can be done with a linear equation system! Theorem : a) The vector spaces of all affine relations valid at the program 3 points of an affine program can be computed in time O( n k ⋅ ). b) This computation performs arithmetic operatio ns on int egers 2 with O( n k ⋅ ) bits only.

  70. Example a + a x + a x + a x = 0 is valid at 2 0 1 1 2 2 3 3 a + 2 a + 3 a + 4 a = 0 0 1 2 3 ⇔ 1 a + 2 a = 0 1 2 2 a = 0 3 0 x 1 :=1 ⇔ a = a , a = − 2 a , a = 0 0 2 1 2 3 x 2 :=1 x 3 :=1 ⇒ 2 x − x − 1 is valid at 2 1 2 1 x 1 :=x 1 +1 x 2 :=2x 2 -2x 1 +5 x 3 :=x 3 +x 2           2 3 4 1 0           3 2 5 7 2 , 0             4   9  16  0 2    

  71. Also in the Paper � Non-deterministic assignments � Bit length estimation � Polynomial relations � Affine programs + affine equality guards validity of affine relations undecidable �

  72. End of Excursion 1

  73. Overview Introduction � Fundamentals of Program Analysis � Excursion 1 Interprocedural Analysis � Excursion 2 Analysis of Parallel Programs � Excursion 3 Conclusion � �������������������������������������������������������������������������������������������� ���!"� "!&

  74. Interprocedural Analysis Main: P: procedures c:=a+b c:=a+b P() Q() call edges R() Q: R: a:=7 c:=a+b a:=7 c:=a+b P() R() recursion

  75. Running Example: (Definite) Availability of the single expression a+b The lattice: false false a+b not available c:=a+b true a+b available true Initial value: false a:=7 c:=c+3 false true c:=a+b false true a:=42 false

  76. Intra-Procedural-Like Analysis Conservative assumption: procedure destroys all information; information flows from call node to entry point of procedure Main: The lattice: st M false false c:=a+b true P: true u 1 true false st P P() λ x. false c:=a+b false u 2 r P true a:=7 false u 3 P() λ x. false � � � � false r M

  77. Context-Insensitive Analysis Conservative assumption: Information flows from each call node to entry of procedure and from exit of procedure back to return point Main: The lattice: st M false false c:=a+b true P: true u 1 true false st P P() c:=a+b true u 2 r P true a:=7 false u 3 P() ☺ ☺ ☺ ☺ true r M

  78. Context-Insensitive Analysis Conservative assumption: Information flows from each call node to entry of procedure and from exit of procedure bac to return point Main: The lattice: st M false false c:=a+b true P: true u 1 true false st P P() false true u 2 r P true false a:=7 false u 3 P() � � � � r M false

  79. Recall: Abstract Interpretation Recipe constraint system for constraint system for Replace Reference Semantics Analysis concrete operators o on concrete lattice (D, ⊑ ) on abstract lattice (D # , ⊑ # ) by abstract operators o # MFP # MFP Assume a universally-disjunctive abstraction function α : D → D # . Correct abstract interpretation: Show α (o(x 1 ,...,x k )) ⊑ # o # ( α (x 1 ),..., α (x k )) f.a. x 1 ,...,x k ∈ L, operators o Then α (MFP[u]) ⊑ # MFP # [u] f.a. u Correct and precise abstract interpretation: Show α (o(x 1 ,...,x k )) = o # ( α (x 1 ),..., α (x k )) f.a. x 1 ,...,x k ∈ L, operators o Then α (MFP[u]) = MFP # [u] f.a. u Use this as a guideline for designing correct (and precise) analyses ! �������������������������������������������������������������������������������������������� ���!"� """

  80. Example Flow Graph Main: The lattice: st M false e 0 : c:=a+b true P: u 1 st P e 1 : P() e 4 : c:=a+b u 2 r P e 2 : a:=7 u 3 e 3 : P() r M

  81. Let‘s Apply Our Abstract Interpretation Recipe: Constraint System for Feasible Paths Operational justification: { ∗ r } | st S u ( ) = r ∈ Edges  → u for all in procedure u p p { ∗ r } | st S p ( ) = r ∈ Edges  → ε for all procedures p p { } ∗ r | ∃ w ∈ Nodes ∗ st R u ( ) = r ∈ Edges :  → uw for all u Main Same-level runs : S p ( ) ⊇ S r ( ) r return point of p p p { } S st ( ) ⊇ ε st entry point of p p p { } S v ( ) ⊇ S u ( ) ⋅ e e = ( , , ) base edge u s v S(v) ⊇ S u ( ) ⋅ S p ( ) e = ( , , ) call edge u p v Reaching runs: { } R st ( ) ⊇ ε st entry point of Main Main Main { } R v ( ) ⊇ R u ( ) ⋅ e e = ( , , ) basic e u s v dg e R v ( ) ⊇ R u ( ) ⋅ S p ( ) e = ( , , ) call edge u p v R st ( ) ⊇ R u ( ) e = ( , , ) call ed u p v ge, st entry point of p p p

  82. Context-Sensitive Analysis Summary-based approaches : Phase 1: Compute summary information for each procedure... ... as an abstraction of same-level runs Phase 2: Use summary information as transfer functions for procedure calls... ... in an abstraction of reaching runs Classic types of summary information: Functional approach: [Sharir/Pnueli 81, Knoop/Steffen: CC´92] Use (monotonic) functions on data flow informations ! Relational approach: [Cousot/Cousot: POPL´77] Use relations (of a representable class) on data flow informations ! Call-string-based approaches : e.g [Sharir/Pnueli 81], [Khedker/Karkare: CC´08] - Analysis relative to finite portion of call stack - Applicable to arbitrary lattices - Sometimes less precise than summary-based approaches

  83. Formalization of Functional Approach Abstractions: ∗ Abstract same-level runs with α :Edges → ( L → L ) : Funct ⊔ { } | r ∈ R ∗ α ( R ) = f for R ⊆ Edges Func t r ∗ Abstract reaching runs with α : Edges → L : M P O ⊔ { } | r ∈ R ∗ α ( R ) = f init ( ) for R ⊆ Edge s MOP r 1. Phase: Compute summary informations, i.e., functions : ⊒ # # S ( ) p S ( r ) r return point of p p p ⊒ # S ( st ) id st entry point of p p p ⊒ # # � # S ( ) v f S ( ) u e = ( , , ) base edge u s v e ⊒ � # # # S (v) S ( ) p S ( ) u e = ( , , ) call edge u p v 2. Phase: Use summary informations; compute on data flow informations: ⊒ # R ( st ) init st entry point of Main Main Main ⊒ # # # R ( ) v f ( R ( ) u ) e = ( , , ) basic edge u s v e ⊒ # # # R ( ) v S ( ) p R ( ( ) u ) e = ( , , ) call edg u p v e ⊒ # # R ( st ) R ( ) u e = ( , , ) call edge, u p v st entry point of p p p

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend