what is points to analysis
play

What is points-to analysis? Informally: analysis determining what - PDF document

4/5/2010 What is points-to analysis? Informally: analysis determining what locations (objects) pointers can point to Points-To Analysis Program main() { x = &a; Meeting 21, CSCI 5535, Spring 2010 y = &b; z = x; Guest Lecture:


  1. 4/5/2010 What is points-to analysis? � Informally: analysis determining what locations (objects) pointers can point to Points-To Analysis Program main() { x = &a; Meeting 21, CSCI 5535, Spring 2010 y = &b; z = x; Guest Lecture: Manu Sridharan } Result pt(x) = {a}, pt(y) = {b}, pt(z) = {a} Importance of points-to analysis Lecture overview Verification � What we’ll cover Is { *x = 10 } *y = 3 { *x = 10 } valid? � Definition / complexity of several points-to analysis If x and y cannot point to same location, then yes variants Optimization � Andersen’s analysis via CFL-reachability Can � Some issues with handling method calls a = *x; *y = 4; b = *x � Refinement-based analysis be optimized to a = *x; *y = 4; b = a? � What we’ll skip (lack of time) If x and y cannot point to same location, then yes � Shape analysis (Evan will cover later in semester) Control Flow � Many other optimizations, control-flow analysis of Can l.add() invoke ArrayList.add()? functional languages, … If l can point to an ArrayList object, then yes Formal definition (for C) Soundness and precision � If p can point to q in some execution, a sound analysis Points-to analysis : Given a program and two will always report it variables p and q, points-to analysis checks if p � Analysis may be over-approximate , reporting p can point can point to q in some program execution to q even if it cannot � A precise analysis is sound but not over-approximate [Chakaravarthy03] � Yields exact answer given program semantics � malloc creates a fresh, unnamed variable � Precise analysis for C is undecidable Alias analysis : check if p1 and p2 can point to q � Or, for any Turing-complete language simultaneously in some execution � foo( TuringMachine M, TMInput i) { run M on i; p = &q; } � We’ll focus on points-to Bottom line: to obtain decidability and efficiency, must For now, assume no procedure calls approximate program semantics 1

  2. 4/5/2010 Approximation 1: Path Insensitivity Approximation 2: Flow Insensitivity � Treat all branches as non-deterministic � Assume statements can execute in any order � Given if (c) then p; else q; , always assume � With possible repetition either p or q can execute � Assume control-flow graph is complete � Must still respect execution order (flow sensitive) � Complexity � Complexity � With dynamic memory (malloc), undecidable � With dynamic memory (malloc), decidability � See [Ramalingam94,Chakaravarthy03] unknown (!) � Without dynamic memory, PSPACE-complete [MD00] � Without dynamic memory, NP-Hard [Horwitz97] � Even with just one procedure! � Bottom line: need even more approximation � Bottom line: need to approximate more Simultaneity Approximation 3: Andersen’s Semantics of pointer accesses � Assumes discovered points-to relations can all Pointer Write Pointer Read occur simultaneously x w � Hence, less precise handling of pointer accesses y z w x = *y *x = y � Challenge: express as approximate semantics? x y z � Breaks up multi-level derefs � Note: black arrows must occur simultaneously � **x = y becomes temp = *x, *temp = y Issue: Some relations cannot arise simultaneously � Again, imprecision due to simultaneity reasoning (**x does two derefs atomically) Statement set (flow insensitive): � Heap abstraction? Other? (I don’t know) {a=&c;b=a;c=&b;b=&a;*b=c} b points to c: a=&c;b=a, a points to b: c=&b;b=&a;*b=c � Complexity: O(N 3 ); much better! But not both! Andersen’s for Java: The Basics � Four statement types � new : x = new Obj() � assign : x = y � getfield : x = y.f � putfield : x.f = y � Single abstract location for each new ANDERSEN’S ANALYSIS IN CFL- � Represents objects allocated by all executions REACHABILITY � For more precise treatment, shape analysis 2

  3. 4/5/2010 CFL-Reachability More on CFL-Reachability � � Several variants � � → �� �� � � � ε � � � � All-pairs : find all pairs of nodes connected by valid paths � � � Single-source : find all nodes to which source is connected by valid path ������� � General algorithm O(N 3 ) Points-to analysis graph: � N is number of nodes • Nodes represent variables / abstract locations � Faster algorithms for special cases (see [RHS95]) • Edges represent statements Points-to analysis paths: � Specialized algorithm needed to scale pointer analysis � ∈ �� � � � • flowsTo - path from o to x: � For more details, see [Reps98] • alias - path from x to y: �� � � � ∩ �� � � � ≠ ∅ What about alias ? Andersen’s Analysis in CFL-Reachability x = new Obj(); // o 1 � Want: � ����� � ⇔ ∃ � � � ������� � ∧ � ������� � ��� � � � z = new Obj(); // o 2 w = x; ������ ������ � Problem: need all edges in same direction y = x; � � � Solution: alias => flowsTo flowsTo y.f = z; Edge types statement v = w.f; ����� ����� � flowsTo is inverse of flowsTo flowsTo alias ������ ��� � Must add inverse edges to graph (e.g., assign) � � � � � See [SB06] for full grammar flowsTo => new (pf[f] alias gf[f] | assign)* flowsTo => new (assign)* balanced parens Importance of Handling Method Calls � Used pervasively, esp. in Java-like languages � Often deeply related to objects and pointers class ArrayList { Object[] elems; int i; public ArrayList() { allocation this.elems = new Object[10]; } METHOD CALLS public void add(Object o) { pointer write this.elems[i++] = o; } public Object get(int i) { pointer read return this.elems[i]; } } 3

  4. 4/5/2010 Precise Handling of Method Calls Decidability with Context Sensitivity � Precise path-insensitive + dynamic memory still � Idea: analyze as if all method calls inlined undecidable � Yields separate copies of local variables / new � Already undecidable with just one method expressions for each possible call � Flow-insensitive + dynamic memory + precise � Known as a context-sensitive analysis calls: undecidable � Problem: how to handle recursion � Recall that with one method, decidability unknown � Full inlining yields an infinite program � Via small modification of [Reps00] proof � But, analysis definitions still work fine! � Even Andersen-style analysis + precise calls is � Require variables p and q up front; forces choice of inlined undecidable (details coming up) copy � No dynamic memory: not well-studied � Flow-insensitive: find finite sequence from infinite statement � Note that stack frames are a form of dynamic memory set Andersen’s and Calls, Simplified Matching Calls and Returns: Example � Four statement types (ignore fields for now) � new : x = new Obj() ��� � � � � � assign : x = y � � � � id(p) { return p; } � call : x = m(p1, p2, …) x = new Obj(); // o1 � � return: return x y = new Obj(); // o2 a = id(x); � Idea: use balanced parentheses to match calls � � ��� � � � � � � b = id(y); and returns � Parens labeled by call site � → �� �� � � � � � � � ε � Grammar filters out unrealizable paths (method call returning to wrong site) � → ��� � ������ � Classic use of CFL-reachability [RHS95] Andersen’s and Calls: The Details Andersen’s and Calls: Decidability � Must allow for partially balanced call parens � Analysis requires solving reachability over � E.g., to handle intersection of two CFLs ( S and flowsTo ) makeObj() { return new Obj(); } � But, CFLs are not closed under intersection � Handle fields and calls simultaneously via � In our case, problem is undecidable intersected languages � Proof via reduction from PCP [Reps00] � Enhance N production (previous slide) to include all � Standard approach for decidability: approximate field accesses recursion � Points-to analysis must find paths that are both S � Collapse SCCs in call graph (change ( i into assign ) paths (for calls) and flowsTo paths (for fields) � Yields imprecise handling of recursive calls / returns � Also need barred edges, etc.; details in [SB06] 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend