Refinement-Based Context-Sensitive Points-To Analysis for Java Manu - - PowerPoint PPT Presentation

▶

Mar 23, 2024 400 likes •601 views

Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodk UC Berkeley PLDI 2006 1 What Does Refinement Buy You? Increased scalability: enable new clients Memory: orders of magnitude savings Time:

SLIDE 1

Refinement-Based Context-Sensitive Points-To Analysis for Java

Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006

SLIDE 2

What Does Refinement Buy You?

Increased scalability: enable new clients

Memory: orders of magnitude savings
Time: answer for a variable comes back in 1 second
) Suitable for IDE

Precision:

Cast Safety Client

SLIDE 3

Approach: Focus on the Client

Demand-driven: only do requested work Client-driven refinement: stop when client satisfied Example:

client asks: “can x point to o?”
we refine until we answer NO (the good

answer) or we time out

SLIDE 4

Context-Sensitive Analysis Costly

Context-sensitive analysis (def):

Compute result as if all calls inlined
But, collapse recursive methods

Exponential blowup (code growth)

SLIDE 5

Why Not Existing Technique?

Most analyses approximate same way in all code

E.g., k-CFA
Precision lost, esp. for data structures

Our analysis focuses precision where it matters

Fully precise in the limit
Only small amount of code analyzed precisely
First refinement algorithm for Java

SLIDE 6

Points-To Analysis Overview

Compute objects each variable can point to

For each var x, points-to set pt(x)

Model objects with abstract locations

1: x = new Foo() yields pt(x) = { o1 }

Flow-insensitive: statements in any order

SLIDE 7

Points-To Analysis as CFL-Reachability

1) Assignments x = new Obj(); // o1 y = new Obj(); // o2 z = x;

x y z

a b pid retid d c (1 )1 (2 )2 [f [g ]f 2) Method calls id(p) { return p; } a = id(x); b = id(y); 3) Heap accesses c.f = x; c.g = y; d = c.f;

pt(x) = { o | o flowsTo x } flowsTo: balanced call and field parens flowsTo: balanced call parens flowsTo: path exists

SLIDE 8

Summary of Formulation

Graph represents program Compute reachability with two filters

Language of balanced call parens
Language of balanced field parens

SLIDE 9

Single path problem

Problem: show path is unbalanced Goal: reduce number of visited edges Insight: enough to find one unbalanced paren

t0 t1 t2 [f (1 )1 [h [f (1 )1 [h t5 )5 t6 (7 t8 t9 t7 … … … ]j [p )8

t10 t11 t12 ]g ]k

SLIDE 10

Approximation via Match Edges

Match edges connect matched field parens

From source of open to sink of close
Initially, all pairs connected

Use match edges to skip subpaths

t0 t1 t2 [f [g [h ]h t4 x ]j ]f [f [g [h ]h ]j ]f

SLIDE 11

Refining the Approximation

Refine by removing some match edges

Exposes more of original path for checking

Soundness: Traverse match edge ) assume field parens balanced on skipped path Remove where unbalanced parens expected

Explore deeper levels of pointer indirection
t3

t0 t1 [f [g t4 x ]j ]f [f [g [h ]h ]j ]f

SLIDE 12

Refinement With Both Languages

t0 t1 t2 (1 )1 [g ]g t6 x ]f )3 t3 t4 [f (2

Match edges enable approximation of calls

Only can check calls on match-free subpaths

Match edge removal ) more call checking

Key point: refine heap and calls together

Calls: (1 )1 (2 )3 Fields: [f [g ]g ]f

SLIDE 13

Evaluation

SLIDE 14

Experimental Configuration

Implemented in Soot framework Tested on large benchmarks x 2 clients

SPECjvm98, Dacapo suite
Downcast checking, factory method props

Refine context-insensitive result Timeout for long-running queries

SLIDE 15

Precision: Cast Checking

SLIDE 16

Scalability: Time and Memory

Average query time less than 1 second

Interactive performance (for IDE)
At most 13 minutes for casts,

4 minutes for factory client

Very low memory usage: at most 35MB

Of this, 30MB for context-insensitive result
Compare with >2GB for 1-ObjSens analysis

SLIDE 17

Demand-Driven vs. Exhaustive

Demand advantage: no caching required

Hence, low memory overhead
No engineering of efficient sets
Good for changing code; just re-compute

Demand advantage: faster for many clients

Often only care about some variables

Demand disadvantage: slower querying all vars

At most 90 minutes for all app. vars
But, still good precision, memory

SLIDE 18

Conclusions

Novel refinement-based analysis

More precise for tested clients
Interactive performance for queries
Low memory: could scale even more
Relatively easy to implement

Insight: refine heap and calls together

Useful for other balanced-paren analyses?