Outline 1 Introduction 2 Andersens Analysis The Algorithm - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline 1 Introduction 2 Andersens Analysis The Algorithm - - PowerPoint PPT Presentation

Introduction Andersens Analysis Steensgaards Analysis Comparison Hybrids Outline 1 Introduction 2 Andersens Analysis The Algorithm Constraints Complexity 3 Steensgaards Analysis The Algorithm Making it Work Complexity 4


slide-1
SLIDE 1

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

Outline

1 Introduction 2 Andersen’s Analysis

The Algorithm Constraints Complexity

3 Steensgaard’s Analysis

The Algorithm Making it Work Complexity

4 Comparison 5 Hybrids

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-2
SLIDE 2

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

Introduction and Rationale

Introduction Last time we saw flow sensitive points-to analysis Computes information at every point of a program

Precise The information is a (large) graph — expensive!

Flow-Insensitive analysis Compute just one graph for the entire program Consider all statements regardless of control-flow SSA or similar forms can recover some precision

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-3
SLIDE 3

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

Introduction and Rationale

Introduction Last time we saw flow sensitive points-to analysis Computes information at every point of a program

Precise The information is a (large) graph — expensive!

Flow-Insensitive analysis Compute just one graph for the entire program Consider all statements regardless of control-flow SSA or similar forms can recover some precision

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-4
SLIDE 4

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

A little comparison

Code int x; int *y, *z; x = &y; x = &z; Flow-Sensitive

y x z x

Flow-Insensitive

y z x

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-5
SLIDE 5

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Outline

1 Introduction 2 Andersen’s Analysis

The Algorithm Constraints Complexity

3 Steensgaard’s Analysis

The Algorithm Making it Work Complexity

4 Comparison 5 Hybrids

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-6
SLIDE 6

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Andersen’s algorithm

Essentially the immediate adaptation of the usual dataflow points-to algorithm to be flow-insensitive Since do not know the order of statements, can say less:

x = &y — can only know that y ∈ pt(x) x = y — can only know that pt(y) ⊆ pt(x)

When analyzing, collect such constraints Can use a fixed-point computation to compute the actual points-to sets

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-7
SLIDE 7

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Constraints for C

1 x = &y — y ∈ pt(x) 2 x = y — pt(y) ⊆ pt(x)

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-8
SLIDE 8

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Constraints for C II

x = *y y w z a b c x ∀a ∈ pt(y).pt(a) ⊆ pt(x)

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-9
SLIDE 9

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Constraints for C III

*x = y

x w b y z a

∀w ∈ pt(x).pt(y) ⊆ pt(w)

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-10
SLIDE 10

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Constraints for C — Summary

1 x = &y — y ∈ pt(x) 2 x = y — pt(y) ⊆ pt(x) 3 x = *y — ∀a ∈ pt(y).pt(a) ⊆ pt(x) 4 *x = y — ∀w ∈ pt(x).pt(y) ⊆ pt(w)

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-11
SLIDE 11

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Constraints for Java

1 Stack variables can not be pointed to, only heap objects can

be

2 Can take advantage of type safety 3 The following is one memory abstraction:

Name objects by allocation site Variables point to objects Fields of objects point to objects

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-12
SLIDE 12

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Constraints for Java II

1 x = y — pt(y) ⊆ pt(x) 2 y.f = x — ∀o ∈ pt(y) (pt(x) ⊆ pt(o.f )) 3 x = y.f — ∀o ∈ pt(y) (o.f ⊆ pt(x))

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-13
SLIDE 13

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Cost of the algorithm

Asymptotically Implicitly have a constraint graph, O(n) nodes, O(n2) edges The fixed point computation essentially computes transitive closure — which is an O(n3) computation In practice Usually, nowhere near that bad... ... but can be bad enough to be unusable

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-14
SLIDE 14

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Cost of the algorithm

Asymptotically Implicitly have a constraint graph, O(n) nodes, O(n2) edges The fixed point computation essentially computes transitive closure — which is an O(n3) computation In practice Usually, nowhere near that bad... ... but can be bad enough to be unusable

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-15
SLIDE 15

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Actual Performance (from [ShHo97])

Name Size (LoC) Time (sec) triangle 1986 2.9 gzip 4584 1.7 li 6054 738.5 bc 6745 5.5 less 12152 1.9 make 15564 260.8 tar 18585 23.2 espresso 22050 1373.6 screen 24300 514.5 75MHz SuperSPARC, 256MB RAM

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-16
SLIDE 16

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Constraints Complexity

Reducing the cost

Cycles in a graph must have the same points-to sets, so can be collapsed to a single node [FäFoSuAi98]

In some cases runs at much as 50x faster li is done in 30.25 seconds, espresso in 27 seconds, on UltraSparc in 167-400Mhz

If two variables have the same points-to sets, they can be collapsed [RoCh00]

Around 2x improvement in run time, 3x lower memory usage

BDDs (Reduced Ordered Binary Decision Diagrams) have been used to represent the graph more sparsely [BeLhQiHeUm03]

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-17
SLIDE 17

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Making it Work Complexity

Outline

1 Introduction 2 Andersen’s Analysis

The Algorithm Constraints Complexity

3 Steensgaard’s Analysis

The Algorithm Making it Work Complexity

4 Comparison 5 Hybrids

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-18
SLIDE 18

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Making it Work Complexity

Overview

Can view the problem as trying to assign synthetic types to each reference — so it points to objects of specified type A type is defined recursively as pointing to an another type Hence, proceeds as a type inference algorithm, doing unification x = y — τ(x) = τ(y), so take pt(x) = pt(y) Each type points to one other type, so the points-to graph has at most 1 out edge for each node (but each node can be many variables)

Graph is of linear size — fast! Limits precision

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-19
SLIDE 19

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Making it Work Complexity

Processing of assignments

x=*y x a y w v b z

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-20
SLIDE 20

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Making it Work Complexity

Processing of assignments

x=*y x a y w v b z

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-21
SLIDE 21

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Making it Work Complexity

Processing of assignments

x=*y x y w b a z v

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-22
SLIDE 22

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Making it Work Complexity

Processing of assignments

x=*y x y w z v b a

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-23
SLIDE 23

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Making it Work Complexity

Making it work

There are a couple of problems that arise in practice Building a call graph

Make the type a pair, including a function pointer portion Compute the set of functions that may point to using unification as well

Integer assignments to pointers/lack of type safety

int* a = 0, *x = a, *y = b; Will collapse them into a single node Should only do unification if RHS is known to be a pointer

Don’t unify if we don’t see the RHS pointing to anything, just record an edge Perform a unification if RHS gets to point to something

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-24
SLIDE 24

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids The Algorithm Making it Work Complexity

Complexity

It’s fast! Asymptotically, O(Nα(N, N)) Has been shown to analyze programs with millions of lines of code in under a minute

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-25
SLIDE 25

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

Outline

1 Introduction 2 Andersen’s Analysis

The Algorithm Constraints Complexity

3 Steensgaard’s Analysis

The Algorithm Making it Work Complexity

4 Comparison 5 Hybrids

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-26
SLIDE 26

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

Some numbers — time (from [ShHo97])

Name Size (LoC) Andersen(sec) Steensgaard(sec) triangle 1986 2.9 0.8 gzip 4584 1.7 1.1 li 6054 738.5 4.7 bc 6745 5.5 1.6 less 12152 1.9 1.5 make 15564 260.8 6.1 tar 18585 23.2 3.6 espresso 22050 1373.6 10.2 screen 24300 514.5 10.1 75MHz SuperSPARC, 256MB RAM

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-27
SLIDE 27

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

Some numbers — Average alias set size (from [ShHo97])

Name Size (LoC) Andersen Steensgaard triangle 1986 4.01 21.93 gzip 4584 2.96 25.17 li 6054 171.14 457.89 bc 6745 18.57 83.55 less 12152 7.11 63.75 make 15564 74.70 414.03 tar 18585 17.41 53.7 espresso 22050 109.53 143.4 screen 24300 106.89 652.8

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-28
SLIDE 28

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

Relation between algorithms

Andersen’s algorithm can be viewed as type-inference, too

But with subtyping x = y: τ(y) <: τ(x), so pt(y) ⊆ pt(x).

Steensgaard’s algorithm can be thought as restricting the

  • ut-degree of the graph procuded by Andersen’s algorithm to

1, by merging nodes when that is exceeded

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-29
SLIDE 29

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

Outline

1 Introduction 2 Andersen’s Analysis

The Algorithm Constraints Complexity

3 Steensgaard’s Analysis

The Algorithm Making it Work Complexity

4 Comparison 5 Hybrids

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-30
SLIDE 30

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

k-limiting

[ShHo97] provides a k-limiting algorithm, which with k = 1 behave as Steensgaard, with k = N as Andersen Assign variables k colors Have a separate points-to slot for each color Do a few runs with different assignments, and intersect the results (k2 logk N factor slowdown) Average alias set size was shrunk by about 1.78, About 2x faster than Andersen when that runs slowly, but

  • ften slower than it — very high constant factors

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-31
SLIDE 31

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

One level flow

[Das2000] introduced an another heuristic. Algorithm Observation: C programs mostly use pointers to pass in parameters, which are basically assignments Solution: Accurately model the simple cases by using containment constraints to refer to points-to sets of symbols in the assignment, but unify stuff further out Can get some context sensitivity on top of it, by labeling edges, and doing CFL reachability (makes it O(n3))

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-32
SLIDE 32

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

Accuracy Produces nearly identical sets as Andersen for most test programs (except one that used pointers to pointers) Performance Asymptotically: linear memory use, quadratic time (in the constraint-solving phase) About 2x slower than Steensgaard’s algorithm in practice Analyzes 1.4 million lines of code (Word97) in about 2 minutes on a 450Mhz Xeon

Maks Orlovich On Flow-Insensitive Points-To Analyses

slide-33
SLIDE 33

Introduction Andersen’s Analysis Steensgaard’s Analysis Comparison Hybrids

Discussion...

Maks Orlovich On Flow-Insensitive Points-To Analyses