Dynamic Purity Analysis for Java Programs Chris Pickett Clark - - PowerPoint PPT Presentation

dynamic purity analysis for java programs
SMART_READER_LITE
LIVE PREVIEW

Dynamic Purity Analysis for Java Programs Chris Pickett Clark - - PowerPoint PPT Presentation

Dynamic Purity Analysis for Java Programs Chris Pickett Clark Verbrugge Haiying Xu { hxu31,cpicke,clump } @sable.mcgill.ca Sable Research Group, McGill University Montr eal, Qu ebec, Canada H3A 2A7 PASTE 2007 June 14, 2007 PASTE 2007


slide-1
SLIDE 1

Dynamic Purity Analysis for Java Programs

Haiying Xu Chris Pickett Clark Verbrugge {hxu31,cpicke,clump}@sable.mcgill.ca

Sable Research Group, McGill University Montr´ eal, Qu´ ebec, Canada H3A 2A7

PASTE 2007

June 14, 2007

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 1/30

slide-2
SLIDE 2

Outline

1

Introduction and Motivation

2

Static Purity Analysis

3

Dynamic Purity Analysis

4

Experimental Results

5

Memoization

6

Conclusion and Future Work

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 2/30

slide-3
SLIDE 3

What is Method Purity?

Roughly, a pure method has no externally visible side effects. Different variations on purity are possible: S˘ alcianu and Rinard: can create, modify and return new objects Rountev: similar, but cannot return a new object

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 3/30

slide-4
SLIDE 4

Why is Method Purity Important?

Artzi, Kiezun, Glasser, Ernst: program comprehension modelling formal verification compiler optimization memoization thread level speculation stack allocation refactoring test input generation regression oracle creation invariant detection specification mining program slicing

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 4/30

slide-5
SLIDE 5

Why is Method Purity Important?

Artzi, Kiezun, Glasser, Ernst: program comprehension modelling formal verification compiler optimization memoization thread level speculation stack allocation refactoring test input generation regression oracle creation invariant detection specification mining program slicing

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 5/30

slide-6
SLIDE 6

Contributions

In this work, we: Design and implement dynamic purity analysis. Investigate several different purity definitions. Introduce three different dynamic purity metrics. Implement memoization as a purity consumer.

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 6/30

slide-7
SLIDE 7

Outline

1

Introduction and Motivation

2

Static Purity Analysis

3

Dynamic Purity Analysis

4

Experimental Results

5

Memoization

6

Conclusion and Future Work

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 7/30

slide-8
SLIDE 8

Static Purity Analysis

Consider the classic functional form of purity: A method is strongly pure iff it Does not r/w the heap or static data Does not perform any synchronization Does not invoke any native method Does not invoke any impure method

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 8/30

slide-9
SLIDE 9

Static Purity Analysis Framework

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 9/30

slide-10
SLIDE 10

Static Analysis Results

metric comp db jack javac jess mpeg rt static method purity 14% 13% 13% 12% 13% 13% 13% dynamic method purity 6% 6% 6% 5% 5% 6% 5% dynamic invocation purity ≈0% 2% 10% 10% 6% 16% 3% dynamic bytecode purity ≈0% 2% 1% ≈0% ≈0% 2% ≈0%

Static/dynamic method purity % of reachable/reached methods that are pure Dynamic invocation purity % of all invocations that are pure Dynamic bytecode purity % of bytecode instruction stream contained in a pure method

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 10/30

slide-11
SLIDE 11

Outline

1

Introduction and Motivation

2

Static Purity Analysis

3

Dynamic Purity Analysis

4

Experimental Results

5

Memoization

6

Conclusion and Future Work

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 11/30

slide-12
SLIDE 12

Motivation for Dynamic Purity Analysis

Static purity analysis is hard: Implementation is complex Whole-program analysis is expensive Dynamic evaluation tells a different story: Static vs. dynamic call graph Choice of metrics Input dependence

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 12/30

slide-13
SLIDE 13

Motivation for Dynamic Purity Analysis

Purity can also depend on method input:

int x; void foo (boolean b) { if (b) x = 10; }

If we only ever execute foo (false), foo is pure!

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 13/30

slide-14
SLIDE 14

Different Kinds of Dynamic Purity

Four different kinds of dynamic purity: Strong: the same as strong static purity – no heap or static r/w – no calls to impure methods Moderate: – allow object allocation, if the object does not escape – allow heap r/w to non-escaping objects – allow calls to certain impure methods Weak: – moderate, but no limitations on heap reads Once-Impure: – weak, but no restrictions on the first invocation

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 14/30

slide-15
SLIDE 15

Moderate Purity

class Obj { int f; public Obj() { f = 10; } Obj bar() { Obj o = new Obj(); return o; } int foo() { // moderately pure Obj o = bar(); return o.f; } ...

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 15/30

slide-16
SLIDE 16

Weak and Once-Impure Purity

... static int x; int baz (Obj o) { // weakly pure return o.f; } int baf (boolean b) { // once-impure for TF+ if (b) { Obj.x = 9 * 6; // write to static field } return 42; } }

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 16/30

slide-17
SLIDE 17

Outline

1

Introduction and Motivation

2

Static Purity Analysis

3

Dynamic Purity Analysis

4

Experimental Results

5

Memoization

6

Conclusion and Future Work

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 17/30

slide-18
SLIDE 18

Method Purity

10 20 30 40 50 60 70 80 90 100 comp db jack javac jess mpeg rt Method Purity(%) strong_s strong_d moderate weak

  • nce_impure

Fairly uniform across benchmarks. Moderate purity does not improve much—cannot dereference input.

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 18/30

slide-19
SLIDE 19

Invocation Purity

10 20 30 40 50 60 70 80 90 100 comp db jack javac jess mpeg rt Invocation Purity(%) strong_s strong_d moderate weak

  • nce_impure

Unpredictable from method purity. Two different groups appear.

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 19/30

slide-20
SLIDE 20

Bytecode Purity

10 20 30 40 50 60 70 80 90 100 comp db jack javac jess mpeg rt Bytecode Purity(%) strong_s strong_d moderate weak

  • nce_impure

Somewhat predictable from invocation purity. Three different groups appear.

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 20/30

slide-21
SLIDE 21

Sources of Impurity

source comp db jack javac jess mpeg rt PUTFIELD 27% 29% 21% 21% 23% 24% 28% PUTFIELD+ 52% 52% 58% 66% 61% 60% 53% method impurity source comp db jack javac jess mpeg rt PUTFIELD 81% 82% 45% 25% 24% 40% 71% PUTFIELD+ 19% 17% 37% 58% 19% 60% 28% invocation impurity source comp db jack javac jess mpeg rt PUTFIELD 21% 85% 38% 25% 8% 11% 33% PUTFIELD+ 79% 13% 48% 66% 45% 89% 66% bytecode impurity

PUTFIELD is the main reason for impurity.

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 21/30

slide-22
SLIDE 22

Outline

1

Introduction and Motivation

2

Static Purity Analysis

3

Dynamic Purity Analysis

4

Experimental Results

5

Memoization

6

Conclusion and Future Work

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 22/30

slide-23
SLIDE 23

Using Purity for Memoization

Overview of memoization: Maps method input to output Allows repeat invocations of pure methods to be skipped Once-impure purity is a natural fit How can we use memoization? Candidate for optimization Good functional sanity test

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 23/30

slide-24
SLIDE 24

Memoization Framework

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 24/30

slide-25
SLIDE 25

Applying Memoization

Factors influencing memoization decisions: Method size (50 instructions) Input size (100 KB—otherwise potentially the whole heap!) Hashtable warm up period (1000 cold start misses) Hit ratio (better than 1 in 10) Global memory consumption (1 GB) These are fairly generous limits...

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 25/30

slide-26
SLIDE 26

Execution Times

100 200 300 comp db jack javac jess mpeg rt execution time(s) vanilla

  • nline
  • nline+memo
  • ffline+memo

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 26/30

slide-27
SLIDE 27

Memoization Improvements

Why doesn’t memoization achieve speedup? Small number of memoized methods Most memoized methods are short Usually, less than 1% of bytecode is skipped (best case 9%) Implementation limitations Potential improvements: Consider purity on a per-input basis Track only those fields read by the method Adaptively turn off memoization if no benefit Allow for cycles in input data structures

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 27/30

slide-28
SLIDE 28

Outline

1

Introduction and Motivation

2

Static Purity Analysis

3

Dynamic Purity Analysis

4

Experimental Results

5

Memoization

6

Conclusion and Future Work

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 28/30

slide-29
SLIDE 29

Conclusions

Static results correlate weakly with dynamic behaviour We considered three different metrics: method purity varies only slightly invocation purity separates benchmarks into two groups bytecode purity separates benchmarks into three groups Consumer applications can impose strong constraints

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 29/30

slide-30
SLIDE 30

Future Work

Future work: Consider purity at different granularities Visualize purity evolution over time Support arbitrary kinds of dynamic purity Memoization improvements Other applications besides memoization (lots!) – e.g., speculate past nearly pure methods

PASTE 2007 Haiying Xu, Chris Pickett, Clark Verbrugge 30/30