Heap Cloning: Enabling Dynamic Symbolic Execution of Java Programs - - PDF document

heap cloning enabling
SMART_READER_LITE
LIVE PREVIEW

Heap Cloning: Enabling Dynamic Symbolic Execution of Java Programs - - PDF document

11/27/2011 Heap Cloning: Enabling Dynamic Symbolic Execution of Java Programs Saswat Anand Mary Jean Harrold Supported by NSF (CCF-0725202, CCF-0541048) and IBM (Software Quality Innovation Award) Symbolic Execution 35 Databases 30


slide-1
SLIDE 1

11/27/2011 1

Heap Cloning: Enabling Dynamic Symbolic Execution of Java Programs

Saswat Anand Mary Jean Harrold

Supported by NSF (CCF-0725202, CCF-0541048) and IBM (Software Quality Innovation Award)

Symbolic Execution

5 10 15 20 25 30 35 2005 2006 2007 2008 2009 2010 2011 Databases Systems Verification Security Programming Languages Software Engineering Estimated number of publications related to symbolic execution. The list of papers available at: http://sites.google.com/site/symexbib

slide-2
SLIDE 2

11/27/2011 2

Symbolic Execution

5 10 15 20 25 30 35 2005 2006 2007 2008 2009 2010 2011 Databases Systems Verification Security Programming Languages Software Engineering Estimated number of publications related to symbolic execution. The list of papers available at: http://sites.google.com/site/symexbib

Goal of this work is to enable correct symbolic execution of real-world Java programs with minimal manual effort. Outline

  • Problem
  • Heap Cloning
  • Empirical evaluation
  • Conclusion
slide-3
SLIDE 3

11/27/2011 3

Outline

  • Problem
  • Heap Cloning
  • Empirical evaluation
  • Conclusion

Symbolic Execution of Real-World Programs

P’ not symbolically executed because:

  • 1. P’ uses difficult-to-

handle Java features such as native methods.

  • 2. symbolic execution of P’

is mostly unnecessary.

To be symbolically executed Not to be symbolically executed

Program

P P’

slide-4
SLIDE 4

11/27/2011 4

Symbolic Execution with User-specified Models

To be symbolically executed

Program

P PM

Replace P’ with user-specified models PM Impractical! Requires significant manual effort.

Dynamic Symbolic Execution

Program

P P’

Symbolic execution Concrete execution Call/return

P: to be symbolically executed P’: not to be symbolically executed

No manual effort to write models

slide-5
SLIDE 5

11/27/2011 5

Example

void negate() { this.f = -this.f; } Java version of native method negate

class A { int f; A(int x) { this.f = x; } native void negate(); } class A { int f; A(int x) { this.f = x; } native void negate(); void main() { } } 1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit();

Example

Task: Perform dynamic symbolic execution along the path (1-2- 3-4-5-7) that program takes for concrete input -10. Requirement: Native method negate() not to be symbolically executed

slide-6
SLIDE 6

11/27/2011 6

Dynamic Symbolic Execution

x

  • 10, X0

PC: true

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit();

Continued on next slide

state before: m = new A(x)

Dynamic Symbolic Execution

x

  • 10, X0

PC: true

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit();

Continued on next slide

state before: m = new A(x)

Path Constraint (PC) of a path is a constraint of input values. If PC is unsatisfiable, the path is infeasible. If PC is satisfiable, any solution of PC is a program input that takes the corresponding program path.

slide-7
SLIDE 7

11/27/2011 7

Dynamic Symbolic Execution

x

  • 10, X0

PC: true

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); Differences between two successive states are shown in Red.

Continued on next slide

state before: m = new A(x) state before: if(m.f < 0)

A m PC: true x

  • 10, X0

f

Dynamic Symbolic Execution

x

  • 10, X0

A m PC: true PC: true

state before: m.negate() 1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); Differences between two successive states are shown in Red.

Continued on next slide

state before: m = new A(x) state before: if(m.f < 0)

x

  • 10, X0

PC: X0 < 0

A m

  • 10, X0

x f f

slide-8
SLIDE 8

11/27/2011 8

Dynamic Symbolic Execution

Without model of negate native method

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); Differences between two successive states are shown in Red.

Continued from prev. slide

state before: m.negate()

PC: X0 < 0

A m

  • 10, X0

x f

Dynamic Symbolic Execution

states before: if(m.f < 0)

Without model of negate native method

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); Differences between two successive states are shown in Red.

Continued from prev. slide

state before: m.negate()

PC: X0 < 0

A m

  • 10, X0

x

PC: X0 < 0

A m 10, X0 x

  • 10, X0

f f

slide-9
SLIDE 9

11/27/2011 9

Dynamic Symbolic Execution

states before: if(m.f < 0)

Without model of negate native method

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); Differences between two successive states are shown in Red.

Continued from prev. slide

Concrete value of the field changed from -10 to 10. But, symbolic value X0 did not change.

state before: m.negate()

PC: X0 < 0

A m

  • 10, X0

x

PC: X0 < 0

A m 10, X0 x

  • 10, X0

f f

Dynamic Symbolic Execution

states before: if(m.f < 0)

Without model of negate native method

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); Differences between two successive states are shown in Red.

Continued from prev. slide

state before: m.negate()

PC: X0 < 0

A m

  • 10, X0

x

PC: X0 < 0

A m 10, X0 x

  • 10, X0

PC: X0 < 0 & X0 >= 0

Solve

UNSAT

slide-10
SLIDE 10

11/27/2011 10

Dynamic Symbolic Execution

states before: if(m.f < 0)

Without model of negate native method

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); Differences between two successive states are shown in Red.

Continued from prev. slide

state before: m.negate()

PC: X0 < 0

A m

  • 10, X0

x

PC: X0 < 0

A m 10, X0 x

  • 10, X0

PC: X0 < 0 & X0 >= 0

Meaning: there is no input that takes the path 1-2-3-4-5-7 Incorrect!

Solve

UNSAT

Dynamic Symbolic Execution with Models

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); Differences between two successive states are shown in Red.

Continued from prev. slide

state before: m.negate()

PC: X0 < 0

A m

  • 10, X0

x void negate() { this.f = -this.f; } User-specified model of negate

slide-11
SLIDE 11

11/27/2011 11

Dynamic Symbolic Execution with Models

states before: if(m.f < 0)

With model of negate native method

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); Differences between two successive states are shown in Red.

Continued from prev. slide

state before: m.negate()

PC: X0 < 0

A m

  • 10, X0

x

PC: X0 < 0

A m 10, -X0 x

  • 10, X0

void negate() { this.f = -this.f; } User-specified model of negate

Dynamic Symbolic Execution with Models

states before: if(m.f < 0)

With model of negate native method

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); Differences between two successive states are shown in Red.

Continued from prev. slide

state before: m.negate()

PC: X0 < 0

A m

  • 10, X0

x

PC: X0 < 0

A m 10, -X0 x

  • 10, X0

PC: X0 < 0 & -X0 >= 0

Solve

X0 = -1

slide-12
SLIDE 12

11/27/2011 12

But, Models Not Always Needed for Soundness!

Original example Modified example void negate() { this.f = -this.f; }

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); 1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(x < 0) 6 error(); 7 exit();

But, Models Not Always Needed for Soundness!

Original example Modified example void negate() { this.f = -this.f; }

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit(); 1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(x < 0) 6 error(); 7 exit();

Stmt at line 5 uses the value defined inside negate method. Stmt at line 5 uses the value defined inside negate method. Thus, negate introduces imprecision. Thus, model for negate is needed. Stmt at line 5 does not use the value defined inside negate method. Stmt at line 5 does not use the value defined inside negate method. Thus, negate does not introduce imprecision. Thus, no need for model for negate.

slide-13
SLIDE 13

11/27/2011 13

Goal and Approach

Goal:

Identify where imprecision is introduced and report to

  • user. User then provides models for only a subset

methods in P’ to eliminate imprecision.

Approach:

  • 1. Transform the program using Heap Cloning.
  • 2. Perform dynamic symbolic execution of the

transformed program without any models

  • 3. Detect and report where imprecision might be

introduced.

Outline

  • Problem
  • Heap Cloning
  • Empirical evaluation
  • Conclusion
slide-14
SLIDE 14

11/27/2011 14

Heap Cloning

Original Program P Transformed Program PT

  • 1. Symbolic execution of PT generates same

results as symbolic execution of P.

  • 2. However, if imprecision is introduced

during symbolic execution of PT, it can be detected and reported to user.

Heap Cloning

Transformation class HC.java.lang.Object { … } class HC.A extends HC.java.lang.Object { HC.A m = new HC.A(); }

Heap Cloning

For every class A, Heap Cloning generates HC.A. For each field (method)

  • f A, HC.A has a

corresponding field (method). If class A extends class B, HC.A extends HC.B Almost all references to A are changed to HC.A

slide-15
SLIDE 15

11/27/2011 15

class HC.java.lang.Object { … } class HC.A extends HC.java.lang.Object { int f; … void main() { HC.A m = new HC.A(); … } }

Heap Cloning

For every class A, Heap Cloning generates HC.A. For each field (method)

  • f A, HC.A has a

corresponding field (method). If class A extends class B, HC.A extends HC.B Almost all references to A are changed to HC.A

Heap Cloning

Effects of the statement m = new A on program state

Effect in the original program P

m A m HC.A A

shadow Effect in the transformed program PT Shadow

  • bject

For every object of class A in P, PT allocates two

  • bjects: one of class A,
  • ther of class HC.A
slide-16
SLIDE 16

11/27/2011 16

Heap Cloning

Effects of the statement m.f = -10 in PT

m HC.A A

shadow

  • 10

f f

Effects of the statement m.f = v (v is a reference type variable) in PT

m HC.A A

shadow

HC.B B

shadow

v

f f

Heap Cloning

Effects of the statement m.f = -10 in PT

m HC.A A

shadow

  • 10

f f

Effects of the statement m.f = v (v is a reference type variable) in PT

m HC.A A

shadow

HC.B B

shadow

v

f f

Every heap location is cloned. Values of both locations remain “consistent” if P’ does not modify value of shadow location

slide-17
SLIDE 17

11/27/2011 17

Heap Cloning

Effects of the statement m.f = -10 in PT

m HC.A A

shadow

  • 10

f f

Effects of the statement m.f = v (v is a reference type variable) in PT

m HC.A A

shadow

HC.B B

shadow

v

f f

Heap Cloning

Effects of the statement m.f = -10 in PT

m HC.A A

shadow

  • 10

f f

Effects of the statement m.f = v (v is a reference type variable) in PT

m HC.A A

shadow

HC.B B

shadow

v

f f

Every heap location is cloned. Values of both locations remain “consistent” if P’ does not modify value of shadow location

slide-18
SLIDE 18

11/27/2011 18

Heap Cloning

Effects of the statement m.f = -10 in PT

m HC.A A

shadow

  • 10

f f

Effects of the statement m.f = v (v is a reference type variable) in PT

m HC.A A

shadow

HC.B B

shadow

v

f f

Inconsistency implies imprecision

Heap Cloning

m HC.A A

shadow

  • 10, A

f f

x m HC.A A

shadow

  • 10, X0

f f

x 10

state before m.negate state after m.negate()

Two conditions are satisfied and indicate potential imprecision.

  • 1. Concrete values are different.
  • 2. m.f has a symbolic value

1 x = read(); 2 m = new A(x); 3 if(m.f < 0) 4 m.negate(); 5 if(m.f < 0) 6 error(); 7 exit();

slide-19
SLIDE 19

11/27/2011 19

Outline

  • Problem
  • Heap Cloning
  • Empirical evaluation
  • Conclusion

Empirical Study

Implementation: Cinger

PHC PT

Heap Cloning transformation

PT

slide-20
SLIDE 20

11/27/2011 20

Empirical Study

Subject Covered lines of code

  • Avg. no. of

conjuncts in path constraints Application Library

NanoXML 1,230 14,604 19,582 JLex 6,566 13,702 65,068 Sat4J-Dimacs 3,908 17,195 60,351 Sat4J-CSP 4,125 39,617 629,078 BCEL 2,321 12,659 34,161 Lucene 20,821 56,622 47,248

Subjects

Empirical Study

Evaluate the reduction in number of methods for which user must write models, when P’ is the set of all native methods that the program uses

  • 1. Estimate the number of native methods that may

cause side effects

  • 2. Count the number of native methods that introduce

imprecision when symbolic execution is performed

  • n Heap-Cloning transformed program.
  • 3. Compare the two

Goal Method

slide-21
SLIDE 21

11/27/2011 21

Empirical Study

Results

Subject Number of unique native methods that were executed and that took reference- type parameters

NanoXML 4 JLex 6 Sat4J-Dimacs Sat4J-CSP 9 BCEL 4 Lucene 3

Over all subjects, only one native method System.arraycopy introduced imprecision, and thus needed model.

Summary and Future Work

  • 1. Heap Cloning worked well for selected
  • subjects. In future, use more and diverse

set of subjects.

  • 2. Heap Cloning reduced manual effort when

P’ consisted of all native methods. In future, treat framework (e.g., Google’s Android framework) code as part of P’.

  • 3. Heap Cloning could identify where models

are necessary. In future, check for correctness of user-specified models

slide-22
SLIDE 22

11/27/2011 22

Contributions

  • 1. Heap Cloning technique that

enables correct dynamic symbolic execution, but requires minimal manual effort to specify models

  • 2. Implementation of Cinger system that

is capable of correct dynamic symbolic execution

  • f real-world Java programs
  • 3. Empirical studies that

shows Heap Cloning identifies a small number of methods that cause imprecision and need to be manually modeled