Large-Scale API Protocol Mining for Automated Bug Detection Michael - - PowerPoint PPT Presentation

large scale api protocol mining for automated bug
SMART_READER_LITE
LIVE PREVIEW

Large-Scale API Protocol Mining for Automated Bug Detection Michael - - PowerPoint PPT Presentation

Large-Scale API Protocol Mining for Automated Bug Detection Michael Pradel Department of Computer Science ETH Zurich 1 Motivation LinkedList pinConnections = ...; Iterator i = pinConnections.iterator (); while ( i.hasNext () ) { PinLink


slide-1
SLIDE 1

1

Large-Scale API Protocol Mining for Automated Bug Detection

Michael Pradel Department of Computer Science ETH Zurich

slide-2
SLIDE 2

2

Motivation

LinkedList pinConnections = ...; Iterator i = pinConnections.iterator (); while ( i.hasNext () ) { PinLink curr = (PinLink) i.next (); if ( ... ) { pinConnections.remove(curr); } }

(from DaCapo benchmarks)

slide-3
SLIDE 3

2

Motivation

LinkedList pinConnections = ...; Iterator i = pinConnections.iterator (); while ( i.hasNext () ) { PinLink curr = (PinLink) i.next (); if ( ... ) { pinConnections.remove(curr); } }

(from DaCapo benchmarks)

Don’t modify a collection while iterating over it!

slide-4
SLIDE 4

3

API Usage Constraints

call x before y eventually call x don’t call x while calling y and z

Constraints Program API

slide-5
SLIDE 5

3

API Usage Constraints

call x before y eventually call x don’t call x while calling y and z

Protocols!

x y z

Constraints Program API

slide-6
SLIDE 6

4

Protocol Mining

Problem: No protocols specified

API Training Programs

slide-7
SLIDE 7

4

Protocol Mining

Problem: No protocols specified

x y z

Mining

API Training Programs

slide-8
SLIDE 8

4

Protocol Mining

Problem: No protocols specified

x y z Target Program

Mining Bug finding

API Training Programs

slide-9
SLIDE 9

5

Big Picture

Protocol mining Bug finding

slide-10
SLIDE 10

5

Big Picture

Protocol mining Bug finding Other applications

slide-11
SLIDE 11

5

Big Picture

Protocol mining Bug finding Other applications

slide-12
SLIDE 12

5

Big Picture

Protocol mining Bug finding Other applications

static vs. dynamic API-based vs. client-based single- vs. multi-object

slide-13
SLIDE 13

5

Big Picture

Protocol mining Bug finding Other applications

static vs. dynamic API-based vs. client-based single- vs. multi-object static dynamic

verification testing verification testing

slide-14
SLIDE 14

5

Big Picture

Protocol mining Bug finding Other applications

static vs. dynamic API-based vs. client-based single- vs. multi-object static dynamic

verification testing verification testing

slide-15
SLIDE 15

6

Part 1: Protocol Mining

slide-16
SLIDE 16

7

Protocol Mining - Overview

Execution trace Subtraces Protocols Program & Input

slide-17
SLIDE 17

7

Protocol Mining - Overview

Execution trace Subtraces Protocols Program & Input

slide-18
SLIDE 18

8

Execution Traces

List l = new LinkedList (); l.add(new Foo ()); Iterator i = l.iterator (); OutputStream s = new FileOutputStream("f"); while (i.hasNext ()) { Foo f = i.next (); if (f.isOK ()) s.write(f.getData ()); } s.close ();

slide-19
SLIDE 19

8

Execution Traces

List l = new LinkedList (); l.add(new Foo ()); Iterator i = l.iterator (); OutputStream s = new FileOutputStream("f"); while (i.hasNext ()) { Foo f = i.next (); if (f.isOK ()) s.write(f.getData ()); } s.close ();

AspectJ instrumentation

slide-20
SLIDE 20

8

Execution Traces

List l = new LinkedList (); l.add(new Foo ()); Iterator i = l.iterator (); OutputStream s = new FileOutputStream("f"); while (i.hasNext ()) { Foo f = i.next (); if (f.isOK ()) s.write(f.getData ()); } s.close ();

new LinkedList → 1 1.add(2) → 3 1.iterator → 4 new FileOS(6) → 5 4.hasNext 4.next 5.write(7) 4.hasNext 5.close

AspectJ instrumentation

slide-21
SLIDE 21

9

Protocol Mining - Overview

Execution trace Subtraces Protocols Program & Input

slide-22
SLIDE 22

9

Protocol Mining - Overview

Execution trace Subtraces Protocols Program & Input

slide-23
SLIDE 23

10

Subtraces

new LinkedList → 1 1.add(2) → 3 1.iterator → 4 new FileOS(6) → 5 4.hasNext 4.next 5.write(7) 4.hasNext 5.close

slide-24
SLIDE 24

10

Subtraces

Different API usages intermingled!

new LinkedList → 1 1.add(2) → 3 1.iterator → 4 new FileOS(6) → 5 4.hasNext 4.next 5.write(7) 4.hasNext 5.close

slide-25
SLIDE 25

10

Subtraces

Core object x:

Calls to x Calls to parameters

passed to x

Calls to objects

returned by x

new LinkedList → 1 1.add(2) → 3 1.iterator → 4 new FileOS(6) → 5 4.hasNext 4.next 5.write(7) 4.hasNext 5.close

slide-26
SLIDE 26

10

Subtraces

new LinkedList → 1 1.add(2)→3 1.iterator→4 4.hasNext 4.next 4.hasNext 4.hasNext 4.next 4.hasNext new FileOS(6) → 5 5.write(7) 5.close

new LinkedList → 1 1.add(2) → 3 1.iterator → 4 new FileOS(6) → 5 4.hasNext 4.next 5.write(7) 4.hasNext 5.close

slide-27
SLIDE 27

11

Protocol Mining - Overview

Execution trace Subtraces Protocols Program & Input

slide-28
SLIDE 28

11

Protocol Mining - Overview

Execution trace Subtraces Protocols Program & Input

slide-29
SLIDE 29

12

Group Subtraces

LinkedList, Iterator Iterator FileOS

Group by set of involved types

slide-30
SLIDE 30

13

Generate Protocols

Finite state machine

Method → state Consecutive call → transition

slide-31
SLIDE 31

13

Generate Protocols

Finite state machine

Method → state Consecutive call → transition

new LinkedList → l l.add l.add l.add l.add l.iterator → i i.hasNext i.hasNext i.hasNext i.next l.add

slide-32
SLIDE 32

13

Generate Protocols

Finite state machine

Method → state Consecutive call → transition

new LinkedList → l l.add l.add l.add l.iterator → i i.hasNext i.hasNext i.next

slide-33
SLIDE 33

14

Scalability

Bottleneck: Large execution traces

slide-34
SLIDE 34

14

Scalability

Bottleneck: Large execution traces Pass 1: Find core

  • bjects and

associated

  • bjects
slide-35
SLIDE 35

14

Scalability

Bottleneck: Large execution traces Pass 1: Find core

  • bjects and

associated

  • bjects

Pass 2: Extract calls for each subtrace

slide-36
SLIDE 36

14

Scalability

Bottleneck: Large execution traces Pass 1: Find core

  • bjects and

associated

  • bjects

Pass 2: Extract calls for each subtrace Mine millions of events in a few minutes

slide-37
SLIDE 37

15

Examples

new ZipFile → f f.entries → e e.hasMore Elements e.hasMore Elements e.hasMore Elements e.next Element f.close

ZipFile and Enumeration

slide-38
SLIDE 38

16

Examples

new URL → u u.openStream → s s.close

URL and InputStream

slide-39
SLIDE 39

17

Evaluation

Are examples convincing enough?

OK/ Not OK 20+ mining approaches Use for some task

slide-40
SLIDE 40

17

Evaluation

Are examples convincing enough? Neither reproducible nor comparable!

OK/ Not OK 20+ mining approaches Use for some task

slide-41
SLIDE 41

18

Evaluation Framework

Precision and recall ? OK

slide-42
SLIDE 42

18

Evaluation Framework

Precision and recall ? OK Method constraint groups

slide-43
SLIDE 43

19

Example

Formatter format close close locale, out, ioException flush ioException format, locale, out, ioException println Formatter format close println

Mined protocol M Reference protocol

slide-44
SLIDE 44

19

Example

Formatter format close close locale, out, ioException flush ioException format, locale, out, ioException println Formatter format close println

Mined protocol M Reference protocol

Precision: How much of M is correct? → 23%

slide-45
SLIDE 45

19

Example

Formatter format close close locale, out, ioException flush ioException format, locale, out, ioException println Formatter format close println

Mined protocol M Reference protocol

Recall: How complete is M? → 9%

slide-46
SLIDE 46

20

Results

12 training programs 32 reference protocols

slide-47
SLIDE 47

21

Learn More from Many Teachers?

Empirical study: How does mining more programs influence the results? ...

12 programs

. . .

slide-48
SLIDE 48

22

API Coverage

Types and methods covered in mined protocols

slide-49
SLIDE 49

22

API Coverage

Types and methods covered in mined protocols More programs ↓ Higher coverage

slide-50
SLIDE 50

23

Recall

Recall of mined protocols

slide-51
SLIDE 51

23

Recall

Recall of mined protocols More programs ↓ Higher recall

slide-52
SLIDE 52

24

Part 2: Bug Finding

slide-53
SLIDE 53

25

Bug Finding

x y z Target Program

Mining Bug finding

API Training Programs

slide-54
SLIDE 54

25

Bug Finding

x y z Target Program

Mining Bug finding

API Training Programs

slide-55
SLIDE 55

26

Overview

Analysis

slide-56
SLIDE 56

26

Overview

Analysis Runtime verification Static checking

slide-57
SLIDE 57

26

Overview

Analysis Runtime verification Static checking What is a protocol violation?

slide-58
SLIDE 58

26

Overview

Analysis Runtime verification Static checking What is a protocol violation?

slide-59
SLIDE 59

27

Incomplete Specifications

Protocols = Incomplete specifications

new LinkedList → l l.add l.add l.add l.iterator → i i.hasNext i.hasNext i.next

slide-60
SLIDE 60

27

Incomplete Specifications

Protocols = Incomplete specifications

new LinkedList → l l.add l.add l.add l.iterator → i i.hasNext i.hasNext i.next

?

i.hasNext

slide-61
SLIDE 61

28

What is a Protocol Violation?

new LinkedList → l l.add l.add l.add l.iterator → i i.hasNext i.hasNext i.next

Setup phase:

bind parameters

Liable phase:

all parameters bound violation: take non-existing transition end in non-final state

slide-62
SLIDE 62

29

State Partitioning

Protocol transformation: Setup states vs. liable states

new LinkedList → l l.add l.add l.add l.iterator → i i.hasNext i.hasNext i.next

slide-63
SLIDE 63

29

State Partitioning

Protocol transformation: Setup states vs. liable states

setup

new LinkedList → l l.add l.add l.add l.iterator → i i.hasNext i.hasNext i.next

slide-64
SLIDE 64

29

State Partitioning

Protocol transformation: Setup states vs. liable states

liable

new LinkedList → l l.add l.add l.add l.iterator → i i.hasNext i.hasNext i.next

slide-65
SLIDE 65

29

State Partitioning

Protocol transformation: Setup states vs. liable states

ambiguous → split

new LinkedList → l l.add l.add l.add l.iterator → i i.hasNext i.hasNext i.next

slide-66
SLIDE 66

29

State Partitioning

Protocol transformation: Setup states vs. liable states

new LinkedList → l l.add l.iterator → i i.hasNext i.hasNext i.next l.add l.add l.add

slide-67
SLIDE 67

29

State Partitioning

Protocol transformation: Setup states vs. liable states

new LinkedList → l l.add l.iterator → i i.hasNext i.hasNext i.next l.add l.add l.add

setup liable

slide-68
SLIDE 68

30

Overview

Analysis Runtime verification Static checking

slide-69
SLIDE 69

30

Overview

Analysis Runtime verification Static checking

slide-70
SLIDE 70

31

Dynamic Protocol Checking

Runtime verification (JavaMOP) Input

slide-71
SLIDE 71

31

Dynamic Protocol Checking

Challenge 1: Check many different execution paths Runtime verification (JavaMOP) Input

slide-72
SLIDE 72

31

Dynamic Protocol Checking

Challenge 1: Check many different execution paths Challenge 2: Monitoring mined protocols Runtime verification (JavaMOP) Input

slide-73
SLIDE 73

32

Randomly Generated Input

Challenge 1: Check many different execution paths

Input

slide-74
SLIDE 74

32

Randomly Generated Input

Challenge 1: Check many different execution paths

Random test generation Input

slide-75
SLIDE 75

32

Randomly Generated Input

Challenge 1: Check many different execution paths

Random test generation Call sequences that trigger an exception Input

slide-76
SLIDE 76

32

Randomly Generated Input

Challenge 1: Check many different execution paths

Random test generation Call sequences that trigger an exception Non-exceptional sequences Input

slide-77
SLIDE 77

33

Protocol Monitoring

new LinkedList → l i.iterator → i i.hasNext i.next

Challenge 2: Monitoring mined protocols

l = new LinkedList ()

slide-78
SLIDE 78

33

Protocol Monitoring

l

new LinkedList → l i.iterator → i i.hasNext i.next

Challenge 2: Monitoring mined protocols

l = new LinkedList ()

slide-79
SLIDE 79

33

Protocol Monitoring

l

new LinkedList → l i.iterator → i i.hasNext i.next

Challenge 2: Monitoring mined protocols

l = new LinkedList ()

slide-80
SLIDE 80

33

Protocol Monitoring

l

new LinkedList → l i.iterator → i i.hasNext i.next

Challenge 2: Monitoring mined protocols

l = new LinkedList ()

Violation!?

slide-81
SLIDE 81

33

Protocol Monitoring

l = new LinkedList () i1 = l.iterator () i2.next()

l

new LinkedList → l i.iterator → i i.hasNext i.next

Challenge 2: Monitoring mined protocols

slide-82
SLIDE 82

33

Protocol Monitoring

l = new LinkedList () i1 = l.iterator () i2.next()

(l,i1)

new LinkedList → l i.iterator → i i.hasNext i.next

Challenge 2: Monitoring mined protocols

slide-83
SLIDE 83

33

Protocol Monitoring

l = new LinkedList () i1 = l.iterator () i2.next()

i2 (l,i1)

new LinkedList → l i.iterator → i i.hasNext i.next

Challenge 2: Monitoring mined protocols

slide-84
SLIDE 84

33

Protocol Monitoring

l = new LinkedList () i1 = l.iterator () i2.next()

Violation!? (l,i1)

new LinkedList → l i.iterator → i i.hasNext i.next

i2 Challenge 2: Monitoring mined protocols

slide-85
SLIDE 85

33

Protocol Monitoring

l = new LinkedList () i1 = l.iterator () i2.next()

Violation!? (l,i1)

new LinkedList → l i.iterator → i i.hasNext i.next

i2 Challenge 2: Monitoring mined protocols Naive approach gives too many violations

slide-86
SLIDE 86

34

Explicit Fail Transitions

Fail only in liable states

new LinkedList → l i.iterator → i i.hasNext i.next

F

i.next i.hasNext

slide-87
SLIDE 87

34

Explicit Fail Transitions

Fail only in liable states

Violation:

Reach fail state End in non-final, liable state

new LinkedList → l i.iterator → i i.hasNext i.next

F

i.next i.hasNext

slide-88
SLIDE 88

35

Evaluation

Questions

Find relevant issues by monitoring

mined protocols?

How useful is generated input?

Setup: DaCapo benchmarks, 1.6 MLOC Java

slide-89
SLIDE 89

36

Results

Protocol violations Program Test cases Total Relevant avrora 15,753 5 4 batik 3,477 daytrader 32,446 eclipse 816 fop 6,536 52 50 h2 7,584 14 7 lucene 1,985 pmd 1,286 sunflow 4,300 1 tomcat 14,627 1 1 xalan 21,083 1 1 Sum 160,857 74 63

Randomly generated Bug (exception, unexpected behavior)

  • r

code smell (perfor- mance/maintainability problem)

slide-90
SLIDE 90

36

Results

Protocol violations Program Test cases Total Relevant avrora 15,753 5 4 batik 3,477 daytrader 32,446 eclipse 816 fop 6,536 52 50 h2 7,584 14 7 lucene 1,985 pmd 1,286 sunflow 4,300 1 tomcat 14,627 1 1 xalan 21,083 1 1 Sum 160,857 74 63

Randomly generated Bug (exception, unexpected behavior)

  • r

code smell (perfor- mance/maintainability problem)

85% true positives

slide-91
SLIDE 91

37

Examples

try { is = u.openStream (); r = new InputStreamReader(is, "UTF

  • 8");

br = new BufferedReader(r); } finally { if ( is != null ){ try { is.close (); } catch ( IOException ignored ){} is = null; } if ( r != null ){ try{ r.close (); } catch ( IOException ignored ){} r = null; } if ( br == null ){ try{ br.close (); } catch ( IOException ignored ){} br = null; } }

slide-92
SLIDE 92

37

Examples

try { is = u.openStream (); r = new InputStreamReader(is, "UTF

  • 8");

br = new BufferedReader(r); } finally { if ( is != null ){ try { is.close (); } catch ( IOException ignored ){} is = null; } if ( r != null ){ try{ r.close (); } catch ( IOException ignored ){} r = null; } if ( br == null ){ try{ br.close (); } catch ( IOException ignored ){} br = null; } }

Reader never closed

slide-93
SLIDE 93

38

Examples

Iterator i = pinConnections.iterator (); PinLink currLink = (PinConnect.PinLink) i.next (); currLink.propagateSignals (); while (i.hasNext ()) { currLink = (PinConnect.PinLink) i.next (); currLink.propagateSignals (); }

slide-94
SLIDE 94

38

Examples

Iterator i = pinConnections.iterator (); PinLink currLink = (PinConnect.PinLink) i.next (); currLink.propagateSignals (); while (i.hasNext ()) { currLink = (PinConnect.PinLink) i.next (); currLink.propagateSignals (); }

Incorrect iterator usage

slide-95
SLIDE 95

39

Normal vs. Generated Input

slide-96
SLIDE 96

40

Overview

Analysis Runtime verification Static checking

slide-97
SLIDE 97

40

Overview

Analysis Runtime verification Static checking

slide-98
SLIDE 98

41

State of the Art

Typestate checking + specification Anomaly detection

slide-99
SLIDE 99

41

State of the Art

+ Precise

  • Needs specification

+ Automatic

  • Imprecise

Typestate checking + specification Anomaly detection

slide-100
SLIDE 100

41

State of the Art

+ Precise

  • Needs specification

+ Automatic

  • Imprecise

Combine both! Precise checker for mined multi-object protocols Typestate checking + specification Anomaly detection

slide-101
SLIDE 101

42

Static Protocol Checking

Joint work with Ciera Jaspan and Jonathan Aldrich (ISR, CMU)

Fusion analysis f() ✓ g() ✗ h() ✗ f() g() h() Pruning Relationship constraints

@Constraint( requires="..." )

slide-102
SLIDE 102

43

Fusion

  • C. Jaspan and J. Aldrich

Checking Framework Interactions with Relationships ECOOP ’09

Static analysis to check API usages Good match with mined protocols:

Reasons about interacting objects Distinguishes setup from checking

slide-103
SLIDE 103

44

Relationship-based Analysis

void m() { .. }

Effects: Add/remove objects Requirements: Check before call

Relationships = Tuples of objects

slide-104
SLIDE 104

44

Relationship-based Analysis

void m() { .. }

Effects: Add/remove objects Requirements: Check before call

Relationships = Tuples of objects

Keep track of protocol execution (e.g., current state) Check protocol constraints if in liable state

slide-105
SLIDE 105

45

Example

LinkedList l = new LinkedList (); Iterator i = l.iterator (); i.next ();

new LinkedList → l i.iterator → i i.hasNext i.next

1 2 3 4

slide-106
SLIDE 106

45

Example

LinkedList l = new LinkedList (); Iterator i = l.iterator (); i.next ();

new LinkedList → l i.iterator → i i.hasNext i.next

1 2 3 4 l ∈ rstate2, l ∈ riterator

slide-107
SLIDE 107

45

Example

LinkedList l = new LinkedList (); Iterator i = l.iterator (); i.next ();

new LinkedList → l i.iterator → i i.hasNext i.next

1 2 3 4 l ∈ rstate2, l ∈ riterator l ∈ rstate3, i ∈ rstate3, i ∈ rhasNext, (l, i) ∈ rprotocol

slide-108
SLIDE 108

45

Example

LinkedList l = new LinkedList (); Iterator i = l.iterator (); i.next ();

new LinkedList → l i.iterator → i i.hasNext i.next

1 2 3 4 l ∈ rstate2, l ∈ riterator l ∈ rstate3, i ∈ rstate3, i ∈ rhasNext, (l, i) ∈ rprotocol

i / ∈ rnext

slide-109
SLIDE 109

46

Results

Program Warnings Total Bugs Code smells True pos. avrora 13 9 69% batik 1 0% daytrader — eclipse 15 2 1 20% fop 13 8 1 69% h2 1 0% jython 7 2 1 43% lucene 13 3 3 46% pmd 15 2 8 67% sunflow — tomcat 2 0% xalan 1 1 100% Total 81 26 15 51% exception or unexpected behavior maintainability

  • r performance

issue

slide-110
SLIDE 110

46

Results

Program Warnings Total Bugs Code smells True pos. avrora 13 9 69% batik 1 0% daytrader — eclipse 15 2 1 20% fop 13 8 1 69% h2 1 0% jython 7 2 1 43% lucene 13 3 3 46% pmd 15 2 8 67% sunflow — tomcat 2 0% xalan 1 1 100% Total 81 26 15 51% exception or unexpected behavior maintainability

  • r performance

issue

Few false positives!

slide-111
SLIDE 111

47

Examples

LinkedList pinConnections = ...; Iterator i = pinConnections.iterator (); while ( i.hasNext () ) { PinLink curr = (PinLink) i.next (); if ( ... ) { pinConnections.remove(curr); } }

slide-112
SLIDE 112

47

Examples

LinkedList pinConnections = ...; Iterator i = pinConnections.iterator (); while ( i.hasNext () ) { PinLink curr = (PinLink) i.next (); if ( ... ) { pinConnections.remove(curr); } }

Concurrent modification

slide-113
SLIDE 113

48

Examples

BufferedReader in = null; try { in = new BufferedReader (...); ... in.close (); } finally { if (in != null) { try { in.close (); } catch (IOException e) { ... } } }

slide-114
SLIDE 114

48

Examples

BufferedReader in = null; try { in = new BufferedReader (...); ... in.close (); } finally { if (in != null) { try { in.close (); } catch (IOException e) { ... } } }

Duplicate close

slide-115
SLIDE 115

49

Summary: Bug Finding

Clear definition of protocol violation

liable setup

Static and dynamic checking:

Both are practical Complement each other

slide-116
SLIDE 116

50

Conclusion

An attractive tool to help programmers

understand large systems pinpoint problem areas

Supplements other approaches:

Reveals multi-object bugs Easy to use

slide-117
SLIDE 117

51

Thank you!

michael@binaervarianz.de

slide-118
SLIDE 118

52

slide-119
SLIDE 119

53

What is a Violation?

next next

Positive protocols Negative protocols vs.

new Writer close write next hasNext hasNext

slide-120
SLIDE 120

53

What is a Violation?

next next

Positive protocols Negative protocols vs.

new Writer close write next hasNext hasNext

Reaching final state Taking non-existing

transition

Not reaching final

state

slide-121
SLIDE 121

54

Related Work (Mining)

Gabel & Su, FSE 2008 Language learning algorithm with

pre-defined micro-patterns

Don’t consider dataflow Lee, Chen & Rosu, ICSE 2011 First, learn related events; then, mine

sliced trace with PFSA learner

Require unit tests for first step

slide-122
SLIDE 122

55

Template

slide-123
SLIDE 123

56

Template

slide-124
SLIDE 124

57

Template

slide-125
SLIDE 125

58

Template

slide-126
SLIDE 126

59

Template

slide-127
SLIDE 127

60