Is This Class Thread-Safe? Inferring Documentation using - - PowerPoint PPT Presentation

is this class thread safe inferring documentation using
SMART_READER_LITE
LIVE PREVIEW

Is This Class Thread-Safe? Inferring Documentation using - - PowerPoint PPT Presentation

Is This Class Thread-Safe? Inferring Documentation using Graph-Based Learning Andrew Habib, Michael Pradel TU Darmstadt, Germany software-lab.org 1 Thread-Safe Classes 2 Thread-Safe Classes 3 Created by Freepik Thread-Safe Classes 3


slide-1
SLIDE 1

1

Andrew Habib, Michael Pradel TU Darmstadt, Germany

software-lab.org

Is This Class Thread-Safe? Inferring Documentation using Graph-Based Learning

slide-2
SLIDE 2

2

Thread-Safe Classes

slide-3
SLIDE 3

3

Thread-Safe Classes

Created by Freepik

slide-4
SLIDE 4

3

Thread-Safe Classes

Created by Freepik

slide-5
SLIDE 5

3

Thread-Safe Classes

Is this class thread-safe?

Created by Freepik

?

slide-6
SLIDE 6

3

Thread-Safe Classes

Is this class thread-safe?

Inspect manually

Created by Freepik

?

slide-7
SLIDE 7

3

Thread-Safe Classes

Assume not thread-safe

Is this class thread-safe?

Inspect manually

Created by Freepik

?

slide-8
SLIDE 8

3

Thread-Safe Classes

Assume not thread-safe Assume thread-safe

Is this class thread-safe?

Inspect manually

Created by Freepik

?

slide-9
SLIDE 9

4

Documentation of Thread Safety

Case study: The Qualitas Corpus

112 Java projects 179,239 classes

slide-10
SLIDE 10

4

Documentation of Thread Safety

Case study: The Qualitas Corpus

112 Java projects 179,239 classes Search: concu, thread, sync, parallel 8,655 search hits Randomly sample 120 hits Manually inspect random sample

slide-11
SLIDE 11

4

Documentation of Thread Safety

Case study: The Qualitas Corpus

Search: concu, thread, sync, parallel 8,655 search hits (from 179,239 classes) Manually inspect random sample of 120 hits

Documented as: Count % Thread-safe 11 9.2% Not thread-safe 12 10.0% Conditionally thread-safe 2 1.7% No documentation 95 79.2% Total inspected classes 120 100.0%

slide-12
SLIDE 12

4

Documentation of Thread Safety

Case study: The Qualitas Corpus

Search: concu, thread, sync, parallel 8,655 search hits (from 179,239 classes) Manually inspect random sample of 120 hits

Documented as: Count % Thread-safe 11 9.2% Not thread-safe 12 10.0% Conditionally thread-safe 2 1.7% No documentation 95 79.2% Total inspected classes 120 100.0%

21%

slide-13
SLIDE 13

4

Documentation of Thread Safety

Case study: The Qualitas Corpus

Search: concu, thread, sync, parallel 8,655 search hits (from 179,239 classes) Manually inspect random sample of 120 hits

Documented as: Count % Thread-safe 11 9.2% Not thread-safe 12 10.0% Conditionally thread-safe 2 1.7% No documentation 95 79.2% Total inspected classes 120 100.0%

21% By extrapolation: % of documented classes = 1.004%

slide-14
SLIDE 14

5

Is This Class Thread-Safe?

Given an object-oriented class with unknown multi-threading behaviour, infer whether it is supposed to be thread-safe or not

slide-15
SLIDE 15

5

Is This Class Thread-Safe?

Given an object-oriented class with unknown multi-threading behaviour, infer whether it is supposed to be thread-safe or not This talk: TSFinder Machine learning approach to infer thread-safety documentation

slide-16
SLIDE 16

6

Overview of TSFinder

Labeled training classes Extracted graphs Graph kernel matrix SVM model

Training

slide-17
SLIDE 17

6

Overview of TSFinder

New class Extracted graphs Feature vector Thread-safe Thread-unsafe Labeled training classes Extracted graphs Graph kernel matrix SVM model

Classification Training

slide-18
SLIDE 18

7

Field-Focused Graphs

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

slide-19
SLIDE 19

7

Field-Focused Graphs

seq

f

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

slide-20
SLIDE 20

7

Field-Focused Graphs

seq

f

private

Mod Mod

volatile

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

Mod: Modifier

slide-21
SLIDE 21

7

Field-Focused Graphs

seq

f

Writes

m

reset()

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

slide-22
SLIDE 22

7

Field-Focused Graphs

m

seq

f

Reads

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

isMax()

slide-23
SLIDE 23

7

Field-Focused Graphs

next() seq

m

public this

f

Reads Writes Mod Sync

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

slide-24
SLIDE 24

7

Field-Focused Graphs

init

m

next() Sequence(int) seq

m f

Writes

m

Calls Reads Calls

reset()

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

isMax()

slide-25
SLIDE 25

7

Field-Focused Graphs

init

m

next() Sequence(int) seq

m

public this

f

private

Writes

m

Calls

public

Mod Reads Writes Mod Sync Reads Mod Mod

volatile

Calls

reset()

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

Mod: Modifier

isMax()

slide-26
SLIDE 26

8

Field-Focused Graphs (2)

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

slide-27
SLIDE 27

8

Field-Focused Graphs (2)

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

slide-28
SLIDE 28

8

Field-Focused Graphs (2)

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

MAX

f

seq

f m

isMax()

Reads Reads

slide-29
SLIDE 29

8

Field-Focused Graphs (2)

public class Sequence { private volatile int seq; private int MAX; public Sequence(int m) { MAX = m; reset (); } synchronized public int next() { if(! isMax ()) return seq ++; return -1; } boolean isMax () { return seq > MAX; } void reset () { seq = 0; } }

MAX

f

seq

f m

isMax()

Reads Reads

Build the rest of the graph as before

slide-30
SLIDE 30

9

Class to Vector

Known classes New class C

slide-31
SLIDE 31

9

Class to Vector

* We use the Weisfeiler-Lehman Graph Kernels [Shervashidze et al., 2011]

Known classes New class C

Graph kernel: *

Similarity score

K(

,

) = k ∈ [0, 1]

slide-32
SLIDE 32

9

Class to Vector

* We use the Weisfeiler-Lehman Graph Kernels [Shervashidze et al., 2011]

Summary of similarity of C to known classes

Known classes New class C

Graph kernel: *

Similarity score

K(

,

) = k ∈ [0, 1]

slide-33
SLIDE 33

9

Class to Vector

* We use the Weisfeiler-Lehman Graph Kernels [Shervashidze et al., 2011]

Summary of similarity of C to known classes Vector representation of class C

Known classes New class C

Graph kernel: *

Similarity score

K(

,

) = k ∈ [0, 1]

slide-34
SLIDE 34

10

Evaluation: Setup

Fields Methods Classes Count Min Max Avg Min Max Avg TS 115 1 64 8.7 2 163 34.7 not TS 115 55 4.3 1 103 23.8 All 230 64 6.4 1 163 29.2

230 Java classes from the JDK

Explicit thread safety documentation

slide-35
SLIDE 35

10

Evaluation: Setup

230 Java classes from the JDK

Explicit thread safety documentation

LoC Classes Count Min Max Avg Graphs TS 115 13 4,264 430.2 1,989 not TS 115 7 1,931 219.7 2,871 All 230 7 4,264 323.1 4,860

slide-36
SLIDE 36

11

Effectiveness of TSFinder

Two-class SVM with SGD* 10-fold cross-validation 230 labeled JDK classes

* Stochastic Gradient Descent

slide-37
SLIDE 37

11

Effectiveness of TSFinder

Two-class SVM with SGD* 10-fold cross-validation 230 labeled JDK classes

Thread-Safe Not Thread-Safe Accuracy Prec. Rec. Prec. Rec. 94.5% 94.9% 94.0% 94.2% 95.0%

* Stochastic Gradient Descent

slide-38
SLIDE 38

11

Effectiveness of TSFinder

Two-class SVM with SGD* 10-fold cross-validation 230 labeled JDK classes

Thread-Safe Not Thread-Safe Accuracy Prec. Rec. Prec. Rec. 94.5% 94.9% 94.0% 94.2% 95.0%

* Stochastic Gradient Descent

slide-39
SLIDE 39

11

Effectiveness of TSFinder

Two-class SVM with SGD* 10-fold cross-validation 230 labeled JDK classes

Thread-Safe Not Thread-Safe Accuracy Prec. Rec. Prec. Rec. 94.5% 94.9% 94.0% 94.2% 95.0%

Most predictions are correct!

* Stochastic Gradient Descent

slide-40
SLIDE 40

12

Comparison with Baseline

Naive classifier using simple class feautres, e.g.:

% of volatile fields % of synchronized methods

slide-41
SLIDE 41

12

Comparison with Baseline

Naive classifier using simple class feautres, e.g.:

% of volatile fields % of synchronized methods

Accuracy Classifier TSFinder Naive SVM (SGD* with hinge loss) 94.5% 75.0% Random forest 94.1% 79.3% SVM (SMO**) 92.5% 70.6% SVM (SGD with log loss) 92.0% 74.3% Additive logistic regression 92.8% 74.5%

* Stochastic Gradient Descent ** Sequential Minimal Optimization

slide-42
SLIDE 42

12

Comparison with Baseline

Naive classifier using simple class feautres, e.g.:

% of volatile fields % of synchronized methods

Accuracy Classifier TSFinder Naive SVM (SGD* with hinge loss) 94.5% 75.0% Random forest 94.1% 79.3% SVM (SMO**) 92.5% 70.6% SVM (SGD with log loss) 92.0% 74.3% Additive logistic regression 92.8% 74.5%

* Stochastic Gradient Descent ** Sequential Minimal Optimization

slide-43
SLIDE 43

13

Efficiency of TSFinder

Training One-time effort All steps: 11.7 minutes Model graphs (230 classes): 0.6 MB Classifying new class On average over 230 classes: 3 seconds Graphs extraction dominates classification

slide-44
SLIDE 44

14

Conclusion

State-of-the-art of thread-safety

documentation is poor

TSFinder uses machine learning to

infer documentation

TSFinder infers thread safety

documentation with accuracy of 94.5%