How Many of All Bugs Do We Find? A Study of Static Bug Detectors - - PowerPoint PPT Presentation

how many of all bugs do we find a study of static bug
SMART_READER_LITE
LIVE PREVIEW

How Many of All Bugs Do We Find? A Study of Static Bug Detectors - - PowerPoint PPT Presentation

How Many of All Bugs Do We Find? A Study of Static Bug Detectors Andrew Habib, Michael Pradel TU Darmstadt, Germany software-lab.org 1 Static Bug Detection Error Prone 2 Static Bug Detection Error Prone General framework Scalable


slide-1
SLIDE 1

1

Andrew Habib, Michael Pradel TU Darmstadt, Germany

software-lab.org

How Many of All Bugs Do We Find? A Study of Static Bug Detectors

slide-2
SLIDE 2

2

Static Bug Detection

Error Prone

slide-3
SLIDE 3

2

Static Bug Detection

General framework Scalable static analysis Set of checkers for specific bug

patterns

Error Prone

slide-4
SLIDE 4

3

How Many Bugs Do They Find?

slide-5
SLIDE 5

3

How Many Bugs Do They Find?

Given a representative set of real-world bugs, how many of them do static bug detectors find?

slide-6
SLIDE 6

3

How Many Bugs Do They Find?

Given a representative set of real-world bugs, how many of them do static bug detectors find? This talk: Empirical study with 594 real-world Java bugs and 3 popular static checkers

slide-7
SLIDE 7

4

Real-World Bugs

594 bugs from 15 popular Java

projects

Extended version of Defects4J data set Why this set? Gathered independently Used in other bug-related studies * Contains real fixes by developers

* Just et al., 2014 (mutation testing); Shamshiri et al., 2015 (test generation); Pearson et al., 2017 (fault localization); Martinez et al., 2017 (program repair)

slide-8
SLIDE 8

5

Defects4J: Files Involved in Bug

50 100 150 200 250 300 350 400 450 500 550 1 2 3 4 5 6 7 11

Number of bugs

Number of buggy files

501 64 12 10 4 1 1 1

slide-9
SLIDE 9

6

Defects4J: Size of Bug Fix

1-4 5-9 10-14 15-19 20-24 25-49 50-74 75-99 100-199 200-1.999

Diff size between buggy and fixed versions (LoC)

296 128 54 29 29 44 6 6 1 1 50 100 150 200 250 300 350 400 450 500 550

Number of bugs

slide-10
SLIDE 10

7

Previous Approach

How to determine which bugs are found?

[Thung et al., 2012]

Get diff between buggy and fixed code Run tool on code with buggy lines If warning on buggy line: Bug found Result: 50% – 95% of all bugs found Limitation: No check that warning points to bug One tool flags up to 57% of all lines

slide-11
SLIDE 11

7

Previous Approach

How to determine which bugs are found?

[Thung et al., 2012]

Get diff between buggy and fixed code Run tool on code with buggy lines If warning on buggy line: Bug found Result: 50% – 95% of all bugs found Limitation: No check that warning points to bug One tool flags up to 57% of all lines

slide-12
SLIDE 12

8

Methodology: Overview

Bug detectors Bugs + fixes

Automated filtering of warnings

Combined Fixed warnings- based Diff-based

slide-13
SLIDE 13

8

Methodology: Overview

Bug detectors Bugs + fixes

Automated filtering of warnings

Combined Fixed warnings- based Diff-based

slide-14
SLIDE 14

8

Methodology: Overview

Bug detectors Bugs + fixes

Automated filtering of warnings

Combined Fixed warnings- based Diff-based

slide-15
SLIDE 15

8

Methodology: Overview

Manual inspection of candidates

Bug detectors Bugs + fixes Candidates for detected bugs Detected bugs

Automated filtering of warnings

Combined Fixed warnings- based Diff-based

slide-16
SLIDE 16

9

Methodology: Diff-based

Manual inspection of candidates

Bug detectors Bugs + fixes Candidates for detected bugs Detected bugs

Automated filtering of warnings

Diff-based

slide-17
SLIDE 17

9

Methodology: Diff-based

1) Identify lines changed to fix bug 2) Intersect with lines with warning

slide-18
SLIDE 18

9

Methodology: Diff-based

Buggy file:

1) Identify lines changed to fix bug 2) Intersect with lines with warning

Fixed file:

slide-19
SLIDE 19

9

Methodology: Diff-based

Buggy file:

1) Identify lines changed to fix bug 2) Intersect with lines with warning

Fixed file: Modified line

slide-20
SLIDE 20

9

Methodology: Diff-based

Buggy file:

1) Identify lines changed to fix bug 2) Intersect with lines with warning

Fixed file: Removed line Modified line

slide-21
SLIDE 21

9

Methodology: Diff-based

Buggy file:

1) Identify lines changed to fix bug 2) Intersect with lines with warning

Fixed file: Newly inserted line Removed line Modified line

slide-22
SLIDE 22

9

Methodology: Diff-based

1) Identify lines changed to fix bug 2) Intersect with lines with warning

Buggy file:

Warnings by bug detector

Fixed file:

slide-23
SLIDE 23

9

Methodology: Diff-based

1) Identify lines changed to fix bug 2) Intersect with lines with warning

Buggy file:

Warnings by bug detector

Fixed file:

Candidate for detected bug

slide-24
SLIDE 24

10

Example:

public Dfp multiply(final int x) { return multiplyFast(x); } public Dfp multiply(final int x) { if (x >= 0 && x < RADIX) { return multiplyFast(x); } else { return multiply(newInstance(x)); } }

Bug fix

slide-25
SLIDE 25

10

Example:

public Dfp multiply(final int x) { return multiplyFast(x); } public Dfp multiply(final int x) { if (x >= 0 && x < RADIX) { return multiplyFast(x); } else { return multiply(newInstance(x)); } }

Bug fix Warning: Missing @Override

slide-26
SLIDE 26

10

Example:

public Dfp multiply(final int x) { return multiplyFast(x); } public Dfp multiply(final int x) { if (x >= 0 && x < RADIX) { return multiplyFast(x); } else { return multiply(newInstance(x)); } }

Bug fix Warning: Missing @Override Candidate for detected bug

  • 1

+1

slide-27
SLIDE 27

11

Method.: Fixed Warnings-based

Manual inspection of candidates

Bug detectors Bugs + fixes Candidates for detected bugs Detected bugs

Automated filtering of warnings

Fixed warnings- based

slide-28
SLIDE 28

11

Method.: Fixed Warnings-based

1) Compare warnings before and after fix 2) Warning that disappears was for bug

slide-29
SLIDE 29

11

Method.: Fixed Warnings-based

1) Compare warnings before and after fix 2) Warning that disappears was for bug

Buggy file: Fixed file:

slide-30
SLIDE 30

11

Method.: Fixed Warnings-based

1) Compare warnings before and after fix 2) Warning that disappears was for bug

Buggy file: Fixed file:

Warnings by bug detector

slide-31
SLIDE 31

11

Method.: Fixed Warnings-based

1) Compare warnings before and after fix 2) Warning that disappears was for bug

Buggy file: Fixed file:

Warnings by bug detector Candidate for detected bug

slide-32
SLIDE 32

12

Example

public Week(Date time , TimeZone zone) { this(time , RegularTimePeriod.DEFAULT_TIME_ZONE , Locale.getDefault ()); } public Week(Date time , TimeZone zone) { this(time , zone , Locale.getDefault ()); }

Bug fix

slide-33
SLIDE 33

12

Example

public Week(Date time , TimeZone zone) { this(time , RegularTimePeriod.DEFAULT_TIME_ZONE , Locale.getDefault ()); } public Week(Date time , TimeZone zone) { this(time , zone , Locale.getDefault ()); }

Bug fix Warning: Chaining constructor ignores argument Candidate for detected bug

slide-34
SLIDE 34

13

Methodology: Combined

Manual inspection of candidates

Bug detectors Bugs + fixes Candidates for detected bugs Detected bugs

Automated filtering of warnings

Combined Fixed warnings- based Diff-based + =

slide-35
SLIDE 35

14

Results

slide-36
SLIDE 36

15

Warnings to Inspect

All warnings Per bug Candidates Tool Min Max Avg Total

  • nly

Error Prone 148 7.58 4,402 53 Infer 36 0.33 198 32 SpotBugs 47 1.1 647 68 Total 5,247 153

slide-37
SLIDE 37

15

Warnings to Inspect

All warnings Per bug Candidates Tool Min Max Avg Total

  • nly

Error Prone 148 7.58 4,402 53 Infer 36 0.33 198 32 SpotBugs 47 1.1 647 68 Total 5,247 153

slide-38
SLIDE 38

15

Warnings to Inspect

All warnings Per bug Candidates Tool Min Max Avg Total

  • nly

Error Prone 148 7.58 4,402 53 Infer 36 0.33 198 32 SpotBugs 47 1.1 647 68 Total 5,247 153

97% of all warnings are removed by the automated filtering step

slide-39
SLIDE 39

16

Manual Inspection

Candidate = (bug, warning) Distinguish coincidental matches from actually detected bugs Full match Partial match Mismatch

Created by Freepik

slide-40
SLIDE 40

17

Manual Inspection: Example

public Dfp multiply(final int x) { return multiplyFast(x); } public Dfp multiply(final int x) { if (x >= 0 && x < RADIX) { return multiplyFast(x); } else { return multiply(newInstance(x)); } }

Bug fix Warning: Missing @Override Candidate for detected bug

slide-41
SLIDE 41

17

Manual Inspection: Example

public Dfp multiply(final int x) { return multiplyFast(x); } public Dfp multiply(final int x) { if (x >= 0 && x < RADIX) { return multiplyFast(x); } else { return multiply(newInstance(x)); } }

Bug fix Warning: Missing @Override Mismatch

slide-42
SLIDE 42

18

Manual Inspection: Example (2)

public Week(Date time , TimeZone zone) { this(time , RegularTimePeriod.DEFAULT_TIME_ZONE , Locale.getDefault ()); } public Week(Date time , TimeZone zone) { this(time , zone , Locale.getDefault ()); }

Bug fix Warning: Chaining constructor ignores argument Candidate for detected bug

slide-43
SLIDE 43

18

Manual Inspection: Example (2)

public Week(Date time , TimeZone zone) { this(time , RegularTimePeriod.DEFAULT_TIME_ZONE , Locale.getDefault ()); } public Week(Date time , TimeZone zone) { this(time , zone , Locale.getDefault ()); }

Bug fix Warning: Chaining constructor ignores argument Full match

slide-44
SLIDE 44

19

Most Bugs are Missed

Three tools together: Detect 27 of 594 bugs (less than 5%) SpotBugs ErrorProne Infer 14 6 3 0 0 2 2

slide-45
SLIDE 45

20

Why are Most Bugs Missed?

Manual inspection of random sample of 20 missed bugs: 14 are domain-specific

Unrelated to any of the supported bug patterns Application-specific algorithms Forgot to handle special case Difficult to decide whether behavior is intended

slide-46
SLIDE 46

20

Why are Most Bugs Missed?

Manual inspection of random sample of 20 missed bugs: 14 are domain-specific

Unrelated to any of the supported bug patterns Application-specific algorithms Forgot to handle special case Difficult to decide whether behavior is intended

slide-47
SLIDE 47

21

Why are Most Bugs Missed? (2)

Manual inspection of random sample of 20 missed bugs: 6 are near misses

Root cause is targeted by bug detector, but

current implementation misses the bug

Detector targets similar, but not the same,

problem

slide-48
SLIDE 48

22

Conclusion

Novel methodology to measure how many of a

set of bugs are detected

Popular static bug detectors miss most bugs Main reason: Domain-specific bugs vs. generic

bug patterns

Huge potential for future work on bug detection

slide-49
SLIDE 49

23

Implications for Future Work

Huge potential for:

Bug detectors that catch domain-specific bugs More sophisticated yet precise static analyses Generalizations of existing bug checkers Bug finding techniques other than static analysis,

e.g., test generation