Scent Intensification for Testing & Debugging Rui Abreu - - PowerPoint PPT Presentation

scent intensification for testing debugging
SMART_READER_LITE
LIVE PREVIEW

Scent Intensification for Testing & Debugging Rui Abreu - - PowerPoint PPT Presentation

Scent Intensification for Testing & Debugging Rui Abreu Economic Relevance [Embedded] Software Exponential increase LOC Despite thorough design / testing, constant fault density Typically 5-15bugs / KLOC, 75 min / bug


slide-1
SLIDE 1

Scent Intensification for Testing & Debugging

Rui Abreu

slide-2
SLIDE 2

Economic Relevance

  • [Embedded] Software
  • Exponential increase LOC
  • Despite thorough design / testing, constant fault density
  • Typically 5-15bugs / KLOC, 75 min / bug ➤ $4K/KLOC
  • Development cost $15-30K / KLOC ➤ 15-25% diagnostic cost
  • Residual defects cost US $60B/year [NIST 2002]
  • estimated 20% due to fault diagnosis (downtime, labor)
slide-3
SLIDE 3

The birth of debugging: your guess?

slide-4
SLIDE 4

Software Errors mentioned in Ada Byron’s notes on Charles Bababage’s analytical engine

2015 1840

slide-5
SLIDE 5

2015 1840

S First actual bug and actual debugging: Admiral Grace Hopper’s associates working on Mark II Computer at Harvard University

1947

slide-6
SLIDE 6

UNIVAC 1100’s FLIT - Fault Localization by Interpretive Testing

2015 1840

S

1947 1962

slide-7
SLIDE 7

Weiser’s Breakthrough paper. Input: source code and program point

2015 1840

S

1947 1962 1981

slide-8
SLIDE 8

Stallman’s GDB Input: faulty program and 1 failed test case

2015 1840

S

1947 1962 1981

W

1986

slide-9
SLIDE 9

Korel and Laski’s dynamic slicing Agrawal Input: source code and failed test case

2015 1840

S

1947 1962 1981

W

1986

S

1988 1993

slide-10
SLIDE 10

DDD Input: faulty program and failed test case

2015 1840

S

1947 1962 1981

W

1986

S

1988 1993

slide-11
SLIDE 11

Delta Debugging Input: faulty program, 1 failed and 1 passed test case

2015 1840

S

1947 1962 1981

W

1986

S

1988 1993 1996

slide-12
SLIDE 12

2015 1840

S

1947 1962 1981

W

1986

S

1988 1993 1996

Statistical Debugging Input: faulty program, test suite

2002

slide-13
SLIDE 13

2015 1840

S

1947 1962 1981

W

1986

S

1988 1993 1996 2002

S

2007

EZUNIT

slide-14
SLIDE 14

2015 1840

S

1947 1962 1981

W

1986

S

1988 1993 1996 2002

S

2007

E

2009

VIDA

slide-15
SLIDE 15

2015 1840

S

1947 1962 1981

W

1986

S

1988 1993 1996 2002

S

2007

E

20092011/12

slide-16
SLIDE 16

2015 1840

S

1947 1962 1981

W

1986

S

1988 1993 1996 2002

S

2007

E

20092011/12

Also a survey paper is under review at TSE. More than 300 works cited.

slide-17
SLIDE 17

Focus of this talk

  • Techniques that take into account spectra
  • aka abstraction of program traces
  • Spectrum-based Fault Localization (SFL)
  • Statistical vs. reasoning
  • Lightweight, scalable
slide-18
SLIDE 18

SFL: Principle (1)

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 6 7 8 1 2 3 4 9 10 11 12 1 2 3 5 4 6 7 8 9 10 11 12 Not touched Touched, pass Touched, fail

Test suite t1 t2 t3 t4 t5

Integrates well with testing

slide-19
SLIDE 19

SFL: Principle (2)

Test suite t2 t3 t4 t5

0 1 1 1 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 5 6 7 8 1 2 3 4 9 10 11 12 1 2 3 5 4 6 7 8 9 10 11 12 Not touched Touched, pass Touched, fail

Status t1 !

Integrates well with testing

slide-20
SLIDE 20

Test suite t3 t4 t5

1 2 2 2 2 1 1 0 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 5 6 7 8 1 2 3 4 9 10 11 12 1 2 3 5 4 6 7 8 9 10 11 12 Not touched Touched, pass Touched, fail

Status t1 ! t2 !

SFL: Principle (3)

Integrates well with testing

slide-21
SLIDE 21

Test suite t4 t5

1 2 2 2 2 1 1 0 2 1 1 1 1 1 1 0 1 0 0 1 1 0 0 0 5 6 7 8 1 2 3 4 9 10 11 12 1 2 3 5 4 6 7 8 9 10 11 12 Not touched Touched, pass Touched, fail

Status t1 ! t2 ! t3 "

SFL: Principle (4)

Integrates well with testing

slide-22
SLIDE 22

Test suite t5

1 3 3 2 3 1 1 0 3 1 3 3 1 1 1 0 1 0 0 1 1 0 0 0 5 6 7 8 1 2 3 4 9 10 11 12 1 2 3 5 4 6 7 8 9 10 11 12 Not touched Touched, pass Touched, fail

Status t1 ! t2 ! t3 " t4 !

SFL: Principle (5)

Integrates well with testing

slide-23
SLIDE 23

Test suite

1 3 3 2 3 1 1 0 3 1 3 3 1 2 2 0 2 0 0 2 2 1 0 0 5 6 7 8 1 2 3 4 9 10 11 12 1 2 3 5 4 6 7 8 9 10 11 12 Not touched Touched, pass Touched, fail

Status t1 ! t2 ! t3 " t4 ! t5 "

SFL: Principle (6)

Integrates well with testing

slide-24
SLIDE 24

1 3 3 2 3 1 1 0 3 1 3 3 1 2 2 0 2 0 0 2 2 1 0 0 5 6 7 8 1 2 3 4 9 10 11 12 1 2 3 5 4 6 7 8 9 10 11 12 Not touched Touched, fail

Status t1 ! t2 ! t3 " t4 ! t5 "

SFL: Principle (7)

Components are ranked according to the likelihood of causing detected errors

Integrates well with testing

slide-25
SLIDE 25

*Fault* Program Spectra Test Suite

class Triangle {… static int type(int a, int b, int c) {

t1 t2 t3 t4 t5 t6 Suspiciousness

int type = SCALENE;

0.09998

if ( (a == b) && (b == c) )

0.09998

type = EQUILATERAL;

0.10001

else if ( (a*a) == ((b*b) + (c*c)) )

0.09999

type = RIGHT;

0.10001

else if ( (a == b) || (b == a) ) /* FAULT */

0.10000

type = ISOSCELES;

0.10001

return type; }

0.09998

static double area(int a, int b, int c) { double s = (a+b+c)/2.0;

0.10000

return Math.sqrt(s*(s-a)*(s-b)*(s-c)); } ... }

0.10000

25

slide-26
SLIDE 26

Suspiciousness score

  • Each component (row) is ranked according to their

similarity to the error vector

  • Many similarity coefficients exist.
  • Ochiai similarity is equivalent to the cosine of the

angle between two vectors in a n-dimensional space

Abreu, R., Zoeteweij, P., Golsteijn, R., & Van Gemund, A. J. (2009). A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 82(11), 1780-1792. Lucia, L., Lo, D., Jiang, L., Thung, F., & Budi, A. (2014). Extended comprehensive study of association measures for fault localization. Journal of Software: Evolution and Process, 26(2).

slide-27
SLIDE 27

Rank Position Suspicious Statement Line number Suspiciousness

type = EQUILATERAL;

3 0.10001 2º

type = RIGHT;

5 0.10001 3º

type = ISOSCELES;

7 0.10001 4º

else if ( (a == b) || (b == a) ) /* FAULT */

6 0.10000 5º

double s = (a+b+c)/2.0;

9 0.10000 6º

return Math.sqrt(s*(s-a)*(s-b)*(s-c));

10 0.10000 7º

else if ( (a*a) == ((b*b) + (c*c)) )

4 0.09999 8º

int type = SCALENE;

1 0.09998 9º

if ( (a == b) && (b == c) )

2 0.09998 10º

return type; }

8 0.09998

13

= 4

Cd

Diagnostic Performance

  • R. Abreu, P. Zoeteweij, and A. J. van Gemund, “Spectrum-Based Multiple Fault Localization”, ASE ’09
slide-28
SLIDE 28

Can we do better?

  • Statistics-based SFL does not reason in terms of

multiple faults

c1 c2 c3 P/F 1 1 (F) 1 1 (F) 1 1 1 (F) 1 1 1 (F) 1 1 0 (P)

Diagnostic report = < c3, c1, c2 >

slide-29
SLIDE 29

Reasoning-based Approach

  • Barinel is a reasoning-based approach
  • Integrates the best of model-based diagnosis with spectra

c1 c2 c3 P/F 1 1 (F) 1 1 (F) 1 1 1 (F) 1 1 1 (F) 1 1 0 (P)

c1 must be faulty c2 cannot be single fault c3 cannot be single fault c2, c3 cannot be double fault

slide-30
SLIDE 30

Reasoning-based Approach

  • Barinel is a reasoning-based approach
  • Integrates the best of model-based diagnosis with spectra

c1 c2 c3 P/F 1 1 (F) 1 1 (F) 1 1 1 (F) 1 1 1 (F) 1 1 0 (P)

c2 must be faulty c1 cannot be single fault c1 cannot be single fault c1, c3 cannot be double fault

slide-31
SLIDE 31

Reasoning-based Approach

  • Barinel is a spectrum-based reasoning approach
  • Integrates the best of model-based diagnosis with spectra

c1 c2 c3 P/F 1 1 (F) 1 1 (F) 1 1 1 (F) 1 1 1 (F) 1 1 0 (P)

Summary: c1, c2 faulty, but not single-fault c1, c2 can be double-fault c1,c3 nor c2,c3 can be double-fault so {c1,c2} is the only diagnosis possible (subsuming the triple fault {c1,c2,c3})

slide-32
SLIDE 32

Spectrum-based reasoning

  • 1. Generate sets of components that explain observed erroneous behavior
  • Equivalent to compute minimal hitting set (Staccato/MHS2**)
  • Given failed executions
  • 2. Rank candidates according to their probability of being the true fault

explanation ➤ Baye’s rule

  • Given both passed and failed executions
  • R. Abreu, P. Zoeteweij, and A. J. van Gemund, “Spectrum-Based Multiple Fault Localization”, ASE ’09

**https://github.com/npcardoso/MHS2 (citable via https://zenodo.org/record/10037) ➤ contribute to the project; send pull requests; email us!

slide-33
SLIDE 33

% of faulty versions 25 50 75 100 Effort (% of program to be examined to find the fault) 20 40 60 80 100

Worst technique Ideal technique

Diagnostic Performance

slide-34
SLIDE 34

% of faulty versions 25 50 75 100 Effort (% of program to be examined to find the fault) 10 20 30 40 50 60 70 80 90 100

Intersection Union NN DD Tarantula Ochiai Sober CrossTab PPDG Barinel

slide-35
SLIDE 35

No similarity coefficient is statistically significantly better!

slide-36
SLIDE 36
  • Best Performing techniques still require to

inspect 10% of the code…

  • 100 LOC ➤ 10LOC
  • 10,000 LOC ➤ 1,000LOC
  • 1,000,000 LOC ➤ 10,000LOC

How good are we?

slide-37
SLIDE 37

Case To Inspect Out of / Previous Load Problem 2 logical threads 315 Teletext Lock-Up 2 blocks 60K NVM corrupt 96 blocks, 10 files 150K, 1.8K Scrolling Bug 5 blocks 150K Invisible Pages 12 blocks 150K Tuner Problem 2 files 1.8K Zapping Crash 1 run (15 mins) 1 day (develop) Wrong Audio 1 run (15 mins) ½ day (expert)

Case Studies (NXP)

slide-38
SLIDE 38
  • Are we properly quantifying diagnostic accuracy?
  • Comparing techniques based on the rankings
  • Assuming perfect bug understanding
  • Are we showing providing an ecosystem offering

this techniques?

Humm….

slide-39
SLIDE 39

Parnin & Orso et al observed that there is a lack of human studies! (ISSTA’11)

Human Studies

slide-40
SLIDE 40

Crowbar

— http://www.crowbar.io —

Previously known as GZoltar

slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43

Visualizations

slide-44
SLIDE 44
  • 40 participants
  • Intention: GZoltar vs. IDE’s features
  • Program: Xtream
  • 17,389 LOC
  • 306 classes and 22 packages
  • 1418 unit test cases
  • Injected 1 logical fault

Gouveia, C., Campos, J., & Abreu, R.. Using HTML5 visualizations in software fault localization. VISSOFT’13

User Study: Setup

slide-45
SLIDE 45

RQ1: Do the proposed visualizations efficiently aid the user to quickly find a fault?

User Study: Results

slide-46
SLIDE 46

RQ2: Is Crowbar a usable toolset?

0,00 1,00 2,00 3,00 4,00 5,00 Font's size/shape Intuitive icons/buttons Information clearly organized Tasks quickly/easily executed Usefulness of warnings GZoltar response speed No user experience needed IDE integration relevance Importance of visual debugging GZoltar global experience

User Study: Results

slide-47
SLIDE 47

class Triangle {… static int type(int a, int b, int c) {

t1 t2 t3 t4 t5 t6 Suspiciousness

int type = SCALENE;

0.09998

if ( (a == b) && (b == c) )

0.09998

type = EQUILATERAL;

0.10001

else if ( (a*a) == ((b*b) + (c*c)) )

0.09999

type = RIGHT;

0.10001

else if ( (a == b) || (b == a) ) /* FAULT */

0.10000

type = ISOSCELES;

0.10001

return type; }

0.09998

static double area(int a, int b, int c) { double s = (a+b+c)/2.0;

0.10000

return Math.sqrt(s*(s-a)*(s-b)*(s-c)); } ... }

0.10000

“A confounding factor for the usefulness of SFL is the dependency on the quality of the existing test suite”

Importance of Testing

slide-48
SLIDE 48

Rank Position Suspicious Statement Line number Suspiciousness

type = EQUILATERAL;

3 0.10001 2º

type = RIGHT;

5 0.10001 3º

type = ISOSCELES;

7 0.10001 4º

else if ( (a == b) || (b == a) ) /* FAULT */

6 0.10000 5º

double s = (a+b+c)/2.0;

9 0.10000 6º

return Math.sqrt(s*(s-a)*(s-b)*(s-c));

10 0.10000 7º

else if ( (a*a) == ((b*b) + (c*c)) )

4 0.09999 8º

int type = SCALENE;

1 0.09998 9º

if ( (a == b) && (b == c) )

2 0.09998 10º

return type; }

8 0.09998

13

= 4

Cd

Diagnostic Performance

  • R. Abreu, P. Zoeteweij, and A. J. van Gemund, “Spectrum-Based Multiple Fault Localization”, ASE ’09
slide-49
SLIDE 49

14

H(D) = − X

dk∈D

Pr(dk) · log2(Pr(dk)), 0 ≤ H ≤ log2(M)

  • A. Gonzalez-Sanchez, R. Abreu, H.-G. Gross, and A. J. van Gemund, “Spectrum-Based Sequential Diagnosis”, AAAI ’11
slide-50
SLIDE 50

Rank Position Suspicious Statement Line number Suspiciousness

type = EQUILATERAL;

3 0.10001 2º

type = RIGHT;

5 0.10001 3º

type = ISOSCELES;

7 0.10001 4º

else if ( (a == b) || (b == a) ) /* FAULT */

6 0.10000 5º

double s = (a+b+c)/2.0;

9 0.10000 6º

return Math.sqrt(s*(s-a)*(s-b)*(s-c));

10 0.10000 7º

else if ( (a*a) == ((b*b) + (c*c)) )

4 0.09999 8º

int type = SCALENE;

1 0.09998 9º

if ( (a == b) && (b == c) )

2 0.09998 10º

return type; }

8 0.09998

15

= 3.322

H(

Measuring Entropy

slide-51
SLIDE 51

The variety of test cases is the major factor to have uncertainty in the ranking

slide-52
SLIDE 52

class Triangle {… static int type(int a, int b, int c) {

t1 t2 t3 t4 t5 t6 Suspiciousness

int type = SCALENE;

0.09998

if ( (a == b) && (b == c) )

0.09998

type = EQUILATERAL;

0.10001

else if ( (a*a) == ((b*b) + (c*c)) )

0.09999

type = RIGHT;

0.10001

else if ( (a == b) || (b == a) ) /* FAULT */

0.10000

type = ISOSCELES;

0.10001

return type; }

0.09998

static double area(int a, int b, int c) { double s = (a+b+c)/2.0;

0.10000

return Math.sqrt(s*(s-a)*(s-b)*(s-c)); } ... }

0.10000

17

= 0.4

¯ ρ

Density of a Test Suite

  • A. Gonzalez-Sanchez, R. Abreu, H.-G. Gross, and A. J. van Gemund, “Prioritizing tests for fault localization through ambiguity group reduction”, ASE ’11
slide-53
SLIDE 53

− ¯ ⇢(

IG(¯ ⇢) = −¯ ⇢ · log2(¯ ⇢) − (1 − ¯ ⇢) · log2(1 − ¯ ⇢)

0.0 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

IG(¯ ρ)

18

  • R. A. Johnson, “An information theory approach to diagnosis”, IRE Transactions on Reliability and Quality Control, no. 1, pp. 35–35, 1960
slide-54
SLIDE 54

A fitness function based on entropy to guide search-based test generation and to optimize the quality of ranking reports

slide-55
SLIDE 55

|0.5 − ¯ ⇢(T ∪ {t})|

yes no

add to the test suite new test (t) EvoSuite

20

ENTBUG

Campos, J., Abreu, R., Fraser, G., & d'Amorim, M. Entropy-based test generation for improved fault localization. ASE’13.

slide-56
SLIDE 56

T T + {t7} T + {t7, t8} T + {t7, t8, t9}

class Triangle {… static int type(int a, int b, int c) {

º Suspiciousness t7 º Suspiciousness t8 º Suspiciousness t9 º Suspiciousness

int type = SCALENE;

8 0.09998 1 6 0.03629 1 6 0.02354 1 5 0.04347

if ( (a == b) && (b == c) )

9 0.09998 1 7 0.03629 1 7 0.02354 1 6 0.04347

type = EQUILATERAL;

1 0.10001 1

else if ( (a*a) == ((b*b) + (c*c)) )

7 0.09999 1 5 0.08466 3 0.10983 1 2 0.17391

type = RIGHT;

2 0.10001 1 1 0.29033 1 0.37666

else if ( (a == b) || (b == a) ) /* FAULT */

4 0.10000 1 2 0.17204 2 0.22320 1 1 0.34782

type = ISOSCELES;

3 0.10001

return type; }

10 0.09998 1 8 0.03629 1 8 0.02354 1 7 0.04347

static double area(int a, int b, int c) { double s = (a+b+c)/2.0;

5 0.10000 1 3 0.17204 1 4 0.10983 1 3 0.17391

return Math.sqrt(s*(s-a)*(s-b)*(s-c)); } ... }

6 0.10000 1 4 0.17204 1 5 0.10983 1 4 0.17391

Test case outcome (pass = , fail = )

0.400 0.457 0.475 0.500 3.322 2.651 2.445 2.437 4.000 2.000 1.000 0.000

− ¯ ⇢(

H(

Cd

21

= 0.0

Cd

= 0.500

¯ ρ

  • 27%

H(

slide-57
SLIDE 57
  • Available as an Eclipse plug-in
  • a

Visual Studio plugin will be released soon

  • Also available as a library
  • Instrumentation and

diagnosis

  • Testing features are yet to be

deployed

  • Only test suite

minimization available

http://www.gzoltar.com

slide-58
SLIDE 58
  • Open Eclipse
  • Install Crowbar
  • Help ➤ Install New Software
  • http://crowbar.io/plugin/tarot/
  • Window ➤ Other… ➤ Crowbar

Views ➤ Diagnostic Reports

  • Import (as maven project) buggy yodaTime
  • http://crowbar.io/plugin/tarot/buggy_yodatime.zip
  • Find the bug!

Let’s use it

slide-59
SLIDE 59
  • Integration with software repository mining
  • Use fitness function in test suite prioritization and

minimization

  • Generation: How to solve the oracle problem?
  • Human in the loop
  • AutoSeer project: leverage program invariants
  • Explore idiosyncrasies of mobile devices

Opportunities and Challenges

slide-60
SLIDE 60

Questions?