A Comparative Analysis of the Efficiency of Change Metrics and - - PowerPoint PPT Presentation

a comparative analysis of the efficiency of change
SMART_READER_LITE
LIVE PREVIEW

A Comparative Analysis of the Efficiency of Change Metrics and - - PowerPoint PPT Presentation

A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi ICSE 08 Andres Bhlmann, April 2009 What is it about? Change Metrics + Change Metrics


slide-1
SLIDE 1

A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction

Raimund Moser, Witold Pedrycz, Giancarlo Succi ICSE 08 Andres Bühlmann, April 2009

slide-2
SLIDE 2

Static Code Attributes Change Metrics File is defective / not defective

What is it about?

? ?

Change Metrics + Static Code Attributes Machine learner Machine learner Machine learner File is defective / not defective File is defective / not defective

? ? ? ? ? ? ?

slide-3
SLIDE 3

Change Metrics

  • Describe how a file changed in the past

Static Code Attributes

  • Describe how a file is at the moment

Example: REVISIONS, REFACTORINGS, BUGFIXES, LOC_ADDED, ... Example: Methods: Number of method calls, Nested bock depth, ... Classes: Number of methods, ... Files: Number of interfaces, Total lines of code, ...

slide-4
SLIDE 4

The cost of making wrong decisions

Predicted class: True class:

True negative True positive False positive False negative

1 c > 1

? ?

Cost matrix

? ? ? ? True class Predicted class

slide-5
SLIDE 5

Assessing classification accuracy

True Negative (TN) False Positive (FP) False Negative (FN) True Positive (TP) ? ? TPTN TPTN FPFN Percentage of correctly predicted files: TP TPFN FP FPTN True positive rate (recall): False positive rate:

slide-6
SLIDE 6

Experimental set-up

18 Change Metrics 31 Static Code Attributes

Release #Files Defective (Post-release) 2.0 3851 31% 2.1 5341 23% 3.0 5347 32% H0: Code metrics have the same prediction accuracy as change metrics for (cost-sensitive) defect prediction. Machine learners

  • Naive Bayes
  • Logistic Regression
  • Decision Tree (J48)

31 Static Code Attributes + 18 Change metrics

slide-7
SLIDE 7

The cost of making wrong decisions

1 c > 1

Cost matrix

? ? True class Predicted class

slide-8
SLIDE 8

J48 J48 J48 10 20 30 40 50 60 70 80 90 Release 3.0 PC TP FP

Results (cost insensitive)

Change metrics Code metrics Change + code metrics

slide-9
SLIDE 9

J48 J48 J48 10 20 30 40 50 60 70 80 90 Release 3.0 (c=5) PC TP FP

Results (cost sensitive)

Change metrics Code metrics Change + code metrics

slide-10
SLIDE 10

Their results and conclusions

H0: Code metrics have the same prediction accuracy as change metrics for (cost-sensitive) defect prediction.

  • Change metrics have better prediction accuracy than code metrics
  • Use cost sensitive classification to improve recall
  • May not generalize
  • Not sure whether or not used right metrics
  • Doesn't perform well for an iterative approach
  • Potential errors in code and change metrics
slide-11
SLIDE 11

Discussion

What is nice

  • Different machine learners
  • 3 Releases
  • 10 fold cross validation
  • Significance analysis
  • Cost sensitive analysis

What I didn't like

  • No results for iterative defect prediction
  • Not clear how change metrics were extracted
  • Change metrics not available
slide-12
SLIDE 12

Further Thoughts

The machine learners tell us which metrics are good indicators for defects Example: Files involved in a lot of bug fixing activities are most likely to be defective Can we conclude something from this statement? At least we can ask further questions:

  • Are this file new?
  • Do this files contain a lot of complex code?
  • Are bugs fixed by the initial author?
  • Is our documentation and/or comments insufficent?
  • > We should investigate how statistical analysis of detected bugs

could be used to improve our process and design decisions

slide-13
SLIDE 13
slide-14
SLIDE 14

How to calculate the metrics?

CVS Bug-database

Change Metrics Static Code Attributes List of defective files

slide-15
SLIDE 15

The cost of making wrong decisions

1 c > 1

? ?

Cost matrix

L( , ) = Prob( = ) * 1 + Prob( = ) * 0

True class Predicted class

L( , ) = Prob( = ) * 0 + Prob( = ) * c

? ?

slide-16
SLIDE 16

How does the cost affect the prediction?

slide-17
SLIDE 17

Which metrics have the most predictive power?

Powerful defect indicators:

  • High number of revisions
  • High number of bug-fixing activities
  • Small sized CVS commits
  • Small number of refactorings
slide-18
SLIDE 18

Static Code Attributes

Metric name Definition FOUT Number of method calls (fan out) MLOC Method lines of code NBD

Nested block depth

PAR Number of parameters

VG McCabe cyclomatic complexity NOF Number of fields NOM Number of methods

NSF Number of static fields

NSM Number of static methods ACD Number anonymous type declarations NOI Number of interfaces

NOT Number of classes TLOC Total lines of code methods classes files

See Zimmermann p 2

slide-19
SLIDE 19

Change Metrics

Metric name Definition REVISIONS Number of revisions of a file REFACTORINGS Number of times a file has been refactored BUGFIXES

Number of times a file has been involved in bug-fixing

AUTHORS Number of distinct authors

LOC_ADDED Sum over all revisions of the lines of code added to a file LOC_DELETED

CODECHURN Sum of added lines of code – deleted lines of code

  • ver all revisions

MAX_CHANGESET Maximum number of files committed together AGE Age of a file in weeks counted counted backwards from the release

See p 185

slide-20
SLIDE 20

NB LR J48 NB LR J48 NB LR J48 10 20 30 40 50 60 70 80 90

Release 2.1

PC TP FP

NB LR J48 NB LR J48 NB LR J48 10 20 30 40 50 60 70 80 90

Release 3.0

PC TP FP

NB LR J48 NB LR J48 NB LR J48 10 20 30 40 50 60 70 80 90

Release 2.0

PC TP FP

Change metrics Code metrics Change + code metrics

See p 186

Code metrics Change + code metrics Change metrics

slide-21
SLIDE 21

J48 J48 J48 10 20 30 40 50 60 70 80 90

Release 2.1 (c=5)

PC TP FP J48 J48 J48 10 20 30 40 50 60 70 80 90 100

Release 2.0 (c=5)

PC TP FP J48 J48 J48 10 20 30 40 50 60 70 80 90

Release 3.0 (c=5)

PC TP FP

See p 188

Code metrics Change + code metrics Change metrics Code metrics Change + code metrics Change metrics

slide-22
SLIDE 22

J48 J48 J48 10 20 30 40 50 60 70 80 90 Release 3.0 PC TP FP

Results (cost insensitive)

Change metrics Code metrics Absolute numbers for J48 using change metrics: Files: 5347 Correctly classified: 4277 Defective: 1725 Correctly classified defective: 1121 Defect free: 3622 Incorrectly classified defective: 471 Change + code metrics

slide-23
SLIDE 23

J48 J48 J48 10 20 30 40 50 60 70 80 90 Release 3.0 (c=5) PC TP FP

Results (cost sensitive)

Absolute numbers for J48 using change metrics: Files: 5347 Correctly classified: 4010 Defective: 1725 Correctly classified defective: 1432 Defect free: 3622 Incorrectly classified defective: 1050 Change metrics Code metrics Change + code metrics