a comparative analysis of the efficiency of change
play

A Comparative Analysis of the Efficiency of Change Metrics and - PowerPoint PPT Presentation

A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi ICSE 08 Andres Bhlmann, April 2009 What is it about? Change Metrics + Change Metrics


  1. A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi ICSE 08 Andres Bühlmann, April 2009

  2. What is it about? Change Metrics + Change Metrics Static Code Attributes Static Code Attributes Machine learner Machine learner Machine learner File is defective File is defective File is defective / not defective / not defective / not defective ? ? ? ? ? ? ? ? ?

  3. Static Code Attributes ● Describe how a file is at the moment Example: Methods: Number of method calls, Nested bock depth, ... Classes: Number of methods, ... Files: Number of interfaces, Total lines of code, ... Change Metrics ● Describe how a file changed in the past Example: REVISIONS, REFACTORINGS, BUGFIXES, LOC_ADDED, ...

  4. The cost of making wrong decisions True positive True negative False negative False positive Predicted class: ? ? ? ? True class: Predicted class Cost matrix ? ? 0 1 True class c > 1 0

  5. Assessing classification accuracy ? ? True Negative False Positive (TN) (FP) False Negative True Positive (FN) (TP) TP True positive rate (recall): TP  FN FP False positive rate: FP  TN TP  TN Percentage of correctly predicted files: TP  TN  FP  FN

  6. Experimental set-up 31 Static Code Attributes 31 Static Code Attributes 18 Change Metrics + 18 Change metrics Release #Files Defective (Post-release) Machine learners - Naive Bayes 2.0 3851 31% - Logistic Regression 2.1 5341 23% - Decision Tree (J48) 3.0 5347 32% H 0 : Code metrics have the same prediction accuracy as change metrics for (cost-sensitive) defect prediction.

  7. The cost of making wrong decisions Predicted class Cost matrix ? ? 0 1 True class c > 1 0

  8. Results (cost insensitive) 90 Release 3.0 80 PC 70 TP 60 FP 50 40 30 20 10 0 J48 J48 J48 Change metrics Change + code metrics Code metrics

  9. Results (cost sensitive) 90 Release 3.0 (c=5) 80 PC 70 TP 60 FP 50 40 30 20 10 0 J48 J48 J48 Change metrics Change + code metrics Code metrics

  10. Their results and conclusions H 0 : Code metrics have the same prediction accuracy as change metrics for (cost-sensitive) defect prediction. - Change metrics have better prediction accuracy than code metrics - Use cost sensitive classification to improve recall - May not generalize - Not sure whether or not used right metrics - Doesn't perform well for an iterative approach - Potential errors in code and change metrics

  11. Discussion What is nice - Different machine learners - 3 Releases - 10 fold cross validation - Significance analysis - Cost sensitive analysis What I didn't like - No results for iterative defect prediction - Not clear how change metrics were extracted - Change metrics not available

  12. Further Thoughts The machine learners tell us which metrics are good indicators for defects Example: Files involved in a lot of bug fixing activities are most likely to be defective Can we conclude something from this statement? At least we can ask further questions: - Are this file new? - Do this files contain a lot of complex code? - Are bugs fixed by the initial author? - Is our documentation and/or comments insufficent? -> We should investigate how statistical analysis of detected bugs could be used to improve our process and design decisions

  13. How to calculate the metrics? Bug-database CVS List of defective files Static Code Attributes Change Metrics

  14. The cost of making wrong decisions Predicted class Cost matrix ? ? 0 1 True class c > 1 0 L( , ) = Prob( = ) * 1 + Prob( = ) * 0 ? L( , ) = Prob( = ) * 0 + Prob( = ) * c ?

  15. How does the cost affect the prediction?

  16. Which metrics have the most predictive power? Powerful defect indicators: - High number of revisions - High number of bug-fixing activities - Small sized CVS commits - Small number of refactorings

  17. See Zimmermann p 2 Static Code Attributes Metric name Definition FOUT Number of method calls (fan out) MLOC Method lines of code NBD Nested block depth methods PAR Number of parameters VG McCabe cyclomatic complexity NOF Number of fields NOM Number of methods classes NSF Number of static fields NSM Number of static methods ACD Number anonymous type declarations NOI Number of interfaces files NOT Number of classes TLOC Total lines of code

  18. See p 185 Change Metrics Metric name Definition REVISIONS Number of revisions of a file REFACTORINGS Number of times a file has been refactored BUGFIXES Number of times a file has been involved in bug-fixing AUTHORS Number of distinct authors LOC_ADDED Sum over all revisions of the lines of code added to a file LOC_DELETED CODECHURN Sum of added lines of code – deleted lines of code over all revisions MAX_CHANGESET Maximum number of files committed together AGE Age of a file in weeks counted counted backwards from the release

  19. See p 186 90 Release 2.0 80 PC 70 TP 60 FP 50 40 30 20 10 0 NB LR J48 NB LR J48 NB LR J48 Code metrics Change + code metrics Change metrics 90 Release 2.1 80 PC 70 TP 60 FP 50 40 30 20 10 0 NB LR J48 NB LR J48 NB LR J48 Change + code metrics Code metrics Change metrics 90 Release 3.0 80 PC 70 TP 60 FP 50 40 30 20 10 0 NB LR J48 NB LR J48 NB LR J48

  20. See p 188 100 Release 2.0 (c=5) 90 PC 80 TP 70 FP 60 50 40 30 20 10 0 J48 J48 J48 Code metrics Change + code metrics Change metrics 90 Release 2.1 (c=5) 80 PC 70 TP 60 FP 50 40 30 20 10 0 J48 J48 J48 Code metrics Change + code metrics Change metrics 90 Release 3.0 (c=5) 80 PC 70 TP FP 60 50 40 30 20 10 0 J48 J48 J48

  21. Results (cost insensitive) 90 Release 3.0 80 PC 70 TP 60 FP 50 40 30 20 10 0 J48 J48 J48 Change metrics Change + code metrics Code metrics Absolute numbers for J48 using change metrics: Files: 5347 Correctly classified: 4277 Defective: 1725 Correctly classified defective: 1121 Defect free: 3622 Incorrectly classified defective: 471

  22. Results (cost sensitive) 90 Release 3.0 (c=5) 80 PC 70 TP 60 FP 50 40 30 20 10 0 J48 J48 J48 Change metrics Change + code metrics Code metrics Absolute numbers for J48 using change metrics: Files: 5347 Correctly classified: 4010 Defective: 1725 Correctly classified defective: 1432 Defect free: 3622 Incorrectly classified defective: 1050

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend