Open Source Bug Fixes: Characterization and Dataset Prediction - - PowerPoint PPT Presentation

open source bug fixes characterization and
SMART_READER_LITE
LIVE PREVIEW

Open Source Bug Fixes: Characterization and Dataset Prediction - - PowerPoint PPT Presentation

Overview Introduction Motivation Research Questions Open Source Bug Fixes: Characterization and Dataset Prediction Data Collection Bug Report Features Factors A ff ecting Bug Fix Likelihood Reputations Edits and


slide-1
SLIDE 1

Open Source Bug Fixes: Characterization and Prediction

Vikash Balasubramanian Kashif Khan November 27, 2019

University of Waterloo

Overview

  • Introduction
  • Motivation
  • Research Questions
  • Dataset
  • Data Collection
  • Bug Report Features
  • Factors Affecting Bug Fix Likelihood
  • Reputations
  • Edits and Editors
  • Reopenings
  • Comments and Commenters
  • Statistical Models
  • Descriptive Models
  • Predictive Models
  • Discussion

1

Introduction

  • Motivation
  • Software validation is costly.
  • Large software project deal with large number of bug reports

most of which do not get fixed.

  • Improvement in bug triaging process can lead to cost savings.
  • Characterizing and predicting which bugs get fixed: An

empirical study of Microsoft Windows, Philip J. Guo, Thomas Zimmermann, Nachiappan Nagappan, Brendan Murphy, Proc.

  • f the 32nd ACM/IEEE Intl.
  • Research Questions
  • What are the most important factors that determine whether a

bug will get fixed in Open Source Software systems?

  • Are there any differences in the impact and importance of

these factors between Open Source projects and Closed Source projects and to test if the previous findings for closed source systems like Windows still hold true.

2

Dataset

slide-2
SLIDE 2

Data Collection

  • Used Bugs from Linux Kernel and KDE Project Bugs

Database

  • Both Projects use Bugzilla
  • Collected bug reports using the REST API provided by Bugzilla
  • For each bug report collected all the associated comments and

edit history.

Project Total Bugs Fixed Not Fixed Linux Kernel 31,694 10,107 15,322 KDE 403,438 153,471 209,715

Table 1: Bug counts for each project

3

Bug Report Features

  • Bug ID
  • Status
  • Resolution
  • Bug Opener Reputation

bug opener reputation = |Opened| ∩ |Fixed| |Opened| + 1

  • Initial Assignee Reputation
  • Joint Bug Opener and Initial Assignee Reputation
  • Initial Severity
  • Severity Upgraded
  • Component Change Count
  • Unique Editor Count
  • Unique Commenter Count
  • Total Comment Count

4

Factors Affecting Bug Fix Likelihood

Reputations - Bug Opener

(a) Linux (b) KDE

Figure 1: Bug Opener Reputation

5

slide-3
SLIDE 3

Reputations - Joint Reputation

(a) Linux (b) KDE

Figure 2: Joint Bug Opener + Initial Assignee Reputation

6

Edits and Editors

(a) Linux (b) KDE

Figure 3: # of Unique Editors

7

Reopenings

(a) Linux (b) KDE

Figure 4: # of Reopenings

8

Comments and Commenters

(a) Linux (b) KDE

Figure 5: # of Comments

9

slide-4
SLIDE 4

Statistical Models

Base Model

  • Gradient Boosted decision trees
  • Strong results
  • Interpretable
  • Easily handles categorical and numerical variables
  • Relaxed independence assumptions in comparison to standard

regression models

10

Descriptive Model

Project Accuracy Precision Recall F-Score Kernel-1 63.99 62.49 57.59 59.94 Kernel-2 64.62 63.25 58.70 60.89 KDE-1 72.96 73.55 70.36 71.92 KDE-2 73.86 74.28 71.49 72.86

Table 2: Performance of the descriptive model. 1 refers to the model variant including only the features suggested by Guo et. al, 2 refers to the model variant that also includes the additional features suggested by us.

11

Descriptive Model Interpretation

Information Gain (%) Feature Linux KDE Opener Reputation 42.10 21.96 Comment Count 15.45 7.71 Opener Assignee Reputation 12.85 8.33 First Assignee Reputation 8.37 7.32 Initial Severity 5.73 8.67 Editors Count 5.34 40.63 Component Path Changes 3.52 2.22 Commenter Count 2.51 0.75 Reopened Count 2.17 1.63 Severity Upgraded 1.78 0.77

Table 3: Information gain for Descriptive Model

12

slide-5
SLIDE 5

Predictive Model

Project Accuracy Precision Recall F-Score Kernel-1 63.84 62.48 57.25 59.75 Kernel-2 63.95 62.79 57.22 59.88 KDE-1 69.10 68.76 66.62 67.67 KDE-2 69.42 69.12 66.96 68.02

Table 4: Performance of the predictive model. 1 refers to the model variant including only the features suggested by Guo et. al, 2 refers to the model variant that also includes the additional features suggested by us.

13

Predictive Model Interpretation

Information Gain (%) Feature Linux KDE Opener Reputation 61.10 47.80 Opener Assignee Reputation 11.04 17.52 First Assignee Reputation 20.15 15.76 Initial Severity 7.59 18.93

Table 5: Information gain for all the features that we use for the predictive model. Higher numbers indicate more important features

14

Discussion

Thank you for listening. Any questions?

15