Open Source Bug Fixes: Characterization and Dataset Prediction - PowerPoint PPT Presentation

Overview • Introduction • Motivation • Research Questions Open Source Bug Fixes: Characterization and • Dataset Prediction • Data Collection • Bug Report Features • Factors A ff ecting Bug Fix Likelihood • Reputations • Edits and Editors Vikash Balasubramanian Kashif Khan • Reopenings November 27, 2019 • Comments and Commenters University of Waterloo • Statistical Models • Descriptive Models • Predictive Models • Discussion 1 Introduction • Motivation • Software validation is costly. • Large software project deal with large number of bug reports most of which do not get fixed. • Improvement in bug triaging process can lead to cost savings. • Characterizing and predicting which bugs get fixed: An Dataset empirical study of Microsoft Windows, Philip J. Guo, Thomas Zimmermann, Nachiappan Nagappan, Brendan Murphy, Proc. of the 32nd ACM/IEEE Intl. • Research Questions • What are the most important factors that determine whether a bug will get fixed in Open Source Software systems? • Are there any di ff erences in the impact and importance of these factors between Open Source projects and Closed Source projects and to test if the previous findings for closed source systems like Windows still hold true. 2

Data Collection Bug Report Features • Bug ID • Used Bugs from Linux Kernel and KDE Project Bugs • Status Database • Resolution • Both Projects use Bugzilla • Bug Opener Reputation • Collected bug reports using the REST API provided by Bugzilla bug opener reputation = | Opened | ∩ | Fixed | • For each bug report collected all the associated comments and | Opened | + 1 edit history. • Initial Assignee Reputation • Joint Bug Opener and Initial Assignee Reputation Project Total Bugs Fixed Not Fixed • Initial Severity Linux Kernel 31,694 10,107 15,322 • Severity Upgraded KDE 403,438 153,471 209,715 • Component Change Count Table 1: Bug counts for each project • Unique Editor Count • Unique Commenter Count • Total Comment Count 3 4 Reputations - Bug Opener Factors A ff ecting Bug Fix Likelihood (a) Linux (b) KDE Figure 1: Bug Opener Reputation 5

Reputations - Joint Reputation Edits and Editors (a) Linux (b) KDE (a) Linux (b) KDE Figure 2: Joint Bug Opener + Initial Assignee Reputation Figure 3: # of Unique Editors 6 7 Reopenings Comments and Commenters (a) Linux (b) KDE (a) Linux (b) KDE Figure 4: # of Reopenings Figure 5: # of Comments 8 9

Base Model • Gradient Boosted decision trees • Strong results Statistical Models • Interpretable • Easily handles categorical and numerical variables • Relaxed independence assumptions in comparison to standard regression models 10 Descriptive Model Descriptive Model Interpretation Information Gain (%) Feature Linux KDE Opener Reputation 42.10 21.96 Project Accuracy Precision Recall F-Score Comment Count 15.45 7.71 Kernel-1 63.99 62.49 57.59 59.94 Opener Assignee Reputation 12.85 8.33 Kernel-2 64.62 63.25 58.70 60.89 First Assignee Reputation 8.37 7.32 KDE-1 72.96 73.55 70.36 71.92 Initial Severity 5.73 8.67 KDE-2 73.86 74.28 71.49 72.86 Editors Count 5.34 40.63 Table 2: Performance of the descriptive model. 1 refers to the model Component Path Changes 3.52 2.22 variant including only the features suggested by Guo et. al, 2 refers to the Commenter Count 2.51 0.75 model variant that also includes the additional features suggested by us. Reopened Count 2.17 1.63 Severity Upgraded 1.78 0.77 Table 3: Information gain for Descriptive Model 11 12

Predictive Model Predictive Model Interpretation Information Gain (%) Project Accuracy Precision Recall F-Score Feature Linux KDE Kernel-1 63.84 62.48 57.25 59.75 Kernel-2 63.95 62.79 57.22 59.88 Opener Reputation 61.10 47.80 Opener Assignee Reputation 11.04 17.52 KDE-1 69.10 68.76 66.62 67.67 First Assignee Reputation 20.15 15.76 KDE-2 69.42 69.12 66.96 68.02 Initial Severity 7.59 18.93 Table 4: Performance of the predictive model. 1 refers to the model variant including only the features suggested by Guo et. al, 2 refers to the Table 5: Information gain for all the features that we use for the model variant that also includes the additional features suggested by us. predictive model. Higher numbers indicate more important features 13 14 Discussion Thank you for listening. Any questions? 15

Open Source Bug Fixes: Characterization and Dataset Prediction - PowerPoint PPT Presentation

Overview Introduction Motivation Research Questions Open Source Bug Fixes: Characterization and Dataset Prediction Data Collection Bug Report Features Factors A ff ecting Bug Fix Likelihood Reputations Edits and

Bug-inducing analysis to prevent fault prone bug fixes Yang Feng Nanjing University

Industrial Bug Mining Industrial Bug Mining Extracting, Grading and Enriching the Ore of Exploits

Fedora Bug Triage John "poelcat" Poelstra Jon "jds2001" Stanley June 21,

Make Money With Open Source What is Open Source? Community Free software vs. open source

v4-16-Release: bug reports, committed fixes and proposed changes P. Hristov 21/05/2009 Weekly

Bugzilla, Bug-squad and GNOME3 Presented By Akhil Laddha 1 Agenda About me Bugzilla Bug

Bug Driven Bug Finding Chadd C. Williams Jeffrey K. Hollingsworth University of Maryland

3/3/15 Announcement: Bug of the week (extra credit) Architectural Patterns Each group can

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Characterization of the Household Electricity Characterization of the Household Electricity

SITE CHARACTERIZATION Part 1. Non-Intrusive Site Characterization Technologies Tyler E. Gass,

Open Source Databases Peter Zaitsev, CEO Percona What a Year! Huge changes for Open Source and

Automating Your Lights with Open Source Combining Open Source Hardware with Free and Open Source

The State of Open Source Databases Peter Zaitsev CEO, Percona October 1 st , 2019 Open Source

Open-source without headaches Edwin Dalmaijer @esdalmaijer 20 November 2018 Wait, isnt open

Deliverables 4 Matt Calderwood Kirk LaBuda Nick Monaco Overall System Architecture (no changes

Feature Dis isentanglement to Aid id Im Imaging Biomarker Characterization for Genetic

The Twenty-Sixth Sunday in Ordinary Time Welcome Home Gathering Song: All Hail the Power of

ENERGY DISSIPATION STRUCTURES: INFLUENCE OF AERATION IN SUPERCRITICAL FLOWS Juan Jos Rebollo 1

The road four node Evan Lucas https://github.com/evanlucas https://twitter.com/evanhlucas 1

Whats your story? Using storytelling to propel engineering education research Shawn Jordan

And Chemosh said to me: Go, take Nebo from Israel! So I went by night and foughtagainst it from

Computational Semantics LING 571 Deep Processing for NLP October 28, 2019 1 Announcements

Critical Peaks Redefined Nao Hirokawa Julian Nagele Vincent van Oostrom Michio Oyamaguchi IFIP

Open Source Bug Fixes: Characterization and Dataset Prediction - PowerPoint PPT Presentation

Overview Introduction Motivation Research Questions Open Source Bug Fixes: Characterization and Dataset Prediction Data Collection Bug Report Features Factors A ff ecting Bug Fix Likelihood Reputations Edits and

Bug-inducing analysis to prevent fault prone bug fixes Yang Feng Nanjing University

Industrial Bug Mining Industrial Bug Mining Extracting, Grading and Enriching the Ore of Exploits

Fedora Bug Triage John &quot;poelcat&quot; Poelstra Jon &quot;jds2001&quot; Stanley June 21,

Make Money With Open Source What is Open Source? Community Free software vs. open source

v4-16-Release: bug reports, committed fixes and proposed changes P. Hristov 21/05/2009 Weekly

Bugzilla, Bug-squad and GNOME3 Presented By Akhil Laddha 1 Agenda About me Bugzilla Bug

Bug Driven Bug Finding Chadd C. Williams Jeffrey K. Hollingsworth University of Maryland

3/3/15 Announcement: Bug of the week (extra credit) Architectural Patterns Each group can

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Characterization of the Household Electricity Characterization of the Household Electricity

SITE CHARACTERIZATION Part 1. Non-Intrusive Site Characterization Technologies Tyler E. Gass,

Open Source Databases Peter Zaitsev, CEO Percona What a Year! Huge changes for Open Source and

Automating Your Lights with Open Source Combining Open Source Hardware with Free and Open Source

The State of Open Source Databases Peter Zaitsev CEO, Percona October 1 st , 2019 Open Source

Open-source without headaches Edwin Dalmaijer @esdalmaijer 20 November 2018 Wait, isnt open

Deliverables 4 Matt Calderwood Kirk LaBuda Nick Monaco Overall System Architecture (no changes

Feature Dis isentanglement to Aid id Im Imaging Biomarker Characterization for Genetic

The Twenty-Sixth Sunday in Ordinary Time Welcome Home Gathering Song: All Hail the Power of

ENERGY DISSIPATION STRUCTURES: INFLUENCE OF AERATION IN SUPERCRITICAL FLOWS Juan Jos Rebollo 1

The road four node Evan Lucas https://github.com/evanlucas https://twitter.com/evanhlucas 1

Whats your story? Using storytelling to propel engineering education research Shawn Jordan

And Chemosh said to me: Go, take Nebo from Israel! So I went by night and foughtagainst it from

Computational Semantics LING 571 Deep Processing for NLP October 28, 2019 1 Announcements

Critical Peaks Redefined Nao Hirokawa Julian Nagele Vincent van Oostrom Michio Oyamaguchi IFIP

Fedora Bug Triage John "poelcat" Poelstra Jon "jds2001" Stanley June 21,