Reported Bugs in a Software Repository Hadi Jahanshahi Mucahit - PowerPoint PPT Presentation

Predicting the Number of Reported Bugs in a Software Repository Hadi Jahanshahi Mucahit Cevik Ayşe Başar May 2020 33 rd Canadian Conference on Artificial Intelligence Data Science Laboratory

Outline Data Science Laboratory • Introduction • Contribution and Research Questions • Methodology • Result • Conclusion Predicting the Number of Reported Bugs in a Software Repository May 20 2/20 Jahanshahi et al.

Reported bugs’ pattern Data Science Laboratory • Why is predicting the number of bugs reported to a system important? • Bug prediction : binary classification • Predicting the number of bugs : Regression task • Predicting the number of reported bugs : Time series prediction Predicting the Number of Reported Bugs in a Software Repository May 20 3/20 Jahanshahi et al.

General Idea Data Science Laboratory • In this paper, the number of reported bugs to the Mozilla bug repository during the last decade is extracted. • The release times of Mozilla updates is used as an exogenous variable. • Different time series prediction methods have been utilized to investigate the performance of each model under different circumstances. Predicting the Number of Reported Bugs in a Software Repository May 20 4/20 Jahanshahi et al.

Previous studies and our contributions [I] Data Science Laboratory • Previous studies use generic time series models without having a rational baseline to compare their models. [1, 2, 3, 4] • Another study [5] used time series analysis to determine seasonality and trends of Affective Metrics for Software Development. • They consider the evolution of human aspects in SE while our study focuses on the reported number of bugs in software as a metric which helps developers maintain the software quality Predicting the Number of Reported Bugs in a Software Repository May 20 5/20 Jahanshahi et al.

Previous studies and our contributions [II] Data Science Laboratory • Wang and Zhang [5] design Defect State Transition models and apply the Markovian method to predict the number of defects at each state in the future. • There are also studies that consider software defect number prediction in method-level and file-level [6, 7, 8]. Predicting the Number of Reported Bugs in a Software Repository May 20 6/20 Jahanshahi et al.

Research Questions Data Science Laboratory RQ1 : How accurately the number of bugs in a project can be predicted using time series analysis? RQ2 : How feasible is long-term bug number prediction? Predicting the Number of Reported Bugs in a Software Repository May 20 7/20 Jahanshahi et al.

Data preparation Data Science Laboratory • For time series prediction, we first check whether the given data is stationary. As the p-value of the test is 0.012, there is no need to have supplementary preprocessing. • Checking Auto-correlation function (ACF) and partial autocorrelation function (PACF) Predicting the Number of Reported Bugs in a Software Repository May 20 8/20 Jahanshahi et al.

Methodology Data Science Laboratory • Rolling method is used for training the time series dataset. Predicting the Number of Reported Bugs in a Software Repository May 20 9/20 Jahanshahi et al.

Data Data Science Laboratory • We have extracted the number of reported bugs from the Mozilla bug repository*. ____________________________ * Mozilla Bug Tracking System. https://bugzilla.mozilla.org/. Predicting the Number of Reported Bugs in a Software Repository May 20 10/20 Jahanshahi et al.

Forecasting Models (I) Data Science Laboratory • Naive Baseline : It assumes the number of bugs at time t is equal to that at time t−1. • EXP : It considers two factors in its prediction: the forecast value at the previous timestamp and its actual value. Therefore, it is defined as • WMA : Weighted Moving Average simply forecasts based on a weighted average of the previous steps. Predicting the Number of Reported Bugs in a Software Repository May 20 11/20 Jahanshahi et al.

Forecasting Models (II) Data Science Laboratory • ARIMA : The general ARIMA model (p, q, d) is formulated as Where . • RF : We applied RF Regressor as a new method that has not been used in this domain. Predicting the Number of Reported Bugs in a Software Repository May 20 12/20 Jahanshahi et al.

Forecasting Models (III) Data Science Laboratory • LSTM : We use the LSTM cell architecture defined by [9] as follows: • All models’ parameters are shown in Table 1. Predicting the Number of Reported Bugs in a Software Repository May 20 13/20 Jahanshahi et al.

Results (I) Data Science Laboratory Predicting the Number of Reported Bugs in a Software Repository May 20 14/20 Jahanshahi et al.

Results (II) Data Science Laboratory Predicting the Number of Reported Bugs in a Software Repository May 20 15/20 Jahanshahi et al.

Answer to the Research Questions Data Science Laboratory • RQ1 : How accurately the number of bugs in a project can be predicted using time series analysis? • Surprisingly, the performance of a one-step prediction for all models is not significantly different. Furthermore, the baseline seems as good as the others, a new finding which was not considered in previous studies. • RQ2 : How feasible is long-term bug number prediction? • For the Mozilla project, LSTM shows a significant improvement compared to traditional time series models. Predicting the Number of Reported Bugs in a Software Repository May 20 16/20 Jahanshahi et al.

Conclusions Data Science Laboratory • What we expect to see from our time series analyses: • to forecast the number of future defects • to identify the trends and abnormality in the system. • Our observations: • The number of bugs introduced to the system is stationary. • Considering eight different methods with five different performance metrics, Random Forest with exogenous variables exceeds other methods. • Deep learning, especially LSTM in our case, significantly enhances the long- term prediction. Predicting the Number of Reported Bugs in a Software Repository May 20 17/20 Jahanshahi et al.

Main references (I) Data Science Laboratory [1] Kenmei, B., Antoniol, G., di Penta, M.: Trend analysis and issue prediction in large-scale open source systems. In: 2008 12th European Conference on Software Maintenance and Reengineering, pp. 73 – 82, April 2008 [2] Krishna, R., Agrawal, A., Rahman, A., Sobran, A., Menzies, T.: What is the connection between issues, bugs, and enhancements? Lessons learned from 800+ software projects. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2018, pp. 306 – 315. Association for Computing Machinery, New York (2018) [3] Wu, W., Zhang, W., Yang, Y., Wang, Q.: Time series analysis for bug number prediction. In: The 2nd International Conference on Software Engineering and Data Mining, pp. 589 – 596, June 2010 Predicting the Number of Reported Bugs in a Software Repository May 20 18/20 Jahanshahi et al.

Main references (II) Data Science Laboratory [4] Yazdi, H.S., Angelis, L., Kehrer, T., Kelter, U.: A framework for capturing, statistically modeling and analyzing the evolution of software models. J. Syst. Softw. 118, 176 – 207 (2016) [4] Destefanis, G., Ortu, M., Counsell, S., Swift, S., Tonelli, R., Marchesi, M.: On the randomness and seasonality of affective metrics for software development. In: Proceedings of the Symposium on Applied Computing, SAC 2017, pp. 1266 – 1271. Association for Computing Machinery, New York (2017) [5] Wang, J., Zhang, H.: Predicting defect numbers based on defect state transition models. In: Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 191 – 200, September 2012 Predicting the Number of Reported Bugs in a Software Repository May 20 19/20 Jahanshahi et al.

Main references (II) Data Science Laboratory [6] Chen, X., Zhang, D., Zhao, Y., Cui, Z., Ni, C.: Software defect number prediction: unsupervised vs supervised methods. Inf. Softw. Technol. 106, 161 – 181 (2019) [7] Gao, K., Khoshgoftaar, T.M.: A comprehensive empirical study of count models for software fault prediction. IEEE Trans. Reliab. 56(2), 223 – 236 (2007) [8] Graves, T.L., Karr, A.F., Marron, J.S., Siy, H.: Predicting fault incidence using software change history. IEEE Trans. Softw. Eng. 26(7), 653 – 661 (2000) [9] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735 – 1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 Predicting the Number of Reported Bugs in a Software Repository May 20 20/20 Jahanshahi et al.

Reported Bugs in a Software Repository Hadi Jahanshahi Mucahit - PowerPoint PPT Presentation

Predicting the Number of Reported Bugs in a Software Repository Hadi Jahanshahi Mucahit Cevik Aye Baar May 2020 33 rd Canadian Conference on Artificial Intelligence Data Science Laboratory Outline Data Science Laboratory

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Outline Bugs! 1 Avoiding and Finding bugs 2 Bugs still happen 3 Why do bugs still happen ?!

CS412 Software Security Software Bugs Mathias Payer EPFL, Spring 2019 Mathias Payer CS412

BED BUGS HOW TO HELP SOLVE THE PROBLEM WHAT ARE BED BUGS? Bed bugs are parasites that feed on

IST-Pesticides RESEARCH SUPPORTED BY: Osborne Natural Enemies Bugs eating Bugs What

IN SCRUM PROJECTS Ramesh Shiraddi Bugs Current sprint bugs -- Created and found in current

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer & PMC Member uschindler@apache.org

Part I. Hunting for Bugs Vadim Mutilin Institute for System Programming of the Russian Academy of

Security Bugs in Protocols are Really Bad! Marsh Ray PhoneFactor Protocol Bugs Objectives

Finding Bugs Last time Run-time reordering transformations Today Program Analysis for

1 Making BUGS Open 2 Adopt a module BUGS is a long running software project aiming to make

Software has bugs To find them , we use testing and code reviews ! But some bugs are still

and interpretation of patient reported data/patient reported outcomes? Much of patient

1 1)CONTEXTUALIZAO 2) FORMAO DO REPORTED SPEECH 3) CLASSIFICAO DO REPORTED

A Bugs Life Definition Examples Computer Literacy 1 Lecture 16 Algorithms

SCARC Bed Bug Training Education, Treatment and Prevention History of Bed Bugs Bed

Journals Scientific journals started in 1665 French Journal des savans English

Agenda Announcements Structure APT Membership and for loops 1/14/2013 CompSci101

Mac hine L e ar ning Intr oduc tion Stanley Liang, PhD York University What is Mac hine L e

Department of Computer Science Lehman College, City University of New York Summer 2020 CMP

Mat2170 Course Goals Develop Algorithm Design Skills : writing step-by-step instructions to

Course Introduction History of Bits, Bytes & Computing Numbers Computer Systems TDDD63

USER-CENTERED LANGUAGE DESIGN, PART 2: TASKS Michael Coblenz YOUR TURN Assume you have some

Wheres Wally? In search of citizen perspectives on the smart city. Brief intro Image

Reported Bugs in a Software Repository Hadi Jahanshahi Mucahit - PowerPoint PPT Presentation

Predicting the Number of Reported Bugs in a Software Repository Hadi Jahanshahi Mucahit Cevik Aye Baar May 2020 33 rd Canadian Conference on Artificial Intelligence Data Science Laboratory Outline Data Science Laboratory

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Outline Bugs! 1 Avoiding and Finding bugs 2 Bugs still happen 3 Why do bugs still happen ?!

CS412 Software Security Software Bugs Mathias Payer EPFL, Spring 2019 Mathias Payer CS412

BED BUGS HOW TO HELP SOLVE THE PROBLEM WHAT ARE BED BUGS? Bed bugs are parasites that feed on

IST-Pesticides RESEARCH SUPPORTED BY: Osborne Natural Enemies Bugs eating Bugs What

IN SCRUM PROJECTS Ramesh Shiraddi Bugs Current sprint bugs -- Created and found in current

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer &amp; PMC Member uschindler@apache.org

Part I. Hunting for Bugs Vadim Mutilin Institute for System Programming of the Russian Academy of

Security Bugs in Protocols are Really Bad! Marsh Ray PhoneFactor Protocol Bugs Objectives

Finding Bugs Last time Run-time reordering transformations Today Program Analysis for

1 Making BUGS Open 2 Adopt a module BUGS is a long running software project aiming to make

Software has bugs To find them , we use testing and code reviews ! But some bugs are still

and interpretation of patient reported data/patient reported outcomes? Much of patient

1 1)CONTEXTUALIZAO 2) FORMAO DO REPORTED SPEECH 3) CLASSIFICAO DO REPORTED

A Bugs Life Definition Examples Computer Literacy 1 Lecture 16 Algorithms

SCARC Bed Bug Training Education, Treatment and Prevention History of Bed Bugs Bed

Journals Scientific journals started in 1665 French Journal des savans English

Agenda Announcements Structure APT Membership and for loops 1/14/2013 CompSci101

Mac hine L e ar ning Intr oduc tion Stanley Liang, PhD York University What is Mac hine L e

Department of Computer Science Lehman College, City University of New York Summer 2020 CMP

Mat2170 Course Goals Develop Algorithm Design Skills : writing step-by-step instructions to

Course Introduction History of Bits, Bytes &amp; Computing Numbers Computer Systems TDDD63

USER-CENTERED LANGUAGE DESIGN, PART 2: TASKS Michael Coblenz YOUR TURN Assume you have some

Wheres Wally? In search of citizen perspectives on the smart city. Brief intro Image

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer & PMC Member uschindler@apache.org

Course Introduction History of Bits, Bytes & Computing Numbers Computer Systems TDDD63