Ahmed E Hassan Ahmed E. Hassan NSERC/RIM Software Engineering - PDF document

Ahmed E Hassan Ahmed E. Hassan • NSERC/RIM Software Engineering Research Chair Queen’s University, Canada y, Mining Software Engineering Data g g g • Leads the SAIL research group at Queen’s • Co-chair for Workshop on Mining Software C h i f W k h Mi i S ft Ahmed E. Hassan Tao Xie Repositories (MSR) from 2004-2006 Queen’s University Q y North Carolina State University y www.cs.queensu.ca/~ahmed www.csc.ncsu.edu/faculty/xie • Chair of the steering committee for MSR ahmed@cs.queensu.ca xie@csc.ncsu.edu Some slides are adapted from tutorial slides co-prepared by Jian Pei from Simon Fraser University, Canada y An up-to-date version of this tutorial is available at http://ase.csc.ncsu.edu/dmse/ A. E. Hassan and T. Xie: Mining Software Engineering Data 2 Tao Xie Tao Xie Acknowledgments Acknowledgments • Assistant Professor at North Carolina State A i t t P f t N th C li St t • Jian Pei, SFU University, USA • Thomas Zimmermann Microsoft Research Thomas Zimmermann, Microsoft Research • Leads the ASE research group at NCSU • Peter Rigby, U. of Victoria • PC Co-Chair of ICSM 2009 MSR 2011 PC Co Chair of ICSM 2009 MSR 2011 • Sunghun Kim, HKUST • Co-organizer of 2007 Dagstuhl Seminar on • John Anvik, U. of Victoria • John Anvik U of Victoria Mining Programs and Processes Mining Programs and Processes A. E. Hassan and T. Xie: Mining Software Engineering Data 3 A. E. Hassan and T. Xie: Mining Software Engineering Data 4 Tutorial Goals Tutorial Goals Mining SE Data Mining SE Data • Learn about: • MAIN GOAL – Recent and notable research and researchers in mining – Transform static record- SE data keeping SE data to active – Data mining and data processing techniques and how to data apply them to SE data l th t SE d t – Make SE data actionable – Risks in using SE data due to e.g., noise, project culture by uncovering hidden by uncovering hidden • By end of tutorial, you should be able: patterns and trends – Retrieve SE data Bugzilla Mailings Mailings Bugzilla – Prepare SE data for mining – Mine interesting information from SE data Code Execution CVS CVS repository traces A. E. Hassan and T. Xie: Mining Software Engineering Data 5 A. E. Hassan and T. Xie: Mining Software Engineering Data 6

Mining SE Data Mining SE Data Overview of Mining SE Data Overview of Mining SE Data • SE data can be used to: programming defect detection testing debugging maintenance … – Gain empirically-based understanding of p y g software engineering tasks helped by data mining ft i i t k h l d b d t i i software development – Predict plan and understand various aspects Predict, plan, and understand various aspects of a project association/ classification clustering … patterns – Support future development and project Support future development and project data mining techniques management activities code change program structural bug … bases bases history history states states entities entities reports reports software engineering data A. E. Hassan and T. Xie: Mining Software Engineering Data 7 A. E. Hassan and T. Xie: Mining Software Engineering Data 8 Overview of Mining SE Data g Overview of Mining SE Data Overview of Mining SE Data 99 ASE 00 ICSE 05 FSE*2 99 FSE 99 FSE ASE 01 ICSE programming defect detection testing debugging maintenance … PLDI FSE POPL 02 ISSTA OSDI OSDI software engineering tasks helped by data mining ft i i t k h l d b d t i i POPL 06 PLDI KDD OOPSLA 03 PLDI KDD 99 ICSE 99 ICSE 04 ASE 04 ASE 07 ICSE*3 02 ICSE ISSTA association/ FSE*3 03 PLDI 05 ICSE classification clustering 03 ICSE … patterns ASE 05 FSE ASE 06 ICSE PLDI*2 PLDI*2 04 ICSE 04 ICSE PLDI PLDI 06 ICSE 06 ICSE 06 ASE data mining techniques ISSTA*2 05 FSE*2 06 ISSTA FSE*2 07 ICSE KDD 06 ASE 07 ISSTA 07 PLDI SOSP 07 ICSE*2 08 ICSE 08 ICSE*3 08 ICSE 3 08 ICSE 08 ICSE 08 ICSE 08 ICSE code change program structural bug code change program structural bug … … bases bases history history states states entities entities reports/nl reports/nl bases bases history history states states entities entities reports reports software engineering data software engineering data A. E. Hassan and T. Xie: Mining Software Engineering Data 9 A. E. Hassan and T. Xie: Mining Software Engineering Data 10 Overview of Mining SE Data Overview of Mining SE Data Tutorial Outline Tutorial Outline • Part I: What can you learn from SE data? programming defect detection testing debugging maintenance … – A sample of notable recent findings for different p g software engineering tasks helped by data mining ft i i t k h l d b d t i i SE data types 02 KDD 99 ASE 01 SOSP 99 ICSE 03 ICSE 04 ICSE 00 ICSE 04 OSDI PLDI*2 0 01 ICSE*2 CS ASE 05 FSE 05 FSE*2 05 ICSE FSE 05 FSE PLDI 06 ICSE*2 FSE 02 ICSE • Part II: How can you mine SE data? ASE*2 POPL 07 ICSE*2 ISSTA ASE 06 KDD 06 KDD 06 FSE 06 FSE FSE*2 FSE*2 POPL POPL PLDI PLDI – Overview of data mining techniques 07 ICSE*3 OOPSLA ISSTA 04 ISSTA 06 ICSE 08 ICSE*2 PLDI PLDI*2 06 ISSTA FSE – Overview of SE data processing tools and Overview of SE data processing tools and 07 FSE SOSP 07 ICSE ASE ISSTA 08 ICSE*3 techniques ISSTA PLDI KDD 08 ICSE A. E. Hassan and T. Xie: Mining Software Engineering Data 11 A. E. Hassan and T. Xie: Mining Software Engineering Data 12

Types of SE Data Types of SE Data Historical Data Historical Data • Historical data – Version or source control: cvs, subversion, perforce “History is a guide to navigation in History is a guide to navigation in – Bug systems: bugzilla, GNATS, JIRA perilous times. History is who we are – Mailing lists: mbox • Multi-run and multi-site data and why we are the way we are.” – Execution traces - David C. McCullough - David C McCullough – Deployment logs • Source code data Source code data – Source code repositories: sourceforge.net, google code A. E. Hassan and T. Xie: Mining Software Engineering Data 13 A. E. Hassan and T. Xie: Mining Software Engineering Data 14 Percentage of Project Costs Historical Data Historical Data Devoted to Maintenance • Track the evolution of a software project: 100 – source control systems store changes to the code 95 95 – defect tracking systems follow the resolution of defects Moad 90 Erlikh 00 90 – archived project communications record rationale for 85 decisions throughout the life of a project 80 Lientz & Swanson 81 • Used primarily for record-keeping activities: 75 Eastwood 93 Eastwood 93 70 – checking the status of a bug McKee 1984 65 – retrieving old code Zelkowitz 79 Huff 90 Port 98 60 1975 1980 1985 1990 1995 2000 2005 A. E. Hassan and T. Xie: Mining Software Engineering Data 15 A. E. Hassan and T. Xie: Mining Software Engineering Data 16 Survey of Software Maintenance Activities • Perfective: add new functionality dd f ti lit P f ti Source Control Repositories p • Corrective: fix faults Corrective: fix faults • Adaptive: new file formats, refactoring 2.2 2 2 18.2 39 0 39.0 17 4 17.4 56.7 60.3 Lientz, Swanson, Tomhkins [1978] Schach, Jin, Yu, Heller, Offutt [2003] Nosek, Palvia [1990] Mining ChangeLogs MIS Survey (Linux, GCC, RTP) A. E. Hassan and T. Xie: Mining Software Engineering Data 17

Ahmed E Hassan Ahmed E. Hassan NSERC/RIM Software Engineering - PDF document

Ahmed E Hassan Ahmed E. Hassan NSERC/RIM Software Engineering Research Chair Queens University, Canada y, Mining Software Engineering Data g g g Leads the SAIL research group at Queens Co-chair for Workshop on Mining

Mining Software Engineering Data Tao Xie Ahmed E. Hassan North Carolina State University

Ahmed Ali Profile Ahmed has a long track record in the Australian poultry industry. After

Crosscutting Concerns Using Historical Code Changes Bram Adams Zhen Ming Jiang Ahmed E. Hassan

OPENING REMARKS BY MR. AHMED ISSACK HASSAN EBS CHAIRPERSON OF THE INDEPENDENT ELECTORAL AND

CISC 326 Game Architecture Module 02: Challenges In Game Development Ahmed E. Hassan (with

CISC 322 Software Architecture Project Scheduling (PERT/CPM) Ahmed E. Hassan Project A

CISC 322 Software Architecture Lecture 13: Reflexion Models and Source Sticky Notes Emad

CISC 322 Software Architecture Example of COCOMO-II Ahmed E. Hassan Function Point Table

CISC 322 Software/Game Architecture Module 7: Project Scheduling (PERT/CPM) Ahmed E. Hassan

CISC 326 Game Architecture Module 6: Reference Architectures (Web Servers

CISC 322 Software/Game Architecture Module 4: Examples of Architectures (Linux) Ahmed E. Hassan

Nasr Nasradin adin-Ahmed Ahmed Ibr Ibrahim ahim Geophys Geophysicist icist Geology

Emerging Trends In Insulation By Shahab Z Ahmed (AVP) Shahab Z Ahmed -AVP ALP Aeroflex India

Cdm dmCL CL, a Specific Textual Constraint Language for Common Data Model Ahmed Ahmed, Paola

Problems Problem Spaces Problems, Problem Spaces, and Search Ahmed Rafea Ahmed Rafea Problem

GPAC: delivery of VR/360 videos using Tiles Ahmed Rida SEKKAT Ahmed JELIJLI Telecom ParisTech

ICT sector: A patent-based perspective Nicoletta CORROCHER # Grazia CECERE* Mge ZMAN*

program evaluation: you dont need a phd to do it! Sara Corwin, MPH, PhD University of South

[2/2] Find scary C++ bugs before they find you Konstantin Serebryany, Google May 2014

The debate! Ajay Singh YES, WE CAN Dr. Locatelli NO, WE CAN T Should

Chronological Concepts of the Ancient World in Linked Data - Roman Consuls as Eponyms - Digital

P r a c t i c a l L i n k e d D a t a A c c e s s v i a S P A R Q

The Impact of Mass Loss Bin C osmos on the Final Structure Mathieu Renzo and Fate of PhD

The Athenian Acropolis Image courtesy of Jack Versloot on flickr. License CC BY. 1 Temple of