3. Case studies of code cloning ER Motivation: model Lots of - PDF document

1. Longitudinal case studies of Four “interesting” ways in which growth and evolution history can teach us about software • Studied several OSSs, esp. 6000 Linux kernel: 5000 Development releases (1.1, 1.3, 2.1, 2.3) # of source code files (*.[ch] ) Stable releases (1.0, 1.2, 2.0, 2.2) – Looked for “evolutionary 4000 narratives” to explain 3000 University of Waterloo observable historical Michael W. Godfrey * phenomena 2000 1000 Xinyi Dong • Methodology: 0 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 Cory Kapser – Analyze individual tarball 140 Lijie Zou versions 120 – Build hierarchical metrics 100 data model Uncommented LOC 80 Software Architecture Group (SWAG) – Generate graphs, look for interesting lumps under the 60 University of Waterloo carpet, try to answer why 40 Average .h file size -- dev. releases Average .h file size -- stable releases 20 Median .h file size -- dev. releases Median .h file size -- stable releases *Currently on sabbatical at Sun Microsystems 0 Jan 1993 Jun 1994 Oct 1995 Mar 1997 Jul 1998 Dec 1999 Apr 2001 1. Longitudinal case studies of 2. Case studies of origin analysis growth and evolution V new z • Reasoning about structural change – (moving, renaming, merging, splitting, etc .) f – Try to reconstruct what happened Source Analysis Metrics x y – Formalized several “change patterns” code scripts data • e.g., service consolidation Extraction / analysis ??? • Methodology: – Consider consecutive pairs of versions: V old z • Entity analysis – metrics-based clone detection g • Relationship analysis – compare relational MS Exploration images (calls, called-by, uses, extends, etc ) x y Excel – Create evolutionary record of what happened • what evolved from what, and how/why 2. Case studies of origin analysis 3. Case studies of code cloning ER • Motivation: model – Lots of research in clone detection, but more on algorithms and cppx / tools than on case studies and comprehension Source Understand / • What kinds of cloning are there? Why does cloning happen? What code kinds are the most/least harmful? Do different clone kinds have Metrics Beagle different precision / recall numbers? Different algorithms? data – Future work: track clone evolution Extraction / analysis • Do related bugs get fixed? Does cloned code have more bugs? • Methodology: 1. Use CCFinder on source to find initial clone pairs. 2. Use ctags to map out source files into “entity regions” – Consecutive typedefs, fcn prototypes, var defs – Individual macros, structs, unions, enums, fcn defs Exploration Beagle 3. Map (abstract up) clone pairs to the source code regions

3. Case studies of code cloning 3. Case studies of code cloning • Methodology: 4. Filter different region kinds according to observed heuristics CCFinder Source – C struct s often look alike; parameterized string matching returns many Custom filters Taxonomized more false positives without these filters than, say, between functions. code 5. Sort clones by location: and sorter clone pairs – Same region, same file, same directory, or different directory 6. … and entity kind: ctags – Fcn to fcn – structures ( enum , union , struct ) Extraction / analysis – macro – heterogeneous (different region kinds) – misc. clones 7. … and even more detailed criteria: – Function initialization / finalization clones, … Exploration CICS gui 8. Navigate and investigate using CICS gui, look for patterns – Cross subsystem clones seems to vary more over time – Intra subsystem clones are usually function clones 4. Longitudinal case studies of software 4. Longitudinal case studies of software manufacturing-related artifacts manufacturing-related artifacts • Some results: – Between 58 and 81 % of the core developers Q: How much maintenance effort is put into SM contributed changes to SM artifacts artifacts, relative to the system as a whole? – SM artifacts were responsible for • 3-10% of the number of changes made • Up to 20% of the total LOC changed (GCC) • Studying six OSSs: – GCC, PostgreSQL, kepler, ant, mycore, • Open questions: midworld – How difficult is it to maintain these artifacts? • All used CVS; we examined their logs – Do different SM tools require different amounts of • We look for SM artifacts ( Makefile , build.xml , effort? SConscript ) and compared them to non-SM artifacts 4. Longitudinal case studies of software Dimensions of studies manufacturing-related artifacts • Single version vs . consecutive version pairs vs. longitudinal study CVS Analysis Metrics • Coarsely vs. finely grained detail repos scripts data • Intermediate representation of artifacts: Extraction / analysis – Raw code vs. metrics vs. ER-like semantic model – Navigable representation of system architecture; auto- abstraction of info at arbitrary levels MS Exploration Excel

Challenges in this field Challenges in this field 1. Dealing with scale 3. Artifact linkage and analysis granularity • “Big system analysis” times “many versions” • Repositories (CVS, Unix fs) often store only • Research tools often live at bleeding edge, source code, with no special understanding of, slow and produce voluminous detail say, where a particular method resides. • (How) should we make them smarter? 2. Automation • e.g., ctags and CCfinder • Research tools often buggy, require handholding 4. [Your thoughts?] • Often, hard to get automated multiple analyses.

3. Case studies of code cloning ER Motivation: model Lots of - PDF document

1. Longitudinal case studies of Four interesting ways in which growth and evolution history can teach us about software Studied several OSSs, esp. 6000 Linux kernel: 5000 Development releases (1.1, 1.3, 2.1, 2.3) # of source code

SHEEP CLONING Paley Li, Nicholas Cameron, and James Noble 2 Object cloning How do you do

ENZYMES IN CLONING PART I Dr.Sarookhani / / Cloning Cloning -

Cloning Tools Photoshop Tutorials Introduction In a skilled and experienced hand, the cloning

Ligase-Independent Cloning Ligase-Independent Cloning for BioBrick Preparation for BioBrick

DNA CLONING DNA CLONING Dr.Sarookhani Dr.Sarookhani / /

Pseudorandom States, No-Cloning Pseudorandom States, No-Cloning Theorems and Quantum Money

EQUINE CLONING HISTORY AND A CRYSTAL BALL Introduction This paper will offer some highlights

Making Context-sensitive Points-to Analysis with Heap Cloning Practical For The Real World Chris

Looking for someone to do presentation on cloning >>>CLICK HERE<<< Looking for

Genes can be cloned in recombinant plasmids Gene cloning Enzymes are used to cut and paste

Cloning Considered Harmful Considered Harmful Cory Kapser and Michael W. Godfrey David R.

Project Project Walrus Walrus Make the most of your card cloning devices Make the most of your

Actifio DCA for Oracle Understanding the business and IT impact of the Actifio Database Cloning

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 2: Behavioral Cloning from

Cloning First thing in course: distinguishing factual and normative claims. Factual (do

Quantum Communication from No-Cloning to the Quantum Repeater Institut fr Physik,

An Empirical Study of Code Clone Genealogies

Working together to end FGM Edited slide presentation. For information on training contact:-

Handling Covariates in the Design Rosenberger of Clinical Trials I. Introduction Covariates and

for COMS 3157 Advanced Programming What you need to know for AP 1. Understanding version control,

Evaluating Code Duplication Evaluating Code Duplication Detection Techniques Detection

CS 285 Instructor: Sergey Levine UC Berkeley Terminology & notation 1. run away 2. ignore

Types for Deep/Shallow Cloning Ka Wai Cheng Imperial College London Department of Computing

Objects, Clones and Collections Implementation and simulation with simecol An example

3. Case studies of code cloning ER Motivation: model Lots of - PDF document

1. Longitudinal case studies of Four interesting ways in which growth and evolution history can teach us about software Studied several OSSs, esp. 6000 Linux kernel: 5000 Development releases (1.1, 1.3, 2.1, 2.3) # of source code

SHEEP CLONING Paley Li, Nicholas Cameron, and James Noble 2 Object cloning How do you do

ENZYMES IN CLONING PART I Dr.Sarookhani / / Cloning Cloning -

Cloning Tools Photoshop Tutorials Introduction In a skilled and experienced hand, the cloning

Ligase-Independent Cloning Ligase-Independent Cloning for BioBrick Preparation for BioBrick

DNA CLONING DNA CLONING Dr.Sarookhani Dr.Sarookhani / /

Pseudorandom States, No-Cloning Pseudorandom States, No-Cloning Theorems and Quantum Money

EQUINE CLONING HISTORY AND A CRYSTAL BALL Introduction This paper will offer some highlights

Making Context-sensitive Points-to Analysis with Heap Cloning Practical For The Real World Chris

Looking for someone to do presentation on cloning &gt;&gt;&gt;CLICK HERE&lt;&lt;&lt; Looking for

Genes can be cloned in recombinant plasmids Gene cloning Enzymes are used to cut and paste

Cloning Considered Harmful Considered Harmful Cory Kapser and Michael W. Godfrey David R.

Project Project Walrus Walrus Make the most of your card cloning devices Make the most of your

Actifio DCA for Oracle Understanding the business and IT impact of the Actifio Database Cloning

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 2: Behavioral Cloning from

Cloning First thing in course: distinguishing factual and normative claims. Factual (do

Quantum Communication from No-Cloning to the Quantum Repeater Institut fr Physik,

An Empirical Study of Code Clone Genealogies

Working together to end FGM Edited slide presentation. For information on training contact:-

Handling Covariates in the Design Rosenberger of Clinical Trials I. Introduction Covariates and

for COMS 3157 Advanced Programming What you need to know for AP 1. Understanding version control,

Evaluating Code Duplication Evaluating Code Duplication Detection Techniques Detection

CS 285 Instructor: Sergey Levine UC Berkeley Terminology &amp; notation 1. run away 2. ignore

Types for Deep/Shallow Cloning Ka Wai Cheng Imperial College London Department of Computing

Objects, Clones and Collections Implementation and simulation with simecol An example

Looking for someone to do presentation on cloning >>>CLICK HERE<<< Looking for

CS 285 Instructor: Sergey Levine UC Berkeley Terminology & notation 1. run away 2. ignore