Do Code Clones Matter? Software Engineering Seminar Spring Semester - - PowerPoint PPT Presentation
Do Code Clones Matter? Software Engineering Seminar Spring Semester - - PowerPoint PPT Presentation
Do Code Clones Matter? Software Engineering Seminar Spring Semester 2010 Dan Tecu Paper presented today: Do Code Clones Matter? Juergens, E.; Deissenboeck, F.; Hummel, B.; Wagner, S. ICSE'09, May 16-24, 2009, Vancouver, Canada 09 th of March
09th of March 2 Do Code Clones Matter?
Do Code Clones Matter?
Juergens, E.; Deissenboeck, F.; Hummel, B.; Wagner, S. ICSE'09, May 16-24, 2009, Vancouver, Canada
Paper presented today:
09th of March 3 Do Code Clones Matter?
What is code cloning?
- Code cloning is a primitive form of code reuse, usually by
copying (copy-paste)
- Code cloning may happen intentionally or unintentionally
09th of March 4 Do Code Clones Matter?
Clones and possible consequences
- Clones introduce redundancy and might introduce
dependencies
- Inconsistent changes to cloned code might create faults
and might lead to incorrect behavior
- Inconsistent bug fixing: when a bug found in cloned code is
fixed in only one clone
09th of March 5 Do Code Clones Matter?
Previous work
- Most previous work agrees that cloning poses a problem
for software maintenance [Lague et al. 1997]
- However, there is little information available concerning the
impact of code cloning on software quality
- Some researchers even started to doubt that code cloning
is harmful [Krinke, 2007]
09th of March 6 Do Code Clones Matter?
Research problem
- This paper purpose is to shed some light in this field
- It presents the results of a large case study that was
undertaken to find out:
- 1. whether the clones are changed inconsistently
- 2. whether the inconsistencies are introduced intentionally
- 3. whether the unintentional inconsistencies can represent faults
09th of March 7 Do Code Clones Matter?
Contribution
- A novel suffix-tree based algorithm for detection of
inconsistent clones
- An empirical study showing whether the cloned code is
harmful or not
09th of March 8 Do Code Clones Matter?
Detection of inconsistent clones
- Works on token level
- Identifier names are irrelevant (due to the normalizer)
- Clone groups whose clones overlap with each other are
filtered out
09th of March 9 Do Code Clones Matter?
Detection in detail: suffix trees
- A suffix tree for the word: “wood”
- A path from root to a leaf
describes (uniquely) a suffix; the tree describes all suffixes
- The circles represent nodes, the
rectangles represent leaves
- Each edge to a node is labeled
with a substring of the initial word
- Each edge to a leaf is labeled
with the empty string wood d
- d
- d
09th of March 10 Do Code Clones Matter?
Detection in detail: edit distance
- The edit distance between 2 strings is the minimum
number of operations required to change one string into another
- The allowed operations are:
- 1. Deletion of a character
- 2. Insertion of a character
- 3. Changing a character into another
- The edit distance between “wood” and “floor” is 3
09th of March 11 Do Code Clones Matter?
Detection in detail: the algorithm
- Input parameters: the sequence (composed of n tokens),
maximum edit distance and minimal clone length
- The suffix tree over the input sequence is constructed
- For each suffix an approximate search based on the edit
distance is performed in the tree
09th of March 12 Do Code Clones Matter?
Performance of algorithm
- Worst case complexity is hard to analyze
- Results on Intel Core 2 Duo 2.4 GHz, 3.5 GB RAM, running
Java in a single thread are shown below:
09th of March 13 Do Code Clones Matter?
Study description: study objects
- 5 projects described below, were analyzed:
System Organization Language Age (years) Size (kLOC) A Munich Re C# 6 317 B Munich Re C# 4 454 C Munich Re C# 2 495 D LV 1871 Cobol 17 197 Sysiphus TUM Java 8 281
09th of March 14 Do Code Clones Matter?
Study description: research questions
- RQ1: Are clones changed inconsistently?
- RQ2: Are inconsistent clones created unintentionally?
- RQ3: Can inconsistent clones be indicators for faults in real
systems?
09th of March 15 Do Code Clones Matter?
Study measurements: parameters tuning
- Minimal clone length: 10 statements (for Cobol: 20)
- Maximum edit distance: 5 (for Cobol: 10)
- Maximal inconsistency ratio (the ratio of edit distance and
clone length): 0.2
- Additional constraint: the first 2 statements of two clones
need to be equal
09th of March 16 Do Code Clones Matter?
Study measurements: absolute numbers
09th of March 17 Do Code Clones Matter?
Study measurements: relative numbers
- RQ1 mean value: 0.52
- RQ2 mean value: 0.28
- RQ3 mean value: 0.15
09th of March 18 Do Code Clones Matter?
Fault density
- To answer RQ3, we have to compare the fault density in
inconsistencies against the average fault density
- Fault density in inconsistencies was evaluated in faults/
kLOC
- But, the average fault density in the analyzed systems was
not known
- Typical range for fault density is: 0.1 – 50 faults/kLOC
[Endres and Rombach, 2003]
09th of March 19 Do Code Clones Matter?
RQ3 answered
- Average fault density in inconsistencies: 48.1 faults/kLOC
09th of March 20 Do Code Clones Matter?
Threats to validity
- The development repositories of the systems were not
analyzed (to trace the evolution of inconsistencies)
- Comparison with the actual fault density would have been
better
- The analyzed projects were not sampled randomly
- Majority of the systems is written in C#
- Only 5 systems were analyzed
09th of March 21 Do Code Clones Matter?
Conclusion
- The answer to RQ1 is positive: clones are changed
inconsistently
- The answer to RQ2 is positive: inconsistent clones are
created unintentionally
- The answer to RQ3 is also positive: the average fault in
inconsistent clones is very close to the upper bound of the reported average fault density
- Inconsistent clones can be indicators for faults in real
systems
09th of March 22 Do Code Clones Matter?
References
- A. Endres and D. Rombach. A Handbook of Software and
Systems Engineering. Pearson 2003
- J. Krinke. A study of consistent and inconsistent changes
to code clones. In Proc. WCRE´07. IEEE, 2007
- B. Lague, D. Proulx, J. Mayrand, E. M. Merlo and J.
- Hudepohl. Assessing the benefits of incorporating function
clone detection in a development process. In Proc. ICSM ´97. IEEE, 1997
09th of March 23 Do Code Clones Matter?
Detection in detail: detect procedure
09th of March 24 Do Code Clones Matter?