Do Code Clones Matter? Software Engineering Seminar Spring Semester - - PowerPoint PPT Presentation

do code clones matter
SMART_READER_LITE
LIVE PREVIEW

Do Code Clones Matter? Software Engineering Seminar Spring Semester - - PowerPoint PPT Presentation

Do Code Clones Matter? Software Engineering Seminar Spring Semester 2010 Dan Tecu Paper presented today: Do Code Clones Matter? Juergens, E.; Deissenboeck, F.; Hummel, B.; Wagner, S. ICSE'09, May 16-24, 2009, Vancouver, Canada 09 th of March


slide-1
SLIDE 1

Do Code Clones Matter?

Software Engineering Seminar Spring Semester 2010 Dan Tecu

slide-2
SLIDE 2

09th of March 2 Do Code Clones Matter?

Do Code Clones Matter?

Juergens, E.; Deissenboeck, F.; Hummel, B.; Wagner, S. ICSE'09, May 16-24, 2009, Vancouver, Canada

Paper presented today:

slide-3
SLIDE 3

09th of March 3 Do Code Clones Matter?

What is code cloning?

  • Code cloning is a primitive form of code reuse, usually by

copying (copy-paste)

  • Code cloning may happen intentionally or unintentionally
slide-4
SLIDE 4

09th of March 4 Do Code Clones Matter?

Clones and possible consequences

  • Clones introduce redundancy and might introduce

dependencies

  • Inconsistent changes to cloned code might create faults

and might lead to incorrect behavior

  • Inconsistent bug fixing: when a bug found in cloned code is

fixed in only one clone

slide-5
SLIDE 5

09th of March 5 Do Code Clones Matter?

Previous work

  • Most previous work agrees that cloning poses a problem

for software maintenance [Lague et al. 1997]

  • However, there is little information available concerning the

impact of code cloning on software quality

  • Some researchers even started to doubt that code cloning

is harmful [Krinke, 2007]

slide-6
SLIDE 6

09th of March 6 Do Code Clones Matter?

Research problem

  • This paper purpose is to shed some light in this field
  • It presents the results of a large case study that was

undertaken to find out:

  • 1. whether the clones are changed inconsistently
  • 2. whether the inconsistencies are introduced intentionally
  • 3. whether the unintentional inconsistencies can represent faults
slide-7
SLIDE 7

09th of March 7 Do Code Clones Matter?

Contribution

  • A novel suffix-tree based algorithm for detection of

inconsistent clones

  • An empirical study showing whether the cloned code is

harmful or not

slide-8
SLIDE 8

09th of March 8 Do Code Clones Matter?

Detection of inconsistent clones

  • Works on token level
  • Identifier names are irrelevant (due to the normalizer)
  • Clone groups whose clones overlap with each other are

filtered out

slide-9
SLIDE 9

09th of March 9 Do Code Clones Matter?

Detection in detail: suffix trees

  • A suffix tree for the word: “wood”
  • A path from root to a leaf

describes (uniquely) a suffix; the tree describes all suffixes

  • The circles represent nodes, the

rectangles represent leaves

  • Each edge to a node is labeled

with a substring of the initial word

  • Each edge to a leaf is labeled

with the empty string wood d

  • d
  • d
slide-10
SLIDE 10

09th of March 10 Do Code Clones Matter?

Detection in detail: edit distance

  • The edit distance between 2 strings is the minimum

number of operations required to change one string into another

  • The allowed operations are:
  • 1. Deletion of a character
  • 2. Insertion of a character
  • 3. Changing a character into another
  • The edit distance between “wood” and “floor” is 3
slide-11
SLIDE 11

09th of March 11 Do Code Clones Matter?

Detection in detail: the algorithm

  • Input parameters: the sequence (composed of n tokens),

maximum edit distance and minimal clone length

  • The suffix tree over the input sequence is constructed
  • For each suffix an approximate search based on the edit

distance is performed in the tree

slide-12
SLIDE 12

09th of March 12 Do Code Clones Matter?

Performance of algorithm

  • Worst case complexity is hard to analyze
  • Results on Intel Core 2 Duo 2.4 GHz, 3.5 GB RAM, running

Java in a single thread are shown below:

slide-13
SLIDE 13

09th of March 13 Do Code Clones Matter?

Study description: study objects

  • 5 projects described below, were analyzed:

System Organization Language Age (years) Size (kLOC) A Munich Re C# 6 317 B Munich Re C# 4 454 C Munich Re C# 2 495 D LV 1871 Cobol 17 197 Sysiphus TUM Java 8 281

slide-14
SLIDE 14

09th of March 14 Do Code Clones Matter?

Study description: research questions

  • RQ1: Are clones changed inconsistently?
  • RQ2: Are inconsistent clones created unintentionally?
  • RQ3: Can inconsistent clones be indicators for faults in real

systems?

slide-15
SLIDE 15

09th of March 15 Do Code Clones Matter?

Study measurements: parameters tuning

  • Minimal clone length: 10 statements (for Cobol: 20)
  • Maximum edit distance: 5 (for Cobol: 10)
  • Maximal inconsistency ratio (the ratio of edit distance and

clone length): 0.2

  • Additional constraint: the first 2 statements of two clones

need to be equal

slide-16
SLIDE 16

09th of March 16 Do Code Clones Matter?

Study measurements: absolute numbers

slide-17
SLIDE 17

09th of March 17 Do Code Clones Matter?

Study measurements: relative numbers

  • RQ1 mean value: 0.52
  • RQ2 mean value: 0.28
  • RQ3 mean value: 0.15
slide-18
SLIDE 18

09th of March 18 Do Code Clones Matter?

Fault density

  • To answer RQ3, we have to compare the fault density in

inconsistencies against the average fault density

  • Fault density in inconsistencies was evaluated in faults/

kLOC

  • But, the average fault density in the analyzed systems was

not known

  • Typical range for fault density is: 0.1 – 50 faults/kLOC

[Endres and Rombach, 2003]

slide-19
SLIDE 19

09th of March 19 Do Code Clones Matter?

RQ3 answered

  • Average fault density in inconsistencies: 48.1 faults/kLOC
slide-20
SLIDE 20

09th of March 20 Do Code Clones Matter?

Threats to validity

  • The development repositories of the systems were not

analyzed (to trace the evolution of inconsistencies)

  • Comparison with the actual fault density would have been

better

  • The analyzed projects were not sampled randomly
  • Majority of the systems is written in C#
  • Only 5 systems were analyzed
slide-21
SLIDE 21

09th of March 21 Do Code Clones Matter?

Conclusion

  • The answer to RQ1 is positive: clones are changed

inconsistently

  • The answer to RQ2 is positive: inconsistent clones are

created unintentionally

  • The answer to RQ3 is also positive: the average fault in

inconsistent clones is very close to the upper bound of the reported average fault density

  • Inconsistent clones can be indicators for faults in real

systems

slide-22
SLIDE 22

09th of March 22 Do Code Clones Matter?

References

  • A. Endres and D. Rombach. A Handbook of Software and

Systems Engineering. Pearson 2003

  • J. Krinke. A study of consistent and inconsistent changes

to code clones. In Proc. WCRE´07. IEEE, 2007

  • B. Lague, D. Proulx, J. Mayrand, E. M. Merlo and J.
  • Hudepohl. Assessing the benefits of incorporating function

clone detection in a development process. In Proc. ICSM ´97. IEEE, 1997

slide-23
SLIDE 23

09th of March 23 Do Code Clones Matter?

Detection in detail: detect procedure

slide-24
SLIDE 24

09th of March 24 Do Code Clones Matter?

Detection in detail: search procedure