Data-Driven Analysis of Technical Debt Based on Open- Sources - - PowerPoint PPT Presentation

data driven analysis of
SMART_READER_LITE
LIVE PREVIEW

Data-Driven Analysis of Technical Debt Based on Open- Sources - - PowerPoint PPT Presentation

8/15/2018 | 1 Data-Driven Analysis of Technical Debt Based on Open- Sources Software Projects Georgios Digkas 1,2 1 University of Groningen, Netherlands 2 University of Macedonia, Greece Self Introduction 8/15/2018 | 2 Technical Debt


slide-1
SLIDE 1

8/15/2018 | 1

Georgios Digkas1,2

1 University of Groningen, Netherlands 2 University of Macedonia, Greece

Data-Driven Analysis of Technical Debt Based on Open- Sources Software Projects

slide-2
SLIDE 2

Self Introduction

8/15/2018 | 2

slide-3
SLIDE 3

Technical Debt

8/15/2018 | 5

source : https://twitter.com/carnage4life

slide-4
SLIDE 4

Technical Debt Types

8/15/2018 | 6

source : https://conference.eurostarsoftwaretesting.com source : https://conference.eurostarsoftwaretesting.com

Symptoms of Technical Debt

slide-5
SLIDE 5

TD Tools

8/15/2018 | 7

Li, Zengyang et al.

slide-6
SLIDE 6

SonarQube TD Evaluation

8/15/2018 | 8

Rules

Technical Debt

slide-7
SLIDE 7

SonarQube Rules / Issues

8/15/2018 | 9

source : docs.sonarqube.org

slide-8
SLIDE 8

ECSA 2017

The evolution of Technical Debt in the Apache Ecosystem

8/15/2018 | 12

slide-9
SLIDE 9

› What are the most frequent types of TD? › What are the most costly to fix types of TD? › How does TD evolve over time?

8/15/2018 | 13

slide-10
SLIDE 10

Focus on Apache Ecosystem

8/15/2018 | 14

slide-11
SLIDE 11

Evolution of TD

8/15/2018 | 15

slide-12
SLIDE 12

8/15/2018 | 16

Evolution of Normalized TD

Normalized TD = TD / NCLOC

slide-13
SLIDE 13

The most frequent types of TD

# Issue %

1 String literals should not be duplicated 7.0 2 The members of an interface declaration or class should appear in a pre-defined order 5.6 3 Exception handlers should preserve the original exceptions 4.8 4 The diamond operator ("<>") should be used 4.4 5 Generic exceptions should never be thrown 4.2 6 Statements should be on separate lines 3.7 7 Control flow statements "if", "for", "while", "switch" and "try" should not be nested too deeply 3.5 8 Sections of code should not be "commented out" 3.2 9 Source files should not have any duplicated blocks 2.4 10 "@Override" should be used on overriding and implementing methods 2.4

8/15/2018 | 17

slide-14
SLIDE 14

The most costly to fix types of TD

# Issue %

 1 Source files should not have any duplicated blocks 13.8  2 String literals should not be duplicated 9.2  3 Generic exceptions should never be thrown 8.4 4 Cognitive Complexity of methods should not be too high 5.0  5 Exception handlers should preserve the original exceptions 4.8 6 Methods should not be too complex 3.7 7 Control flow statements "if", "for", "while", "switch" and "try" should not be nested too deeply 3.5 8 The members of an interface declaration or class should appear in a pre-defined order 2.8 9 Dead stores should be removed 2.4 10 Standard outputs should not be used directly to log anything 2.2

8/15/2018 | 18

slide-15
SLIDE 15

The most costly to fix types of TD

# Issue

 1 Source files should not have any duplicated blocks

672

 2 String literals should not be duplicated

446

 3 Generic exceptions should never be thrown

408

4 Cognitive Complexity of methods should not be too high

246

 5 Exception handlers should preserve the original exceptions

232

6 Methods should not be too complex

179

7 Control flow statements "if", "for", "while", "switch" and "try" should not be nested too deeply

170

8 The members of an interface declaration or class should appear in a pre-defined order

135

9 Dead stores should be removed

115

10 Standard outputs should not be used directly to log anything

107

8/15/2018 | 19

slide-16
SLIDE 16

Takeaways

› Technical Debt  › Normalized Technical Debt  › Most frequent: low-level coding problems › Most expensive types of TD are higher lever

  • duplicated code
  • ad-hoc exception handling

› A minority of problem types is responsible for the majority of estimated TD

8/15/2018 | 20

slide-17
SLIDE 17

SANER 2018

How Do Developers Fix Issues and Pay Back Technical Debt in the Apache Ecosystem?

8/15/2018 | 22

slide-18
SLIDE 18

› Is TD paid back? › Which TD types are paid back more often? › What is the survivability of those issues?

8/15/2018 | 23

slide-19
SLIDE 19

Open and closed issues per project

8/15/2018 | 25

slide-20
SLIDE 20

Fixed issues per issue type

8/15/2018 | 26

slide-21
SLIDE 21

Issues with the highest fixing rate

Issue F

Conditionally executed blocks should be reachable 59 * Replace Map.get/test with single method call 58 * Deprecated elements should have both the annotation and the Javadoc tag 57 Unused "private" fields should be removed 56 Boolean expressions should not be gratuitous 55 * Synchronized classes … should not be used 53 * Constructors should not be used to instantiate "String" and primitive-wrapper classes 52 Dead stores should be removed 50 * @Override should be used on overriding [...] 48 Unused "private" methods should be removed 47

8/15/2018 | 27

slide-22
SLIDE 22

Issues whose resolution has yielded the higher benefit

Issue CiR

1 Source files should not have any duplicated blocks 8 2 Cognitive Complexity of methods should not be too high 4 3 Generic exceptions should never be thrown 4 4 String literals should not be duplicated 1 5 Exception handlers should preserve the original exceptions 1 6 Control flow statements should not be nested too deeply 1 7 Synchronized classes … should not be used 3 8 Methods should not be too complex 3 9 Standard outputs should not be used directly to log anything 1 10 Sections of code should not be "commented out" 8

8/15/2018 | 28

slide-23
SLIDE 23

Research Question 5

8/15/2018 | 29

slide-24
SLIDE 24

Takeaways

› Variation in the fixing rate › Variation in the survivability

  • 10% fixed within the first month
  • 50% in the first year

› Some of the issues can take up to 10 years › Issues related to duplication and exception handling are frequently encountered and rarely fixed by developers

8/15/2018 | 30

slide-25
SLIDE 25

Limitations of my Studies

› OSS projects by ASF › Java › SonarQube › Weekly analysis of the commits › Architectural decisions not known/accessible › Commit policy

8/15/2018 | 31

slide-26
SLIDE 26

Work In Progress

8/15/2018 | 32

slide-27
SLIDE 27

New Source Code TD (Observation)

› The analysis of several open-source projects by ASF revealed that the quality of some projects degrades

  • ver time (conforming to the software ageing

phenomenon). › However, for the majority of Apache projects their normalized Technical Debt (TD/NCLOC) tends to decrease over time.

8/15/2018 | 33

slide-28
SLIDE 28

New Source Code TD (OQ)

When normalized TD is decreasing in a project, is it because TD is repaid or because the new code is clean,

  • r both? To what extent is each factor responsible?

8/15/2018 | 34

slide-29
SLIDE 29

New Source Code TD (Claim)

› Might be limited value in trying to get rid of existing TD. › Someone should aim at writing clean, TD-free code. › If (as Google reports) the existing code base is renewed annually at a rate of say, 20%, and if the new code is clean, then after 6, 7, 8 years all code will be essentially TD-free.

8/15/2018 | 35

slide-30
SLIDE 30

New Source Code TD (Objective)

Analyze newly added source code (per commit) for the purpose of evaluation with respect to the technical debt amount that is introduced from the point of view of software developers in the context of OSS (industrial) development.

8/15/2018 | 36

slide-31
SLIDE 31

New Source Code TD (Potential RQs)

› RQ1: For TD violations that are removed, in which exact ways has the removal occurred?

  • Sloppy code removed or refactored?
  • The removal happened intentionally or it was a side effect?

› RQ2: For new code that might be 'clean', exactly how clean is it? › RQ3: If the new code is not totally clean, what types of TD are newly introduced? › RQ4: Is the normalized TD of NEW code higher/lower than the normalized TD of existing code? › RQ5: Is the normalized TD of NEW code lower in projects which improve along evolution, compared to those that deteriorate?

8/15/2018 | 37

slide-32
SLIDE 32

SLR on Architectural Smells

8/15/2018 | 38

slide-33
SLIDE 33

SLR (Objective)

Analyze the research state of the art on software smells for the purpose of understanding with respect to: (a) their applicability on the architecture level, (b) the research intensity on them, (c) their detection from tools, and (d) their classification based on their elements, level of granularity, relevance to software evolution, from the point of view of researchers and practitioners in the context of software development.

8/15/2018 | 39

slide-34
SLIDE 34

SLR (RQs)

  • 1. Which smells can be defined (identified) at the

architecture level?

  • 1. Which are the pure architecture smells?
  • 2. Which code or design flaws (i.e. smells, violations
  • r antipatterns) could also be applied at the

architecture level?

  • 3. Are there similarities among architectural smells

that have been defined with different names in the literature?

8/15/2018 | 40

slide-35
SLIDE 35

SLR (RQs)

  • 2. Which architectural smells have attracted the most

research attention?

  • 3. Which architectural smells are detectable by tools?
  • 4. How can we classify architectural smells with respect

to: (a) the affected architectural element (interface, component and connector), (b) the portion of the system is involved (or the whole system), and (c) the development history of the project (to indicate if it is considered to identify the architectural smell or not).

8/15/2018 | 41

slide-36
SLIDE 36

8/15/2018 | 44

Thank you for your attention