8/15/2018 | 1
Georgios Digkas1,2
1 University of Groningen, Netherlands 2 University of Macedonia, Greece
Data-Driven Analysis of Technical Debt Based on Open- Sources - - PowerPoint PPT Presentation
8/15/2018 | 1 Data-Driven Analysis of Technical Debt Based on Open- Sources Software Projects Georgios Digkas 1,2 1 University of Groningen, Netherlands 2 University of Macedonia, Greece Self Introduction 8/15/2018 | 2 Technical Debt
8/15/2018 | 1
Georgios Digkas1,2
1 University of Groningen, Netherlands 2 University of Macedonia, Greece
8/15/2018 | 2
8/15/2018 | 5
source : https://twitter.com/carnage4life
8/15/2018 | 6
source : https://conference.eurostarsoftwaretesting.com source : https://conference.eurostarsoftwaretesting.com
Symptoms of Technical Debt
8/15/2018 | 7
Li, Zengyang et al.
8/15/2018 | 8
8/15/2018 | 9
source : docs.sonarqube.org
The evolution of Technical Debt in the Apache Ecosystem
8/15/2018 | 12
› What are the most frequent types of TD? › What are the most costly to fix types of TD? › How does TD evolve over time?
8/15/2018 | 13
8/15/2018 | 14
8/15/2018 | 15
8/15/2018 | 16
Normalized TD = TD / NCLOC
The most frequent types of TD
# Issue %
1 String literals should not be duplicated 7.0 2 The members of an interface declaration or class should appear in a pre-defined order 5.6 3 Exception handlers should preserve the original exceptions 4.8 4 The diamond operator ("<>") should be used 4.4 5 Generic exceptions should never be thrown 4.2 6 Statements should be on separate lines 3.7 7 Control flow statements "if", "for", "while", "switch" and "try" should not be nested too deeply 3.5 8 Sections of code should not be "commented out" 3.2 9 Source files should not have any duplicated blocks 2.4 10 "@Override" should be used on overriding and implementing methods 2.4
8/15/2018 | 17
The most costly to fix types of TD
# Issue %
1 Source files should not have any duplicated blocks 13.8 2 String literals should not be duplicated 9.2 3 Generic exceptions should never be thrown 8.4 4 Cognitive Complexity of methods should not be too high 5.0 5 Exception handlers should preserve the original exceptions 4.8 6 Methods should not be too complex 3.7 7 Control flow statements "if", "for", "while", "switch" and "try" should not be nested too deeply 3.5 8 The members of an interface declaration or class should appear in a pre-defined order 2.8 9 Dead stores should be removed 2.4 10 Standard outputs should not be used directly to log anything 2.2
8/15/2018 | 18
The most costly to fix types of TD
# Issue
1 Source files should not have any duplicated blocks
672
2 String literals should not be duplicated
446
3 Generic exceptions should never be thrown
408
4 Cognitive Complexity of methods should not be too high
246
5 Exception handlers should preserve the original exceptions
232
6 Methods should not be too complex
179
7 Control flow statements "if", "for", "while", "switch" and "try" should not be nested too deeply
170
8 The members of an interface declaration or class should appear in a pre-defined order
135
9 Dead stores should be removed
115
10 Standard outputs should not be used directly to log anything
107
8/15/2018 | 19
› Technical Debt › Normalized Technical Debt › Most frequent: low-level coding problems › Most expensive types of TD are higher lever
› A minority of problem types is responsible for the majority of estimated TD
8/15/2018 | 20
How Do Developers Fix Issues and Pay Back Technical Debt in the Apache Ecosystem?
8/15/2018 | 22
› Is TD paid back? › Which TD types are paid back more often? › What is the survivability of those issues?
8/15/2018 | 23
Open and closed issues per project
8/15/2018 | 25
8/15/2018 | 26
Issue F
Conditionally executed blocks should be reachable 59 * Replace Map.get/test with single method call 58 * Deprecated elements should have both the annotation and the Javadoc tag 57 Unused "private" fields should be removed 56 Boolean expressions should not be gratuitous 55 * Synchronized classes … should not be used 53 * Constructors should not be used to instantiate "String" and primitive-wrapper classes 52 Dead stores should be removed 50 * @Override should be used on overriding [...] 48 Unused "private" methods should be removed 47
8/15/2018 | 27
Issue CiR
1 Source files should not have any duplicated blocks 8 2 Cognitive Complexity of methods should not be too high 4 3 Generic exceptions should never be thrown 4 4 String literals should not be duplicated 1 5 Exception handlers should preserve the original exceptions 1 6 Control flow statements should not be nested too deeply 1 7 Synchronized classes … should not be used 3 8 Methods should not be too complex 3 9 Standard outputs should not be used directly to log anything 1 10 Sections of code should not be "commented out" 8
8/15/2018 | 28
8/15/2018 | 29
› Variation in the fixing rate › Variation in the survivability
› Some of the issues can take up to 10 years › Issues related to duplication and exception handling are frequently encountered and rarely fixed by developers
8/15/2018 | 30
› OSS projects by ASF › Java › SonarQube › Weekly analysis of the commits › Architectural decisions not known/accessible › Commit policy
8/15/2018 | 31
8/15/2018 | 32
› The analysis of several open-source projects by ASF revealed that the quality of some projects degrades
phenomenon). › However, for the majority of Apache projects their normalized Technical Debt (TD/NCLOC) tends to decrease over time.
8/15/2018 | 33
When normalized TD is decreasing in a project, is it because TD is repaid or because the new code is clean,
8/15/2018 | 34
› Might be limited value in trying to get rid of existing TD. › Someone should aim at writing clean, TD-free code. › If (as Google reports) the existing code base is renewed annually at a rate of say, 20%, and if the new code is clean, then after 6, 7, 8 years all code will be essentially TD-free.
8/15/2018 | 35
Analyze newly added source code (per commit) for the purpose of evaluation with respect to the technical debt amount that is introduced from the point of view of software developers in the context of OSS (industrial) development.
8/15/2018 | 36
› RQ1: For TD violations that are removed, in which exact ways has the removal occurred?
› RQ2: For new code that might be 'clean', exactly how clean is it? › RQ3: If the new code is not totally clean, what types of TD are newly introduced? › RQ4: Is the normalized TD of NEW code higher/lower than the normalized TD of existing code? › RQ5: Is the normalized TD of NEW code lower in projects which improve along evolution, compared to those that deteriorate?
8/15/2018 | 37
8/15/2018 | 38
Analyze the research state of the art on software smells for the purpose of understanding with respect to: (a) their applicability on the architecture level, (b) the research intensity on them, (c) their detection from tools, and (d) their classification based on their elements, level of granularity, relevance to software evolution, from the point of view of researchers and practitioners in the context of software development.
8/15/2018 | 39
architecture level?
architecture level?
that have been defined with different names in the literature?
8/15/2018 | 40
research attention?
to: (a) the affected architectural element (interface, component and connector), (b) the portion of the system is involved (or the whole system), and (c) the development history of the project (to indicate if it is considered to identify the architectural smell or not).
8/15/2018 | 41
8/15/2018 | 44