Empirical analysis of the relationship between CC and SLOC in a - - PowerPoint PPT Presentation
Empirical analysis of the relationship between CC and SLOC in a - - PowerPoint PPT Presentation
Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods Davy Landman Alexander Serebrenik Jurgen Vinju Metrics Lines of Code (SLOC) Cyclomatic Complexity (CC) Popular in practice and research
Metrics
- Lines of Code (SLOC)
- Cyclomatic Complexity (CC)
- Popular in practice and research
Metrics
- Lines of Code (SLOC)
- Cyclomatic Complexity (CC)
public ¡double ¡sqrt(int ¡n){ ¡ ¡ ¡ ¡ ¡// ¡Newton-‑Raphson ¡method ¡ ¡ ¡ ¡ ¡double ¡r ¡= ¡n ¡/ ¡2.0; ¡ ¡ ¡ ¡ ¡while ¡(abs(r ¡– ¡(n ¡/ ¡r)) ¡> ¡0.00001) ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡r ¡= ¡0.5 ¡* ¡(r ¡+ ¡(n ¡/ ¡r)); ¡ ¡ ¡ ¡ ¡} ¡ ¡ ¡ ¡ ¡return ¡r; ¡ } ¡ 1 ¡ ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡
= 7 = 2
- M. Shepperd. "A critique of cyclomatic complexity as a software metric." Software Engineering Journal 3.2 (1988)
Citations Total 218 Last 5 years 90
CC redundant?
- Shepperd’s was based on 8 papers(1979-1987)
- 7 papers followed(1991-2013)
- Fortran, PL/1, Pascal, COBOL, C, C++, and Java
- SLOC & CC correlate linearly
R2 = 0.65 - 0.95
Our research
- Identify differences in 15 papers
- Get data
- Reproduce!
we do not conclude that CC is redundant with SLOC
- Our result: R2 = 0.43
- Difference related work:
- Aggregation
- Power transform
- Larger methods correlate even less
- Differing variance
Corpus
1e+01 1e+03 1e+05 1e+07 1 10 100 100 1000 10000
SLOC of a Method Frequency
1e+01 1e+03 1e+05 1e+07 1 10 100 100 1000
CC of a Method Frequency
- 13K Open Source Java Projects (14GB of Java)
- 17M methods in 362M SLOC
- E. Linstead, S. K. Bajracharya, T. C. Ngo, P. Rigor, C. V. Lopes, and P. Baldi, “Sourcerer: mining and searching
internet-scale software repositories,” Data Mining and Knowledge Discovery, 18.2 (2009).
First result
- Correlation (R2) : 0.43
- Lower than other papers: 0.65 - 0.95
- Why?
- Correlation (R2) : 0.43
- Lower than other papers: 0.65 - 0.95
Other explanations
Yes No Power transform 4 12 File level (sum) 9 6
Power transform
0e+00 2e+06 4e+06 6e+06 8e+06 50 100 150 200 250
SLOC of a Method Frequency
1e+01 1e+03 1e+05 1e+07 1 10 100 100 1000 10000
SLOC of a Method Frequency
R2 = 0.70 R2 = 0.43
Method level
File level
- Example: 1 fjle, 30 “small” methods.
- File SLOC = 30 * avg(SLOCm) = 30 * 2.5
- File CC = 30 * avg(CCm) = 30 * 2
- Volume factor causes high correlation[1]
[1] K. El Emam, S. Benlarbi, N. Goel, S.N. Rai. "The confounding effect of class size on the validity
- f object-oriented metrics." IEEE Transactions on Software Engineering 27.7 (2001)
R2 = 0.87 R2 = 0.65
Aggrega&on ¡causing ¡it? ¡
File level
we do not conclude that CC is redundant with SLOC
- Our result: R2 = 0.43
- Difference related work:
- Aggregation
- Power transform
- Larger methods correlate even less
- Differing variance
1e+01 1e+03 1e+05 1e+07 1 10 100 100 1000 10000
SLOC of a Method Frequency
50% 25% 10% 1% 0.1%
Israel Herraiz and Ahmed E. Hassan, “Beyond lines of code: Do we need more complexity metrics?” Making Software What Really Works, and Why We Believe It. (2010)
Tail
- min. SLOC # Methods
R2 “power” R2 100% 1 17.8M 0.43 0.70 50% 3 8.9M 0.45 0.62 25% 9 4.5M 0.42 0.44 10% 20 1.8M 0.38 0.27 1% 77 179K 0.29 0.05 0.1% 230 18K 0.21 0.00
Statistics
Large Methods
we do not conclude that CC is redundant with SLOC
- Our result: R2 = 0.43
- Difference related work:
- Aggregation
- Power transform
- Larger methods correlate even less
- Differing variance
Variance
- R2 = 0.43 means 57% variance not explained
- Variance = actual CC – predicted CC
Method level
Method level log10(Method level)
Method level log10(Method level) File level
Method level log10(Method level) File level log10(File level)
Differing variance complicate interpretation of linear models
we do not conclude that CC is redundant with SLOC
- Our result: R2 = 0.43
- Difference related work:
- Aggregation
- Power transform
- Larger methods correlate even less
- Differing variance