SLIDE 1 Riaan Cornelius
Using forensic techniques for targeted refactoring
Crafting Code
SLIDE 2
Who am I
> More than a decade of software dev experience > Mobile app developer by day > Purveyor of strange topics by night > I’ve dabbled in AI, computer vision, robotics and even cooking > Please remember to rate my talk: http://www.devconf.co.za/rate
SLIDE 3
Why do we refactor?
> As a developer, what is your job?
SLIDE 4
Why do we refactor?
SLIDE 5
Why do we refactor?
SLIDE 6
Why do we refactor?
SLIDE 7
Why do we refactor?
SLIDE 8
Why do we refactor?
> Maintenance is expensive
SLIDE 9
The enemy of change
> Complexity > If our job is to understand code, how do we make that job easier
SLIDE 10
Some (potentially) useful tools
> Static analysis > Complexity metrics > Code reviews > Tests
SLIDE 11
Tools I used
> Git (specifically git log) > Code Maat > Python > D3.js (Javascript library)
SLIDE 12
Forget the tools
> It’s not about the tools, but rather the techniques > These tools simplify some parsing, processing or visualisation > You can write your own scripts for any of these functions
SLIDE 13
Problems of scale
> In large systems, how do you prioritise improvements?
SLIDE 14
The problem with complexity metrics
> Complexity is only a problem if you need to deal with it
SLIDE 15 Offender profiling
> You probably know something about offender profiling. > Hollywood loves it:
- Silence of the lambs
- Numbers
- Criminal minds
- NCIS
- Many more…
SLIDE 16
Offender profiling
> There is one serious limitation: They only work in Hollywood
SLIDE 17
Geographic profiling
> Based in statistics and psychology. > Same principle as police officer sticking pins in a map
SLIDE 18
Geographic profiling
SLIDE 19
Applying geographical profiling to code
> What if a hotspot analysis could narrow down areas of bad code?
SLIDE 20
Exploring the geography of code
SLIDE 21
Add a spatial component
> Hopefully you all use a VCS. > We need to focus on areas with high developer activity
SLIDE 22
Add a spatial component
> git log --pretty=format:'[%h] %an %ad %s' --date=short --numstat > maat.bat -l git.log -c git -a revisions > metric_data.cvs
SLIDE 23
Add a spatial component
SLIDE 24
Combine complexity and effort
SLIDE 25
Profiling your codebase
> Choose a timespan for your analysis > Get frequency data > Add complexity data > Merge complexity and effort > Visualise this data
SLIDE 26
Profiling your codebase
> We’ll look at the hibernate ORM > git clone https://github.com/hibernate/hibernate-orm.git
SLIDE 27 Profiling your codebase
> Choosing a timeframe > Don’t look at the life of the project > What timeframe you use depend on your development methodology
- Between releases
- Over iterations
- Around significant events (reorganisation of code or teams)
SLIDE 28
Profiling your codebase
> Generate a log: > git log --pretty=format:'[%h] %an %ad %s' --date=short –numstat -- before=2013-09-05 --after=2012-01-01 > hib_evo.log
SLIDE 29 Profiling your codebase
> A summary of the changes shows some interesting things: prompt> maat -l l hib ib_evo.lo log -c git it -a su summary ry
statistic,value number-of-commits,1346 number-of-entities,10193 number-of-entities-changed,18258 number-of-authors,89
SLIDE 30
Profiling your codebase
> Analyzing change frequencies: > maat -l hib_evo.log -c git -a revisions > hib_freqs.csv
SLIDE 31
Profiling your codebase
> Calculate complexity > Complexity by lines of code? > Bad metric, but no worse than others… > Cloc ./ --by-file –csv –quiet –report-file=hib_lines.csv
SLIDE 32 Profiling your codebase
> Combine complexity and effort: > python scripts/merge_comp_freqs.py hib_freqs.csv hib_lines.csv
> module,revisions,code build.gradle,79,402 hibernate-core/.../persister/entity/AbstractEntityPersister.java,44,3983 hibernate-core/.../cfg/Configuration.java,40,2673 hibernate-core/.../internal/SessionImpl.java,39,2097 hibernate-core/.../internal/SessionFactoryImpl.java,34,1384 …
SLIDE 33
Profiling your codebase
> Now we can finally get to the fun part: Visualisation > I’m using a sample D3.js circle-packing algorithm > Due to security restrictions in modern browsers: > pyth ython -m m Sim Simple leHTTPServer 8888
SLIDE 34
Profiling your codebase
SLIDE 35
Profiling your codebase
SLIDE 36
Measuring complexity
> Is there a simple option that is better than lines of code?
SLIDE 37
Measuring complexity
SLIDE 38 Measuring complexity
> python scripts/complexity_analysis.py hibernate- core/src/main/java/org/hibernate/cfg/Configuration.java
n, total, mean, sd, max 3335, 8072, 2.42, 1.63, 14
SLIDE 39 Measuring complexity
> You’ve already seen how to analyze a single revision. Now we want to:
- 1. Take a range of revisions for a specific module.
- 2. Calculate the indentation complexity of the module as it occurred in
each revision.
- 3. Output the results revision by revision for further analysis.
SLIDE 40 Measuring complexity
> python scripts/git_complexity_trend.py
- -start ccc087b --end 46c962e
- -file hibernate-core/src/main/java/org/hibernate/cfg/Configuration.java
> rev, n, total, mean, sd e75b8a7, 3080, 7610, 2.47, 1.76 23a6280, 3092, 7649, 2.47, 1.76 8991100, 3100, 7658, 2.47, 1.76 8373871, 3101, 7658, 2.47, 1.76 …
SLIDE 41
Visualising complexity trends
SLIDE 42
Visualising complexity trends
SLIDE 43
Visualising complexity trends
SLIDE 44
Going further
SLIDE 45
Resources
> http://riaan.me/dc16 Twitter: @riaancornelius Please remember to rate my talk: http://www.devconf.co.za/rate
SLIDE 46 /* THANK YOU*/
Riaan Cornelius Entelect Software Riaan.Cornelius@Entelect.co.za 084 755 1866
http://www.devconf.co.za/