Code Analysis via Version Control History Justin Mclean Class - - PowerPoint PPT Presentation

code analysis via version control history
SMART_READER_LITE
LIVE PREVIEW

Code Analysis via Version Control History Justin Mclean Class - - PowerPoint PPT Presentation

Code Analysis via Version Control History Justin Mclean Class Software Email: justin@classsoftware.com Twitter: @justinmclean Blog: http://blog.classsoftware.com Who am I? Programming for 25 years Developing and creating web


slide-1
SLIDE 1

Code Analysis via Version Control History

Justin Mclean Class Software Email: justin@classsoftware.com Twitter: @justinmclean Blog: http://blog.classsoftware.com

slide-2
SLIDE 2

Who am I?

  • Programming for 25 years
  • Developing and creating web applications for

15 years

  • Apache Flex PMC, Incubator PMC, Apache

member

  • Release manager for Apache Flex, FlexUnit,

Tour De Flex, Squiggly

  • Run IoT meetup in Sydney Australia
slide-3
SLIDE 3

In the last 40 years we have written billions of lines

  • f code that will keep programmers employed for

trillions of man hours in the next few thousand years to clean up this mess we’ve made. Joe Armstrong

The Mess We’re In

slide-4
SLIDE 4

Your Code as a Crime Scene

slide-5
SLIDE 5

You Write Code?

  • 40-80% of all code is maintenance
  • This is difficult and expensive
  • More so with agile methodologies and on

successful systems

  • How to make this effective?
  • Primary goal is to understand existing code
  • All code is legacy code
slide-6
SLIDE 6

Detecting Issues With Code

  • Code reviews
  • Pair programming
  • Unit Tests
  • Continuous Integration
  • Static code analysis
  • Complexity metrics
slide-7
SLIDE 7

Scalability

  • What about large code bases?
  • How do you decide what to work on?
  • How to work out where the bugs are?
  • What bugs are important?
slide-8
SLIDE 8

Code Visualisation

  • Can you visually represent code to get a better

understanding of what’s going on?

  • Code City


http://www.inf.usi.ch/phd/wettel/codecity- wof.html

  • Make a city of each class arranged by

packages in city blocks

  • Height is number of methods, colour coded

by no of lines and area by no of terms

slide-9
SLIDE 9

Code City

slide-10
SLIDE 10

Code City Limitations

  • Supports only a few common languages
  • Shows hotspots (large buildings) but no real

indication of where to possibly spend effort

  • Existing hotspots may be stable
  • Missing an important dimension
slide-11
SLIDE 11

Version Control History

  • VCS contains a lot of useful information - that

we mostly ignore. Information like:

  • Who changed what lines when
  • How much and how often things change
  • Can aggregate information to tell us

something useful?

slide-12
SLIDE 12

Why code Changes

  • Fixing bugs
  • Refactoring poor design
  • Poor understanding of the problem
  • Change frequency == proxy for effort
  • Code that changes in the past is likely to

change again in the future

slide-13
SLIDE 13

Effort + Complexity

  • Change frequency / effort is not the whole

story

  • Config files changes frequently
  • Overlaps in effort and complexity gives

possible hotspots

slide-14
SLIDE 14

Code Hotspots

  • Hotspots are code where changes will give the

most benefit

  • Frequent changes to complex code indicate

poor quality code

  • Lots of study in this area and surprisingly

simple complex measures (change frequency) perform just as wells more complex measures

slide-15
SLIDE 15

What use are Hotspots?

  • Take cognitive biases out of the equation
  • Where you bugs are likely to be
  • Prime areas for code reviews
  • Prime areas for refactoring
  • Targets for extra testing
slide-16
SLIDE 16

Code Maat

  • Performs various analysis on version control

history

  • Produces simple csv text files
  • Supports parsing git, svn, hg VCS
  • Open source (GPL)
  • https://github.com/adamtornhill/code-maat
slide-17
SLIDE 17

Producing HotSpot Data

  • Clone git repo
  • Work out time period
  • Generate git log summary data
  • Look as summary
  • Generate change frequencies
  • Generate code complexity metrics
  • Combine change frequency + complexity
slide-18
SLIDE 18

Apache Flex Project

  • Large code base


No files = 25000,LOC = 5 million or about 20 million including tests

  • Mix of many file types and languages MXML,

ActionScript, Java, XML files

  • Two distinct phases - before and after

donation

slide-19
SLIDE 19

statistic,value number-of-commits,30090 number-of-entities,4672 number-of-entities-changed,34916 number-of-authors,18

Adobe Flex SDK Summary

slide-20
SLIDE 20

statistic,value number-of-commits,2911 number-of-entities,51505 number-of-entities-changed,81012 number-of-authors,55

Apache Flex SDK Summary

slide-21
SLIDE 21

Complexity

  • LOC is a terrible complexity measure, but

turns out it’s just as bad as most others

  • Fast and simple
  • language agnostic
  • CLOC
  • https://github.com/AlDanial/cloc
slide-22
SLIDE 22

module,revisions,code build.xml,139,1481 frameworks/build.xml,57,423 mustella/jenkins.sh,48,167 mustella/build.xml,44,1988 ide/checkAllPlayerGlobals.sh,35,85 frameworks/projects/mobiletheme/defaults.css,34,1568 modules/downloads.xml,34,384 installer.xml,29,821 frameworks/projects/spark/build.xml,28,231 frameworks/projects/textLayout/build.xml,28,218 frameworks/projects/framework/src/mx/collections/ ListCollectionView.as,28,1442

Apache Flex SDK Revisions

slide-23
SLIDE 23

Hot Spot Confirmation

  • Build files
  • Mobile theme
  • ListCollections
  • DataGrid and AdvancedDataGrid
  • DateField
slide-24
SLIDE 24

Hot Spot Limitations

  • Just numbers - may need to normalise
  • Time period may be hard to get right

(hotspots move)

  • Impacted by individual commit styles
  • May have false positives
  • Just a guide - but still a very useful one
slide-25
SLIDE 25

Visualise Hot Spots

  • Hard to understand a large amount of

information

  • Classes are nested in packages and we have

complexity and change frequency / effort

  • Circle packing works well. Circle size is LOC,

colour by change frequency.

  • D3.js easy to use / can display easily
  • Need CSV -> JSON conversion
slide-26
SLIDE 26

Apache Flex SDK

slide-27
SLIDE 27

Apache Flex SDK

slide-28
SLIDE 28

Apache Flex SDK

slide-29
SLIDE 29

Apache Flex SDK

slide-30
SLIDE 30

Hotspot Analysis

  • Hotspots are small proportion of all code
  • Configuration files vs complex application

logic

  • Can have false positive - need to confirm
slide-31
SLIDE 31

Hotspot Analysis

  • Experimental area
  • 3rd party modules
  • Compiler
  • Data grids
  • support classes
slide-32
SLIDE 32

Complexity (again)

  • LOC OK but is there something better?
  • Whitespace indentation!
  • Easy to calculate / language independent
slide-33
SLIDE 33

DataGrid.as complexity

  • CLOC shows 50/50 split between code and

comments with 2800 lines of actual code.

  • May be good idea to remove comments?
  • DataGrid.as whitespace


n,total,mean,sd,max
 5860,9244,1.58,1.21,13

  • Mean is low, sd is low, but max is way too

high

  • Real hotspot
slide-34
SLIDE 34

rev,n,total,mean,sd f52eb16,5608,8816,1.57,1.23 0c4290c,5609,8816,1.57,1.23 63580a8,5644,8856,1.57,1.23 fa2108b,5644,8856,1.57,1.23 abc381b,5644,8856,1.57,1.23 774cdd7,5791,9061,1.56,1.22 5e6e5c3,5791,9061,1.56,1.22 4388da8,5791,9061,1.56,1.22 ec1ac28,5810,9090,1.56,1.22 22b68de,5839,9124,1.56,1.22 b1d0359,5855,9164,1.57,1.22 1bef097,5851,9158,1.57,1.22 3e752d9,5854,9165,1.57,1.22 6c53962,5857,9172,1.57,1.22 c47f9f9,5855,9169,1.57,1.22 bb600fd,5855,9169,1.57,1.22 71f8757,5853,9230,1.58,1.21 3a1769b,5860,9247,1.58,1.21 8767c20,5860,9244,1.58,1.21

Complexity Trend

slide-35
SLIDE 35

Temporal Coupling

  • File that need to change at the same time
  • Causes:
  • Copy paste duplicated code
  • Inadequate encapsulation
  • Anti-pattern sometimes referred to as shotgun

surgery

slide-36
SLIDE 36

Detect Temporal Coupling

  • Results are a bit noisy - may need to filter

frameworks/projects/framework/src/mx/states/AddItems.as,
 frameworks/projects/spark/src/spark/components/Group.as,
 92,7 frameworks/projects/framework/src/mx/states/AddItems.as,
 frameworks/projects/mx/src/mx/core/Container.as,
 83,6 frameworks/projects/mx/src/mx/core/Container.as,
 frameworks/projects/spark/src/spark/components/SkinnableContainer.as,
 83,6 frameworks/projects/framework/src/mx/states/AddItems.as,
 frameworks/projects/spark/src/spark/components/SkinnableContainer.as,
 83,6 frameworks/projects/mx/src/mx/core/Container.as,
 frameworks/projects/spark/src/spark/components/Group.as,
 76,7

slide-37
SLIDE 37

Just Words?

slide-38
SLIDE 38

Knowledge Map

  • Generate ownership of files
  • Multiple owners per file imply more potential

bugs

  • Knowledge maps - who know most about

which files

slide-39
SLIDE 39

Apache Flex SDK

slide-40
SLIDE 40

Apache Flex SDK

slide-41
SLIDE 41

What we have learnt

  • Lot of useful information in your version

control history waiting to be found out

  • Technique scales easily to (very) large code

bases

  • Keep data formats simple
  • Simple measures of effort and complexity are
  • ften as good as complex ones
  • Can find out areas in need of attention in your

code base

slide-42
SLIDE 42

Links

  • Code as a crime scene


https://pragprog.com/book/atcrime/your-code- as-a-crime-scene

  • Code City


http://www.inf.usi.ch/phd/wettel/codecity.html

  • CLOC


https://github.com/AlDanial/cloc

  • Code Maat


https://github.com/adamtornhill/code-maat

slide-43
SLIDE 43

Ask now, see me after the session,
 follow me on twitter @justinmclean


  • r email me at justin@classsoftware.com.

Slides can be found at conference site.

Questions?