We need more coverage, stat! Classroom experience with the Software - - PowerPoint PPT Presentation

we need more coverage stat classroom experience with the
SMART_READER_LITE
LIVE PREVIEW

We need more coverage, stat! Classroom experience with the Software - - PowerPoint PPT Presentation

We need more coverage, stat! Classroom experience with the Software ICU Philip Johnson, Shaoxuan Zhang University of Hawaii Presentation by Sandro Heinzelmann Software Engineering Seminar 2010 April 27, 2010 Teaching software measurement -


slide-1
SLIDE 1

We need more coverage, stat! Classroom experience with the Software ICU

Philip Johnson, Shaoxuan Zhang University of Hawaii Presentation by Sandro Heinzelmann Software Engineering Seminar 2010 April 27, 2010

slide-2
SLIDE 2

2

Teaching software measurement - Motivation

  • Not easy
  • Tradeoff between too much work and too little insight
  • Personal Software Process (PSP) / Team Software

Process (TSP) versus simple literature review

  • Find a balance by using automation tools

Hackystat Hudson Software ICU

slide-3
SLIDE 3

3

Hackystat

  • Opensource project initiated by Philip Johnson
  • Collection of services
  • Enables subtle, unobtrusive data collection in various

development tools (Eclipse, Ant,...)

  • Notion of sensors integrated in applications

– Keep track of work, send data to Hackystat SensorBase

  • Layer of analysis modules
  • Webinterface to display data

code.google.com/p/hackystat/

slide-4
SLIDE 4

4

Hackystat in the past

  • Continuously improved over time
  • Used in case studies in 2003 and 2006 with varying

success

  • Hard to install, confusion about various measurements

and interpretation

images.clipartof.com

  • New approach with a medical metaphor

Software Intensive Care Unit (ICU)

slide-5
SLIDE 5

5

Software health metaphor

  • Terminology of „health“
  • Not new - „runtime health“ of life-critical hardware-software

systems (NASA)

  • Here focus is on health during development
  • Notion of vital signs and their normal ranges

– Normal or improving healthy – Interpreted as a whole

ukb.uni-bonn.de

slide-6
SLIDE 6

6

Software health metaphor

  • High-level characteristics of a healthy project

– High efficiency, high effectiveness, high quality

  • „Healthy programmer behavior“

– Work consistently, contribute equally, consistent committing, no last minute rushes, ...

Illustration by Aaron Bacall

slide-7
SLIDE 7

7

Vital signs

  • Coverage
  • Complexity
  • Coupling
  • Churn
  • Builds

Research hypotheses

  • Commits
  • Unit tests
  • Size
  • Dev time
slide-8
SLIDE 8

8

Vital sign interpretation

  • Normal ranges and coloring defined by current value as

well as trends

  • Thresholds and methods can be configured

Dev time ≥50% of the members commit, commits on ≥ 50% of the days in the project interval Coverage high or increasing Size No interpretation (color white)

slide-9
SLIDE 9

9

ICU display

  • Current value as well as trend lines

code.google.com/p/hackystat/

slide-10
SLIDE 10

10

Drill-downs

  • Detailed, per-member view of vital signs
slide-11
SLIDE 11

11

Research questions

  • What are the strengths and weaknesses of the medical

ICU metaphor for teaching software measurement in a classroom setting?

  • How appropriate were the choices of “vital signs”?
  • How effective were the algorithms for coloring the vital

signs?

  • How does this approach compare to previous uses of

Hackystat to teach software metrics in a classroom setting?

slide-12
SLIDE 12

12

Study setting

  • 18 students in a senior-level undergraduate software

engineering course

  • Course about open source development in Java
  • ICU introduced in the final 4 weeks
  • Hackystat log data
  • Online survey during the last week, 17 questions

– Installation overhead – Overhead of sensor use – Problems encountered during use – Frequency of use – Privacy – Useful vital signs – Usefulness in an industrial setting

slide-13
SLIDE 13

13

Results - misc

  • Privacy: mixed, but generally positive feelings (from no

problem to „hacky-stalk“)

  • Overhead: easier than in earlier versions, though varying

from tool to tool. Sensor sending sometimes slow.

  • Frequency of use
slide-14
SLIDE 14

14

Results – vital signs

Vital sign usefulness

2 4 6 8 10 12 14 16 18 20

C

  • v

e r a g e C

  • m

p l e x i t y C

  • u

p l i n g C h u r n S i z e D e v t i m e C

  • m

m i t B u i l d T e s t

  • Coloring generally seen as accurate, with some general

drawbacks

  • ICU and drill-downs in particular useful to react to poor

health and manage team

slide-15
SLIDE 15

15

Results – industrial possibilities

  • Generally considered a good idea
  • But

– does not include non-IDE work (like reading a technical book) – Algorithms can never fully judge the health of a program in all contexts

slide-16
SLIDE 16

16

Discussion and conclusions

  • Significantly better results than previous Hackystat

studies

  • ICU metaphor is useful to interpret and understand

measurements

– No more „pretty squiggly lines“ – Coloring encourages thoughts about validity

  • ICU provides a layer of abstraction

– Normal ranges must be chosen carefully! – Too lenient interpretation leads to oversight – Too strict interpretation leads to „boy who cried wolf“ syndrome

  • Vital sign ranges need to be tweaked further
  • Dangerous weakness: measurement dysfunction
slide-17
SLIDE 17

17

Measurement dysfunction

  • Individual measurements did not contribute to the grade
  • Data was only visibile to the assistant, professor only

had anonymized data and got to see survey only after semester

  • And yet: At least one group had major problems

„I need more dev time because I need an A“ „oh if he ups his stats more than mine, tomorrow I‘m gonna hack all day“

  • compromised work as a team

Using measures competitively as a means to do good at a performance evaluation

slide-18
SLIDE 18

18

Threats to validity

In the paper

  • Small sample size ( )
  • Small duration, Small project size
  • Subjects with very similar background (senior computer

science students)

  • Wrong demography for „industry“ questions

Personal

  • Relatively short survey
  • Students unfamiliar with software measurement
slide-19
SLIDE 19

19

Future directions

  • Refine vital signs and ranges

– More research – „crowd-sourcing“

  • Use in more environments

– Industry – Different project types/languages/IDEs

  • Game-based approach

– „Devcathlon“

Personal

  • Comparative studies versus other measurement

techniques (PSP/TSP)

slide-20
SLIDE 20

20

Questions?

stormgrounds.com

slide-21
SLIDE 21

21

Appendix - Hudson

  • Continuous integration tool developed by Kohsuke

Kawaguchi

  • Builds and tests projects after every commit
  • Used in the following application for measurements of

coverage, coupling, and complexity

slide-22
SLIDE 22

22

Appendix: PSP

  • „Disciplined, data-driven procedure“
  • Level-based approach: PSP0 to PSP2.1
  • Use „historical“ data/experience from previous level to

detect repeated defects

  • Requires programmers to log their activities (a lot of

manual data collection required, even with tool support)

  • Many measures collected and derived: estimation

accuracy (size/time), prediction intervals (size/time),time in phase distribution, defect injection distribution, defect removal distribution, productivity, reuse percentage, cost performance index, planned value, earned value, etc etc

slide-23
SLIDE 23

23

Appendix: Complexity

  • Authors hint at 2 methods: Halstead complexity

measures & McGabe‘s cyclomatic complexity

  • Judging from the configuration site, ICU uses

JavaNCSS, which uses the cyclomatic complexity:

– Uses flow graph of program – Counts number of independent paths through program (Base Path Testing) – M = E − N + 2P where M = cyclomatic complexity E = number of edges of the graph N = number of nodes of the graph P = number of connected components