we need more coverage stat classroom experience with the
play

We need more coverage, stat! Classroom experience with the Software - PowerPoint PPT Presentation

We need more coverage, stat! Classroom experience with the Software ICU Philip Johnson, Shaoxuan Zhang University of Hawaii Presentation by Sandro Heinzelmann Software Engineering Seminar 2010 April 27, 2010 Teaching software measurement -


  1. We need more coverage, stat! Classroom experience with the Software ICU Philip Johnson, Shaoxuan Zhang University of Hawaii Presentation by Sandro Heinzelmann Software Engineering Seminar 2010 April 27, 2010

  2. Teaching software measurement - Motivation • Not easy • Tradeoff between too much work and too little insight • Personal Software Process (PSP) / Team Software Process (TSP) versus simple literature review • � Find a balance by using automation tools Software ICU Hackystat Hudson 2

  3. Hackystat • Opensource project initiated by Philip Johnson • Collection of services • Enables subtle, unobtrusive data collection in various development tools (Eclipse, Ant,...) • Notion of sensors integrated in applications – Keep track of work, send data to Hackystat SensorBase • Layer of analysis modules • Webinterface to display data 3 code.google.com/p/hackystat/

  4. Hackystat in the past • Continuously improved over time • Used in case studies in 2003 and 2006 with varying success • Hard to install, confusion about various measurements and interpretation • New approach with a medical metaphor Software Intensive Care Unit (ICU) images.clipartof.com 4

  5. Software health metaphor ukb.uni-bonn.de • Terminology of „health“ • Not new - „runtime health“ of life-critical hardware-software systems (NASA) • Here focus is on health during development • Notion of vital signs and their normal ranges – Normal or improving � healthy – Interpreted as a whole 5

  6. Software health metaphor • High-level characteristics of a healthy project – High efficiency, high effectiveness, high quality • „Healthy programmer behavior“ – Work consistently, contribute equally, consistent committing, no last minute rushes, ... 6 Illustration by Aaron Bacall

  7. Vital signs • Coverage • Commits • Complexity • Unit tests • Coupling • Size • Churn • Dev time • Builds � Research hypotheses 7

  8. Vital sign interpretation • Normal ranges and coloring defined by current value as well as trends • Thresholds and methods can be configured Coverage high or increasing Dev time ≥ 50% of the members commit, commits on ≥ 50% of the days in the project interval Size No interpretation (color white) 8

  9. ICU display • Current value as well as trend lines code.google.com/p/hackystat/ 9

  10. Drill-downs • Detailed, per-member view of vital signs 10

  11. Research questions • What are the strengths and weaknesses of the medical ICU metaphor for teaching software measurement in a classroom setting? • How appropriate were the choices of “vital signs”? • How effective were the algorithms for coloring the vital signs? • How does this approach compare to previous uses of Hackystat to teach software metrics in a classroom setting? 11

  12. Study setting • 18 students in a senior-level undergraduate software engineering course • Course about open source development in Java • ICU introduced in the final 4 weeks • Hackystat log data • Online survey during the last week, 17 questions – Installation overhead – Overhead of sensor use – Problems encountered during use – Frequency of use – Privacy – Useful vital signs – Usefulness in an industrial setting 12

  13. Results - misc • Privacy: mixed, but generally positive feelings (from no problem to „hacky-stalk“) • Overhead: easier than in earlier versions, though varying from tool to tool. Sensor sending sometimes slow. • Frequency of use 13

  14. Results – vital signs Vital sign usefulness 20 18 16 14 12 10 8 6 4 2 0 t y g n e e d t e i s m l t n r z m g i e i u i u x i a S l i m T h p t B r e e C l v o u p v o e C m o D C C o C • Coloring generally seen as accurate, with some general drawbacks • ICU and drill-downs in particular useful to react to poor health and manage team 14

  15. Results – industrial possibilities • Generally considered a good idea • But – does not include non-IDE work (like reading a technical book) – Algorithms can never fully judge the health of a program in all contexts 15

  16. Discussion and conclusions • Significantly better results than previous Hackystat studies • ICU metaphor is useful to interpret and understand measurements – No more „pretty squiggly lines“ – Coloring encourages thoughts about validity • ICU provides a layer of abstraction – Normal ranges must be chosen carefully! – Too lenient interpretation leads to oversight – Too strict interpretation leads to „boy who cried wolf“ syndrome • Vital sign ranges need to be tweaked further • Dangerous weakness: measurement dysfunction 16

  17. Measurement dysfunction Using measures competitively as a means to do good at a performance evaluation • Individual measurements did not contribute to the grade • Data was only visibile to the assistant, professor only had anonymized data and got to see survey only after semester • And yet: At least one group had major problems „I need more dev time because I need an A“ „oh if he ups his stats more than mine, tomorrow I‘m gonna hack all day“ • � compromised work as a team 17

  18. Threats to validity In the paper • Small sample size ( ) • Small duration, Small project size • Subjects with very similar background (senior computer science students) • Wrong demography for „industry“ questions Personal • Relatively short survey • Students unfamiliar with software measurement 18

  19. Future directions • Refine vital signs and ranges – More research – „crowd-sourcing“ • Use in more environments – Industry – Different project types/languages/IDEs • Game-based approach – „Devcathlon“ Personal • Comparative studies versus other measurement techniques (PSP/TSP) 19

  20. Questions? stormgrounds.com 20

  21. Appendix - Hudson • Continuous integration tool developed by Kohsuke Kawaguchi • Builds and tests projects after every commit • Used in the following application for measurements of coverage, coupling, and complexity 21

  22. Appendix: PSP • „Disciplined, data-driven procedure“ • Level-based approach: PSP0 to PSP2.1 • Use „historical“ data/experience from previous level to detect repeated defects • Requires programmers to log their activities (a lot of manual data collection required, even with tool support) • Many measures collected and derived: estimation accuracy (size/time), prediction intervals (size/time),time in phase distribution, defect injection distribution, defect removal distribution, productivity, reuse percentage, cost performance index, planned value, earned value, etc etc 22

  23. Appendix: Complexity • Authors hint at 2 methods: Halstead complexity measures & McGabe‘s cyclomatic complexity • Judging from the configuration site, ICU uses JavaNCSS, which uses the cyclomatic complexity: – Uses flow graph of program – Counts number of independent paths through program (Base Path Testing) – M = E − N + 2 P where M = cyclomatic complexity E = number of edges of the graph N = number of nodes of the graph P = number of connected components 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend