Failure is a four-letter word Andreas Zeller Thomas Zimmermann - - PowerPoint PPT Presentation

failure is a four letter word
SMART_READER_LITE
LIVE PREVIEW

Failure is a four-letter word Andreas Zeller Thomas Zimmermann - - PowerPoint PPT Presentation

Failure is a four-letter word Andreas Zeller Thomas Zimmermann Christian Bird PROMISE 2011, Banff, Canada Software failures 2 Defect distributions 3 Failure causes 4 Failure causes 5 Failure causes 6 Failure causes 7 Failure


slide-1
SLIDE 1

Failure is a four-letter word

Andreas Zeller • Thomas Zimmermann • Christian Bird PROMISE 2011, Banff, Canada
slide-2
SLIDE 2

Software failures

2
slide-3
SLIDE 3

Defect distributions

3
slide-4
SLIDE 4

Failure causes

4
slide-5
SLIDE 5

Failure causes

5
slide-6
SLIDE 6

Failure causes

6
slide-7
SLIDE 7

Failure causes

7
slide-8
SLIDE 8

Failure causes

7
slide-9
SLIDE 9

Cost of consequence

8
slide-10
SLIDE 10

Back to basics

9
slide-11
SLIDE 11

Back to basics

9

A B C

slide-12
SLIDE 12

Basic actions

10
slide-13
SLIDE 13 public class ImageViewerPlugin extends AbstractUIPlugin { //The shared instance. private static ImageViewerPlugin plugin;
  • /**
* The constructor. */ public ImageViewerPlugin() { plugin = this; } /** * This method is called upon plug-in activation */ public void start(BundleContext context) throws Exception { super.start(context); }

Basic actions

10
slide-14
SLIDE 14 11 public class ImageViewerPlugin extends AbstractUIPlugin { //The shared instance. private static ImageViewerPlugin plugin;
  • /**
* The constructor. */ public ImageViewerPlugin() { plugin = this; } /** * This method is called upon plug-in activation */ public void start(BundleContext context) throws Exception { super.start(context); }
slide-15
SLIDE 15 12 ////// ******** aaaaaaaaaa cccccccccc d eeeeeeeeeeeeeeee ggggggg hh iiiiiiiiiiiii lllllllllllll mmmm nnnnnnnnnnnnnnnn
  • pppppppppp
rrrrrrrrrr ssssssssssss ttttttttttttttttttttt uuuuuuuuuuuu v wwww A B C E IIII PPPP TTT U VVV {{{ }}}
slide-16
SLIDE 16

Hypotheses

13
slide-17
SLIDE 17

Hypotheses

13
  • 1. We can predict defects from

programmer actions.

slide-18
SLIDE 18

Hypotheses

13
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

slide-19
SLIDE 19

Hypotheses

13
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

  • 3. We can prevent defects by

restricting programmer actions.

slide-20
SLIDE 20

Hypotheses

14
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

  • 3. We can prevent defects by

restricting programmer actions.

slide-21
SLIDE 21

Eclipse bug data

[PROMISE 2007] 15 Table 1: Features of the Eclipse datasets. Release& Total&chars& Total&files& Files&with& defects& Eclipse(2.0( 44,914,520( 6,728( 975((14%)( Eclipse(2.1( 56,068,650( 7,887( 854((11%)( Eclipse(3.0( 76,193,482( 10,593( 1,568((15%)(
slide-22
SLIDE 22

Eclipse characters

16
slide-23
SLIDE 23 Table 2: Precision for various training/testing combinations. Training&Set& Eclipse&2.0& Eclipse&2.1& Eclipse&3.0& Average& Eclipse(2.0( 0.74( 0.39( 0.49( 0.54( Eclipse(2.1( 0.55( 0.64( 0.56( 0.58( Eclipse(3.0( 0.57( 0.40( 0.64( 0.54( Average( 0.62( 0.47( 0.56( 0.55(

Precision

17
slide-24
SLIDE 24 Table 2: Precision for various training/testing combinations. Training&Set& Eclipse&2.0& Eclipse&2.1& Eclipse&3.0& Average& Eclipse(2.0( 0.74( 0.39( 0.49( 0.54( Eclipse(2.1( 0.55( 0.64( 0.56( 0.58( Eclipse(3.0( 0.57( 0.40( 0.64( 0.54( Average( 0.62( 0.47( 0.56( 0.55(

Precision

18
slide-25
SLIDE 25 Table 2: Precision for various training/testing combinations. Training&Set& Eclipse&2.0& Eclipse&2.1& Eclipse&3.0& Average& Eclipse(2.0( 0.74( 0.39( 0.49( 0.54( Eclipse(2.1( 0.55( 0.64( 0.56( 0.58( Eclipse(3.0( 0.57( 0.40( 0.64( 0.54( Average( 0.62( 0.47( 0.56( 0.55(

Precision

18
slide-26
SLIDE 26

Recall

19 Table 3: Recall for various training/testing combinations. Training&Set& Eclipse&2.0& Eclipse&2.1& Eclipse&3.0& Average& Eclipse(2.0( 0.32( 0.27( 0.27( 0.28( Eclipse(2.1( 0.03( 0.18( 0.14( 0.11( Eclipse(3.0( 0.19( 0.16( 0.20( 0.18( Average( 0.18( 0.20( 0.20( 0.19(
slide-27
SLIDE 27

Hypotheses

20
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

  • 3. We can prevent defects by

restricting programmer actions.

slide-28
SLIDE 28

Hypotheses

20
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

  • 3. We can prevent defects by

restricting programmer actions.

slide-29
SLIDE 29

Hypotheses

21
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

  • 3. We can prevent defects by

restricting programmer actions.

slide-30
SLIDE 30

Defect correlations

22
slide-31
SLIDE 31

Defect correlations

23
slide-32
SLIDE 32

Defect correlations

23
slide-33
SLIDE 33

Defect correlations

24
slide-34
SLIDE 34

Defect correlations

24

IROP

slide-35
SLIDE 35

Hypotheses

25
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

  • 3. We can prevent defects by

restricting programmer actions.

slide-36
SLIDE 36

Hypotheses

25
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

  • 3. We can prevent defects by

restricting programmer actions.

✔ ✔

slide-37
SLIDE 37

Hypotheses

26
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

  • 3. We can prevent defects by

restricting programmer actions.

✔ ✔

slide-38
SLIDE 38

Explicit causes

27
slide-39
SLIDE 39

Explicit causes

27
slide-40
SLIDE 40

IROP keyboard

28
slide-41
SLIDE 41

Coding standards

29
slide-42
SLIDE 42

Coding standards

29

if ¡(p ¡!= ¡null) ¡ ¡{ ¡int ¡i; ¡while ¡(p[i] ¡< ¡0) ¡i++; ¡return ¡i; ¡}

slide-43
SLIDE 43

Coding standards

29

if ¡(p ¡!= ¡null) ¡ ¡{ ¡int ¡i; ¡while ¡(p[i] ¡< ¡0) ¡i++; ¡return ¡i; ¡} when ¡(q ¡!= ¡null) ¡ ¡ ¡{ ¡num ¡n; ¡as ¡(q[n] ¡< ¡0) ¡n++; ¡handback ¡n; ¡}

slide-44
SLIDE 44

Coding standards

30

when ¡(q ¡!= ¡null) ¡ ¡ ¡{ ¡num ¡n; ¡as ¡(q[n] ¡< ¡0) ¡n++; ¡handback ¡n; ¡}

slide-45
SLIDE 45

Coding standards

30

when ¡(q ¡!= ¡null) ¡ ¡ ¡{ ¡num ¡n; ¡as ¡(q[n] ¡< ¡0) ¡n++; ¡handback ¡n; ¡}

100% semantics preserving

slide-46
SLIDE 46

New habits

31
slide-47
SLIDE 47

New habits

31

W e can sun tete set majusculet, and t text says jus as swelm as antecedently . Let us jus ban tem!

slide-48
SLIDE 48

Hypotheses

32
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

  • 3. We can prevent defects by

restricting programmer actions.

✔ ✔

slide-49
SLIDE 49

Hypotheses

32
  • 1. We can predict defects from

programmer actions.

  • 2. We can isolate defect-prone

programmer actions.

  • 3. We can prevent defects by

restricting programmer actions. ✔

✔ ✔

slide-50
SLIDE 50

FAQs and threats

33
slide-51
SLIDE 51

FAQs and threats

33
  • 1. How about external validity?
(findings based on ≥177,000,000 characters + 1,000s of defects; one of largest studies ever)
slide-52
SLIDE 52

FAQs and threats

33
  • 1. How about external validity?
(findings based on ≥177,000,000 characters + 1,000s of defects; one of largest studies ever)
  • 2. Are the correlations significant?
(yes – all of them)
slide-53
SLIDE 53

FAQs and threats

33
  • 1. How about external validity?
(findings based on ≥177,000,000 characters + 1,000s of defects; one of largest studies ever)
  • 2. Are the correlations significant?
(yes – all of them)
  • 3. Are the measures appropriate?
(all code originally input as characters; no abstraction taken that could interfere)
slide-54
SLIDE 54

Future work

34
slide-55
SLIDE 55

Future work

34
  • Automatic renamings
(PROMISE → ENGAGEMENT, Eclipse → Eclse)
slide-56
SLIDE 56

Future work

34
  • Automatic renamings
(PROMISE → ENGAGEMENT, Eclipse → Eclse)
  • Abstraction
(Failure / mistake / error / problem / bug report
  • vs. success / fame)
slide-57
SLIDE 57

Future work

34
  • Automatic renamings
(PROMISE → ENGAGEMENT, Eclipse → Eclse)
  • Abstraction
(Failure / mistake / error / problem / bug report
  • vs. success / fame)
  • Generalization
(ИРОП principle)
slide-58
SLIDE 58

Failure is a four-letter word

slide-59
SLIDE 59

Failure is a four-letter word

slide-60
SLIDE 60
slide-61
SLIDE 61

Why all this is wrong

slide-62
SLIDE 62

Correlation vs. Causation

slide-63
SLIDE 63

Machine Learning works

slide-64
SLIDE 64

Cherry Picking

slide-65
SLIDE 65

Fix Causes, not Symptoms

slide-66
SLIDE 66

Actionable Findings

slide-67
SLIDE 67

Our Inspiration

http://xkcd.com/882/
slide-68
SLIDE 68
slide-69
SLIDE 69
slide-70
SLIDE 70
slide-71
SLIDE 71
slide-72
SLIDE 72
slide-73
SLIDE 73

Use Book in Class

slide-74
SLIDE 74

Use Paper in Class

Failure is a Four-Letter Word – A Parody in Empirical Research – Andreas Zeller* Saarland University Saarbrücken, Germany zeller@cs.uni-saarland.de Thomas Zimmermann Microsoft Research Washington, USA tzimmer@microsoft.com Christian Bird Microsoft Research Washington, USA cbird@microsoft.com ABSTRACT Background: The past years have seen a surge of techniques predicting failure-prone locations based on more or less complex
  • metrics. Few of these metrics are actionable, though.
Aims: This paper explores a simple, easy-to-implement method to predict and avoid failures in software systems. The IROP method links elementary source code features to known software failures in a lightweight, easy-to-implement fashion. Method: We sampled the Eclipse data set mapping defects to files in three Eclipse releases. We used logistic regression to as- sociate programmer actions with defects, tested the predictive power of the resulting classifier in terms of precision and recall, and isolated the most defect-prone actions. We also collected initial feedback on possible remedies. Results: In our sample set, IROP correctly predicted up to 74% of the failure-prone modules, which is on par with the most elaborate predictors available. We isolated a set of four easy-to-remember recommendations, telling programmers precisely what to do to avoid errors. Initial feedback from developers suggests that these recommendations are straightforward to follow in practice. Conclusions: With the abundance of software development data, even the simplest methods can produce “actionable” results. number of developers associated with a file. As elaborate as these approaches may be, they all share the same problem which we call the cost of consequence: If I know that a module is failure-prone because it frequently changes, should I stop changing it? If I know failures are related to complexity, should I rewrite it from scratch? Any of these measures induces a new risk – a risk which may be greater than the one originally addressed. In this paper, we take a different approach. We predict failures from the most basic actions programmers undertake, focusing on the actions that introduce defects as they are being made – literal- ly at the moment the source code is typed in. Our recommenda- tions are immediately actionable: A simple visual representation associates actions with the likelihood of introducing defects – warning programmers before they might hit the wrong key. Our approach is both effective and efficient: In a case study on the Eclipse failure set, it correctly identified up to 74% of the failure- prone modules, which is on par with the most elaborate predictors
  • available. Specifically, our contributions include:
1) A novel mechanism to associate programmer actions with software defects; 2) A predictor that is purely text-oriented, thus lightweight, real-time, easy to implement, and language-agnostic; 3) A set of easy-to-remember recommendations, validated on
slide-75
SLIDE 75
slide-76
SLIDE 76

http://www.st.cs.uni-saarland.de/softevo/irop/