How Failures Come to be Andreas Zeller 1 An F-16 fighter plane An - - PDF document

how failures come to be
SMART_READER_LITE
LIVE PREVIEW

How Failures Come to be Andreas Zeller 1 An F-16 fighter plane An - - PDF document

How Failures Come to be Andreas Zeller 1 An F-16 fighter plane An F-16 on the northern (northern hemisphere) hemisphere. Why the northern hemisphere, you ask? 2 2 Because this is what An F-16 an F-16 on the (southern hemisphere)


slide-1
SLIDE 1

Andreas Zeller

How Failures Come to be

2

An F-16

(northern hemisphere)

3

An F-16

(southern hemisphere)

1

An F-16 fighter plane

  • n the northern

hemisphere. Why the northern hemisphere, you ask?

2

Because this is what an F-16 on the southern hemisphere would look like. (BTW, interesting effect if you drop a bomb :-) From risks.digest, volume 3, issue 44:

  • Since the F-16 is a

3

slide-2
SLIDE 2

4

F-16 Landing Gear

5

The First Bug

September 9, 1947

6

More Bugs

From risks.digest, volume 3, issue 44:

  • One of the first

things the Air Force test pilots tried on an early F-16 was to tell the computer to raise the landing gear while standing still on the runway. Guess what happened? Scratch

  • ne F-16. (my friend

4

Retrieved by a technician from the Harvard Mark II machine on September 9, 1947. Now on display at the Smithsonian, Washington

5 6

slide-3
SLIDE 3

More bugs

7

Facts on Debugging

8

  • Software bugs cost ~60 bln US$/yr in US
  • Improvements could reduce cost by 30%
  • Validation (including debugging) can easily

take up to 50-75% of the development time

  • When debugging, some people are three

times as efficient than others

9

A Sample Program

sample 9 8 7 $ Output: 7 8 9 sample 11 14 $ Output: 0 11

7 8 9

slide-4
SLIDE 4

10

How to Debug

(Sommerville 2004)

Locate error Design error repair Repair error Re-test program

11

The Traffjc Principle

T R A F F I C rack the problem eproduce utomate ind Origins

  • cus

solate

  • rrect

12

The Traffjc Principle

T R A F F I C rack the problem eproduce utomate ind Origins

  • cus

solate

  • rrect

10 11 12

slide-5
SLIDE 5

13

  • 1. The programmer creates a

defect – an error in the code.

  • 2. When executed, the defect

creates an infection – an error in the state.

  • 3. The infection propagates.
  • 4. The infection causes a failure.

From Defect to Failure

✘ ✘ ✘

Variables

This infection chain must be traced back – and broken. t

14

  • Not every defect causes

a failure!

  • Testing can only show the

presence of errors – not their absence.

(Dijkstra 1972)

The Curse of Testing

✘ ✘ ✘

Variables

15

  • Every failure can be

traced back to some infection, and every infection is caused by some defect.

  • Debugging means to

relate a given failure to the defect – and to remove the defect.

Debugging

✘ ✘ ✘

Variables

13 14 15

slide-6
SLIDE 6

t variables

Search in Space + Time

16

?

t variables

The Defect

17

✘ ✘

t variables

Search in Time

18

16 The defect must be searched in _space_ and _time_ 17 18

slide-7
SLIDE 7

t variables

Search in Time

19

t variables

Search in Time

20

t variables

Search in Space

21

✔ ✔

19 20 21

slide-8
SLIDE 8

t variables

Search in Space

22

✔ ✔ ✔

✘ ✘

t variables

Search in Space

23

✔ ✔ ✔

✘ ✘

24

A Program State

22 23

State of the GNU compiler (GCC) 42991 vertices 44290 edges - and 1 is wrong :-) An actual GCC execution has millions

  • f these states.

24

slide-9
SLIDE 9

25

A Sample Program

sample 9 8 7 $ Output: 7 8 9 sample 11 14 $

26

int main(int argc, char *argv[]) { int *a; int i; a = (int *)malloc((argc - 1) * sizeof(int)); for (i = 0; i < argc - 1; i++) a[i] = atoi(argv[i + 1]); shell_sort(a, argc); printf("Output: "); for (i = 0; i < argc - 1; i++) printf("%d ", a[i]); printf("\n"); free(a); return 0; }

27

Find Origins

  • The 0 printed is the

value of a[0]. Where does it come from?

  • Basic idea: Track or

deduce value origins

  • Separates relevant from

irrelevant values

  • We can trace back a[0]

to shell_sort variables time !

! ! ! ! ! 25 26 27

slide-10
SLIDE 10

28

static void shell_sort(int a[], int size) { int i, j; int h = 1; do { h = h * 3 + 1; } while (h <= size); do { h /= 3; for (i = h; i < size; i++) { int v = a[i]; for (j = i; j >= h && a[j - h] > v; j -= h) a[j] = a[j - h]; if (i != j) a[j] = v; } } while (h != 1); }

29

Observing a Run

i = 0 a[i] = atoi(argv[i + 1]) i++ a[i] = atoi(argv[i + 1]) i++ shell_sort(a, argc) a = malloc(...) return 0 3 ? ? ? ? "11""14" 11 14 1 2 3 11 3 "11""14" ? 2 ?

variables time

argc argv [0] argv [1] a [0] a [1] i size h a [2] 30

Specific Observation

static void shell_sort(int a[], int size) { int i, j; int h = 1; ... } sample 11 14 $ a[0] = 11 a[1] = 14 a[2] = 0 size = 3 Output: 0 11 fprintf(stderr, “At shell_sort”); for (i = 0; i < size; i++) fprintf(stderr, “a[%d] = %d\n”, i, a[i]); fprintf(stderr, “size = %d\n”, size);

The state is infected at the call of shell_sort!

28

FIXME: argv[0] should be “sample”, not “11”

29 30

slide-11
SLIDE 11

31

shell_sort(a, argc); shell_sort(a, argc - 1); shell_sort(a, argc); int main(int argc, char *argv[]) { int *a; int i; a = (int *)malloc((argc - 1) * sizeof(int)); for (i = 0; i < argc - 1; i++) a[i] = atoi(argv[i + 1]); ... }

Fixing the Program

sample 11 14 $ Output: 11 14

32

Sane state Infected state

Finding Causes

The difference causes the failure

33

Sane state Infected state

Search in Space

Mixed state

✔ ✘

Test

?

argc = 3

31 32 33

slide-12
SLIDE 12

34

Passing run Failing run

Search in Time

t

argc = 3 argc = 3 a[2] = 0 Transition from argc to a[2]

int main(int argc, char *argv[]) { int *a; // Input array a = (int *)malloc((argc - 1) * sizeof(int)); for (int i = 0; i < argc - 1; i++) a[i] = atoi(argv[i + 1]); // Sort array shell_sort(a, argc); // Output array printf("Output: "); for (int i = 0; i < argc - 1; i++) printf("%d ", a[i]); printf("\n"); free(a); return 0; }

35

Should be argc - 1

36

34 35 36

slide-13
SLIDE 13

37

Concepts

A failure comes to be in three stages:

  • 1. The programmer creates a defect
  • 2. The defect causes an infection
  • 3. The infection causes a failure -- an

externally visible error. Not every defect results in an infection, and not every infection results in a failure.

38

Concepts (2)

To debug a program, proceed in 7 steps:

T R A F F I C rack the problem eproduce utomate ind Origins

  • cus

solate

  • rrect

39

Concepts (3)

A variety of tools and techniques is available to automate debugging:

  • Program Slicing
  • Observing & Watching State
  • Asserting Invariants
  • Detecting Anomalies
  • Isolating Cause-Effect Chains

37 38 39

slide-14
SLIDE 14

40 This work is licensed under the Creative Commons Attribution License. To view a copy of this license, visit http://creativecommons.org/licenses/by/1.0

  • r send a letter to Creative Commons, 559 Abbott Way, Stanford, California 94305, USA.

40