Prevalence of Confusing Code in Software Projects Atoms of - - PowerPoint PPT Presentation

prevalence of confusing code in software projects
SMART_READER_LITE
LIVE PREVIEW

Prevalence of Confusing Code in Software Projects Atoms of - - PowerPoint PPT Presentation

Prevalence of Confusing Code in Software Projects Atoms of Confusion in the Wild Dan Gopstein NYU Hongwei Henry Zhou, Phyllis Frankl, Justin Cappos AtomsOfConfusion.com 1 Atoms of Confusion in the Wild if ((err =


slide-1
SLIDE 1

Prevalence of Confusing Code in Software Projects

Atoms of Confusion in the Wild

Dan Gopstein NYU

Hongwei Henry Zhou, Phyllis Frankl, Justin Cappos AtomsOfConfusion.com

1

slide-2
SLIDE 2

Atoms of Confusion in the Wild

2

if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail;

slide-3
SLIDE 3

Atoms of Confusion in the Wild

3

if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail;

Apple’s Goto Fail bug

slide-4
SLIDE 4

Atoms of Confusion in the Wild

4

if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail;

Apple’s Goto Fail bug

Two Atoms of Confusion:

  • Assignment as Value
  • Omitted Curly Brace
slide-5
SLIDE 5

Atoms of Confusion in the Wild

5

if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) { goto fail; goto fail;

Apple’s Goto Fail bug

{ }

Two Atoms of Confusion:

  • Assignment as Value
  • Omitted Curly Brace
slide-6
SLIDE 6

Outline

6

Atoms of Confusion are ...

  • Confusing - Both in the lab and in the wild
  • Prevalent - Occurring frequently in practice
  • Buggy - Causing or correlated with faults
slide-7
SLIDE 7

Outline

7

Atoms of Confusion are ...

  • Confusing - Both in the lab and in the wild
  • Prevalent - Occurring frequently in practice
  • Buggy - Causing or correlated with faults
slide-8
SLIDE 8

Atoms of Confusion

8

Understanding Misunderstandings in Source Code

  • D. Gopstein, J. Iannacone, Y. Yan, L. DeLong,
  • Y. Zhuang, M. Yeh, J. Cappos

ESEC/FSE 2017

slide-9
SLIDE 9

printf("%d",013) Confusion

13 11

When a person and a machine read the same piece of code, yet come to different conclusions about its output.

9

slide-10
SLIDE 10

Measurable printf("%d",013)

10

printf("%d",11)

slide-11
SLIDE 11

Measurable

11

printf("%d",013) printf("%d",11)

slide-12
SLIDE 12

Measurable

12

printf("%d",013) printf("%d",11)

slide-13
SLIDE 13

Precise

The smallest piece of code that can cause confusion

Other Stuff Fluff Confusing Code Confusing Code

13

slide-14
SLIDE 14

Precise

The smallest piece of code that can cause confusion

Other Stuff Fluff Confusing Code Confusing Code

Atom of Confusion

14

slide-15
SLIDE 15

Identified Atoms

15

φ

slide-16
SLIDE 16

Atoms of Confusion

16

Understanding Misunderstandings in Source Code

  • D. Gopstein, J. Iannacone, Y. Yan, L. DeLong, Y. Zhuang, M. Yeh, J.

Cappos ESEC/FSE 2017

V1 && F2()

Logic as Control Flow

V1 = ++V2;

Pre-Increment

printf("%d",013)

Literal Encoding

0 && 1 || 2

Operator Precedence φ = .63 φ = .48 φ = .28 φ = .33

slide-17
SLIDE 17

Outline

17

Atoms of Confusion are ...

  • Confusing - Both in the lab and in the wild
  • Prevalent - Occurring frequently in practice
  • Buggy - Causing or correlated with faults
slide-18
SLIDE 18

Classifier

18

if = x 2 foo () ; if (x = 2) foo();

slide-19
SLIDE 19

Classifier

19

if = x 2 foo () ; if (x = 2) foo(); Classifier

slide-20
SLIDE 20

Classifier

20

if = x 2 foo () ; if (x = 2) foo(); Classifier Two Atoms of Confusion:

  • Assignment as Value
  • Omitted Curly Brace

{

slide-21
SLIDE 21

Corpus

21

slide-22
SLIDE 22

How Often do Atoms Occur?

1 atom every ~12 lines 1 atom every ~44 lines

22

slide-23
SLIDE 23

Which Atoms Occur Most Frequently?

1 every ~51 lines 1 every ~1.6 million

23

slide-24
SLIDE 24

Are Confusing Patterns Less Common?

24

φ

slide-25
SLIDE 25

Prevalent

ulpmc->cmd = htobe32(V_ULPTX_CMD(ULP_TX_MEM_WRITE) | is_t4(sc) ? F_ULP_MEMIO_ORDER : F_T5_ULP_MEMIO_IMM);

25

https://github.com/freebsd/freebsd/blob/3c60e22da7d4460db7adb2b916f55e22b7d60e26/sys/dev/cxgbe/tom/t4_ddp.c#L766

slide-26
SLIDE 26

Prevalent

ulpmc->cmd = htobe32(V_ULPTX_CMD(ULP_TX_MEM_WRITE) | is_t4(sc) ? F_ULP_MEMIO_ORDER : F_T5_ULP_MEMIO_IMM);

Contains:

  • Operator Precedence
  • Conditional Operator
  • Implicit Predicate

26

https://github.com/freebsd/freebsd/blob/3c60e22da7d4460db7adb2b916f55e22b7d60e26/sys/dev/cxgbe/tom/t4_ddp.c#L766

slide-27
SLIDE 27

Prevalent

ulpmc->cmd = htobe32(V_ULPTX_CMD(ULP_TX_MEM_WRITE) | is_t4(sc) ? F_ULP_MEMIO_ORDER : F_T5_ULP_MEMIO_IMM);

Contains:

  • Operator Precedence
  • Conditional Operator
  • Implicit Predicate

27

https://github.com/freebsd/freebsd/blob/3c60e22da7d4460db7adb2b916f55e22b7d60e26/sys/dev/cxgbe/tom/t4_ddp.c#L766

slide-28
SLIDE 28

Outline

28

Atoms of Confusion are ...

  • Confusing - Both in the lab and in the wild
  • Prevalent - Occurring frequently in practice
  • Buggy - Causing or correlated with faults
slide-29
SLIDE 29

Are Atoms Removed More In Bug Fix Commits?

29

slide-30
SLIDE 30

Are Atoms Commented More Often?

30

slide-31
SLIDE 31

Are Atoms Commented More Often?

31

1.00

slide-32
SLIDE 32

Buggy

32

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x))

slide-33
SLIDE 33

Buggy

33

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1) => ???

slide-34
SLIDE 34

Buggy

34

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1) => 1

slide-35
SLIDE 35

Buggy

35

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1) => 1 ABS(-2) => ???

slide-36
SLIDE 36

Buggy

36

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1) => 1 ABS(-2) => 2

slide-37
SLIDE 37

Buggy

37

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1) => 1 ABS(-2) => 2 ABS(1-2) => ???

slide-38
SLIDE 38

Buggy

38

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1) => 1 ABS(-2) => 2 ABS(1-2) => 1

slide-39
SLIDE 39

Buggy

39

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1) => 1 ABS(-2) => 2 ABS(1-2) => 1 -3

X

slide-40
SLIDE 40

Buggy

40

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1-2)

slide-41
SLIDE 41

Buggy

41

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1-2) (( x ) < 0 ? (- x ) : ( x ))

slide-42
SLIDE 42

Buggy

42

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1-2) (( x ) < 0 ? (- x ) : ( x ))

slide-43
SLIDE 43

Buggy

43

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1-2) ((1-2) < 0 ? (-1-2) : (1-2))

slide-44
SLIDE 44

Buggy

44

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1-2) ((1-2) < 0 ? (-1-2) : (1-2))

slide-45
SLIDE 45

Buggy

45

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1-2) ((1-2) < 0 ? (-1-2) : (1-2))

  • 3
slide-46
SLIDE 46

Buggy

46

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x)) ABS(1-2) ((1-2) < 0 ? (-1-2) : (1-2))

  • 3
slide-47
SLIDE 47

Buggy

47

https://github.com/torvalds/linux/commit/7aa92c4229fefff0cab6930cf977f4a0e3e606d8

#define ABS(x) ((x) < 0 ? (-x) : (x))

Macro Operator Precedence

slide-48
SLIDE 48

Buggy

48

slide-49
SLIDE 49

Summary

49

Atoms of Confusion are ...

  • Confusing

○ Atoms are statistically more confusing than other code in the lab ○ Atoms are 13% more likely to be commented than other code

  • Prevalent

○ We found millions of examples in our corpus ○ 1 in ~23 lines of code has an atom

  • Buggy

○ Bug-fix commits are 25% more likely remove atoms ○ We found and fixed a handful of bugs in Linux

slide-50
SLIDE 50

Prevalence of Confusing Code in Software Projects

Atoms of Confusion in the Wild

Dan Gopstein NYU

Hongwei Henry Zhou, Phyllis Frankl, Justin Cappos AtomsOfConfusion.com

Thank You

50