Lies, Damned Lies and Statistics EuroPython 2018 Edinburgh, UK - - PowerPoint PPT Presentation

lies damned lies and statistics
SMART_READER_LITE
LIVE PREVIEW

Lies, Damned Lies and Statistics EuroPython 2018 Edinburgh, UK - - PowerPoint PPT Presentation

Lies, Damned Lies and Statistics EuroPython 2018 Edinburgh, UK July 2018 @MarcoBonzanini In the Vatican City there are 5.88 popes per square mile 2 This talk is about: The misuse of statistics in everyday life How (not)


slide-1
SLIDE 1

Lies, Damned Lies
 and Statistics

@MarcoBonzanini

EuroPython 2018 Edinburgh, UK July 2018

slide-2
SLIDE 2

In the Vatican City
 there are 5.88 popes
 per square mile

2

slide-3
SLIDE 3

This talk is about:

  • The misuse of statistics in everyday life
  • How (not) to lie with statistics

This talk is not about:

  • Python
  • Advanced Statistical Models

The audience (you!):

  • Good citizens
  • An interest in statistical literacy


(without an advanced Math degree?)

3

slide-4
SLIDE 4

LIES, DAMNED LIES
 AND CORRELATION

slide-5
SLIDE 5

Correlation

5

slide-6
SLIDE 6

Correlation

  • Informal: a connection between two things
  • Measure the strength of the association

between two variables

6

slide-7
SLIDE 7

Linear Correlation

7

slide-8
SLIDE 8

Linear Correlation

8

Positive Negative x x y y

slide-9
SLIDE 9

Correlation Example

9

slide-10
SLIDE 10

Correlation Example

10

Temperature

Ice Cream
 Sales ($$$)

slide-11
SLIDE 11

“Correlation 
 does not imply
 causation”

11

slide-12
SLIDE 12

12

Deaths by
 drowning Ice Cream
 Sales ($$$)

slide-13
SLIDE 13

13

Lurking Variable

slide-14
SLIDE 14

14

Temperature

Ice Cream
 Sales ($$$)

Temperature

Deaths by
 drowning

Lurking Variable

slide-15
SLIDE 15

More Lurking Variables

15

slide-16
SLIDE 16

More Lurking Variables

16

Damage
 caused
 by fire

Firefighters
 deployed

🔦

slide-17
SLIDE 17

More Lurking Variables

17

Damage
 caused
 by fire

Firefighters
 deployed Fire severity?

slide-18
SLIDE 18

Correlation and causation

18

slide-19
SLIDE 19

Correlation and causation

  • A causes B, or B causes A
  • A and B both cause C
  • C causes A and B
  • A causes C, and C causes B
  • No connection between A and B

19

slide-20
SLIDE 20

20

http://www.tylervigen.com/spurious-correlations

slide-21
SLIDE 21

21

http://www.tylervigen.com/spurious-correlations

slide-22
SLIDE 22

22

https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations

slide-23
SLIDE 23

23

https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations

slide-24
SLIDE 24

24

http://www.nejm.org/doi/full/10.1056/NEJMon1211064

slide-25
SLIDE 25

LIES, DAMNED LIES,
 SLICING AND DICING
 YOUR DATA

slide-26
SLIDE 26

Simpson’s
 Paradox

26

slide-27
SLIDE 27

27

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

slide-28
SLIDE 28

28

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

Gender bias?

slide-29
SLIDE 29

29

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

slide-30
SLIDE 30

30

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

slide-31
SLIDE 31

31

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

slide-32
SLIDE 32

32

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

slide-33
SLIDE 33

LIES, DAMNED LIES
 AND SAMPLING BIAS

slide-34
SLIDE 34

Sampling

34

slide-35
SLIDE 35

Sampling

35

  • A selection of a subset of individuals
  • Purpose: estimate about the whole population
  • Hello Big Data!
slide-36
SLIDE 36

Bias

36

slide-37
SLIDE 37

Bias

37

  • Prejudice? Intuition?
  • Cultural context?
  • In science: a systematic error
slide-38
SLIDE 38

“Dewey defeats Truman”

38

slide-39
SLIDE 39

“Dewey defeats Truman”

39

https://en.wikipedia.org/wiki/Dewey_Defeats_Truman

slide-40
SLIDE 40

“Dewey defeats Truman”

40

https://en.wikipedia.org/wiki/Dewey_Defeats_Truman

  • The Chicago Tribune printed the wrong headline on

election night

  • The editor trusted the results of the phone survey
  • … in 1948, a sample of phone users was not

representative of the general population

slide-41
SLIDE 41

Survivorship Bias

41

slide-42
SLIDE 42

Survivorship Bias

  • Bill Gates, Steve Jobs, Mark Zuckerberg


are all college drop-outs

  • … should you quit studying?

42

slide-43
SLIDE 43

LIES, DAMNED LIES
 AND DATAVIZ

slide-44
SLIDE 44

“A picture is worth a thousand words”

44

slide-45
SLIDE 45

45

https://en.wikipedia.org/wiki/Anscombe%27s_quartet

slide-46
SLIDE 46

46

https://venngage.com/blog/misleading-graphs/

slide-47
SLIDE 47

47

https://venngage.com/blog/misleading-graphs/

slide-48
SLIDE 48

48

https://venngage.com/blog/misleading-graphs/

slide-49
SLIDE 49

49

http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

slide-50
SLIDE 50

50

http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

slide-51
SLIDE 51

51

http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

slide-52
SLIDE 52

52

https://www.raiplay.it/video/2016/04/Agor224-del-08042016-4d84cebb-472c-442c-82e0-df25c7e4d0ce.html

slide-53
SLIDE 53

53

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-54
SLIDE 54

54

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-55
SLIDE 55

55

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-56
SLIDE 56

56

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-57
SLIDE 57

57

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-58
SLIDE 58

58

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-59
SLIDE 59

LIES, DAMNED LIES
 AND SIGNIFICANCE

slide-60
SLIDE 60

Significant = Important

60

?

slide-61
SLIDE 61

Statistically Significant Results

61

slide-62
SLIDE 62

Statistically Significant Results

62

  • We are quite sure they are reliable (not by chance)
  • Maybe they’re not “big”
  • Maybe they’re not important
  • Maybe they’re not useful for decision making
slide-63
SLIDE 63

p-values

63

slide-64
SLIDE 64

64

https://en.wikipedia.org/wiki/Misunderstandings_of_p-values

slide-65
SLIDE 65

p-values

  • Probability of observing our results (or more

extreme) when the null hypothesis is true

  • Probability, not certainty
  • Often p < 0.05 (arbitrary)
  • Can we afford to be fooled by randomness


every 1 time out of 20?

65

slide-66
SLIDE 66

Data dredging

66

slide-67
SLIDE 67

67

slide-68
SLIDE 68

Data dredging

  • a.k.a. Data fishing or p-hacking
  • Convention: formulate hypothesis, collect data,

prove/disprove hypothesis

  • Data dredging: look for patterns until something

statistically significant comes up

  • Looking for patterns is ok


Testing the hypothesis on the same data set is not

68

slide-69
SLIDE 69

SUMMARY

slide-70
SLIDE 70

— Dr. House

“Everybody lies”

70

slide-71
SLIDE 71

71

  • Good Science ™ vs. Big headlines
  • Nobody is immune
  • Ask questions: What is the context? Who’s paying?

What’s missing?

  • … “so what?”
slide-72
SLIDE 72

THANK YOU

@MarcoBonzanini

speakerdeck.com/marcobonzanini GitHub.com/bonzanini marcobonzanini.com

slide-73
SLIDE 73