Lies, Damned Lies and Statistics PyCon UK 2019 @MarcoBonzanini In - - PowerPoint PPT Presentation

lies damned lies and statistics
SMART_READER_LITE
LIVE PREVIEW

Lies, Damned Lies and Statistics PyCon UK 2019 @MarcoBonzanini In - - PowerPoint PPT Presentation

Lies, Damned Lies and Statistics PyCon UK 2019 @MarcoBonzanini In the Vatican City there are 5.88 popes per square mile This talk is about: the misuse of stats in everyday life This talk is NOT about: Python The audience (you!): good


slide-1
SLIDE 1

Lies, Damned Lies
 and Statistics

@MarcoBonzanini

PyCon UK 2019

slide-2
SLIDE 2

In the Vatican City
 there are 5.88 popes
 per square mile

slide-3
SLIDE 3

This talk is about: the misuse of stats in everyday life This talk is NOT about: Python The audience (you!): good citizens, with an interest in statistical literacy (without an advanced Math degree?)

slide-4
SLIDE 4

LIES, DAMNED LIES
 AND CORRELATION

slide-5
SLIDE 5

Correlation

slide-6
SLIDE 6

Correlation

  • Informal: a connection between two things
  • Measure the strength of the association

between two variables

slide-7
SLIDE 7

Linear Correlation

slide-8
SLIDE 8

Linear Correlation

Positive Negative x x y y

slide-9
SLIDE 9

Correlation Example

slide-10
SLIDE 10

Correlation Example

Temperature

Ice Cream
 Sales ($$$)

slide-11
SLIDE 11

“Correlation 
 does not imply
 causation”

slide-12
SLIDE 12

Deaths by
 drowning Ice Cream
 Sales ($$$)

slide-13
SLIDE 13

Lurking Variable

slide-14
SLIDE 14

Temperature

Ice Cream
 Sales ($$$)

Temperature

Deaths by
 drowning

Lurking Variable

slide-15
SLIDE 15

More Lurking Variables

slide-16
SLIDE 16

Damage
 caused
 by fire

Firefighters
 deployed

🔦

More Lurking Variables

slide-17
SLIDE 17

Damage
 caused
 by fire

Firefighters
 deployed Fire severity?

More Lurking Variables

slide-18
SLIDE 18

Correlation and causation

slide-19
SLIDE 19

Correlation and causation

A A B B C A B C A B C

slide-20
SLIDE 20

http://www.tylervigen.com/spurious-correlations

slide-21
SLIDE 21

http://www.tylervigen.com/spurious-correlations

slide-22
SLIDE 22

https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations

slide-23
SLIDE 23

https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations

slide-24
SLIDE 24

http://www.nejm.org/doi/full/10.1056/NEJMon1211064

slide-25
SLIDE 25

LIES, DAMNED LIES,
 SLICING AND DICING
 YOUR DATA

slide-26
SLIDE 26

Simpson’s Paradox

slide-27
SLIDE 27

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

slide-28
SLIDE 28

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

Gender bias?

slide-29
SLIDE 29

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

slide-30
SLIDE 30

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

slide-31
SLIDE 31

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

slide-32
SLIDE 32

University of California, Berkeley Graduate school admissions in 1973

https://en.wikipedia.org/wiki/Simpson%27s_paradox

slide-33
SLIDE 33

LIES, DAMNED LIES
 AND SAMPLING BIAS

slide-34
SLIDE 34

Sampling

slide-35
SLIDE 35

Sampling

  • A selection of a subset of individuals
  • Purpose: estimate about the whole population
  • Hello Big Data!
slide-36
SLIDE 36

Bias

slide-37
SLIDE 37

Bias

  • Prejudice? Intuition?
  • Cultural context?
  • In science: a systematic error
slide-38
SLIDE 38

“Dewey defeats Truman”

slide-39
SLIDE 39

https://en.wikipedia.org/wiki/Dewey_Defeats_Truman

“Dewey defeats Truman”

slide-40
SLIDE 40
  • The Chicago Tribune printed the wrong headline on

election night

  • The editor trusted the results of the phone survey
  • … in 1948, a sample of phone users was not

representative of the general population

“Dewey defeats Truman”

https://en.wikipedia.org/wiki/Dewey_Defeats_Truman

slide-41
SLIDE 41

Survivorship Bias

slide-42
SLIDE 42

Survivorship Bias

  • Bill Gates, Steve Jobs, Mark Zuckerberg


are all college drop-outs

  • … should you quit studying?
slide-43
SLIDE 43

LIES, DAMNED LIES
 AND DATAVIZ

slide-44
SLIDE 44

“A picture is worth 
 a thousand words”

slide-45
SLIDE 45

https://en.wikipedia.org/wiki/Anscombe%27s_quartet

slide-46
SLIDE 46

https://venngage.com/blog/misleading-graphs/

slide-47
SLIDE 47

https://venngage.com/blog/misleading-graphs/

slide-48
SLIDE 48

https://venngage.com/blog/misleading-graphs/

slide-49
SLIDE 49

http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

slide-50
SLIDE 50

http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

slide-51
SLIDE 51

http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

slide-52
SLIDE 52

https://www.raiplay.it/video/2016/04/Agor224-del-08042016-4d84cebb-472c-442c-82e0-df25c7e4d0ce.html

slide-53
SLIDE 53

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-54
SLIDE 54

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-55
SLIDE 55

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-56
SLIDE 56

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-57
SLIDE 57

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-58
SLIDE 58

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

slide-59
SLIDE 59

LIES, DAMNED LIES
 AND SIGNIFICANCE

slide-60
SLIDE 60

Significant = Important

?

slide-61
SLIDE 61

Statistically Significant Results

slide-62
SLIDE 62
  • We are quite sure they are reliable (not by chance)
  • Maybe they’re not “big”
  • Maybe they’re not important
  • Maybe they’re not useful for decision making

Statistically Significant Results

slide-63
SLIDE 63

p-values

slide-64
SLIDE 64

https://en.wikipedia.org/wiki/Misunderstandings_of_p-values

slide-65
SLIDE 65

p-values

  • Probability of observing our results (or more

extreme) when the null hypothesis is true

  • Probability, not certainty
  • Often p < 0.05 (arbitrary)
  • Can we afford to be fooled by randomness


every 1 time out of 20?

slide-66
SLIDE 66

Data dredging

slide-67
SLIDE 67
slide-68
SLIDE 68

Data dredging

  • a.k.a. Data fishing or p-hacking
  • Convention: formulate hypothesis, collect data,

prove/disprove hypothesis

  • Data dredging: look for patterns until something

statistically significant comes up

  • Looking for patterns is ok


Testing the hypothesis on the same data set is not

slide-69
SLIDE 69

LIES, DAMNED LIES
 AND CELEBRITIES ON TWITTER

slide-70
SLIDE 70

https://twitter.com/billgates/status/1118196606975787008

slide-71
SLIDE 71

P(mosquito|death) P(death|mosquito)

slide-72
SLIDE 72

SUMMARY

slide-73
SLIDE 73

— Dr. House

“Everybody lies”

slide-74
SLIDE 74
  • Good Science ™ vs. Big headlines
  • Nobody is immune
  • Ask questions:


What is the context? 
 Who’s paying? 
 What’s missing?

  • … “so what?”
slide-75
SLIDE 75

THANK YOU

@MarcoBonzanini @PyDataLondon