lies damned lies and statistics
play

Lies, Damned Lies and Statistics EuroPython 2018 Edinburgh, UK - PowerPoint PPT Presentation

Lies, Damned Lies and Statistics EuroPython 2018 Edinburgh, UK July 2018 @MarcoBonzanini In the Vatican City there are 5.88 popes per square mile 2 This talk is about: The misuse of statistics in everyday life How (not)


  1. Lies, Damned Lies 
 and Statistics EuroPython 2018 Edinburgh, UK July 2018 @MarcoBonzanini

  2. In the Vatican City 
 there are 5.88 popes 
 per square mile 2

  3. This talk is about: • The misuse of statistics in everyday life • How (not) to lie with statistics This talk is not about: • Python • Advanced Statistical Models The audience (you!): • Good citizens • An interest in statistical literacy 
 (without an advanced Math degree?) 3

  4. LIES, DAMNED LIES 
 AND CORRELATION

  5. Correlation 5

  6. Correlation • Informal: a connection between two things • Measure the strength of the association between two variables 6

  7. Linear Correlation 7

  8. Linear Correlation y y Positive Negative x x 8

  9. Correlation Example 9

  10. Correlation Example Ice Cream 
 Sales ($$$) Temperature 10

  11. “Correlation 
 does not imply 
 causation” 11

  12. Deaths by 
 drowning Ice Cream 
 Sales ($$$) 12

  13. Lurking Variable 13

  14. Lurking Variable Deaths by 
 Ice Cream 
 drowning Sales ($$$) Temperature Temperature 14

  15. More Lurking Variables 15

  16. More Lurking Variables Damage 
 caused 
 🔦 by fire Firefighters 
 deployed 16

  17. More Lurking Variables Damage 
 caused 
 by fire Fire severity? Firefighters 
 deployed 17

  18. Correlation and causation 18

  19. Correlation and causation • A causes B, or B causes A • A and B both cause C • C causes A and B • A causes C, and C causes B • No connection between A and B 19

  20. http://www.tylervigen.com/spurious-correlations 20

  21. http://www.tylervigen.com/spurious-correlations 21

  22. https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations 22

  23. https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations 23

  24. http://www.nejm.org/doi/full/10.1056/NEJMon1211064 24

  25. LIES, DAMNED LIES, 
 SLICING AND DICING 
 YOUR DATA

  26. Simpson’s 
 Paradox 26

  27. University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox 27

  28. University of California, Berkeley Graduate school admissions in 1973 Gender bias? https://en.wikipedia.org/wiki/Simpson%27s_paradox 28

  29. University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox 29

  30. University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox 30

  31. University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox 31

  32. University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox 32

  33. LIES, DAMNED LIES 
 AND SAMPLING BIAS

  34. Sampling 34

  35. Sampling • A selection of a subset of individuals • Purpose: estimate about the whole population • Hello Big Data! 35

  36. Bias 36

  37. Bias • Prejudice? Intuition? • Cultural context? • In science: a systematic error 37

  38. “Dewey defeats Truman” 38

  39. “Dewey defeats Truman” https://en.wikipedia.org/wiki/Dewey_Defeats_Truman 39

  40. “Dewey defeats Truman” • The Chicago Tribune printed the wrong headline on election night • The editor trusted the results of the phone survey • … in 1948, a sample of phone users was not representative of the general population https://en.wikipedia.org/wiki/Dewey_Defeats_Truman 40

  41. Survivorship Bias 41

  42. Survivorship Bias • Bill Gates, Steve Jobs, Mark Zuckerberg 
 are all college drop-outs • … should you quit studying? 42

  43. LIES, DAMNED LIES 
 AND DATAVIZ

  44. “A picture is worth a thousand words” 44

  45. https://en.wikipedia.org/wiki/Anscombe%27s_quartet 45

  46. https://venngage.com/blog/misleading-graphs/ 46

  47. https://venngage.com/blog/misleading-graphs/ 47

  48. https://venngage.com/blog/misleading-graphs/ 48

  49. http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T 49

  50. http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T 50

  51. http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T 51

  52. https://www.raiplay.it/video/2016/04/Agor224-del-08042016-4d84cebb-472c-442c-82e0-df25c7e4d0ce.html 52

  53. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 53

  54. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 54

  55. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 55

  56. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 56

  57. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 57

  58. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 58

  59. LIES, DAMNED LIES 
 AND SIGNIFICANCE

  60. ? Significant = Important 60

  61. Statistically Significant Results 61

  62. Statistically Significant Results • We are quite sure they are reliable (not by chance) • Maybe they’re not “big” • Maybe they’re not important • Maybe they’re not useful for decision making 62

  63. p-values 63

  64. https://en.wikipedia.org/wiki/Misunderstandings_of_p-values 64

  65. p-values • Probability of observing our results (or more extreme) when the null hypothesis is true • Probability, not certainty • Often p < 0.05 (arbitrary) • Can we afford to be fooled by randomness 
 every 1 time out of 20? 65

  66. Data dredging 66

  67. 67

  68. Data dredging • a.k.a. Data fishing or p-hacking • Convention: formulate hypothesis, collect data, prove/disprove hypothesis • Data dredging: look for patterns until something statistically significant comes up • Looking for patterns is ok 
 Testing the hypothesis on the same data set is not 68

  69. SUMMARY

  70. “Everybody lies” — Dr. House 70

  71. • Good Science ™ vs. Big headlines • Nobody is immune • Ask questions: What is the context? Who’s paying? What’s missing? • … “so what?” 71

  72. THANK YOU @MarcoBonzanini speakerdeck.com/marcobonzanini GitHub.com/bonzanini marcobonzanini.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend