how to lie with statistics
play

How to Lie with Statistics March 3, 2020 Data Science CSCI 1951A - PowerPoint PPT Presentation

How to Lie with Statistics March 3, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter Announcements Today Linear Regression Recap/Follow up P-Hacking, Researcher


  1. How to Lie with Statistics March 3, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter

  2. Announcements

  3. Today • Linear Regression Recap/Follow up • P-Hacking, Researcher Degrees of Freedom

  4. Today • Linear Regression Recap/Follow up • P-Hacking, Researcher Degrees of Freedom

  5. Dummy Variables cholesterol yes breakfast constant meds 20 31 0 1 1 20 5 0 1 1 X = 20 40 0 1 1 why do we 25 18 1 0 1 have to do this? what no breakfast about pseudo- eucalyptus inverse?

  6. statsmodels import statsmodels.api as sm y, X = read_data() X = sm.add_constant(X) model = sm.OLS(y, X) results = model.fit() print(results.summary()) https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html

  7. statsmodels import statsmodels.api as sm import statsmodels.formula.api as smf # M has column headers w/ names M = read_data() X = sm.add_constant(X) eq = “chol ~ eucalyptus + meds + breakfast” model = smf.ols(formula=eq, data=M) results = model.fit() print(results.summary()) https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html

  8. statsmodels import statsmodels.api as sm import statsmodels.formula.api as smf # M has column headers w/ names M = read_data() interaction term X = sm.add_constant(X) eq = “chol ~ eucalyptus + meds + breakfast + eucalyptus:meds” model = smf.ols(formula=eq, data=M) results = model.fit() print(results.summary()) https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html

  9. statsmodels import statsmodels.api as sm import statsmodels.formula.api as smf # M has column headers w/ names M = read_data() squared terms X = sm.add_constant(X) eq = “chol ~ eucalyptus + meds + breakfast + eucalyptus^2” model = smf.ols(formula=eq, data=M) results = model.fit() print(results.summary()) https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html

  10. statsmodels https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html

  11. statsmodels overall fit of model (SSE) https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html

  12. statsmodels coefficients (i.e. effect sizes) https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html

  13. statsmodels p-values https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html

  14. statsmodels p-values https://www.statsmodels.org/dev/examples/notebooks/generated/ols.html https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html

  15. Clicker Question!

  16. Today • Linear Regression Recap/Follow up • P-Hacking, Researcher Degrees of Freedom

  17. You can find almost anything if you look hard enough. Per capita cheese consumption correlates with Number of people who died by becoming tangled in their bedsheets 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 800 deaths 33lbs Bedsheet tanglings Cheese consumed 600 deaths 31.5lbs ρ = 0.95 400 deaths 30lbs 28.5lbs 200 deaths 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Bedsheet tanglings Cheese consumed tylervigen.com https://en.wikipedia.org/wiki/Data_dredging http://www.tylervigen.com/spurious-correlations

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend