statistical inference via data science a tidy approach
play

Statistical inference via data science: A "tidy" approach - PowerPoint PPT Presentation

Statistical inference via data science: A "tidy" approach Albert Y. Kim Joint Math Meetings Denver CO, USA January 18, 2020 Slides available at twitter.com/rudeboybert Statistical inference via data science 2 What


  1. Statistical inference via data science: 
 A "tidy" approach Albert Y. Kim Joint Math Meetings 
 Denver CO, USA 
 January 18, 2020 Slides available at twitter.com/rudeboybert

  2. Statistical inference via 
 data science… 2

  3. What is the tidyverse? From: tidyverse.org 3

  4. Why use the tidyverse? 1. It encourages students to “play the whole game” 2. It’s transferable 3. It bridges the gap between tools for learning statistics & tools for doing statistics 4

  5. 1. It encourages students to “play the whole game” • Emphasize exploratory data analysis (EDA) • “To (data) wrangle or not to wrangle? That is the question” • IMO to do no data wrangling betrays true nature of the work From: YouTube, r4ds (2017), Perkins (2009) 5

  6. 2.a) It transfers: Data visualization From: Wilkinson (2005), ggplot2 package, TechCrunch 6

  7. 2.b) It transfers: Data wrangling Normal forms & database normalization From: Codd (1970) 7

  8. 3. It bridges the gap between tools for learning statistics & tools for doing statistics tidyverse design principle #4: 
 Design for humans From: McNamara (2015), Robinson blogpost, tidy tools manifesto 8

  9. Using the tidyverse in intro stats assuming no prior algebra nor coding 1. Statistical modeling 2. Statistical inference 9

  10. EDA to Motivate Statistical Modeling Question: Are there demographic differences in teaching evaluations? From: Chance Magazine 10

  11. EDA to Motivate Model Selection 11

  12. EDA to Motivate Statistical Inference A “you don’t need no PhD in Statistics” moment: 
 Question: Is there a difference in response? Versus just saying: “The p-value is 0!” 12

  13. “There is only one test” From: Downey blogpost 13

  14. infer package for “tidy” statistical inference From: Bray, Ismay, Chasnovski, Baumer, and Cetinkaya-Rundel 14

  15. What is mean year of minting of all ! pennies? Using bootstrap resampling with replacement: library (tidyverse) library (infer) pennies_sample %>% specify (response = year) %>% generate (reps = 1000) %>% calculate (stat = "mean") 15

  16. How to make room for the tidyverse In my opinion: • Drop (combinatorics-based) probability theory χ 2 • De-emphasize tests & ANOVA as much as feasible given upstream consequences • Lean on “There is only one test” framework • Drop asymptotic theory in favor of simulation based inference: bootstrap & permutation tests 16

  17. Guiding Paper “Mere Renovation is Too Little Too Late: We Need to Rethink Our Undergraduate Curriculum from the Ground Up” by Cobb (2015) • Make fundamental concepts accessible • Minimize prerequisites to research • Substitute “mathematics” with “computation” as the engine of statistics 17

  18. For more info check out: • Available free online at moderndive.com • Print copies now on sale at Taylor & Francis booth & 
 CRC Press website: Use discount code ASA18 • Slides available at twitter.com/rudeboybert 18

  19. EDA to Motivate Model Selection 2017 Massachusetts Public High School Data 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend