the general social s u r v e y
play

The General Social S u r v e y IN FE R E N C E FOR C ATE G OR IC - PowerPoint PPT Presentation

The General Social S u r v e y IN FE R E N C E FOR C ATE G OR IC AL DATA IN R Andre w Bra y Assistant Professor of Statistics at Reed College INFERENCE FOR CATEGORICAL DATA IN R INFERENCE FOR CATEGORICAL DATA IN R INFERENCE FOR CATEGORICAL


  1. The General Social S u r v e y IN FE R E N C E FOR C ATE G OR IC AL DATA IN R Andre w Bra y Assistant Professor of Statistics at Reed College

  2. INFERENCE FOR CATEGORICAL DATA IN R

  3. INFERENCE FOR CATEGORICAL DATA IN R

  4. INFERENCE FOR CATEGORICAL DATA IN R

  5. INFERENCE FOR CATEGORICAL DATA IN R

  6. INFERENCE FOR CATEGORICAL DATA IN R

  7. E x ploring GSS library(dplyr) glimpse(gss) Observations: 3,300 Variables: 25 $ id <dbl> 518, 1092, 2094, 229, 979, 554, 491, 319, 3143, 1... $ year <dbl> 1982, 1982, 1982, 1982, 1982, 1982, 1982, 1982, 1... $ age <fct> 49, 22, 26, 75, 71, 33, 56, 33, 69, 40, 44, 42, 5... $ class <fct> WORKING CLASS, WORKING CLASS, WORKING CLASS, LOWE... $ degree <fct> HIGH SCHOOL, HIGH SCHOOL, HIGH SCHOOL, LT HIGH SC... $ sex <fct> MALE, MALE, MALE, MALE, FEMALE, FEMALE, MALE, FEM... $ happy <fct> HAPPY, HAPPY, HAPPY, HAPPY, HAPPY, HAPPY, HAPPY, ... INFERENCE FOR CATEGORICAL DATA IN R

  8. E x ploring GSS gss2016 <- filter(gss, year == 2016) ggplot(gss2016, aes(x = happy)) + geom_bar() INFERENCE FOR CATEGORICAL DATA IN R

  9. E x ploring GSS gss2016 <- filter(gss, year == 2016) ggplot(gss2016, aes(x = happy)) + geom_bar() INFERENCE FOR CATEGORICAL DATA IN R

  10. E x ploring GSS p_hat <- gss2016 %>% summarize(prop_happy = mean(happy == "HAPPY")) %>% pull() p_hat 0.7733333 INFERENCE FOR CATEGORICAL DATA IN R

  11. General 95% confidence inter v al ( ^ − 2 × SE , ^ + 2 × SE ) p p Sample proportion pl u s or min u s t w o standard errors INFERENCE FOR CATEGORICAL DATA IN R

  12. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  13. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  14. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  15. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  16. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  17. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  18. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  19. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  20. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  21. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  22. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  23. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  24. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  25. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  26. Bootstrap Confidence Inter v al library(infer) boot boot <- gss2016 %>% specify(response = happy, Response: happy (factor) success = “HAPPY”) %>% # A tibble: 500 x 2 generate(reps = 500, replicate stat type = "bootstrap") %>% <int> <dbl> calculate(stat = "prop") 1 1 0.827 2 2 0.740 3 3 0.780 4 4 0.773 5 5 0.747 6 6 0.753 INFERENCE FOR CATEGORICAL DATA IN R

  27. Bootstrap Confidence Inter v al ggplot(boot, aes(x = stat)) + geom_density() INFERENCE FOR CATEGORICAL DATA IN R

  28. Bootstrap Confidence Inter v al SE <- boot %>% summarize(sd(stat)) %>% pull() SE 0.03482251 ( ^ − 2 × SE , ^ + 2 × SE ) p p c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7051883 0.8412784 INFERENCE FOR CATEGORICAL DATA IN R

  29. Let ' s practice ! IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

  30. Interpreting a Confidence Inter v al IN FE R E N C E FOR C ATE G OR IC AL DATA IN R Andre w Bra y Assistant Professor of Statistics at Reed College

  31. Confidence inter v als Concl u sion : the tr u e proportion of Americans that are happ y is bet w een 0.705 and 0.841. What do w e mean b y con � dent ? INFERENCE FOR CATEGORICAL DATA IN R

  32. Dataset 1 ds1 <- filter(gss, year == 2016) p_hat <- ds1 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds1 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7073114 0.8393553 INFERENCE FOR CATEGORICAL DATA IN R

  33. INFERENCE FOR CATEGORICAL DATA IN R

  34. INFERENCE FOR CATEGORICAL DATA IN R

  35. INFERENCE FOR CATEGORICAL DATA IN R

  36. INFERENCE FOR CATEGORICAL DATA IN R

  37. INFERENCE FOR CATEGORICAL DATA IN R

  38. INFERENCE FOR CATEGORICAL DATA IN R

  39. INFERENCE FOR CATEGORICAL DATA IN R

  40. INFERENCE FOR CATEGORICAL DATA IN R

  41. INFERENCE FOR CATEGORICAL DATA IN R

  42. INFERENCE FOR CATEGORICAL DATA IN R

  43. Dataset 2 ds2 <- filter(gss, year == 2014) p_hat <- ds1 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds1 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.8348831 0.9384503 INFERENCE FOR CATEGORICAL DATA IN R

  44. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds1 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds1 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  45. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds3 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds3 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  46. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds3 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds3 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  47. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds3 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds3 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  48. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds3 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds3 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  49. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds3 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds3 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  50. Confidence Inter v als Interpretation : “ We ’ re 95% con � dent that the tr u e proportion of Americans that are happ y is bet w een 0.705 and 0.841.” Width of the inter v al a � ected b y n con � dence le v el p INFERENCE FOR CATEGORICAL DATA IN R

  51. Let ' s practice ! IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

  52. The appro x imation shortc u t IN FE R E N C E FOR C ATE G OR IC AL DATA IN R Andre w Bra y Assistant Professor of Statistics at Reed College

  53. Confidence Inter v als SE Standard errors increase w hen n is small 0.009998905 p is close to 0.5 SE_small_n 0.03809731 SE_low_p 0.00547912 INFERENCE FOR CATEGORICAL DATA IN R

  54. INFERENCE FOR CATEGORICAL DATA IN R

  55. INFERENCE FOR CATEGORICAL DATA IN R

  56. The normal distrib u tion A . K . A the " bell c u r v e ". If obser v ations are independent n is large Then ^ p follo w s a normal distrib u tion INFERENCE FOR CATEGORICAL DATA IN R

  57. Standard de v iation √ ^ × (1 − ^ ) p p n INFERENCE FOR CATEGORICAL DATA IN R

  58. Assessing model ass u mptions Ho w do I check " obser v ations are independent "? This depends u pon the data collection method . What does " n is large " mean ? n × ^ > 10 p n × (1 − ^ ) > 10 p INFERENCE FOR CATEGORICAL DATA IN R

  59. Calc u lating standard error : appro x imation p_hat <- gss2016 %>% summarize(mean(happy == "HAPPY")) %>% pull() n <- nrow(gss2016) c(n * p_hat, n * (1 - p_hat)) 116 35 SE_approx <- sqrt(p_hat * (1 - p_hat) / n) SE_approx 0.03418468 INFERENCE FOR CATEGORICAL DATA IN R

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend