DataCamp Supervised Learning in R: Case Studies
Predicting voter turnout from survey data
SUPERVISED LEARNING IN R: CASE STUDIES
Predicting voter turnout from survey data Julia Silge Data - - PowerPoint PPT Presentation
DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Predicting voter turnout from survey data Julia Silge Data Scientist at Stack Overflow DataCamp Supervised Learning in R: Case Studies Views of the
DataCamp Supervised Learning in R: Case Studies
SUPERVISED LEARNING IN R: CASE STUDIES
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
Response Code Strongly agree 1 Agree 2 Disagree 3 Strongly disagree 4
DataCamp Supervised Learning in R: Case Studies
> voters %>% + count(turnout16_2016) # A tibble: 2 x 2 turnout16_2016 n <fct> <int> 1 Did not vote 264 2 Voted 6428
DataCamp Supervised Learning in R: Case Studies
SUPERVISED LEARNING IN R: CASE STUDIES
DataCamp Supervised Learning in R: Case Studies
SUPERVISED LEARNING IN R: CASE STUDIES
DataCamp Supervised Learning in R: Case Studies
Elections don't matter Gay rights are very important Crime is very important Did not vote 55.3% 17.0% 66.3% Voted 34.1% 25.3% 57.6%
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
> library(broom) > > simple_glm %>% + tidy() %>% + filter(p.value < 0.05) %>% + arrange(desc(estimate)) term estimate std.error statistic p.value 1 (Intercept) 2.45703562 0.73272138 3.353301 7.985370e-04 2 imiss_a_2016 0.39712084 0.13898678 2.857256 4.273207e-03 3 imiss_l_2016 0.27468893 0.10678119 2.572447 1.009825e-02 4 imiss_q_2016 0.24456695 0.11909335 2.053573 4.001699e-02 5 track_2016 0.24107452 0.12146679 1.984695 4.717843e-02 6 RIGGED_SYSTEM_1_2016 0.23628350 0.08508091 2.777162 5.483579e-03 7 futuretrend_2016 0.21056782 0.07120079 2.957380 3.102651e-03 8 RIGGED_SYSTEM_5_2016 0.19025188 0.09645384 1.972466 4.855648e-02 9 wealth_2016 -0.06940523 0.02634395 -2.634580 8.424157e-03 10 imiss_k_2016 -0.18103020 0.08272555 -2.188323 2.864611e-02 11 econtrend_2016 -0.29536980 0.08722417 -3.386330 7.083422e-04 12 imiss_f_2016 -0.32328040 0.10543220 -3.066240 2.167694e-03 13 imiss_g_2016 -0.33203385 0.07867346 -4.220405 2.438640e-05 14 imiss_n_2016 -0.44161183 0.09003981 -4.904628 9.360434e-07
DataCamp Supervised Learning in R: Case Studies
SUPERVISED LEARNING IN R: CASE STUDIES
DataCamp Supervised Learning in R: Case Studies
SUPERVISED LEARNING IN R: CASE STUDIES
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
method = "cv" method = "repeatedcv"
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
DataCamp Supervised Learning in R: Case Studies
SUPERVISED LEARNING IN R: CASE STUDIES
DataCamp Supervised Learning in R: Case Studies
SUPERVISED LEARNING IN R: CASE STUDIES
DataCamp Supervised Learning in R: Case Studies
> confusionMatrix(predict(fit_glm, training), + training$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 149 1633 Voted 63 3510 Accuracy : 0.6833 95% CI : (0.6706, 0.6957) No Information Rate : 0.9604 P-Value [Acc > NIR] : 1 Kappa : 0.0847 Mcnemar's Test P-Value : <2e-16 Sensitivity : 0.70283 Specificity : 0.68248 Pos Pred Value : 0.08361 Neg Pred Value : 0.98237 Prevalence : 0.03959 Detection Rate : 0.02782 Detection Prevalence : 0.33277 Balanced Accuracy : 0.69266 'P iti ' Cl Did t t
DataCamp Supervised Learning in R: Case Studies
> confusionMatrix(predict(fit_rf, training), + training$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 212 5 Voted 0 5138 Accuracy : 0.9991 95% CI : (0.9978, 0.9997) No Information Rate : 0.9604 P-Value [Acc > NIR] : < 2e-16 Kappa : 0.9879 Mcnemar's Test P-Value : 0.07364 Sensitivity : 1.00000 Specificity : 0.99903 Pos Pred Value : 0.97696 Neg Pred Value : 1.00000 Prevalence : 0.03959 Detection Rate : 0.03959 Detection Prevalence : 0.04052 Balanced Accuracy : 0.99951 'P iti ' Cl Did t t
DataCamp Supervised Learning in R: Case Studies
> confusionMatrix(predict(fit_glm, testing), + testing$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 37 428 Voted 15 857 Accuracy : 0.6687 95% CI : (0.6427, 0.6939) No Information Rate : 0.9611 P-Value [Acc > NIR] : 1 Kappa : 0.0787 Mcnemar's Test P-Value : <2e-16 Sensitivity : 0.71154 Specificity : 0.66693 Pos Pred Value : 0.07957 Neg Pred Value : 0.98280 Prevalence : 0.03889 Detection Rate : 0.02767 Detection Prevalence : 0.34779 Balanced Accuracy : 0.68923 'P iti ' Cl Did t t
DataCamp Supervised Learning in R: Case Studies
> confusionMatrix(predict(fit_rf, testing), + testing$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 0 14 Voted 52 1271 Accuracy : 0.9506 95% CI : (0.9376, 0.9616) No Information Rate : 0.9611 P-Value [Acc > NIR] : 0.9767 Kappa : -0.0168 Mcnemar's Test P-Value : 5.254e-06 Sensitivity : 0.00000 Specificity : 0.98911 Pos Pred Value : 0.00000 Neg Pred Value : 0.96070 Prevalence : 0.03889 Detection Rate : 0.00000 Detection Prevalence : 0.01047 Balanced Accuracy : 0.49455 'P iti ' Cl Did t t
DataCamp Supervised Learning in R: Case Studies
> library(yardstick) > > sens(testing_results, truth = turnout16_2016, estimate = `Logistic regression` [1] 0.7115385 > > spec(testing_results, truth = turnout16_2016, estimate = `Logistic regression` [1] 0.6669261 > > sens(testing_results, truth = turnout16_2016, estimate = `Random forest`) [1] 0 > > spec(testing_results, truth = turnout16_2016, estimate = `Random forest`) [1] 0.9891051
DataCamp Supervised Learning in R: Case Studies
SUPERVISED LEARNING IN R: CASE STUDIES