Predicting voter turnout from survey data Julia Silge Data - - PowerPoint PPT Presentation

predicting voter turnout from survey data
SMART_READER_LITE
LIVE PREVIEW

Predicting voter turnout from survey data Julia Silge Data - - PowerPoint PPT Presentation

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Predicting voter turnout from survey data Julia Silge Data Scientist at Stack Overflow DataCamp Supervised Learning in R: Case Studies Views of the


slide-1
SLIDE 1

DataCamp Supervised Learning in R: Case Studies

Predicting voter turnout from survey data

SUPERVISED LEARNING IN R: CASE STUDIES

Julia Silge

Data Scientist at Stack Overflow

slide-2
SLIDE 2

DataCamp Supervised Learning in R: Case Studies

Views of the Electorate Research Survey (VOTER)

Democracy Fund Voter Study Group Politically diverse group of analysts and scholars in the United States Data is freely available

slide-3
SLIDE 3

DataCamp Supervised Learning in R: Case Studies

Views of the Electorate Research Survey (VOTER)

Life in America today for people like you compared to fifty years ago is better? about the same? worse? Was your vote primarily a vote in favor of your choice or was it mostly a vote against his/her opponent? How important are the following issues to you? Crime Immigration The environment Gay rights

slide-4
SLIDE 4

DataCamp Supervised Learning in R: Case Studies

Views of the Electorate Research Survey (VOTER)

slide-5
SLIDE 5

DataCamp Supervised Learning in R: Case Studies

Interpreting integer survey responses

AMERICA IS A FAIR SOCIETY WHERE EVERYONE HAS THE OPPORTUNITY TO GET AHEAD

Response Code Strongly agree 1 Agree 2 Disagree 3 Strongly disagree 4

Learn ! more about the data yourself

slide-6
SLIDE 6

DataCamp Supervised Learning in R: Case Studies

Predicting voter turnout

> voters %>% + count(turnout16_2016) # A tibble: 2 x 2 turnout16_2016 n <fct> <int> 1 Did not vote 264 2 Voted 6428

slide-7
SLIDE 7

DataCamp Supervised Learning in R: Case Studies

Let's get started!

SUPERVISED LEARNING IN R: CASE STUDIES

slide-8
SLIDE 8

DataCamp Supervised Learning in R: Case Studies

VOTE 2016

SUPERVISED LEARNING IN R: CASE STUDIES

Julia Silge

Data Scientist at Stack Overflow

slide-9
SLIDE 9

DataCamp Supervised Learning in R: Case Studies

Exploratory data analysis

Elections don't matter Gay rights are very important Crime is very important Did not vote 55.3% 17.0% 66.3% Voted 34.1% 25.3% 57.6%

slide-10
SLIDE 10

DataCamp Supervised Learning in R: Case Studies

Exploratory data analysis

slide-11
SLIDE 11

DataCamp Supervised Learning in R: Case Studies

Fitting a simple model

slide-12
SLIDE 12

DataCamp Supervised Learning in R: Case Studies

Fitting a simple model

> library(broom) > > simple_glm %>% + tidy() %>% + filter(p.value < 0.05) %>% + arrange(desc(estimate)) term estimate std.error statistic p.value 1 (Intercept) 2.45703562 0.73272138 3.353301 7.985370e-04 2 imiss_a_2016 0.39712084 0.13898678 2.857256 4.273207e-03 3 imiss_l_2016 0.27468893 0.10678119 2.572447 1.009825e-02 4 imiss_q_2016 0.24456695 0.11909335 2.053573 4.001699e-02 5 track_2016 0.24107452 0.12146679 1.984695 4.717843e-02 6 RIGGED_SYSTEM_1_2016 0.23628350 0.08508091 2.777162 5.483579e-03 7 futuretrend_2016 0.21056782 0.07120079 2.957380 3.102651e-03 8 RIGGED_SYSTEM_5_2016 0.19025188 0.09645384 1.972466 4.855648e-02 9 wealth_2016 -0.06940523 0.02634395 -2.634580 8.424157e-03 10 imiss_k_2016 -0.18103020 0.08272555 -2.188323 2.864611e-02 11 econtrend_2016 -0.29536980 0.08722417 -3.386330 7.083422e-04 12 imiss_f_2016 -0.32328040 0.10543220 -3.066240 2.167694e-03 13 imiss_g_2016 -0.33203385 0.07867346 -4.220405 2.438640e-05 14 imiss_n_2016 -0.44161183 0.09003981 -4.904628 9.360434e-07

slide-13
SLIDE 13

DataCamp Supervised Learning in R: Case Studies

Let's build some models!

SUPERVISED LEARNING IN R: CASE STUDIES

slide-14
SLIDE 14

DataCamp Supervised Learning in R: Case Studies

Cross-validation

SUPERVISED LEARNING IN R: CASE STUDIES

Julia Silge

Data Scientist at Stack Overflow

slide-15
SLIDE 15

DataCamp Supervised Learning in R: Case Studies

Cross-validation

Partitioning your data into subsets and using one subset for validation

slide-16
SLIDE 16

DataCamp Supervised Learning in R: Case Studies

Cross-validation

Partitioning your data into subsets and using one subset for validation

method = "cv" method = "repeatedcv"

slide-17
SLIDE 17

DataCamp Supervised Learning in R: Case Studies

slide-18
SLIDE 18

DataCamp Supervised Learning in R: Case Studies

slide-19
SLIDE 19

DataCamp Supervised Learning in R: Case Studies

slide-20
SLIDE 20

DataCamp Supervised Learning in R: Case Studies

slide-21
SLIDE 21

DataCamp Supervised Learning in R: Case Studies

Cross-validation

Repeated cross-validation can take a long time Parallel processing can be worth it

slide-22
SLIDE 22

DataCamp Supervised Learning in R: Case Studies

Let's practice!

SUPERVISED LEARNING IN R: CASE STUDIES

slide-23
SLIDE 23

DataCamp Supervised Learning in R: Case Studies

Comparing model performance

SUPERVISED LEARNING IN R: CASE STUDIES

Julia Silge

Data Scientist at Stack Overflow

slide-24
SLIDE 24

DataCamp Supervised Learning in R: Case Studies

Confusion matrix

> confusionMatrix(predict(fit_glm, training), + training$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 149 1633 Voted 63 3510 Accuracy : 0.6833 95% CI : (0.6706, 0.6957) No Information Rate : 0.9604 P-Value [Acc > NIR] : 1 Kappa : 0.0847 Mcnemar's Test P-Value : <2e-16 Sensitivity : 0.70283 Specificity : 0.68248 Pos Pred Value : 0.08361 Neg Pred Value : 0.98237 Prevalence : 0.03959 Detection Rate : 0.02782 Detection Prevalence : 0.33277 Balanced Accuracy : 0.69266 'P iti ' Cl Did t t

slide-25
SLIDE 25

DataCamp Supervised Learning in R: Case Studies

Confusion matrix

> confusionMatrix(predict(fit_rf, training), + training$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 212 5 Voted 0 5138 Accuracy : 0.9991 95% CI : (0.9978, 0.9997) No Information Rate : 0.9604 P-Value [Acc > NIR] : < 2e-16 Kappa : 0.9879 Mcnemar's Test P-Value : 0.07364 Sensitivity : 1.00000 Specificity : 0.99903 Pos Pred Value : 0.97696 Neg Pred Value : 1.00000 Prevalence : 0.03959 Detection Rate : 0.03959 Detection Prevalence : 0.04052 Balanced Accuracy : 0.99951 'P iti ' Cl Did t t

slide-26
SLIDE 26

DataCamp Supervised Learning in R: Case Studies

Confusion matrix for the testing data

> confusionMatrix(predict(fit_glm, testing), + testing$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 37 428 Voted 15 857 Accuracy : 0.6687 95% CI : (0.6427, 0.6939) No Information Rate : 0.9611 P-Value [Acc > NIR] : 1 Kappa : 0.0787 Mcnemar's Test P-Value : <2e-16 Sensitivity : 0.71154 Specificity : 0.66693 Pos Pred Value : 0.07957 Neg Pred Value : 0.98280 Prevalence : 0.03889 Detection Rate : 0.02767 Detection Prevalence : 0.34779 Balanced Accuracy : 0.68923 'P iti ' Cl Did t t

slide-27
SLIDE 27

DataCamp Supervised Learning in R: Case Studies

Confusion matrix for the testing data

> confusionMatrix(predict(fit_rf, testing), + testing$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 0 14 Voted 52 1271 Accuracy : 0.9506 95% CI : (0.9376, 0.9616) No Information Rate : 0.9611 P-Value [Acc > NIR] : 0.9767 Kappa : -0.0168 Mcnemar's Test P-Value : 5.254e-06 Sensitivity : 0.00000 Specificity : 0.98911 Pos Pred Value : 0.00000 Neg Pred Value : 0.96070 Prevalence : 0.03889 Detection Rate : 0.00000 Detection Prevalence : 0.01047 Balanced Accuracy : 0.49455 'P iti ' Cl Did t t

slide-28
SLIDE 28

DataCamp Supervised Learning in R: Case Studies

Comparing model performance

> library(yardstick) > > sens(testing_results, truth = turnout16_2016, estimate = `Logistic regression` [1] 0.7115385 > > spec(testing_results, truth = turnout16_2016, estimate = `Logistic regression` [1] 0.6669261 > > sens(testing_results, truth = turnout16_2016, estimate = `Random forest`) [1] 0 > > spec(testing_results, truth = turnout16_2016, estimate = `Random forest`) [1] 0.9891051

slide-29
SLIDE 29

DataCamp Supervised Learning in R: Case Studies

Let's finish this case study!

SUPERVISED LEARNING IN R: CASE STUDIES