Predicting voter turnout from survey data Julia Silge Data - PowerPoint PPT Presentation

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Predicting voter turnout from survey data Julia Silge Data Scientist at Stack Overflow

DataCamp Supervised Learning in R: Case Studies Views of the Electorate Research Survey (VOTER) Democracy Fund Voter Study Group Politically diverse group of analysts and scholars in the United States Data is freely available

DataCamp Supervised Learning in R: Case Studies Views of the Electorate Research Survey (VOTER) Life in America today for people like you compared to fifty years ago is better? about the same? worse? Was your vote primarily a vote in favor of your choice or was it mostly a vote against his/her opponent? How important are the following issues to you? Crime Immigration The environment Gay rights

DataCamp Supervised Learning in R: Case Studies Views of the Electorate Research Survey (VOTER)

DataCamp Supervised Learning in R: Case Studies Interpreting integer survey responses AMERICA IS A FAIR SOCIETY WHERE EVERYONE HAS THE OPPORTUNITY TO GET AHEAD Response Code Strongly agree 1 Agree 2 Disagree 3 Strongly disagree 4 Learn more about the data yourself !

DataCamp Supervised Learning in R: Case Studies Predicting voter turnout > voters %>% + count(turnout16_2016) # A tibble: 2 x 2 turnout16_2016 n <fct> <int> 1 Did not vote 264 2 Voted 6428

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's get started!

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES VOTE 2016 Julia Silge Data Scientist at Stack Overflow

DataCamp Supervised Learning in R: Case Studies Exploratory data analysis Elections don't matter Gay rights are very important Crime is very important Did not vote 55.3% 17.0% 66.3% Voted 34.1% 25.3% 57.6%

DataCamp Supervised Learning in R: Case Studies Exploratory data analysis

DataCamp Supervised Learning in R: Case Studies Fitting a simple model

DataCamp Supervised Learning in R: Case Studies Fitting a simple model > library(broom) > > simple_glm %>% + tidy() %>% + filter(p.value < 0.05) %>% + arrange(desc(estimate)) term estimate std.error statistic p.value 1 (Intercept) 2.45703562 0.73272138 3.353301 7.985370e-04 2 imiss_a_2016 0.39712084 0.13898678 2.857256 4.273207e-03 3 imiss_l_2016 0.27468893 0.10678119 2.572447 1.009825e-02 4 imiss_q_2016 0.24456695 0.11909335 2.053573 4.001699e-02 5 track_2016 0.24107452 0.12146679 1.984695 4.717843e-02 6 RIGGED_SYSTEM_1_2016 0.23628350 0.08508091 2.777162 5.483579e-03 7 futuretrend_2016 0.21056782 0.07120079 2.957380 3.102651e-03 8 RIGGED_SYSTEM_5_2016 0.19025188 0.09645384 1.972466 4.855648e-02 9 wealth_2016 -0.06940523 0.02634395 -2.634580 8.424157e-03 10 imiss_k_2016 -0.18103020 0.08272555 -2.188323 2.864611e-02 11 econtrend_2016 -0.29536980 0.08722417 -3.386330 7.083422e-04 12 imiss_f_2016 -0.32328040 0.10543220 -3.066240 2.167694e-03 13 imiss_g_2016 -0.33203385 0.07867346 -4.220405 2.438640e-05 14 imiss_n_2016 -0.44161183 0.09003981 -4.904628 9.360434e-07

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's build some models!

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Cross-validation Julia Silge Data Scientist at Stack Overflow

DataCamp Supervised Learning in R: Case Studies Cross-validation Partitioning your data into subsets and using one subset for validation

DataCamp Supervised Learning in R: Case Studies Cross-validation Partitioning your data into subsets and using one subset for validation method = "cv" method = "repeatedcv"

DataCamp Supervised Learning in R: Case Studies

DataCamp Supervised Learning in R: Case Studies Cross-validation Repeated cross-validation can take a long time Parallel processing can be worth it

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's practice!

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Comparing model performance Julia Silge Data Scientist at Stack Overflow

DataCamp Supervised Learning in R: Case Studies Confusion matrix > confusionMatrix(predict(fit_glm, training), + training$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 149 1633 Voted 63 3510 Accuracy : 0.6833 95% CI : (0.6706, 0.6957) No Information Rate : 0.9604 P-Value [Acc > NIR] : 1 Kappa : 0.0847 Mcnemar's Test P-Value : <2e-16 Sensitivity : 0.70283 Specificity : 0.68248 Pos Pred Value : 0.08361 Neg Pred Value : 0.98237 Prevalence : 0.03959 Detection Rate : 0.02782 Detection Prevalence : 0.33277 Balanced Accuracy : 0.69266 'P iti ' Cl Did t t

DataCamp Supervised Learning in R: Case Studies Confusion matrix > confusionMatrix(predict(fit_rf, training), + training$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 212 5 Voted 0 5138 Accuracy : 0.9991 95% CI : (0.9978, 0.9997) No Information Rate : 0.9604 P-Value [Acc > NIR] : < 2e-16 Kappa : 0.9879 Mcnemar's Test P-Value : 0.07364 Sensitivity : 1.00000 Specificity : 0.99903 Pos Pred Value : 0.97696 Neg Pred Value : 1.00000 Prevalence : 0.03959 Detection Rate : 0.03959 Detection Prevalence : 0.04052 Balanced Accuracy : 0.99951 'P iti ' Cl Did t t

DataCamp Supervised Learning in R: Case Studies Confusion matrix for the testing data > confusionMatrix(predict(fit_glm, testing), + testing$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 37 428 Voted 15 857 Accuracy : 0.6687 95% CI : (0.6427, 0.6939) No Information Rate : 0.9611 P-Value [Acc > NIR] : 1 Kappa : 0.0787 Mcnemar's Test P-Value : <2e-16 Sensitivity : 0.71154 Specificity : 0.66693 Pos Pred Value : 0.07957 Neg Pred Value : 0.98280 Prevalence : 0.03889 Detection Rate : 0.02767 Detection Prevalence : 0.34779 Balanced Accuracy : 0.68923 'P iti ' Cl Did t t

DataCamp Supervised Learning in R: Case Studies Confusion matrix for the testing data > confusionMatrix(predict(fit_rf, testing), + testing$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 0 14 Voted 52 1271 Accuracy : 0.9506 95% CI : (0.9376, 0.9616) No Information Rate : 0.9611 P-Value [Acc > NIR] : 0.9767 Kappa : -0.0168 Mcnemar's Test P-Value : 5.254e-06 Sensitivity : 0.00000 Specificity : 0.98911 Pos Pred Value : 0.00000 Neg Pred Value : 0.96070 Prevalence : 0.03889 Detection Rate : 0.00000 Detection Prevalence : 0.01047 Balanced Accuracy : 0.49455 'P iti ' Cl Did t t

DataCamp Supervised Learning in R: Case Studies Comparing model performance > library(yardstick) > > sens(testing_results, truth = turnout16_2016, estimate = `Logistic regression` [1] 0.7115385 > > spec(testing_results, truth = turnout16_2016, estimate = `Logistic regression` [1] 0.6669261 > > sens(testing_results, truth = turnout16_2016, estimate = `Random forest`) [1] 0 > > spec(testing_results, truth = turnout16_2016, estimate = `Random forest`) [1] 0.9891051

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's finish this case study!

Predicting voter turnout from survey data Julia Silge Data - PowerPoint PPT Presentation

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Predicting voter turnout from survey data Julia Silge Data Scientist at Stack Overflow DataCamp Supervised Learning in R: Case Studies Views of the

Game Theory and Voter Turnout By Abhishek Dhankar Motivation Election Turnout Prediction

The CHRNA6 gene, patience, and voter turnout Peter Loewen (Toronto) Christopher Dawes (UCSD)

Voter turnout, political power and community well-being Objectives for today 1. Why voting

VOTER INFORMATION SEMINAR Lee County Presenter Jason Schrader New Voter ID Laws Provisional

Voter Suppression in Georgia Slide 1: Title Slide Slide 2: Overview Slide 3: Voter Suppression

Voter Engagement As a Tool to Stop the Sweeps! August 28, 2018 Our Homes, Our Votes Our Homes,

Data Mining for Potential Voter Fraud Findings and Recommendations Does voter fraud exist?

Presentation of AS Elections Results 2017 Voter Turnout Number of eligible voters 13,571 Total

Voter Turnout with Peer Punishment Noise, High Stakes Elections and Bimodality David K. Levine

Maricopa County Elections p y Department Community Network Voter Assistance Voter Assistance

Voter Registra tjon Prese n ta tjon 2013 Maricopa Cou n ty Recorder Electjon s Voter

Voter Registration Presentation 2014 Maricopa County Recorder Elections Voter Registration

Voter Registration 2014 Maricopa County Community Network April 30 th , 2014 Topics Voter

#ChooseTheFuture Voter registration Non-partisan 501(c)(3) There is a voter registration crisis.

Effective G Get out t the Vot ote S Str trategie ies Direct voter contact is key to

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Evaluation learning algorithm ? Do you want to predict accuracy or predict Charles Sutton

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Bernoulli, Ramanujan, Toeplitz e le matrici triangolari Carmine Di Fiore, Francesco Tudisco, Paolo

General Techniques for Constructing Variational Integrators Melvin Leok Mathematics, University

Natural Language Processing (CSE 490U): Text Classification Noah Smith 2017 c University of

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of

Training Global Linear Models for Chinese Word Segmentation Dong Song and Anoop Sarkar Natural

Leveraging a Corpus of Natural Language Descriptions for Program Similarity Meital Zilberstein