The House of Representatives in 2018 G. Elliott Morris Data - - PowerPoint PPT Presentation

the house of representatives in 2018
SMART_READER_LITE
LIVE PREVIEW

The House of Representatives in 2018 G. Elliott Morris Data - - PowerPoint PPT Presentation

DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R The House of Representatives in 2018 G. Elliott Morris Data Journalist DataCamp Analyzing Election and Polling Data in R Political prediction as a


slide-1
SLIDE 1

DataCamp Analyzing Election and Polling Data in R

The House of Representatives in 2018

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-2
SLIDE 2

DataCamp Analyzing Election and Polling Data in R

Political prediction as a case study: why?

Why use political prediction as a case study? In data science: Helps you practice data cleaning, wrangling, modeling, and visualizing skills all at once Help you understand the limits of (basic) predictive modeling In politics: Craft fine-tuned expectations about what may happen in upcoming elections, allowing you to better anticipate outcomes

slide-3
SLIDE 3

DataCamp Analyzing Election and Polling Data in R

The US House

The House of Representatives Made up of 435 individual voting members from all the states in America All members are up for election every two years

slide-4
SLIDE 4

DataCamp Analyzing Election and Polling Data in R

Tools you will need

Exercise 1: What might happen in 2018?

filter() polls_2018 %>% filter(date > "2018-06-01") mutate() polls_2018 %>% mutate(Dem.Margin = Dem - Rep) pull() polls_2018 %>% pull(Dem.Margin) mean() mean(polls_2018$Dem.Margin)

slide-5
SLIDE 5

DataCamp Analyzing Election and Polling Data in R

Tools you will need

Exercise 2: Historical polling averages from August and September since 1980

filter() polls %>% filter(month(date) %in% c(8,9)) group_by() polls %>% group_by(year) summarise() polls %>% group_by(year) %>% summarise(avg = mean(Dem.Margin)

slide-6
SLIDE 6

DataCamp Analyzing Election and Polling Data in R

Let's practice!

ANALYZING ELECTION AND POLLING DATA IN R

slide-7
SLIDE 7

DataCamp Analyzing Election and Polling Data in R

Training a Model to Predict the Future with Polls

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-8
SLIDE 8

DataCamp Analyzing Election and Polling Data in R

From tidying to modeling

From http://r4ds.had.co.nz/model-intro.html

slide-9
SLIDE 9

DataCamp Analyzing Election and Polling Data in R

The original model

lm(Dem.Vote.Margin ~ Dem.Poll.Margin)

slide-10
SLIDE 10

DataCamp Analyzing Election and Polling Data in R

A multivariate model

ggplot(generic_ballot,aes(x=Dem.Poll.Margin,y=Dem.Vote.Margin, col=party_in_power) + geom_text(aes(label=ElecYear)) + geom_smooth(method='lm')

slide-11
SLIDE 11

DataCamp Analyzing Election and Polling Data in R

A multivariate model

model <- lm(Dem.Vote.Margin ~ Dem.Poll.Margin + party_in_power, data=polls_predict) summary(model) Call: lm(formula = Dem.Vote.Margin ~ Dem.Poll.Margin + party_in_power, data = polls_predict) Residuals: Min 1Q Median 3Q Max

  • 4.3893 -2.4283 -0.2004 2.4982 4.6166

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.1168 1.1244 -1.883 0.078079 . Dem.Poll.Margin 0.8856 0.2070 4.278 0.000577 *** party_in_power -2.1348 0.8809 -2.423 0.027601 *

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.238 on 16 degrees of freedom Multiple R-squared: 0.7498, Adjusted R-squared: 0.7185 F-statistic: 23.98 on 2 and 16 DF, p-value: 1.535e-05

slide-12
SLIDE 12

DataCamp Analyzing Election and Polling Data in R

Predictions on new data

predict(model, data.frame(Dem.Poll.Margin = 8, party_in_power=-1)) 1 7.102972

slide-13
SLIDE 13

DataCamp Analyzing Election and Polling Data in R

Margins of error

Margin of error = model error ∗ 1.96 From: http://www.icse.xyz/msor/ssim/SDandCI.html

slide-14
SLIDE 14

DataCamp Analyzing Election and Polling Data in R

Calculating a margin of error

Generic root-mean-square error formula: With our poling data: In-sample MoE < out-of-sample MoE The latter should be used when available

sqrt(mean(c(model$fitted.values - data$actual_results)^2)) * 1.96 sqrt(mean(c(model$fitted.values - polls_predict$Dem.Vote.Margin)^2)) *1.96 [1] 5.823251

slide-15
SLIDE 15

DataCamp Analyzing Election and Polling Data in R

Your turn!

ANALYZING ELECTION AND POLLING DATA IN R

slide-16
SLIDE 16

DataCamp Analyzing Election and Polling Data in R

The Presidency in 2020

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-17
SLIDE 17

DataCamp Analyzing Election and Polling Data in R

A final applied example

US presidential elections: Decided by the Electoral College, which gives disproportionate representation to every state The count of everyone's ballot is called the popular vote Possible for a president to win the presidency without winning the popular vote: 2016, 2000, 1888, 1876, and 1824 We're going to predict the popular vote, but keep in mind, we might not predict the Electoral College very well.

slide-18
SLIDE 18

DataCamp Analyzing Election and Polling Data in R

Who wins the popular vote?

"Time For Change" model created by Professor Alan Abramowitz. Model for predicting the popular vote Presidential elections can be predicted with: Presidential approval ratings Economic growth How long the White House has been controlled by one party instead of the other

slide-19
SLIDE 19

DataCamp Analyzing Election and Polling Data in R

Training the model

To predict:

vote_share: Vote share for the president's party

Three input variables:

pres_approve: Presidential approval q2_gdp: annual GDP growth from quarter two two_plus_terms: Term length

lm(vote_share ~ pres_approve + q2_gdp + two_plus_terms, pres_elecs)

slide-20
SLIDE 20

DataCamp Analyzing Election and Polling Data in R

Performance of the model

ggplot(pres_elecs,aes(x=predict,y=vote_share,label=Year)) + geom_abline() + geom_text()

slide-21
SLIDE 21

DataCamp Analyzing Election and Polling Data in R

Performance of the model

Calculate the model's margin of error:

# calculate the model's root-mean-square error sqrt(mean(c(pres_elecs$predict-pres_elecs$vote_share)^2)) * 1.96 [1] 3.273301

slide-22
SLIDE 22

DataCamp Analyzing Election and Polling Data in R

The states...

slide-23
SLIDE 23

DataCamp Analyzing Election and Polling Data in R

Your turn!

ANALYZING ELECTION AND POLLING DATA IN R

slide-24
SLIDE 24

DataCamp Analyzing Election and Polling Data in R

Congratulations!

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-25
SLIDE 25

DataCamp Analyzing Election and Polling Data in R

What you learned

Chapter 1: approval polls dplyr: select, filter, mutate,group_by, summarise zoo: rollmean ggplot2: ggplot, geom_line, geom_point Chapter 2: US House elections polls

lm

ggplot: geom_smoooth()

slide-26
SLIDE 26

DataCamp Analyzing Election and Polling Data in R

What you learned

Chapter 3: election results and Brexit

choroplethr

regression for analyzing relationships between data Chapter 4: prediction and applied examples multivariate regression

ggplot for showing the relationship between three variables

making predictions on new data

slide-27
SLIDE 27

DataCamp Analyzing Election and Polling Data in R

What's next?

Data Camp: Learn more about the tidyverse Learn more about ggplot2 Reading: R for Data Science by Garrett Grolemund and Hadley Wickham Work! Connect with the #rstats community online Learn by doing!

slide-28
SLIDE 28

DataCamp Analyzing Election and Polling Data in R

Congratulations, and thanks!

ANALYZING ELECTION AND POLLING DATA IN R