DataCamp Analyzing Election and Polling Data in R
Elections and Political Parties
ANALYZING ELECTION AND POLLING DATA IN R
- G. Elliott Morris
Elections and Political Parties G. Elliott Morris Data Journalist - - PowerPoint PPT Presentation
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Elections and Political Parties G. Elliott Morris Data Journalist DataCamp Analyzing Election and Polling Data in R Measuring party support: the
DataCamp Analyzing Election and Polling Data in R
ANALYZING ELECTION AND POLLING DATA IN R
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
> head(generic_ballot) Date Democrats Republicans ElecYear ElecDay 7/4/1945 44 31 1946 11/5/1946 7/19/1945 38 31 1946 11/5/1946 10/23/1945 36 51 1946 11/5/1946 11/28/1945 40 34 1946 11/5/1946 1/10/1946 40 34 1946 11/5/1946 DaysTilED DemVote RepVote 489 45 53 474 45 53 378 45 53 342 45 53 299 45 53 > nrow(generic_ballot) # the number of observations [1] 2559 > length(generic_ballot) # the number of variables [1] 8
DataCamp Analyzing Election and Polling Data in R
library(lubridate) library(tidyverse) ggplot(generic_ballot,aes(x=mdy(Date),y=Democrats)) + geom_point()
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
select() # selects columns filter() # filters dataset to value(s) of variable(s) group_by() # groups a dataset by unique observations of variable(s) summarise() # summarises a variable from a grouped dataset with an # aggregation function, like mean()
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
ANALYZING ELECTION AND POLLING DATA IN R
DataCamp Analyzing Election and Polling Data in R
ANALYZING ELECTION AND POLLING DATA IN R
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
data %>% group_by(year, month) data %>% group_by(year, month) %>% summarise(support = mean(support)) # with ggplot()!
DataCamp Analyzing Election and Polling Data in R
ggplot(data,aes(x=month, y=support)) ggplot(data,aes(x=month, y=support)) geom_point()
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
ggplot(data,aes(x=month,y=support)) + geom_point() + geom_smooth(span=0.2)
DataCamp Analyzing Election and Polling Data in R
ANALYZING ELECTION AND POLLING DATA IN R
DataCamp Analyzing Election and Polling Data in R
ANALYZING ELECTION AND POLLING DATA IN R
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
poll_error <- generic_ballot %>% mutate(Democrats_Poll_Margin = Democrats - Republicans, Democrats_Vote_Margin = Democrats_vote - Republicans_vote) poll_error <- poll_error %>% group_by(Year) %>% summarise(Democrats_Poll_Margin = mean(Democrats_Poll_Margin), Democrats_Vote_Margin = mean(Democrats_Vote_Margin)) poll_error %>% mutate(error = Dem.Poll.Margin - Dem.Vote.Margin)
DataCamp Analyzing Election and Polling Data in R
rmse <- sqrt(mean(poll_error$error^2)) # multiply it by 1.96 for our 95% CI <- rmse * 1.96 by_year <- poll_error %>% mutate(upper = Dem.Poll.Margin + CI, lower = Dem.Poll.Margin - CI)
DataCamp Analyzing Election and Polling Data in R
# make the ggplot object ggplot(by_year) + # add the polling geom_point() layer ggplot(by_year) + geom_point(aes(x=ElecYear,y=Dem.Poll.Margin,col="Poll")) + # add the results geom_point() layer ggplot(by_year) + geom_point(aes(x=ElecYear,y=Dem.Poll.Margin,col="Poll")) + geom_point(aes(x=ElecYear,y=Dem.Vote.Margin,col="Vote")) + # add the error geom_errorbar() layer ggplot(by_year) + geom_point(aes(x=ElecYear,y=Dem.Poll.Margin,col="Poll")) + geom_point(aes(x=ElecYear,y=Dem.Vote.Margin,col="Vote")) + geom_errorbar(aes(x=ElecYear,ymin=lower, ymax=upper))
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
ANALYZING ELECTION AND POLLING DATA IN R
DataCamp Analyzing Election and Polling Data in R
ANALYZING ELECTION AND POLLING DATA IN R
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R
# specify and evaluate the model model <- lm(Dem.Vote.Margin ~ Dem.Poll.Margin, by_year) summary(model) Call: lm(formula = Dem.Vote.Margin ~ Dem.Poll.Margin, data = by_year) Residuals: Min 1Q Median 3Q Max
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.8109 1.0886 -2.582 0.024000 * Dem.Poll.Margin 0.8031 0.1607 4.998 0.000311 ***
Residual standard error: 3.389 on 12 degrees of freedom Multiple R-squared: 0.6755, Adjusted R-squared: 0.6484 F-statistic: 24.98 on 1 and 12 DF, p-value: 0.0003105
DataCamp Analyzing Election and Polling Data in R
ANALYZING ELECTION AND POLLING DATA IN R