Elections and Political Parties G. Elliott Morris Data Journalist - - PowerPoint PPT Presentation

elections and political parties
SMART_READER_LITE
LIVE PREVIEW

Elections and Political Parties G. Elliott Morris Data Journalist - - PowerPoint PPT Presentation

DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Elections and Political Parties G. Elliott Morris Data Journalist DataCamp Analyzing Election and Polling Data in R Measuring party support: the


slide-1
SLIDE 1

DataCamp Analyzing Election and Polling Data in R

Elections and Political Parties

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-2
SLIDE 2

DataCamp Analyzing Election and Polling Data in R

Measuring party support: the "generic ballot"

"If the election for the U.S. House of Representatives were held today, would you vote for the Democratic candidate or Republican candidate in your district?" Source: RealClearPolitics.com

slide-3
SLIDE 3

DataCamp Analyzing Election and Polling Data in R

House polling: Exploratory data analysis (EDA)

> head(generic_ballot) Date Democrats Republicans ElecYear ElecDay 7/4/1945 44 31 1946 11/5/1946 7/19/1945 38 31 1946 11/5/1946 10/23/1945 36 51 1946 11/5/1946 11/28/1945 40 34 1946 11/5/1946 1/10/1946 40 34 1946 11/5/1946 DaysTilED DemVote RepVote 489 45 53 474 45 53 378 45 53 342 45 53 299 45 53 > nrow(generic_ballot) # the number of observations [1] 2559 > length(generic_ballot) # the number of variables [1] 8

slide-4
SLIDE 4

DataCamp Analyzing Election and Polling Data in R

Generic ballot polling: EDA

library(lubridate) library(tidyverse) ggplot(generic_ballot,aes(x=mdy(Date),y=Democrats)) + geom_point()

slide-5
SLIDE 5

DataCamp Analyzing Election and Polling Data in R

How to learn from? this data

After initial data wrangling, you're going to use the generic ballot polling dataset to: Graph trends over time Compare polls (predictions) with election results (observations) Create statistical models that can explain outcomes

slide-6
SLIDE 6

DataCamp Analyzing Election and Polling Data in R

Tidyverse refresher

Tidyverse functions from chapter 1:

select() # selects columns filter() # filters dataset to value(s) of variable(s) group_by() # groups a dataset by unique observations of variable(s) summarise() # summarises a variable from a grouped dataset with an # aggregation function, like mean()

slide-7
SLIDE 7

DataCamp Analyzing Election and Polling Data in R

Step by step

  • 1. Look at the data with head()
  • 2. filter() the polls to those in 2016
  • 3. select() only the relevant variables
  • 4. mutate() a variable to represent the Democratic margin over Republicans
slide-8
SLIDE 8

DataCamp Analyzing Election and Polling Data in R

Time to practice!

ANALYZING ELECTION AND POLLING DATA IN R

slide-9
SLIDE 9

DataCamp Analyzing Election and Polling Data in R

73 Years of "Generic Ballot" Polls

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-10
SLIDE 10

DataCamp Analyzing Election and Polling Data in R

The generic ballot over time

In the last lesson, you: Explored the data to make a plan for your analysis Mutated a column that better represents the nature of elections (margin of victory can be more helpful than the share of votes cast). Now: Analyze long-term trends in the generic ballot Visualize the generic ballot over time

slide-11
SLIDE 11

DataCamp Analyzing Election and Polling Data in R

The generic ballot over time

From the last slides

slide-12
SLIDE 12

DataCamp Analyzing Election and Polling Data in R

Time series analysis for the generic ballot

Steps for monthly time series analysis: Group polls by the month and year in which they were taken Create an average reading of the Democratic margin in that month Analyze and visualize

data %>% group_by(year, month) data %>% group_by(year, month) %>% summarise(support = mean(support)) # with ggplot()!

slide-13
SLIDE 13

DataCamp Analyzing Election and Polling Data in R

Making a generic ballot ggplot

  • 1. Make the ggplot object
  • 2. Add a geometric layer (point, line, etc)

ggplot(data,aes(x=month, y=support)) ggplot(data,aes(x=month, y=support)) geom_point()

slide-14
SLIDE 14

DataCamp Analyzing Election and Polling Data in R

Making a generic ballot ggplot

slide-15
SLIDE 15

DataCamp Analyzing Election and Polling Data in R

Adding a trend line with geom_smooth()

ggplot(data,aes(x=month,y=support)) + geom_point() + geom_smooth(span=0.2)

slide-16
SLIDE 16

DataCamp Analyzing Election and Polling Data in R

Your turn!

ANALYZING ELECTION AND POLLING DATA IN R

slide-17
SLIDE 17

DataCamp Analyzing Election and Polling Data in R

Calculating and Visualizing Error in Polls

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morrs

Data Journalist

slide-18
SLIDE 18

DataCamp Analyzing Election and Polling Data in R

Polls have error!

Important thing #1: Polls have error! Why? It is impractical to ask everyone in the country who they're going to vote for. Some people answer phone calls for polls and some do not. Some people don't even get called. It's hard to tell what a population should look like. Important: Sometimes, many pollsters all make the same mistake, and results can be biased against one party or individual.

slide-19
SLIDE 19

DataCamp Analyzing Election and Polling Data in R

Why care about error?

Not controlling for the right variables, or not treating your data with the proper amount of uncertainty can lead to over-confident results. From http://cdc.gov/cancer/breast/statistics

slide-20
SLIDE 20

DataCamp Analyzing Election and Polling Data in R

Analyzing error in polls

Steps for calculating error in polls:

  • 1. Wrangle the data:

Create polling averages for every year

  • 2. Calculate polling error for each year

Subtract the result from the average poll's prediction

  • 3. Visualize the results and the margin of error
slide-21
SLIDE 21

DataCamp Analyzing Election and Polling Data in R

Grouping generic ballot data by year

Mutate a variable for the Democrat's margins Average that variable by year Compare the polling average to the results

poll_error <- generic_ballot %>% mutate(Democrats_Poll_Margin = Democrats - Republicans, Democrats_Vote_Margin = Democrats_vote - Republicans_vote) poll_error <- poll_error %>% group_by(Year) %>% summarise(Democrats_Poll_Margin = mean(Democrats_Poll_Margin), Democrats_Vote_Margin = mean(Democrats_Vote_Margin)) poll_error %>% mutate(error = Dem.Poll.Margin - Dem.Vote.Margin)

slide-22
SLIDE 22

DataCamp Analyzing Election and Polling Data in R

Calculating generic ballot error

Calculate the room-mean-square error Compute a margin of error Add an upper and lower bound variable to our dataset

rmse <- sqrt(mean(poll_error$error^2)) # multiply it by 1.96 for our 95% CI <- rmse * 1.96 by_year <- poll_error %>% mutate(upper = Dem.Poll.Margin + CI, lower = Dem.Poll.Margin - CI)

slide-23
SLIDE 23

DataCamp Analyzing Election and Polling Data in R

Visualizing error

Visualizing error with points and error bars:

# make the ggplot object ggplot(by_year) + # add the polling geom_point() layer ggplot(by_year) + geom_point(aes(x=ElecYear,y=Dem.Poll.Margin,col="Poll")) + # add the results geom_point() layer ggplot(by_year) + geom_point(aes(x=ElecYear,y=Dem.Poll.Margin,col="Poll")) + geom_point(aes(x=ElecYear,y=Dem.Vote.Margin,col="Vote")) + # add the error geom_errorbar() layer ggplot(by_year) + geom_point(aes(x=ElecYear,y=Dem.Poll.Margin,col="Poll")) + geom_point(aes(x=ElecYear,y=Dem.Vote.Margin,col="Vote")) + geom_errorbar(aes(x=ElecYear,ymin=lower, ymax=upper))

slide-24
SLIDE 24

DataCamp Analyzing Election and Polling Data in R

Visualizing error

slide-25
SLIDE 25

DataCamp Analyzing Election and Polling Data in R

Practice

ANALYZING ELECTION AND POLLING DATA IN R

slide-26
SLIDE 26

DataCamp Analyzing Election and Polling Data in R

Predicting Winners with Linear Regression

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-27
SLIDE 27

DataCamp Analyzing Election and Polling Data in R

Use polls to predict votes

Steps:

  • 1. Specify a regression model to predict polling
  • 2. Use that model to predict a hypothetical election with a 5-point margin for

Democrats in the polls

slide-28
SLIDE 28

DataCamp Analyzing Election and Polling Data in R

What is linear regression?

Analyzes the relationship between two (or more) variables Does so by fitting a "line of best fit" through the data In other words, what equation best predicts y with x (and x2, x3, etc.)

slide-29
SLIDE 29

DataCamp Analyzing Election and Polling Data in R

What is linear regression?

Linear regression made easy: draw a line between points that best fits the data:

slide-30
SLIDE 30

DataCamp Analyzing Election and Polling Data in R

Use polls to predict votes

# specify and evaluate the model model <- lm(Dem.Vote.Margin ~ Dem.Poll.Margin, by_year) summary(model) Call: lm(formula = Dem.Vote.Margin ~ Dem.Poll.Margin, data = by_year) Residuals: Min 1Q Median 3Q Max

  • 4.735 -1.788 -1.112 1.965 5.793

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.8109 1.0886 -2.582 0.024000 * Dem.Poll.Margin 0.8031 0.1607 4.998 0.000311 ***

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.389 on 12 degrees of freedom Multiple R-squared: 0.6755, Adjusted R-squared: 0.6484 F-statistic: 24.98 on 1 and 12 DF, p-value: 0.0003105

slide-31
SLIDE 31

DataCamp Analyzing Election and Polling Data in R

Practice

ANALYZING ELECTION AND POLLING DATA IN R