The 2016 US Presidential Election G. Elliott Morris Data - - PowerPoint PPT Presentation

the 2016 us presidential election
SMART_READER_LITE
LIVE PREVIEW

The 2016 US Presidential Election G. Elliott Morris Data - - PowerPoint PPT Presentation

DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R The 2016 US Presidential Election G. Elliott Morris Data Journalist DataCamp Analyzing Election and Polling Data in R Understanding presidential


slide-1
SLIDE 1

DataCamp Analyzing Election and Polling Data in R

The 2016 US Presidential Election

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-2
SLIDE 2

DataCamp Analyzing Election and Polling Data in R

Understanding presidential elections

United States Presidential Elections: Voters cast ballots for two major parties, Democrats and Republicans, and other minor parties Results are recorded by county election officials and published by state secretaries of state Can be combined with other county-level demographic data from the U.S. Census Bureau

slide-3
SLIDE 3

DataCamp Analyzing Election and Polling Data in R

What can we learn from elections?

THE BIG QUESTION:

Are white counties... ... more Republican?

slide-4
SLIDE 4

DataCamp Analyzing Election and Polling Data in R

Are areas that vote more Republican in presidential elections...

slide-5
SLIDE 5

DataCamp Analyzing Election and Polling Data in R

...also whiter areas of the country?

slide-6
SLIDE 6

DataCamp Analyzing Election and Polling Data in R

The data

The datasets we're going to use for this lesson are a combination of two official sources:

  • 1. County-level election returns form the 2016 presidential election
  • 2. County-level demographic data from the US Census Bureau, accessed via the

choroplethr package

slide-7
SLIDE 7

DataCamp Analyzing Election and Polling Data in R

The data

left_join(df_county_demographics, uspres_county, by = "county.fips") county.fips total_population percent_white percent_black percent_asian 1 1001 54907 76 18 1 2 1003 187114 83 9 1 3 1005 27321 46 46 0 4 1007 22754 75 22 0 5 1009 57623 88 1 0 6 1011 10746 22 71 0 percent_hispanic per_capita_income median_rent median_age county.name 1 2 24571 668 37.5 autauga 2 4 26766 693 41.5 baldwin 3 5 16829 382 38.3 barbour 4 2 17427 351 39.4 bibb 5 8 20730 403 39.6 blount 6 6 18628 276 39.6 bullock state.name county.total.count D O R Dem.pct 1 alabama 24973 5936 865 18172 0.23769671 2 alabama 95215 18458 3874 72883 0.19385601 3 alabama 10469 4871 144 5454 0.46527844 4 alabama 8819 1874 207 6738 0.21249575 5 alabama 25588 2156 573 22859 0.08425825 6 alabama 4710 3530 40 1140 0.74946921

slide-8
SLIDE 8

DataCamp Analyzing Election and Polling Data in R

Exploring relationships between data

Load the ggplot2 package: Visualize the relationship between percent_white and Dem.pct:

library(ggplot2) ggplot(county_merged, aes(x=percent_white,y=Dem.pct)) + geom_point()

slide-9
SLIDE 9

DataCamp Analyzing Election and Polling Data in R

slide-10
SLIDE 10

DataCamp Analyzing Election and Polling Data in R

Exploring relationships between data

Load the ggplot2 package: Visualize the relationship between percent_white and Dem.pct: Add a trend line:

library(ggplot2) ggplot(county_merged, aes(x=percent_white,y=Dem.pct)) + geom_point() ggplot(county_merged, aes(x=percent_white,y=Dem.pct)) + geom_point() + geom_smooth(method="lm")

slide-11
SLIDE 11

DataCamp Analyzing Election and Polling Data in R

slide-12
SLIDE 12

DataCamp Analyzing Election and Polling Data in R

Now it's your turn!

ANALYZING ELECTION AND POLLING DATA IN R

slide-13
SLIDE 13

DataCamp Analyzing Election and Polling Data in R

Mapping The 2016 US Presidential Election

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-14
SLIDE 14

DataCamp Analyzing Election and Polling Data in R

Why maps? Why not?

Mapping can: Display continuous or discrete data in a familiar way (most people can find where they live on a map) Help analysts identify meaningful patterns in the data: is the south more Republican than the north? Put other types of graphics, like scatterplots, into context Mapping cannot: Conduct statistical analysis! Easily show relationships between two or more variables

slide-15
SLIDE 15

DataCamp Analyzing Election and Polling Data in R

Mapping data in R

slide-16
SLIDE 16

DataCamp Analyzing Election and Polling Data in R

Mapping data in R

Choices for mapping:

choroplethr: fast visualization, low customizability, comes with data ggplot + geom_sf(): fast, customizability, need for data leaflet: interactive, customizable, steep learning curve, need for data

slide-17
SLIDE 17

DataCamp Analyzing Election and Polling Data in R

Choroplethr

slide-18
SLIDE 18

DataCamp Analyzing Election and Polling Data in R

Ggplot + geom_sf()

slide-19
SLIDE 19

DataCamp Analyzing Election and Polling Data in R

Leaflet

From https://rstudio.github.io/leaflet/choropleths.html

slide-20
SLIDE 20

DataCamp Analyzing Election and Polling Data in R

Mapping the 2016 election

... using choroplethr: Load the package Give the dataset its proper names: Map!

library(choroplethr) county_map <- county_merged %>% dplyr::rename("region" = county.fips, "value" = Dem.pct) county_choropleth(county_map)

slide-21
SLIDE 21

DataCamp Analyzing Election and Polling Data in R

A map of the 2016 presidential election

slide-22
SLIDE 22

DataCamp Analyzing Election and Polling Data in R

Your turn!

ANALYZING ELECTION AND POLLING DATA IN R

slide-23
SLIDE 23

DataCamp Analyzing Election and Polling Data in R

Linear Regression and Political Data

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-24
SLIDE 24

DataCamp Analyzing Election and Polling Data in R

Regression recap

Analyzes the relationship between two (or more) variables Does so by fitting a "line of best fit" through the data

slide-25
SLIDE 25

DataCamp Analyzing Election and Polling Data in R

Election results and linear regression

Linear regression made easy: draw a line between points that best fits the data:

slide-26
SLIDE 26

DataCamp Analyzing Election and Polling Data in R

Analyzing results with linear regression

fit <- lm(Dem.pct ~ percent_white, data=county_merged) summary(fit) Call: lm(formula = Dem.pct ~ percent_white, data = county_merged) Residuals: Min 1Q Median 3Q Max

  • 0.39987 -0.08303 -0.00903 0.07281 0.47761

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.6719046 0.0090408 74.32 <2e-16 *** percent_white -0.0045684 0.0001123 -40.68 <2e-16 ***

slide-27
SLIDE 27

DataCamp Analyzing Election and Polling Data in R

Interpreting linear regression results

summary(fit) Call: lm(formula = Dem.pct ~ percent_white, data = county_merged) Residuals: Min 1Q Median 3Q Max

  • 0.39987 -0.08303 -0.00903 0.07281 0.47761

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.6719046 0.0090408 74.32 <2e-16 *** percent_white -0.0045684 0.0001123 -40.68 <2e-16 ***

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1227 on 3097 degrees of freedom (44 observations deleted due to missingness) Multiple R-squared: 0.3482, Adjusted R-squared: 0.348 F-statistic: 1655 on 1 and 3097 DF, p-value: < 2.2e-16

slide-28
SLIDE 28

DataCamp Analyzing Election and Polling Data in R

Your Turn!

ANALYZING ELECTION AND POLLING DATA IN R

slide-29
SLIDE 29

DataCamp Analyzing Election and Polling Data in R

The 2016 UK Referendum to Leave the EU (AKA: Brexit)

ANALYZING ELECTION AND POLLING DATA IN R

  • G. Elliott Morris

Data Journalist

slide-30
SLIDE 30

DataCamp Analyzing Election and Polling Data in R

What was Brexit?

From https://www.bbc.com/news/uk-politics-32810887

slide-31
SLIDE 31

DataCamp Analyzing Election and Polling Data in R

The puzzle of Brexit

  • 1. Who was leading the polls?

"Remain" versus "Leave"

  • 2. What happened?

"Leave" won

  • 3. Why?

Non-college educated UKIP voters vs Labour and establishment Tories

slide-32
SLIDE 32

DataCamp Analyzing Election and Polling Data in R

Brexit Polling

Polls showed Remain with a slight lead in the final days of the campaign

head(brexit_polls) Date Remain Leave RemainLead 1 6/23/16 52 48 4 2 6/22/16 55 45 10 3 6/22/16 51 49 2 4 6/22/16 49 46 3 5 6/22/16 44 45 -1 6 6/22/16 54 46 8 7 6/22/16 48 42 6 8 6/22/16 41 43 -2 9 6/20/16 45 44 1 10 6/19/16 42 44 -2

slide-33
SLIDE 33

DataCamp Analyzing Election and Polling Data in R

Brexit Polling: Analysis

Option 1: Average all the polls in the last week of the campaign. Conclusion: Remain will win Option 2: LOESS smoothers (LOcally wEighted Scatter-plot Smoother) use local regression to predict the outcome of a variable. Conclusion: Remain's lead over time: a surge in the final week, but uncertainty around the outcome remained high

slide-34
SLIDE 34

DataCamp Analyzing Election and Polling Data in R

Brexit Polling: Visualization

ggplot(brexit_polls, aes(x = mdy(Date), y = Remain - Leave)) + geom_point() + geom_smooth(method = 'loess')

slide-35
SLIDE 35

DataCamp Analyzing Election and Polling Data in R

Brexit Polling: Conclusion

Remain's lead was large and significant However, it was not large enough to rule out a Leave victory, as many analysts did This improper reading of uncertainty in data can lead to misguided understandings of the probability of different events.

slide-36
SLIDE 36

DataCamp Analyzing Election and Polling Data in R

Why did Leave win?

Your job as the BBC's new chief election analyst on Brexit: Re-use your skills in geom_smooth() and lm() to: (1) explain the relationship between being a UKIP supporter and voting to leave and (2) explore the relationship between college education and voting to Leave

slide-37
SLIDE 37

DataCamp Analyzing Election and Polling Data in R

Your turn!

ANALYZING ELECTION AND POLLING DATA IN R