Case st u d y: election fra u d IN FE R E N C E FOR C ATE G OR IC - - PowerPoint PPT Presentation

case st u d y election fra u d
SMART_READER_LITE
LIVE PREVIEW

Case st u d y: election fra u d IN FE R E N C E FOR C ATE G OR IC - - PowerPoint PPT Presentation

Case st u d y: election fra u d IN FE R E N C E FOR C ATE G OR IC AL DATA IN R Andre w Bra y Assistant Professor of Statistics at Reed College Election fra u d Vote b uy ing Voting t w ice Altering v ote totals 1 The phrase election fra u d


slide-1
SLIDE 1

Case study: election fraud

IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

Andrew Bray

Assistant Professor of Statistics at Reed College

slide-2
SLIDE 2

INFERENCE FOR CATEGORICAL DATA IN R

Election fraud

Vote buying Voting twice Altering vote totals

The phrase election fraud can mean many things including vote buying, casting two ballots in dierent locations, and stung ballot boxes with fake ballots. We're going to focus on a version of the third, when the vote

1

slide-3
SLIDE 3

INFERENCE FOR CATEGORICAL DATA IN R

Election fraud

Vote buying Voting twice Altering vote totals

slide-4
SLIDE 4

INFERENCE FOR CATEGORICAL DATA IN R

Benford’s Law A.K.A. "the first digit law"

library(gapminder) gapminder %>% filter(year == 2007) %>% select(country, pop) # A tibble: 142 x 2 country pop <fct> <int> 1 Afghanistan 31889923 2 Albania 3600523 3 Algeria 33333216 4 Angola 12420476 5 Argentina 40301927 6 Australia 20434176 7 Austria 8199783 8 Bahrain 708573 9 Bangladesh 150448339 10 Belgium 10392226 # … with 132 more rows

slide-5
SLIDE 5

INFERENCE FOR CATEGORICAL DATA IN R

Benford’s Law A.K.A. "the first digit law"

If the election was fair then vote counts should follow Benford’s Law. If the election was fraudulent then vote counts should not follow Benford’s Law.

slide-6
SLIDE 6

INFERENCE FOR CATEGORICAL DATA IN R

Iran election 2009

iran %>% select(city, ahmadinejad, mousavi, total_votes_cast) # A tibble: 366 x 4 city ahmadinejad mousavi total_votes_cast <chr> <dbl> <dbl> <dbl> 1 Azar Shahr 37203 18312 56712 2 Asko 32510 18799 52643 3 Ahar 47938 26220 75500 4 Bostan Abad 38610 12603 51911 5 Bonab 36395 33695 71389 6 Tabriz 435728 419983 876919 7 Jalfa 20520 14340 35295 8 Chahar o Imaq 12197 3975 16375 9 Sarab 53196 17669 72152 10 Shabestar 37099 39182 77459 # … with 356 more rows

slide-7
SLIDE 7

Let's practice!

IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

slide-8
SLIDE 8

Goodness of fit

IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

Andrew Bray

Assistant Professor of Statistics at Reed College

slide-9
SLIDE 9

INFERENCE FOR CATEGORICAL DATA IN R

First Digit Distribution

slide-10
SLIDE 10

INFERENCE FOR CATEGORICAL DATA IN R

First Digit Distribution

slide-11
SLIDE 11

INFERENCE FOR CATEGORICAL DATA IN R

First Digit Distribution

slide-12
SLIDE 12

INFERENCE FOR CATEGORICAL DATA IN R

First Digit Distribution

slide-13
SLIDE 13

INFERENCE FOR CATEGORICAL DATA IN R

First Digit Distribution

slide-14
SLIDE 14

INFERENCE FOR CATEGORICAL DATA IN R

Chi-squared distance

slide-15
SLIDE 15

INFERENCE FOR CATEGORICAL DATA IN R

Chi-squared distance

slide-16
SLIDE 16

INFERENCE FOR CATEGORICAL DATA IN R

Chi-squared distance

slide-17
SLIDE 17

INFERENCE FOR CATEGORICAL DATA IN R

Chi-squared distance

slide-18
SLIDE 18

INFERENCE FOR CATEGORICAL DATA IN R

Chi-squared distance

slide-19
SLIDE 19

INFERENCE FOR CATEGORICAL DATA IN R

Chi-squared distance

slide-20
SLIDE 20

INFERENCE FOR CATEGORICAL DATA IN R

Chi-squared distance

slide-21
SLIDE 21

INFERENCE FOR CATEGORICAL DATA IN R

Chi-squared distance

slide-22
SLIDE 22

INFERENCE FOR CATEGORICAL DATA IN R

Chi-squared distance

slide-23
SLIDE 23

INFERENCE FOR CATEGORICAL DATA IN R

Chi-squared distance

slide-24
SLIDE 24

INFERENCE FOR CATEGORICAL DATA IN R

First Digit Distribution

slide-25
SLIDE 25

INFERENCE FOR CATEGORICAL DATA IN R

First Digit Distribution

slide-26
SLIDE 26

INFERENCE FOR CATEGORICAL DATA IN R

First Digit Distribution

slide-27
SLIDE 27

INFERENCE FOR CATEGORICAL DATA IN R

First Digit Distribution

slide-28
SLIDE 28

INFERENCE FOR CATEGORICAL DATA IN R

Example: uniformity of party

ggplot(gss2016, aes(x = party)) + geom_bar() + geom_hline(yintercept = 149/3, color = "goldenrod", size = 2) tab <- gss2016 %>% select(party) %>% table() tab Dem Ind Rep 43 72 34 p_uniform <- c(Dem = 1/3, Ind = 1/3, Rep = 1/3) chisq.test(tab, p = p_uniform)$stat X-squared 15.87919

slide-29
SLIDE 29

INFERENCE FOR CATEGORICAL DATA IN R

Simulating the null

gss2016 %>% specify(response = party) %>% hypothesize(null = "point", p = p_uniform) %>% generate(reps = 1, type = "simulate") # A tibble: 149 x 2 # Groups: replicate [1] party replicate <fct> <fct> 1 I 1 2 D 1 3 I 1 4 I 1 5 D 1 6 R 1 7 I 1 8 R 1 9 D 1 10 I 1 # ... with 139 more rows

slide-30
SLIDE 30

INFERENCE FOR CATEGORICAL DATA IN R

Simulating the null

sim_1 <- gss2016 %>% specify(response = party) %>% hypothesize(null = “point”, p = p_uniform) %>% generate(reps = 1, type = "simulate") ggplot(sim_1, aes(x = party)) + geom_bar()

slide-31
SLIDE 31

Let's practice!

IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

slide-32
SLIDE 32

And now to US

IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

Andrew Bray

Assistant Professor of Statistics at Reed College

slide-33
SLIDE 33

INFERENCE FOR CATEGORICAL DATA IN R

Iran election fraud

slide-34
SLIDE 34

INFERENCE FOR CATEGORICAL DATA IN R

Iran election fraud

slide-35
SLIDE 35

INFERENCE FOR CATEGORICAL DATA IN R

Iran election fraud

slide-36
SLIDE 36

INFERENCE FOR CATEGORICAL DATA IN R

Iran election fraud

slide-37
SLIDE 37

INFERENCE FOR CATEGORICAL DATA IN R

U.S.A. 2016 election

H : the election was fair (Benford’s Law

holds)

H : the election was fraudulent (Benford’s

Law does not hold)

A

slide-38
SLIDE 38

INFERENCE FOR CATEGORICAL DATA IN R

Iowa vote totals

By TUBS [CC BY SA 3.0], from Wikimedia Commons

1

slide-39
SLIDE 39

INFERENCE FOR CATEGORICAL DATA IN R

Iowa vote totals

iowa # A tibble: 1,386 x 5

  • ffice candidate party county votes

<chr> <chr> <chr> <chr> <dbl> 1 President/Vice Pre… Evan McMullin / Nathan Johnson Nominated by Peti… Adair 10 2 President/Vice Pre… Under Votes NA Adair 32 3 President/Vice Pre… Gary Johnson / Bill Weld Libertarian Adair 127 4 President/Vice Pre… Over Votes NA Adair 5 5 President/Vice Pre… Gloria La Riva / Dennis J. Banks Socialism and Lib… Adair 0 6 President/Vice Pre… Darrell L. Castle / Scott N. Bra… Constitution Adair 10 7 President/Vice Pre… Hillary Clinton / Tim Kaine Democratic Adair 1133 8 President/Vice Pre… Jill Stein / Ajamu Baraka Green Adair 14 9 President/Vice Pre… Rocky Roque De La Fuente / Micha… Nominated by Peti… Adair 3 10 President/Vice Pre… Donald Trump / Mike Pence Republican Adair 2461 # … with 1,376 more rows

slide-40
SLIDE 40

Let's practice!

IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

slide-41
SLIDE 41

Election fraud in Iran and Iowa: debrief

IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

Andrew Bray

Assistant Professor of Statistics at Reed College

slide-42
SLIDE 42

INFERENCE FOR CATEGORICAL DATA IN R

Iowa election fraud

slide-43
SLIDE 43

INFERENCE FOR CATEGORICAL DATA IN R

Iowa election fraud

slide-44
SLIDE 44

INFERENCE FOR CATEGORICAL DATA IN R

Iowa election fraud

slide-45
SLIDE 45

INFERENCE FOR CATEGORICAL DATA IN R

slide-46
SLIDE 46

INFERENCE FOR CATEGORICAL DATA IN R

slide-47
SLIDE 47

INFERENCE FOR CATEGORICAL DATA IN R

slide-48
SLIDE 48

INFERENCE FOR CATEGORICAL DATA IN R

slide-49
SLIDE 49

INFERENCE FOR CATEGORICAL DATA IN R

slide-50
SLIDE 50

INFERENCE FOR CATEGORICAL DATA IN R

Take-home lesson

The statistical tool must be appropriate for the task.

By TUBS [CC BY SA 3.0], from Wikimedia Commons By P30Carl [GFDL] or [CC BY SA 3.0], from Wikimedia Commons

1 2 3 4

slide-51
SLIDE 51

INFERENCE FOR CATEGORICAL DATA IN R

Methods for categorical data

Condence Intervals

One proportion Dierence in proportions

Hypothesis tests

One proportion Dierence in proportions Test of independence Goodness of t

slide-52
SLIDE 52

INFERENCE FOR CATEGORICAL DATA IN R

slide-53
SLIDE 53

INFERENCE FOR CATEGORICAL DATA IN R

slide-54
SLIDE 54

INFERENCE FOR CATEGORICAL DATA IN R

slide-55
SLIDE 55

INFERENCE FOR CATEGORICAL DATA IN R

slide-56
SLIDE 56

INFERENCE FOR CATEGORICAL DATA IN R

slide-57
SLIDE 57

INFERENCE FOR CATEGORICAL DATA IN R

slide-58
SLIDE 58

Let's practice!

IN FE R E N C E FOR C ATE G OR IC AL DATA IN R