Gov 51: Missing Data Matthew Blackwell Harvard University 1 / 7 - - PowerPoint PPT Presentation

gov 51 missing data
SMART_READER_LITE
LIVE PREVIEW

Gov 51: Missing Data Matthew Blackwell Harvard University 1 / 7 - - PowerPoint PPT Presentation

Gov 51: Missing Data Matthew Blackwell Harvard University 1 / 7 Civilian attitudes and war against insurgency War in Afghanistan: counter-insurgency war Military against insurgents Key to victory: winning hearts and minds of


slide-1
SLIDE 1

Gov 51: Missing Data

Matthew Blackwell

Harvard University

1 / 7

slide-2
SLIDE 2

Civilian attitudes and war against insurgency

  • War in Afghanistan: counter-insurgency war
  • Military against insurgents
  • Key to victory: winning hearts and minds of civilians
  • Aid provision, information campaign, minimizing civilian casualties
  • How does exposure to violence afgect support for Taliban, coalition?

2 / 7

slide-3
SLIDE 3

Civilian attitudes and war against insurgency

  • War in Afghanistan: counter-insurgency war
  • Military against insurgents
  • Key to victory: winning hearts and minds of civilians
  • Aid provision, information campaign, minimizing civilian casualties
  • How does exposure to violence afgect support for Taliban, coalition?

2 / 7

slide-4
SLIDE 4

Civilian attitudes and war against insurgency

  • War in Afghanistan: counter-insurgency war
  • Military against insurgents
  • Key to victory: winning hearts and minds of civilians
  • Aid provision, information campaign, minimizing civilian casualties
  • How does exposure to violence afgect support for Taliban, coalition?

2 / 7

slide-5
SLIDE 5

Civilian attitudes and war against insurgency

  • War in Afghanistan: counter-insurgency war
  • Military against insurgents
  • Key to victory: winning hearts and minds of civilians
  • Aid provision, information campaign, minimizing civilian casualties
  • How does exposure to violence afgect support for Taliban, coalition?

2 / 7

slide-6
SLIDE 6

Civilian attitudes and war against insurgency

  • War in Afghanistan: counter-insurgency war
  • Military against insurgents
  • Key to victory: winning hearts and minds of civilians
  • Aid provision, information campaign, minimizing civilian casualties
  • How does exposure to violence afgect support for Taliban, coalition?

2 / 7

slide-7
SLIDE 7

Afghan study

afghan <- read.csv(”data/afghan.csv”) head(afghan[, 1:8]) ## province district village.id age educ.years ## 1 Logar Baraki Barak 80 26 10 ## 2 Logar Baraki Barak 80 49 3 ## 3 Logar Baraki Barak 80 60 ## 4 Logar Baraki Barak 80 34 14 ## 5 Logar Baraki Barak 80 21 12 ## 6 Logar Baraki Barak 80 18 10 ## employed income violent.exp.ISAF ## 1 0 2,001-10,000 ## 2 1 2,001-10,000 ## 3 1 2,001-10,000 1 ## 4 1 2,001-10,000 ## 5 1 2,001-10,000 ## 6 1 <NA>

3 / 7

slide-8
SLIDE 8

Missing data

  • Nonresponse: respondent can’t or won’t answer question.
  • Sensitive questions

social desirability bias

  • Some countries lack offjcial statistics like unemployment.
  • Leads to missing data.
  • Missing data in R: a special value NA
  • Causes problems with calculating statistics:

## prop. of those who got hurt by ISAF mean(afghan$violent.exp.ISAF) ## [1] NA

4 / 7

slide-9
SLIDE 9

Missing data

  • Nonresponse: respondent can’t or won’t answer question.
  • Sensitive questions ⇝ social desirability bias
  • Some countries lack offjcial statistics like unemployment.
  • Leads to missing data.
  • Missing data in R: a special value NA
  • Causes problems with calculating statistics:

## prop. of those who got hurt by ISAF mean(afghan$violent.exp.ISAF) ## [1] NA

4 / 7

slide-10
SLIDE 10

Missing data

  • Nonresponse: respondent can’t or won’t answer question.
  • Sensitive questions ⇝ social desirability bias
  • Some countries lack offjcial statistics like unemployment.
  • Leads to missing data.
  • Missing data in R: a special value NA
  • Causes problems with calculating statistics:

## prop. of those who got hurt by ISAF mean(afghan$violent.exp.ISAF) ## [1] NA

4 / 7

slide-11
SLIDE 11

Missing data

  • Nonresponse: respondent can’t or won’t answer question.
  • Sensitive questions ⇝ social desirability bias
  • Some countries lack offjcial statistics like unemployment.
  • Leads to missing data.
  • Missing data in R: a special value NA
  • Causes problems with calculating statistics:

## prop. of those who got hurt by ISAF mean(afghan$violent.exp.ISAF) ## [1] NA

4 / 7

slide-12
SLIDE 12

Missing data

  • Nonresponse: respondent can’t or won’t answer question.
  • Sensitive questions ⇝ social desirability bias
  • Some countries lack offjcial statistics like unemployment.
  • Leads to missing data.
  • Missing data in R: a special value NA
  • Causes problems with calculating statistics:

## prop. of those who got hurt by ISAF mean(afghan$violent.exp.ISAF) ## [1] NA

4 / 7

slide-13
SLIDE 13

Missing data

  • Nonresponse: respondent can’t or won’t answer question.
  • Sensitive questions ⇝ social desirability bias
  • Some countries lack offjcial statistics like unemployment.
  • Leads to missing data.
  • Missing data in R: a special value NA
  • Causes problems with calculating statistics:

## prop. of those who got hurt by ISAF mean(afghan$violent.exp.ISAF) ## [1] NA

4 / 7

slide-14
SLIDE 14

Missing data

  • Nonresponse: respondent can’t or won’t answer question.
  • Sensitive questions ⇝ social desirability bias
  • Some countries lack offjcial statistics like unemployment.
  • Leads to missing data.
  • Missing data in R: a special value NA
  • Causes problems with calculating statistics:

## prop. of those who got hurt by ISAF mean(afghan$violent.exp.ISAF) ## [1] NA

4 / 7

slide-15
SLIDE 15

Missing data

  • Nonresponse: respondent can’t or won’t answer question.
  • Sensitive questions ⇝ social desirability bias
  • Some countries lack offjcial statistics like unemployment.
  • Leads to missing data.
  • Missing data in R: a special value NA
  • Causes problems with calculating statistics:

## prop. of those who got hurt by ISAF mean(afghan$violent.exp.ISAF) ## [1] NA

4 / 7

slide-16
SLIDE 16

Handling missing data in R

  • Adding na.rm = TRUE to some functions removes missing data.

mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Or, you can explicitly remove missing values using na.omit() function:

mean(na.omit(afghan$violent.exp.ISAF)) ## [1] 0.375

  • Add NA to table() with exclude = NULL:

table(ISAF = afghan$violent.exp.ISAF, exclude = NULL) ## ISAF ## 1 <NA> ## 1706 1023 25

5 / 7

slide-17
SLIDE 17

Handling missing data in R

  • Adding na.rm = TRUE to some functions removes missing data.

mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Or, you can explicitly remove missing values using na.omit() function:

mean(na.omit(afghan$violent.exp.ISAF)) ## [1] 0.375

  • Add NA to table() with exclude = NULL:

table(ISAF = afghan$violent.exp.ISAF, exclude = NULL) ## ISAF ## 1 <NA> ## 1706 1023 25

5 / 7

slide-18
SLIDE 18

Handling missing data in R

  • Adding na.rm = TRUE to some functions removes missing data.

mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Or, you can explicitly remove missing values using na.omit() function:

mean(na.omit(afghan$violent.exp.ISAF)) ## [1] 0.375

  • Add NA to table() with exclude = NULL:

table(ISAF = afghan$violent.exp.ISAF, exclude = NULL) ## ISAF ## 1 <NA> ## 1706 1023 25

5 / 7

slide-19
SLIDE 19

Handling missing data in R

  • Adding na.rm = TRUE to some functions removes missing data.

mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Or, you can explicitly remove missing values using na.omit() function:

mean(na.omit(afghan$violent.exp.ISAF)) ## [1] 0.375

  • Add NA to table() with exclude = NULL:

table(ISAF = afghan$violent.exp.ISAF, exclude = NULL) ## ISAF ## 1 <NA> ## 1706 1023 25

5 / 7

slide-20
SLIDE 20

Handling missing data in R

  • Adding na.rm = TRUE to some functions removes missing data.

mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Or, you can explicitly remove missing values using na.omit() function:

mean(na.omit(afghan$violent.exp.ISAF)) ## [1] 0.375

  • Add NA to table() with exclude = NULL:

table(ISAF = afghan$violent.exp.ISAF, exclude = NULL) ## ISAF ## 1 <NA> ## 1706 1023 25

5 / 7

slide-21
SLIDE 21

Handling missing data in R

  • Adding na.rm = TRUE to some functions removes missing data.

mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Or, you can explicitly remove missing values using na.omit() function:

mean(na.omit(afghan$violent.exp.ISAF)) ## [1] 0.375

  • Add NA to table() with exclude = NULL:

table(ISAF = afghan$violent.exp.ISAF, exclude = NULL) ## ISAF ## 1 <NA> ## 1706 1023 25

5 / 7

slide-22
SLIDE 22

Handling missing data in R

  • Adding na.rm = TRUE to some functions removes missing data.

mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Or, you can explicitly remove missing values using na.omit() function:

mean(na.omit(afghan$violent.exp.ISAF)) ## [1] 0.375

  • Add NA to table() with exclude = NULL:

table(ISAF = afghan$violent.exp.ISAF, exclude = NULL) ## ISAF ## 1 <NA> ## 1706 1023 25

5 / 7

slide-23
SLIDE 23

Handling missing data in R

  • Adding na.rm = TRUE to some functions removes missing data.

mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Or, you can explicitly remove missing values using na.omit() function:

mean(na.omit(afghan$violent.exp.ISAF)) ## [1] 0.375

  • Add NA to table() with exclude = NULL:

table(ISAF = afghan$violent.exp.ISAF, exclude = NULL) ## ISAF ## 1 <NA> ## 1706 1023 25

5 / 7

slide-24
SLIDE 24

Handling missing data in R

  • Adding na.rm = TRUE to some functions removes missing data.

mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Or, you can explicitly remove missing values using na.omit() function:

mean(na.omit(afghan$violent.exp.ISAF)) ## [1] 0.375

  • Add NA to table() with exclude = NULL:

table(ISAF = afghan$violent.exp.ISAF, exclude = NULL) ## ISAF ## 1 <NA> ## 1706 1023 25

5 / 7

slide-25
SLIDE 25

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-26
SLIDE 26

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-27
SLIDE 27

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-28
SLIDE 28

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-29
SLIDE 29

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-30
SLIDE 30

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-31
SLIDE 31

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-32
SLIDE 32

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-33
SLIDE 33

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-34
SLIDE 34

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-35
SLIDE 35

Available-case vs complete-case analysis

  • Available-case analysis: use the data you have for that variable:

sum(!is.na(afghan$violent.exp.ISAF)) ## [1] 2729 mean(afghan$violent.exp.ISAF, na.rm = TRUE) ## [1] 0.375

  • Complete-case analysis: only use units that have data on all variables
  • Also called listwise deletion

dim(na.omit(afghan)) ## [1] 2554 11 mean(na.omit(afghan)$violent.exp.ISAF) ## [1] 0.372

6 / 7

slide-36
SLIDE 36

Non-response and other biases

  • Nonresponse can create bias.
  • More violent areas

more non-response:

tapply(is.na(afghan$violent.exp.taliban), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.03041 0.00635 0.00000 0.00000 0.06202 tapply(is.na(afghan$violent.exp.ISAF), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.01637 0.00476 0.00000 0.00000 0.02067

  • versampling citizens with less exposure to violence.

7 / 7

slide-37
SLIDE 37

Non-response and other biases

  • Nonresponse can create bias.
  • More violent areas ⇝ more non-response:

tapply(is.na(afghan$violent.exp.taliban), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.03041 0.00635 0.00000 0.00000 0.06202 tapply(is.na(afghan$violent.exp.ISAF), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.01637 0.00476 0.00000 0.00000 0.02067

  • versampling citizens with less exposure to violence.

7 / 7

slide-38
SLIDE 38

Non-response and other biases

  • Nonresponse can create bias.
  • More violent areas ⇝ more non-response:

tapply(is.na(afghan$violent.exp.taliban), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.03041 0.00635 0.00000 0.00000 0.06202 tapply(is.na(afghan$violent.exp.ISAF), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.01637 0.00476 0.00000 0.00000 0.02067

  • versampling citizens with less exposure to violence.

7 / 7

slide-39
SLIDE 39

Non-response and other biases

  • Nonresponse can create bias.
  • More violent areas ⇝ more non-response:

tapply(is.na(afghan$violent.exp.taliban), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.03041 0.00635 0.00000 0.00000 0.06202 tapply(is.na(afghan$violent.exp.ISAF), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.01637 0.00476 0.00000 0.00000 0.02067

  • versampling citizens with less exposure to violence.

7 / 7

slide-40
SLIDE 40

Non-response and other biases

  • Nonresponse can create bias.
  • More violent areas ⇝ more non-response:

tapply(is.na(afghan$violent.exp.taliban), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.03041 0.00635 0.00000 0.00000 0.06202 tapply(is.na(afghan$violent.exp.ISAF), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.01637 0.00476 0.00000 0.00000 0.02067

  • versampling citizens with less exposure to violence.

7 / 7

slide-41
SLIDE 41

Non-response and other biases

  • Nonresponse can create bias.
  • More violent areas ⇝ more non-response:

tapply(is.na(afghan$violent.exp.taliban), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.03041 0.00635 0.00000 0.00000 0.06202 tapply(is.na(afghan$violent.exp.ISAF), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.01637 0.00476 0.00000 0.00000 0.02067

  • versampling citizens with less exposure to violence.

7 / 7

slide-42
SLIDE 42

Non-response and other biases

  • Nonresponse can create bias.
  • More violent areas ⇝ more non-response:

tapply(is.na(afghan$violent.exp.taliban), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.03041 0.00635 0.00000 0.00000 0.06202 tapply(is.na(afghan$violent.exp.ISAF), afghan$province, mean) ## Helmand Khost Kunar Logar Uruzgan ## 0.01637 0.00476 0.00000 0.00000 0.02067

  • ⇝ oversampling citizens with less exposure to violence.

7 / 7