Estimating and Using Propensity Score in Presence of Missing - - PowerPoint PPT Presentation

estimating and using propensity score in presence of
SMART_READER_LITE
LIVE PREVIEW

Estimating and Using Propensity Score in Presence of Missing - - PowerPoint PPT Presentation

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Alessandra Mattei Dipartimento di Statistica G. Parenti Universit` a degli Studi di Firenze


slide-1
SLIDE 1

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact

  • f Childbearing on Wellbeing

Alessandra Mattei

Dipartimento di Statistica “G. Parenti” Universit` a degli Studi di Firenze mattei@ds.unifi.it

slide-2
SLIDE 2

Outline

  • 1. Motivation of the study
  • 2. Estimating causal effects through a quasi-experimental approach
  • 3. Estimating propensity scores with incomplete data
  • 4. Estimating the causal effects of a childbearing on economic

wellbeing in Indonesia using the Indonesia Family Life Survey (IFLS)

  • 5. Concluding remarks
slide-3
SLIDE 3

Motivation of the Study

  • We compare three different approaches of handling missing

background data in the estimation and use of propensity scores:

  • 1. A complete-case analysis
  • 2. A pattern-mixture model based approach developed by

Rosenbaum and Rubin (1984)

  • 3. A multiple imputation approach
  • We make explicit the assumptions underlying each approach by

illustrating the interaction between the treatment assignment mechanism and the missing data mechanism

  • We apply these methods to assess the impact of childbearing events
  • n individuals’ wellbeing in Indonesia, using a sample of women

from the Indonesia Family Life Survey

slide-4
SLIDE 4

The Quasi-Experimental Approach

  • We use appropriate econometric techniques based on longitudinal micro

data in order to identify the causal effects of childbearing events on poverty

  • We consider the endogenous variable of interest, here change in fertility,

as treatment variable Z, and divides individuals into two groups: – those who experienced a childbirth - the treatment group, indicated by Z = T, and – those who did not - the control group, indicated by Z = C

  • The outcome variable, say Y , is a measure of wellbeing
  • Strong Ignorability Assumption (Rosenbaum and Rubin, 1983)

(i) Z is independent of the potential outcomes (Y (C), Y (T)) conditional

  • n X = x (Unconfoundedness Assumption)

(ii) η < Pr (Z = 1|X = x) < 1 − η, for some η > 0

slide-5
SLIDE 5

The Unconfoundedness Assumption

The unconfoundedness assumption requires that all variables that affect both

  • utcome and the likelihood of receiving the treatment are observed or that all

the others are perfectly collinear with the observed ones

  • This assumption is not testable, it is a very strong assumption, and one

that need not generally be applicable

  • Selection may also take place on the basis of unobservable characteristics

We view it as a useful starting point for two reasons

  • 1. In our study, we have carefully investigated which variables are most

likely to confound any comparison between treated and control units

  • 2. Any alternative assumptions that not rely on unconfoundedness, while

allowing for consistent estimation of the causal effects of interest, must make alternative untestable assumptions

slide-6
SLIDE 6

The Propensity Score

  • The propensity score is the conditional probability of receiving a

particular treatment (Z = T) versus control (Z = C) given a vector

  • f observed covariates, X,

e = e(X) = Pr (Z = T | X)

  • Balancing of pre-treatment variables given the propensity score

If e(X) is the propensity score, then Z⊥X|e(X)

  • Unconfoundedness given the propensity score

Z⊥ (Y (C), Y (T)) |X = ⇒ Z⊥ (Y (C), Y (T)) |e(X)

slide-7
SLIDE 7

Notation

Let the response indicator be Rik =    1, if the value of the k covariate for the ith subject is observed 0, if the value of the k covariate for the ith subject is missing for i = 1, . . . , N and k = 1, . . . , K. Let X = (Xobs, Xmis), where Xobs = {Xik : Rik = 1} e Xmis = {Xik : Rik = 0}

slide-8
SLIDE 8

Estimating Propensity Score with Incomplete Data

  • It is not clear how the propensity score should be estimated when

some covariate values are missing

  • The missingness itself may be predictive about which treatment is

received

  • Any technique for estimating propensity score in the presence of

covariate missing data will have to either make a stronger assumption regarding ignorability of the assignment mechanism or will have to make an assumption about the missing data mechanism

  • In order to have ignorability of the assignment mechanism, for all of

the techniques here described, we will maintain the following assumption: Pr (Z | X, R, Y (C), Y (T)) = Pr (Z | X, R)

slide-9
SLIDE 9

Complete-Data Analysis

  • A complete-data analysis uses only observations where all variables

are observed

  • To make valid causal inferences with this approach we require that

data is Missing Completely At Random (MCAR, Little and Rubin): 1987): Pr(R | X, Z) = Pr(R) – This means that the units removed from the data set, those with missing data, are just a simple random sample of the other Note that Pr (Z | X, R, Y (C), Y (T)) = Pr (Z | X, R) and Pr(R | X, Z) = Pr(R) ⇓ Pr (Z | X, R, Y (C), Y (T)) = Pr (Z | X)

slide-10
SLIDE 10

Rosenbaum - Rubin Approach

The Propensity Scores with Incomplete Data

The generalized propensity score, which conditions on all of the

  • bserved covariate information, is

e∗ = e∗(Xobs, R) = Pr

  • Z = T | Xobs, R
  • Balancing of pre-treatment variables given the generalized

propensity score Z⊥

  • Xobs, R
  • |e∗(Xobs, R)
  • Unconfoundedness given the generalized propensity score

Z⊥ (Y (C), Y (T)) |

  • Xobs, R
  • =

⇒ Z⊥ (Y (C), Y (T)) |e∗(Xobs, R)

slide-11
SLIDE 11

Rosenbaum - Rubin Approach

Assumptions

The Rosenbaum-Rubin method relies on either one of the following assumptions: Pr(Z | X, R) ≡ Pr(Z | Xobs, Xmis, R) = Pr(Z | Xobs, R)

  • r

Pr(Y (C), Y (T) | X, R) ≡ Pr(Y (C), Y (T) | Xobs, Xmis, R) = Pr(Y (C), Y (C) | Xobs, R)

  • The Rosenbaum - Rubin method does not make any assumption

about the missing data mechanism

slide-12
SLIDE 12

Rosenbaum - Rubin Approach

Drawbacks

  • The Rosenbaum - Rubin method does assume that

either – all missing covariate values are independent of the the assignment mechanism conditional on the missing data patterns

  • r

– or that they are independent of the potential outcomes conditional on observed covariate values and the missing data patterns

  • Since the Rosenbaum - Rubin method specifies one model for both

handling missing data and estimating propensity scores, it is not possible to incorporate the outcome variable Y into this model even though it might provide useful information about missing values

slide-13
SLIDE 13

Multiple Imputation and Propensity Score Methods The latent ignorability of the assignment mechanism

Using Multiple Imputation (MI) to handling incomplete data covariates, we essentially assume the latent ignorability of the assignment mechanism Pr(Z | X, R, Y (C), Y (T)) = Pr(Z | X).

  • In our case, the assignment mechanism is ignorable only conditional
  • n complete covariate data (which includes, of course, values that

in practice are missing)

  • Computationally, this is achieved by filling in the missing covariate

values using MI

slide-14
SLIDE 14

Multiple Imputation and Propensity Score Methods Assumptions on the assignment mechanism

  • Imputations may in principle be created under any kind of model

for the missing data mechanism, and the resulting inferences will be valid under that mechanism (Rubin, 1987)

  • In our application, MI was performed assuming that the missing
  • bservations are Missing At Random (MAR), that is,

Pr(R | X, Z, Y (C), Y (T)) = Pr(R | Xobs, Z, Y obs), where Y obs =

  • Y obs

i

n

i=1, Y obs i

= I{Zi = T}Yi(T) + I{Zi = C}Yi(C) – This MAR assumption involves all the observed variables – In our application, we perform MI in two way: ∗ including Y in the model, and ∗ not including Y in the imputation model

slide-15
SLIDE 15

Multiple Imputation and Propensity Score Methods Estimators

Let ATTl and se2

l denote the point estimate and variance respectively from

the lth (l = 1, . . . , m) dataset. Then,

  • ATT

= 1 m

m

  • l=1
  • ATTl

V ar

  • ATT
  • =

se2

W + se2 B

  • 1 + 1

m

  • where

se2

W

=

1 m

m

l=1 se2 l

Within-imputation variance se2

B

=

1 m−1

m

l=1

  • ATTl −

ATT 2 Between-imputation variance

  • In our application, MI was performed using the mvis module in STATA

(Patrick Royston, 2004), which is based on MICE method of multiple multivariate imputation (van Buuren et al., 1999)

slide-16
SLIDE 16

Matching Estimators of the ATT Effect based on the Propensity Score

  • The Nearest Neighbor Matching Estimator
  • The Kernel Matching Estimator
  • The Stratification Matching Estimator

Irrespective of the method of handling missing data, the propensity score analysis is implemented by the use of the pscore module in STATA written by Becker and Ichino (2000)

slide-17
SLIDE 17

The Indonesia Family Life Survey Data

  • The IFLS consists of three waves (1993, 1997, 2000) plus a special wave

(1998), which we will not use in our study

  • We will use a subsample of panel ever-married women age 15-49
  • In our study the outcome variable is a measure of monetary wellbeing,

given by the annual value of the total household consumption expenditures adjusted for price variability across space and time and household heterogeneity – Adjustment for price variability ∗ We divided the nominal consumption expenditures by the national consumption price index (IFS, 2002) – Adjustment for household heterogeneity ∗ We adjust our income-based measure of wellbeing for household heterogeneity by applying the following equivalence scale:

  • Total number of persons in the household
slide-18
SLIDE 18

The Outcome Variable

Descriptive statistics of total net equivalised household consumption expenditures in 2000 (Rupiah∗ in thousands) by number of live births Consumption expenditures (Rupiah in thousands) Live births Obs mean s.d. median 3024 194.084 211.816 136.539 1 948 163.026 168.507 119.842 2 128 151.812 195.366 118.244 3 7 199.538 129.990 127.870 At least a live birth 1083 161.936 171.604 119.827

∗9, 064.54 Rupiah = 1 USA $

  • Note that 161.936 − 194.084 = 32.148
slide-19
SLIDE 19

Self-Selection of the Treated Units

  • We observe that women who experience a childbearing and women

who do not are very different in almost all their characteristics (Details are omitted)

  • Systematic differences between the treatment group and the control

group can also occur in the distribution of the missing covariate data – 10.7% of the units in the sample presents at least a missing covariate value

slide-20
SLIDE 20

Self-Selection of the Treated Units

Missing-value indicators (proportion observed)

Covariate Z = C Z = T |Difference| (%) Deprivation Index 0.930 0.919 1.1 Education level of HH head 0.999 1.000 0.1 Yrs of schooling of the HH head 0.995 0.994 0.1 Education level 0.999 1.000 0.1 Yrs of schooling 0.997 0.995 0.2 Activity last week 0.998 1.000 0.2 Age at first marriage 0.985 0.993 0.7 Islam 0.996 0.997 0.1 Parents in HH 0.998 1.000 0.2 Years since the last live birth 0.987 0.987 0.0 Pregnant 1.000 0.999 0.1 Ever used contraceptives 0.999 0.999 0.0 Use of contraceptives 0.998 0.997 0.1 Total 0.104 0.113 0.8

slide-21
SLIDE 21

Propensity Score Models for IFLS Data

Standardized Differences (in %) and Percent Reduction in Bias for Propensity Scores, before and after matching using each approaches to the missing covariates problem in combination with Nearest Neighbor, Gaussian Kernel, and Stratification Propensity Score Matching Results after matching Nearest Neighbor Kernel Stratification Matching Matching Matching Missing Data Initial Stand. Red.

  • Stand. Red. Stand.

Red. Approaches

  • Stand. Diff. Diff.

in Bias

  • Diff. in Bias Diff.

in Bias (%) (%) (%) (%) (%) (%) (%) Complete-Data 140.4 0.1 99.9 7.7 94.5 18.8 86.6 Rosenbaum-Rubin 143.2 0.1 99.9 8.0 94.4 21.9 84.7 MI (without Y ) 143.1

  • 0.1

100.1 7.2 95.0 21.8 84.7 MI (with Y ) 143.5

  • 0.1

100.1 7.3 94.9 20.5 85.5

slide-22
SLIDE 22

Treatment Effects Estimation

Complete-Data Analysis

Matching Method NT NC ATT S.E. t-value Nearest Neighbor 961 532

  • 49.773

17.338

  • 2.871

Kernel 961 2387

  • 37.670

15.126

  • 2.490

Stratification 961 2387

  • 29.990

13.615

  • 2.203
  • The complete-cases analysis gives quite high average treatment

effects and quite high standard errors

  • It appears to be very sensitive to the choice of the matching

method

  • In our application, the MCAR assumption does not appear

plausible; it is more reasonable to believe that the missing data mechanism is either Missing At Random (MAR) or nonignorable

slide-23
SLIDE 23

Treatment Effects Estimation

Rosenbaum-Rubin Model

Matching Method NT NC ATT S.E. t-value Nearest Neighbor 1082 580

  • 20.583

14.211

  • 1.448

Kernel 1082 2670

  • 28.827

14.005

  • 2.058

Stratification 1082 2670

  • 28.563

13.527

  • 2.112
  • With respect to the complete-data analysis, the Rosenbaum-Rubin

model appears to be more robust concerning the choice of the matching method

  • It yields lower average treatment effects and lower standard errors
  • It does not produce an excellent balance in the distribution of the

estimated propensity score

slide-24
SLIDE 24

Treatment Effects Estimation

Multiple Imputation (without Y )

Matching Method NT NC ATT S.E. t-value Nearest Neighbor 1083 565.1

  • 24.781

19.830

  • 1.250

Kernel 1083 2638.4

  • 31.948

13.896

  • 2.299

Stratification 1083 2638.4

  • 26.940

12.942

  • 2.082

Multiple Imputation (with Y )

Matching Method NT NC ATT S.E. t-value Nearest Neighbor 1083 569.1

  • 25.655

18.896

  • 1.358

Kernel 1083 2636.5

  • 31.840

14.741

  • 2.160

Stratification 1083 2636.5

  • 26.213

12.906

  • 2.031
slide-25
SLIDE 25

Advantages of the MI Techniques

  • The two imputation models outperform both of the other two

approaches in terms of robustness of the estimates to the choice of the matching method

  • Using different models for imputation and propensity score, the MI

approach allows to incorporate model features in one model that might be inappropriate for another

  • MI makes the choice of the propensity model easier
  • The MI approach allows for final analysis of the outcomes (such as

covariance adjustment) which include covariates which are not fully

  • bserved
slide-26
SLIDE 26

Concluding Remarks

  • We compared missing completely at random based estimates of

propensity scores and the causal effect of interest with estimators based on alternative models for the missing data process: – A pattern-mixture model based approach developed by Rosenbaum and Rubin (1984) – A combination of propensity score matching with MI

  • We judged the plausibility of these alternative approaches by the

balance that the resulting propensity score models produced and the estimands they brought out

  • In our application, the MI models appear to outperform both the

complete data analysis and the Rosenbaum-Rubin method

  • The combination of propensity score matching with MI we choose

shows evidence that childbearing events reduce consumption levels