[PPT] - Estimating and Using Propensity Score in Presence of Missing PowerPoint Presentation

SLIDE 1

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact

f Childbearing on Wellbeing

Alessandra Mattei

Dipartimento di Statistica “G. Parenti” Universit` a degli Studi di Firenze mattei@ds.unifi.it

SLIDE 2

Outline

1. Motivation of the study
2. Estimating causal effects through a quasi-experimental approach
3. Estimating propensity scores with incomplete data
4. Estimating the causal effects of a childbearing on economic

wellbeing in Indonesia using the Indonesia Family Life Survey (IFLS)

5. Concluding remarks

SLIDE 3

Motivation of the Study

We compare three different approaches of handling missing

background data in the estimation and use of propensity scores:

1. A complete-case analysis
2. A pattern-mixture model based approach developed by

Rosenbaum and Rubin (1984)

3. A multiple imputation approach
We make explicit the assumptions underlying each approach by

illustrating the interaction between the treatment assignment mechanism and the missing data mechanism

We apply these methods to assess the impact of childbearing events
n individuals’ wellbeing in Indonesia, using a sample of women

from the Indonesia Family Life Survey

SLIDE 4

The Quasi-Experimental Approach

We use appropriate econometric techniques based on longitudinal micro

data in order to identify the causal effects of childbearing events on poverty

We consider the endogenous variable of interest, here change in fertility,

as treatment variable Z, and divides individuals into two groups: – those who experienced a childbirth - the treatment group, indicated by Z = T, and – those who did not - the control group, indicated by Z = C

The outcome variable, say Y , is a measure of wellbeing
Strong Ignorability Assumption (Rosenbaum and Rubin, 1983)

(i) Z is independent of the potential outcomes (Y (C), Y (T)) conditional

n X = x (Unconfoundedness Assumption)

(ii) η < Pr (Z = 1|X = x) < 1 − η, for some η > 0

SLIDE 5

The Unconfoundedness Assumption

The unconfoundedness assumption requires that all variables that affect both

utcome and the likelihood of receiving the treatment are observed or that all

the others are perfectly collinear with the observed ones

This assumption is not testable, it is a very strong assumption, and one

that need not generally be applicable

Selection may also take place on the basis of unobservable characteristics

We view it as a useful starting point for two reasons

1. In our study, we have carefully investigated which variables are most

likely to confound any comparison between treated and control units

2. Any alternative assumptions that not rely on unconfoundedness, while

allowing for consistent estimation of the causal effects of interest, must make alternative untestable assumptions

SLIDE 6

The Propensity Score

The propensity score is the conditional probability of receiving a

particular treatment (Z = T) versus control (Z = C) given a vector

f observed covariates, X,

e = e(X) = Pr (Z = T | X)

Balancing of pre-treatment variables given the propensity score

If e(X) is the propensity score, then Z⊥X|e(X)

Unconfoundedness given the propensity score

Z⊥ (Y (C), Y (T)) |X = ⇒ Z⊥ (Y (C), Y (T)) |e(X)

SLIDE 7

Notation

Let the response indicator be Rik =    1, if the value of the k covariate for the ith subject is observed 0, if the value of the k covariate for the ith subject is missing for i = 1, . . . , N and k = 1, . . . , K. Let X = (Xobs, Xmis), where Xobs = {Xik : Rik = 1} e Xmis = {Xik : Rik = 0}

SLIDE 8

Estimating Propensity Score with Incomplete Data

It is not clear how the propensity score should be estimated when

some covariate values are missing

The missingness itself may be predictive about which treatment is

received

Any technique for estimating propensity score in the presence of

covariate missing data will have to either make a stronger assumption regarding ignorability of the assignment mechanism or will have to make an assumption about the missing data mechanism

In order to have ignorability of the assignment mechanism, for all of

the techniques here described, we will maintain the following assumption: Pr (Z | X, R, Y (C), Y (T)) = Pr (Z | X, R)

SLIDE 9

Complete-Data Analysis

A complete-data analysis uses only observations where all variables

are observed

To make valid causal inferences with this approach we require that

data is Missing Completely At Random (MCAR, Little and Rubin): 1987): Pr(R | X, Z) = Pr(R) – This means that the units removed from the data set, those with missing data, are just a simple random sample of the other Note that Pr (Z | X, R, Y (C), Y (T)) = Pr (Z | X, R) and Pr(R | X, Z) = Pr(R) ⇓ Pr (Z | X, R, Y (C), Y (T)) = Pr (Z | X)

SLIDE 10

Rosenbaum - Rubin Approach

The Propensity Scores with Incomplete Data

The generalized propensity score, which conditions on all of the

bserved covariate information, is

e∗ = e∗(Xobs, R) = Pr

Z = T | Xobs, R
Balancing of pre-treatment variables given the generalized

propensity score Z⊥

Xobs, R
|e∗(Xobs, R)
Unconfoundedness given the generalized propensity score

Z⊥ (Y (C), Y (T)) |

Xobs, R
=

⇒ Z⊥ (Y (C), Y (T)) |e∗(Xobs, R)

SLIDE 11

Rosenbaum - Rubin Approach

Assumptions

The Rosenbaum-Rubin method relies on either one of the following assumptions: Pr(Z | X, R) ≡ Pr(Z | Xobs, Xmis, R) = Pr(Z | Xobs, R)

r

Pr(Y (C), Y (T) | X, R) ≡ Pr(Y (C), Y (T) | Xobs, Xmis, R) = Pr(Y (C), Y (C) | Xobs, R)

The Rosenbaum - Rubin method does not make any assumption

about the missing data mechanism

SLIDE 12

Rosenbaum - Rubin Approach

Drawbacks

The Rosenbaum - Rubin method does assume that

either – all missing covariate values are independent of the the assignment mechanism conditional on the missing data patterns

r

– or that they are independent of the potential outcomes conditional on observed covariate values and the missing data patterns

Since the Rosenbaum - Rubin method specifies one model for both

handling missing data and estimating propensity scores, it is not possible to incorporate the outcome variable Y into this model even though it might provide useful information about missing values

SLIDE 13

Multiple Imputation and Propensity Score Methods The latent ignorability of the assignment mechanism

Using Multiple Imputation (MI) to handling incomplete data covariates, we essentially assume the latent ignorability of the assignment mechanism Pr(Z | X, R, Y (C), Y (T)) = Pr(Z | X).

In our case, the assignment mechanism is ignorable only conditional
n complete covariate data (which includes, of course, values that

in practice are missing)

Computationally, this is achieved by filling in the missing covariate

values using MI

SLIDE 14

Multiple Imputation and Propensity Score Methods Assumptions on the assignment mechanism

Imputations may in principle be created under any kind of model

for the missing data mechanism, and the resulting inferences will be valid under that mechanism (Rubin, 1987)

In our application, MI was performed assuming that the missing
bservations are Missing At Random (MAR), that is,

Pr(R | X, Z, Y (C), Y (T)) = Pr(R | Xobs, Z, Y obs), where Y obs =

Y obs

i

n

i=1, Y obs i

= I{Zi = T}Yi(T) + I{Zi = C}Yi(C) – This MAR assumption involves all the observed variables – In our application, we perform MI in two way: ∗ including Y in the model, and ∗ not including Y in the imputation model

SLIDE 15

Multiple Imputation and Propensity Score Methods Estimators

Let ATTl and se2

l denote the point estimate and variance respectively from

the lth (l = 1, . . . , m) dataset. Then,

ATT

= 1 m

m

l=1
ATTl

V ar

ATT
=

se2

W + se2 B

1 + 1

m

where

se2

W

=

1 m

m

l=1 se2 l

Within-imputation variance se2

B

=

1 m−1

m

l=1

ATTl −

ATT 2 Between-imputation variance

In our application, MI was performed using the mvis module in STATA

(Patrick Royston, 2004), which is based on MICE method of multiple multivariate imputation (van Buuren et al., 1999)

SLIDE 16

Matching Estimators of the ATT Effect based on the Propensity Score

The Nearest Neighbor Matching Estimator
The Kernel Matching Estimator
The Stratification Matching Estimator

Irrespective of the method of handling missing data, the propensity score analysis is implemented by the use of the pscore module in STATA written by Becker and Ichino (2000)

SLIDE 17

The Indonesia Family Life Survey Data

The IFLS consists of three waves (1993, 1997, 2000) plus a special wave

(1998), which we will not use in our study

We will use a subsample of panel ever-married women age 15-49
In our study the outcome variable is a measure of monetary wellbeing,

given by the annual value of the total household consumption expenditures adjusted for price variability across space and time and household heterogeneity – Adjustment for price variability ∗ We divided the nominal consumption expenditures by the national consumption price index (IFS, 2002) – Adjustment for household heterogeneity ∗ We adjust our income-based measure of wellbeing for household heterogeneity by applying the following equivalence scale:

Total number of persons in the household

SLIDE 18

The Outcome Variable

Descriptive statistics of total net equivalised household consumption expenditures in 2000 (Rupiah∗ in thousands) by number of live births Consumption expenditures (Rupiah in thousands) Live births Obs mean s.d. median 3024 194.084 211.816 136.539 1 948 163.026 168.507 119.842 2 128 151.812 195.366 118.244 3 7 199.538 129.990 127.870 At least a live birth 1083 161.936 171.604 119.827

∗9, 064.54 Rupiah = 1 USA $

Note that 161.936 − 194.084 = 32.148

SLIDE 19

Self-Selection of the Treated Units

We observe that women who experience a childbearing and women

who do not are very different in almost all their characteristics (Details are omitted)

Systematic differences between the treatment group and the control

group can also occur in the distribution of the missing covariate data – 10.7% of the units in the sample presents at least a missing covariate value

SLIDE 20

Self-Selection of the Treated Units

Missing-value indicators (proportion observed)

Covariate Z = C Z = T |Difference| (%) Deprivation Index 0.930 0.919 1.1 Education level of HH head 0.999 1.000 0.1 Yrs of schooling of the HH head 0.995 0.994 0.1 Education level 0.999 1.000 0.1 Yrs of schooling 0.997 0.995 0.2 Activity last week 0.998 1.000 0.2 Age at first marriage 0.985 0.993 0.7 Islam 0.996 0.997 0.1 Parents in HH 0.998 1.000 0.2 Years since the last live birth 0.987 0.987 0.0 Pregnant 1.000 0.999 0.1 Ever used contraceptives 0.999 0.999 0.0 Use of contraceptives 0.998 0.997 0.1 Total 0.104 0.113 0.8

SLIDE 21

Propensity Score Models for IFLS Data

Standardized Differences (in %) and Percent Reduction in Bias for Propensity Scores, before and after matching using each approaches to the missing covariates problem in combination with Nearest Neighbor, Gaussian Kernel, and Stratification Propensity Score Matching Results after matching Nearest Neighbor Kernel Stratification Matching Matching Matching Missing Data Initial Stand. Red.

Stand. Red. Stand.

Red. Approaches

Stand. Diff. Diff.

in Bias

Diff. in Bias Diff.

in Bias (%) (%) (%) (%) (%) (%) (%) Complete-Data 140.4 0.1 99.9 7.7 94.5 18.8 86.6 Rosenbaum-Rubin 143.2 0.1 99.9 8.0 94.4 21.9 84.7 MI (without Y ) 143.1

0.1

100.1 7.2 95.0 21.8 84.7 MI (with Y ) 143.5

0.1

100.1 7.3 94.9 20.5 85.5

SLIDE 22

Treatment Effects Estimation

Complete-Data Analysis

Matching Method NT NC ATT S.E. t-value Nearest Neighbor 961 532

49.773

17.338

2.871

Kernel 961 2387

37.670

15.126

2.490

Stratification 961 2387

29.990

13.615

2.203
The complete-cases analysis gives quite high average treatment

effects and quite high standard errors

It appears to be very sensitive to the choice of the matching

method

In our application, the MCAR assumption does not appear

plausible; it is more reasonable to believe that the missing data mechanism is either Missing At Random (MAR) or nonignorable

SLIDE 23

Treatment Effects Estimation

Rosenbaum-Rubin Model

Matching Method NT NC ATT S.E. t-value Nearest Neighbor 1082 580

20.583

14.211

1.448

Kernel 1082 2670

28.827

14.005

2.058

Stratification 1082 2670

28.563

13.527

2.112
With respect to the complete-data analysis, the Rosenbaum-Rubin

model appears to be more robust concerning the choice of the matching method

It yields lower average treatment effects and lower standard errors
It does not produce an excellent balance in the distribution of the

estimated propensity score

SLIDE 24

Treatment Effects Estimation

Multiple Imputation (without Y )

Matching Method NT NC ATT S.E. t-value Nearest Neighbor 1083 565.1

24.781

19.830

1.250

Kernel 1083 2638.4

31.948

13.896

2.299

Stratification 1083 2638.4

26.940

12.942

2.082

Multiple Imputation (with Y )

Matching Method NT NC ATT S.E. t-value Nearest Neighbor 1083 569.1

25.655

18.896

1.358

Kernel 1083 2636.5

31.840

14.741

2.160

Stratification 1083 2636.5

26.213

12.906

2.031

SLIDE 25

Advantages of the MI Techniques

The two imputation models outperform both of the other two

approaches in terms of robustness of the estimates to the choice of the matching method

Using different models for imputation and propensity score, the MI

approach allows to incorporate model features in one model that might be inappropriate for another

MI makes the choice of the propensity model easier
The MI approach allows for final analysis of the outcomes (such as

covariance adjustment) which include covariates which are not fully

bserved

SLIDE 26

Concluding Remarks

We compared missing completely at random based estimates of

propensity scores and the causal effect of interest with estimators based on alternative models for the missing data process: – A pattern-mixture model based approach developed by Rosenbaum and Rubin (1984) – A combination of propensity score matching with MI

We judged the plausibility of these alternative approaches by the

balance that the resulting propensity score models produced and the estimands they brought out

In our application, the MI models appear to outperform both the

complete data analysis and the Rosenbaum-Rubin method

The combination of propensity score matching with MI we choose