Application of the Generalized Propensity Score. Evaluation of - - PowerPoint PPT Presentation
Application of the Generalized Propensity Score. Evaluation of - - PowerPoint PPT Presentation
Application of the Generalized Propensity Score. Evaluation of Public Contributions to Piedmont Enterprises Michela Bia Alessandra Mattei Dipartimento di Statistica G. Parenti Universit di Firenze Projects goals Techniques based
Project’s goals
Techniques based on the propensity score have long been used for causal inference in observational studies for reducing bias caused by non-random treatment assignment (Rosenbaum and Rubin, 1983b) The propensity score method is usually confined to binary treatment scenarios. In many cases of interest the treatment takes on more than two values
( e.g drug applied in different doses or a treatment applied over different time periods )
We implement an extension of the propensity score method, in a setting with a continuous treatment. The methodology is applied to the public contributions supplied to the Piedmont enterprises, during years 2001 - 2003 . Due to the variety
- f funds set by public policies, the treatment turns out to be a
continuous variable. We are interested in the effect of different amounts of contribution on the occupational level.
Our empirical study: economic supports to Piedmont Industry
This study covers all measures - basically grants and loans at special rates - of financial support in favour of enterprises in Piedmont between 2001 and 2003 (regional, given to regions, national and EU co-financed): Economic supports to productive activities in depressed areas, economic supports to investments (488/92 Industry, 266/97, L. 140/97 , 341/95, 1329/65, 662/96) Economic supports to investments for enterprises (DOCUP 2000- 2006 Ob.2 areas ) Economic supports to the environment safeguard (R.L. 598/94) Research and development - Applied research (L.297/99 D.M.593/00 )
The Equivalent Gross Subsidy computation
All data concerning loans at special rates are turned into Equivalent Gross Subsidy through a specific computation: where: X is the EGS financing estimation (net benefit of enterprises); mra is the market rate; sra is the subsidized rate; p is the pre-depreciation period; fin is the total financed amount; N is the financing term; mrent is the financing rent with market rate; srent is the financing depreciation rent with subsidized rate;
∑ ∑
+ = =
+ − + + − =
N p t t p t t
mra srent mrent fin mra sra mra X
1 1
) 1 ( ) 1 (
Basic framework
We consider a sample of units i=1,2..,N and, for each unit, a set of potential unit-level outcomes Yi(t) for t∈τ [SUTVA, (Rubin 1980a)] In the binary treatment τ = {0,1} In the continuous case τ ⊂ [t0,t1] We are interested in the average treatment effects estimation, for example: µ(t) - µ(t+∆ t) =E[Yi(t)]- E[Yi (t+∆ t)] Weak unconfoundedness assumption (Imbens and Hirano, 2004) Y(t) ⊥ T|X for all t ∈ τ Generalized Propensity Score Let r(t,x) be the conditional density function of the treatment given the covariates: r(t,x) = fT| X(t|x)
Balancing property
Balancing of pre-treatment variables given the generalized propensity score Within strata with the same value of r(t,X) , the probability that T = t does not depend on the value of X: X ⊥ 1{T=t}| r(t,X) This definition does not require unconfoundedness. Weak unconfoundedness assumption given the generalized propensity score (Imbens and Hirano, 2004) Y(t) ⊥ T|r(t,X) for all t ∈ τ
Van Dik, Lu and Imbens: three different approaches for the GPS implementation
Van Dyk - Imai (2003), Imbens - Hirano (2004) and Lu et al. (2001) develop methods that implement the generalized propensity score. Van Dik and Imai apply analysis techniques mostly based on sub-
- classification. They introduce the generalized propensity score
through a propensity function: where parameterizes this distribution. Dik and Imai assume that depends on X only trough a specific function so that is sufficient for T. They compute for each observation and sub-classify observations with the same or similar values of gps into a number of sub-classes of equal size.
) , ( ) (
) (
X T r x t f
X T ψ ψ
=
ψ
) (., X r ) (X
ψ
θ
θ
) , ( X t r
∧
∧ ψ
The average causal effect is a weighted average of the within sub- classes effects, with weights equal to the relative size of the sub- classes: In contrast to sub-classification method, Lu et al. (2001) suggest matching pairs of units on r^. They propose a distance measure that decreases when the propensity scores become similar and the received treatments become dissimilar. The treatment effect can be evaluated by examining the difference in response between the “high” and “low” treatment. In the continuous case matching procedures are more difficult than in binary treatment. This because the matched pairs should not
- nly have similar r^, but also different treatment levels.
s s S s
W r t T t Y E t Y E ] , ) ( [ )] ( [
1 ∧ =
= ≈ ∑
Imbens and Hirano’s procedure (2004) for the dose-response estimation is mostly based on the regression on the propensity score technique. We will apply it in our empirical study. First the GPS is estimated through the conditional distribution of the treatment variable given the covariates with the estimated GPS equal to To verify whether the specification is correct, one can verify if it balances the covariates.
) ), ; ( (
2
σ β
i i i
X h N X T ∼
) ; (
i i X
T gps φ =
∧
Once the correct specification is obtained, the conditional expectation of Y given T and the GPS - E[Y | T = t, R = r] - is estimated: i) B(t,r)=E[Y(t)| r(t,X) = r] = E[Y | T = t, R = r]= B(t,r) Hence, the dose-response function µ(t) = E[B(t,r(t,X)] is obtained averaging the conditional expectation over the score r(t,X) evaluated at a certain level of the treatment t: ii) µ(t) = E[B(t,r(t,X)]= E[E[Y(t) | r(t,X) ]]=E[Y(t)]
The Program
The program gpscore.ado estimates the generalized propensity score and tests the Balancing Hypothesis according to the following algorithm: 1. Assume a normal distribution for the treatment given the covariates: where β is the parameter vector, is a known function of the covariates which depends on the parameters β and σ2 2. Estimate β and σ2 by maximum likelihood 3. Estimate the gps applying the normal probability density function evaluated for all values of T and X :
) ), ; ( (
2
σ β
i i i
X h N X T ∼
) , ( β
i
X h
) ; (
i i X
T gps φ =
∧
4 Test the Balancing Property
1) Split the treatment’s range in k equally spaced intervals, where k is chosen by the user. 2) Calculate the mean or a percentile of the treatment and evaluate the gps at that specific level of T. Let tk,p be the chosen value of the treatment. 3) Split the estimated gps’ range in j equally spaced intervals, where j can be arbitrarily chosen. 4) Within each j-th interval of gps, for each covariate compute the differences between the mean for units with ti > tk,p and that for units with ti <= tk,p 5) Combine the differences in means, calculated in previous step, weighted by the number of observations in each group of gpsi interval and then in each treatment interval 6) If the test fails, the Balancing Test is not satisfied and one or more
- f
the following alternatives can be tried: a) Specify a different propensity score; b) Specify a different partition of the range of the estimated gps; c) Specify a different sub-classification of the treatment.
Syntax
gpscore is a regression-like command gpscore varlist [if exp] [in range] [fweight iweight pweight], gpscore(string) predict(string) sd(string) Cutpoints(varname numeric index(string) nq_gps(numlist) [regression_type(string) Detail level(real 0.01)]
The economic supports to Piedmont Enterprises
The administrative data are collected by ASIA (1996-2003). The different types of funds assigned to the industries are supplied by Finpiemonte, Mediocredito Bank. The final database is obtained merging contributions archives relative to each type of measure with ASIA archive administrative data (2000- 2003) with Industry Census (2001) data, so that to have:
- business name;
- municipality and corporate address;
- industrial activity field (Ateco 2002);
- juridical classification;
- employees (mean by year, permanent and temporary, 2001-2003);
- grant concession and payment date (according to each law);
- subsidized financing (based on E.G.S computation for loans);
- company type (according to the number of employees and local unit localization)
- craft or non-craft enterprise.
Distribution assumption
The normal distribution assumption of the intervention given covariates, for the small - medium - big companies, is suitable according to the residual analysis of the model specification: where Xi are:
PROV = 8 binary variables ( 7 included in the model ) denoting the type of province for the sample of Piedmont enterprises in the analysis. NON_ART = binary variable denoting the non-craft characteristic (NON_ART = 1) or
- ther (NON_ART = 0) for the sample of Piedmont enterprises in the analysis.
UNILOC = binary variable denoting if the corporate domicile of Piedmont enterprises is inside or outside the region. SETT = 8 binary variables denoting the type of manufacturing activities of Piedmont enterprises in the analysis, according to Ateco2002 classification for ASIA_ISTAT data . APRE = (control) binary variable denoting if the enterprise began its activity during any year after 2000. CHIUDE = (control) binary variable denoting if the enterprise closed after any year after 2000
) , ( _
2 ' 1
σ β β
i i
X N X t Ln + ≈
Residuals Graph of Logarithm of the contributions given the covariates (small enterprises)
- 4
- 2
2 4 Res idua ls 9 10 11 12 Fitted values
Small enterprises
****************************************************** End of the algorithm to estimate the generalized pscore ****************************************************** Mean Standard Difference Deviation t-value p-value prov1 .00738 .02229 .33129 .74044 prov2 -.00239 .00863 -.27682 .78193 prov3 -.00296 .01256 -.23544 .81388 prov4 -.00134 .01389 -.09652 .92312 prov5 -.00081 .00963 -.08422 .93289 prov6 .00256 .01294 .19803 .84303 prov7 -.00053 .01033 -.05172 .95875 non_art2 .02287 .01907 1.1992 .23052 uniloc2 -.00727 .01652 -.43967 .6602 sett1 -.00098 .00576 -.17079 .86439 sett2 -.00134 .01081 -.12368 .90158 sett3 .00083 .01686 .04938 .96062 sett4 -.00061 .01563 -.03914 .96878 sett5 -.0063 .02094 -.30079 .76359 sett6 .00039 .01931 .02005 .98401 sett7 .00741 .01158 .63966 .52243 tot_add2000 .69131 .41585 1.6624 .09651 chiude -4.1e-05 .00831 -.00493 .99606 apre 0 0 . .
The balancing property is satisfied end sum pscore Variable | Obs Mean Std. Dev. Min Max
- ------------+--------------------------------------------------------
pscore | 3943 .2821614 .1118859 .0003668 .3989423
The model specification for causal effect estimation
= ∆ = 00 _ 03 ) , ( add r t β
t pscore b pscore b t b pscore b t b b ) log( )) (log( ) log(
5 2 4 2 3 2 1
+ + + + + =
µ(t) estimation of contribution on
∆employment 2003-2000 for small enterprises
Dose-response derivatives and confidence bands 95% Dose-response differences and confidence bands 95%
) ( ) ( t t t µ µ − ∆ + ) ( ) 50000 ( t t µ µ − +
For instance, if the treatment increased from 1000 euro to 51000 euro (50000+1000), the number of employees would increase of about +1.7 or if the treatment increased from 30000 euro to 80000 euro (50000+ 30000), the number of employees would increase of about +0.66.
Medium enterprises
****************************************************** End of the algorithm to estimate the generalized pscore ****************************************************** Mean Standard Difference Deviation t-value p-value prov1 -.10337 .04604 -2.2454 .02508 prov2 -.02224 .02311 -.96235 .33623 prov3 .05924 .03414 1.7352 .08317 prov4 .01056 .0341 .30972 .75687 prov5 -.00191 .01755 -.10887 .91334 prov6 .01471 .02474 .59464 .55229 prov7 .01867 .03426 .54504 .58591 non_art2 -.00162 .01155 -.13997 .88873 uniloc2 -.03366 .05455 -.61699 .53745 sett1 -.00329 .01194 -.27601 .78262 sett2 .00395 .02215 .17821 .85861 sett3 .01952 .04586 .42569 .67047 sett4 -.01082 .03533 -.30636 .75943 sett5 .01555 .04601 .33799 .73548 sett6 -.00513 .04987 -.10287 .91809 sett7 -.02252 .03583 -.62847 .52992 tot_add2000 2.7048 4.6869 .57709 .56408 chiude .00358 .02069 .17323 .86253 apre 0 0 . .
The balancing property is satisfied end . sum pscore Variable | Obs Mean Std. Dev. Min Max
- ------------+--------------------------------------------------------
pscore | 676 .2766309 .1058829 .000042 .3989407
The model specification for causal effect estimation
= ∆ = 00 _ 03 ) , ( add r t β
t pscore b pscore b t b pscore b t b b ) log( )) (log( ) log(
5 2 4 2 3 2 1
+ + + + + =
µ(t) estimation of contribution on
∆employment 2003-2000 for medium enterprises
Dose-response derivatives and confidence bands 95% Dose-response differences and confidence bands 95%
) ( ) ( t t t µ µ − ∆ + ) ( ) 50000 ( t t µ µ − +
For instance, if the treatment increased from 50000 euro to 100000 euro (50000+50000), the number of employees would increase of about +3.7 or if the treatment increased from 100000 euro to 150000 euro (50000+ 100000), the number of employees would increase of about +3.8.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * E n d
- f
t h e a l g
- r
i t h m t
- e
s t i m a t e t h e g e n e r a l i z e d p s c
- r
e * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * M e a n S t a n d a r d D i f f e r e n c e D e v i a t i
- n
t
- v
a l u e p
- v
a l u e p r
- v
1
- .
1 7 8 5 . 1 3 4 1 7
- .
1 3 3 1 . 8 9 4 7 8 p r
- v
2 . 3 6 8 . 3 6 8 1 . 3 2 2 6 6 p r
- v
3
- .
3 4 3 8 . 1 1 4 6 7
- .
2 9 9 8 1 . 7 6 5 7 p r
- v
4 . 1 9 3 1 . 9 4 8 9 . 2 3 4 7 . 8 3 9 6 9 p r
- v
5
- .
1 3 3 . 5 7 5
- .
1 8 1 1 6 . 8 5 7 5 p r
- v
6 . 4 4 3 1 . 1 1 1 4 3 . 3 9 7 6 5 . 6 9 2 7 7 p r
- v
7
- .
5 9 3 5 . 1 1 5 8 5
- .
5 1 2 3 3 . 6 1 9 2 u n i l
- c
2
- .
2 4 6 7 . 1 7 1 8 6
- .
1 4 3 5 6 . 8 8 6 4 9 s e t t 2 . 2 3 4 . 1 4 1 4 1 . 1 6 5 5 . 9 8 6 8 7 s e t t 3
- .
3 6 6 7 . 1 3 9 5
- .
3 5 2 8 1 . 7 2 5 8 8 s e t t 4 . 5 1 1 . 1 1 5 1 . 4 3 5 6 6 . 6 6 5 1 6 s e t t 5
- .
4 6 3 5 . 1 8 6 7
- .
2 5 6 5 4 . 7 9 8 7 s e t t 6
- .
9 8 3 . 1 4 3 3 1
- .
6 8 5 9 . 9 4 5 6 2 t
- t
_ a d d 2
- .
8 3 1 3 1 2 5 . 1 8
- .
6 6 4 . 9 9 4 7 3 c h i u d e . . a p r e . .
Big enterprises
The balancing property is satisfied end . sum pscore Variable | Obs Mean Std. Dev. Min Max
- ------------+--------------------------------------------------------
pscore | 63 .3011026 .1109757 .0004503 .3989423
The model specification for causal effect estimation
= ∆ = 00 _ 03 ) , ( add r t β t pscore b pscore b t b pscore b t b b ) log( )) (log( ) log(
5 2 4 2 3 2 1
+ + + + + =
µ(t) estimation of contribution on
∆employment2003-2000 for big enterprises
Dose-response derivatives and confidence bands 95% Dose-response differences and confidence bands 95%
) ( ) ( t t t µ µ − ∆ + ) ( ) 50000 ( t t µ µ − +
According to the derivatives confidence bands, we didn’t get significant marginal effects relative to the dose-response (and this result is also confirmed for the dose- response differences distribution )
µ(t) estimation of the only grant contribution
- n ∆employment2003-2000 for small
enterprises
Dose-response derivatives and confidence bands 95% Dose-response differences and confidence bands 95%
) ( ) ( t t t µ µ − ∆ + ) ( ) 50000 ( t t µ µ − +
For instance, if the treatment increased from 2000 euro to 52000 euro (2000+50000), the number of employees would increase of about +2 units or if the treatment increased from 50000 euro to 100000 euro (50000+ 50000), the number
- f employees would increase of about +0.8 units.
Conclusion and further research
- The role of policy maker in management of economic
interventions for industry has amplified over the past few years. Empirical evidence is needed in order to establish a correct future evaluation and efficient programmes to support companies
- Different specifications of the model for the causal effect
estimation will be carried out in order to check robustness of the ATE evaluation
- Sensitivity analysis will be applied in estimating causal
effects of interventions, also verifing the robustness of results removing the starting – point assumptions
- A multilogit normal model to elaborate (in the gpscore