Tobit and Selection Models Manuel Arellano CEMFI January 2014

Censored Regression Illustration 1: Top-coding in wages � Suppose Y (log wages) are subject to “top coding” (as with social security records): � Y � if Y � � c Y = c if Y � > c � Suppose we are interested in E ( Y � ) . E¤ectively it is not identi…ed but if we assume � µ , σ 2 � Y � � N , then µ can be determined from the distribution of Y . � The density of Y is of the form 8 � r � µ � < 1 σ φ if r < c σ � r � µ � f ( r ) = : Pr ( Y � � c ) = 1 � Φ if r � c σ � The log-likelihood function of the sample f y 1 , ..., y N g is � y i � µ � � � c � µ �� µ , σ 2 � 1 = ∏ ∏ L σ φ 1 � Φ . σ σ y i < c y i = c � Usually, we shall be interested in a regression version of this model: � x 0 β , σ 2 � Y � j X = x � N , in which case the likelihood takes the form � y i � x 0 � � � c � x 0 β �� β , σ 2 � 1 i β = ∏ ∏ L σ φ 1 � Φ . σ σ y i < c y i = c 2

Means of censored normal variables � Consider the following right-censored variable: � Y � if Y � � c Y = c if Y � > c � µ , σ 2 � with Y � � N . Therefore, E ( Y ) = E ( Y � j Y � � c ) Pr ( Y � � c ) + c Pr ( Y � > c ) � Letting Y � = µ + σε with ε � N ( 0 , 1 ) � c � µ � Pr ( Y � � c ) = Φ σ � � � c � µ � ε j ε � c � µ E ( Y � j Y � � c ) = µ + σ E = µ � σλ . σ σ � Note that Z r Z r � ∞ e φ ( e ) 1 � ∞ φ 0 ( e ) de = � φ ( r ) E ( ε j ε � r ) = Φ ( r ) de = � Φ ( r ) = � λ ( r ) Φ ( r ) and Z ∞ Z ∞ e φ ( e ) 1 φ 0 ( e ) de = � � φ ( r ) E ( ε j ε > r ) = Φ ( � r ) de = � Φ ( � r ) = λ ( � r ) . Φ ( � r ) r r 3

Illustration 2: Censoring at zero (Tobit model) � Tobin (1958) considered the following model for expenditure on durables � � X 0 β + U , 0 Y = max � 0 , σ 2 � U j X � N . � This is similar to the …rst example, but now we have left-censoring at zero. � However, the nature of the application is very di¤erent because there is no physical censoring (the variable Y � is just a model’s construct). � We are interested in the model as a way of capturing a particular form of nonlinearity in the relationship between X and Y . � In a utility based model, the variable Y � might be interpreted as a notional demand before non-negativity is imposed. � With censoring at zero we have � Y � if Y � > 0 Y = 0 if Y � � 0 E ( Y ) = E ( Y � j Y � > 0 ) Pr ( Y � > 0 ) � � � µ � ε > � µ Pr ( Y � > 0 ) = Pr = Φ σ σ � � � µ � ε j ε > � µ E ( Y � j Y � > 0 ) = µ + σ E = µ + σλ . σ σ 4

Heckman’s generalized selection model � Consider the model y � x 0 β + σ u = � � z 0 γ + v � 0 d = 1 � u � � � 1 �� ρ j z � N 0 , v ρ 1 so that ! � ρ u , 1 � ρ 2 � r � ρ u v j z , u � N Pr ( v � r j z , u ) = Φ p or . 1 � ρ 2 � In Heckman’s original model, y � denotes female log market wage and d is an indicator of participation in the labor force. � The index f z 0 γ + v g is a reduced form of the di¤erence between market wage and reservation wage. 5

Joint likelihood function � The joint likelihood is: L = ∑ ln f p ( d = 1 , y � j z ) g + ∑ ln Pr ( d = 0 j z ) d = 1 d = 0 we have p ( d = 1 , y � j z ) = Pr ( d = 1 j z , y � ) f ( y � j z ) � y � � x 0 β � f ( y � j z ) = 1 σ φ σ ! ! � z 0 γ � ρ u z 0 γ + ρ u � � = 1 � Φ Pr ( d = 1 j z , y � ) = 1 � Pr v � � z 0 γ j z , u p = Φ p . 1 � ρ 2 1 � ρ 2 � Thus ( !) � 1 � z 0 γ + ρ u � � �� L ( γ , β , σ ) = ∑ + ∑ z 0 γ σ φ ( u ) + ln Φ p 1 � Φ ln ln 1 � ρ 2 d = 1 d = 0 where u = y � � x 0 β . σ � Note that if ρ = 0 this log likelihood boils down to the sum a Gaussian linear regression log likelihood and a probit log likelihood. 6

Density of y � conditioned on d = 1 � From the previous result we know that ! � y � � x 0 β � z 0 γ + ρ u p ( d = 1 , y � j z ) = 1 σ φ Φ p . σ 1 � ρ 2 � Alternatively, to obtain it we could factorize as follows � � p ( d = 1 , y � j z ) = Pr ( d = 1 j z ) f ( y � j z , d = 1 ) = Φ f ( y � j z , d = 1 ) . z 0 γ � From the previous expression we know that ! f ( y � j z , d = 1 ) = p ( d = 1 , y � j z ) z 0 γ + ρ u 1 1 p = Φ ( z 0 γ ) Φ σ φ ( u ) . Φ ( z 0 γ ) 1 � ρ 2 � Note that if ρ = 0 we have f ( y � j z , d = 1 ) = f ( y � j z ) = σ � 1 φ ( u ) . 7

Two-step method � Then mean of f ( y � j z , d = 1 ) is given by � � E ( y � j z , d = 1 ) x 0 β + σ E u j z 0 γ + v � 0 = � � = x 0 β + σρλ � � x 0 β + σρ E v j v � � z 0 γ z 0 γ = � � 0 i , b , where b x 0 λ i = λ ( z 0 � Form w i = i b γ ) and b λ i γ is the probit estimate. � Then do the OLS regression of y on x and b λ in the subsample with d = 1 to get consistent estimates of β and σ uv (= σρ ) : ! � 1 � � b β w i w 0 ∑ ∑ = w i y i . i b σ uv d i = 1 d i = 1 8

Nonparametric identi…cation: The fundamental role of exclusion restrictions � The role of exclusion restrictions for identi…cation in a selection model is paramount. � In applications there is a marked contrast in credibility between estimates that rely exclusively on the nonlinearity and those that use exclusion restrictions. � The model of interest is Y = g 0 ( X ) + U D = 1 ( p ( X , Z ) � V > 0 ) where ( U , V ) are independent of ( X , Z ) and V is uniform in the ( 0 , 1 ) interval. � Thus, E ( U j X , Z , D = 1 ) = E [ U j V < p ( X , Z )] = λ 0 [ p ( X , Z )] E ( Y j X , Z ) = g 0 ( X ) (i.e. enforcing the exclusion restriction), but we observe E ( Y j X , Z , D = 1 ) = µ ( X , Z ) = g 0 ( X ) + λ 0 [ p ( X , Z )] E ( D j X , Z ) = p ( X , Z ) . � The question is whether g 0 ( . ) and λ 0 ( . ) can be identi…ed from knowledge of µ ( X , Z ) and p ( X , Z ) . 9

� Let us consider …rst the case where X and Z are continuous. Suppose there is an alternative solution ( g � , λ � ) . Then g 0 ( X ) � g � ( X ) + λ 0 ( p ) � λ � ( p ) = 0 . Di¤erentiating ∂ ( λ 0 � λ � ) ∂ p = 0 ∂ p ∂ Z ∂ ( g 0 � g � ) + ∂ ( λ 0 � λ � ) ∂ p = 0 . ∂ X ∂ p ∂ X � Under the assumption that ∂ p / ∂ Z 6 = 0 (instrument relevance), we have ∂ ( λ 0 � λ � ) ∂ ( g 0 � g � ) = 0 , = 0 ∂ p ∂ X so that λ 0 � λ � and g 0 � g � are constant (i.e. g 0 ( X ) is identi…ed up to an unknown constant). � This is the identi…cation result in Das, Newey, and Vella (2003). � E ( Y j X ) is identi…ed up to a constant, provided we have a continuous instrument. � Identi…cation of the constant requires units for which the probability of selection is arbitrarily close to one (“identi…cation at in…nity”). � Unfortunately, the constants are important for identifying average treatment e¤ects. 10

Z discrete � With binary Z , functional form assumptions play a more fundamental role in securing identi…cation than in the case of an exclusion restriction of a continuous variable. � Suppose X is continuous but Z is a dummy variable. In general g 0 ( X ) is not identi…ed. To see this, consider µ ( X , 1 ) = g 0 ( X ) + λ 0 [ p ( X , 1 )] µ ( X , 0 ) = g 0 ( X ) + λ 0 [ p ( X , 0 )] , so that we identify the di¤erence ν ( X ) = λ 0 [ p ( X , 1 )] � λ 0 [ p ( X , 0 )] , but this does not su¢ce to determine λ 0 up to a constant. � Take as an example the case where p ( X , Z ) is a simple logit or probit model: p ( X , Z ) = F ( β X + γ Z ) , then letting h 0 ( . ) = λ 0 [ F ( . )] , ν ( X ) = h 0 ( β X + γ ) � h 0 ( β X ) . � Suppose the existence of another solution h � . We should have h 0 ( β X + γ ) � h � ( β X + γ ) = h 0 ( β X ) � h � ( β X ) , which is satis…ed by a multiplicity of periodic functions. 11

X and Z discrete � If X is also discrete, there is clearly lack of identi…cation. � For example, suppose X and Z are dummy variables: µ ( 0 , 0 ) = g 0 ( 0 ) + λ 0 [ p ( 0 , 0 )] µ ( 0 , 1 ) = g 0 ( 0 ) + λ 0 [ p ( 0 , 1 )] µ ( 1 , 0 ) = g 0 ( 1 ) + λ 0 [ p ( 1 , 0 )] µ ( 1 , 1 ) = g 0 ( 1 ) + λ 0 [ p ( 1 , 1 )] . � Since λ 0 ( . ) is unknown g 0 ( 1 ) � g 0 ( 0 ) is not identi…ed. � Only λ 0 [ p ( 1 , 1 )] � λ 0 [ p ( 1 , 0 )] and λ 0 [ p ( 0 , 1 )] � λ 0 [ p ( 0 , 0 )] are identi…ed. 12

Tobit and Selection Models Manuel Arellano CEMFI January 2014 - PowerPoint PPT Presentation

Tobit and Selection Models Manuel Arellano CEMFI January 2014 Censored Regression Illustration 1: Top-coding in wages Suppose Y (log wages) are subject to top coding (as with social security records): Y if Y c Y = c if

Linear Panels and Random Coefficients Manuel Arellano Cemfi September 2017 Introduction

Financial Markets and Fluctuations in Uncertainty Cristina Arellano, Yan Bai, and Patrick Kehoe

Partial Default Cristina Arellano, Xavier Mateos-Planas and Jose-Victor Rios-Rull Mpls Fed, Univ

Financial Crises & Fluctuations in Uncertainty by Cristina Arellano, Yan Bai, and Patrick

Quantile Response and Panel Data Manuel Arellano CEMFI Africa Region Training Workshop

Econometric Methods of Program Evaluation Manuel Arellano CEMFI February 2015 I. Structural and

UX Manuel Matuzovi @mmatuzo pitercss 06/2017 Manuel Matuzovic Manuel Matuzovi , ,

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Conference Site Selection Stephanie Sabal Program Coordinator: Site Selection sabal@acm.org

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Selection Rules: Selection Rules Each of the spectroscopies have associated selection

May 2020 Employment Report Doug Walls, Labor Market Information Director Types of Employment Data

September 2020 Employment Report Doug Walls, Labor Market Information Director Types of

Hub and Spoke Gareth Jones Hub and Spoke What is in the DH Hub and Spoke proposal? NPA

COMPETITION BETWEEN SOFTWARE-AS-A-SERVICE VENDORS Dan Ma Robert J. Kauffman Singapore

Discussion of The Time-Varying Volatility of Macroeconomic Fluctuations by Justiniano and

Combined Vehicle Routing and Crew Scheduling with Hours of Service Regulations Thibaut Vidal 1 and

Roadmap for Section 9.2 Windows NT/2000/XP/2003 real-time behavior Windows NT/2000/XP/2003 I/O

INFRASTRUCTURE Optimizing Interrupt Handling Performance for Memory Failures in Large Scale Data