dealing with and understanding endogeneity
play

Dealing With and Understanding Endogeneity Enrique Pinzn StataCorp - PowerPoint PPT Presentation

Dealing With and Understanding Endogeneity Enrique Pinzn StataCorp LP October 20, 2016 Barcelona (StataCorp LP) October 20, 2016 Barcelona 1 / 59 Importance of Endogeneity Endogeneity occurs when a variable, observed or unobserved, that


  1. Dealing With and Understanding Endogeneity Enrique Pinzón StataCorp LP October 20, 2016 Barcelona (StataCorp LP) October 20, 2016 Barcelona 1 / 59

  2. Importance of Endogeneity Endogeneity occurs when a variable, observed or unobserved, that is not included in our models, is related to a variable we incorporated in our model. Model building Endogeneity contradicts: ◮ Unobservables have no effect or explanatory power ◮ The covariates cause the outcome of interest Endogeneity prevents us from making causal claims Endogeneity is a fundamental concern of social scientists (first to the party) (StataCorp LP) October 20, 2016 Barcelona 2 / 59

  3. Importance of Endogeneity Endogeneity occurs when a variable, observed or unobserved, that is not included in our models, is related to a variable we incorporated in our model. Model building Endogeneity contradicts: ◮ Unobservables have no effect or explanatory power ◮ The covariates cause the outcome of interest Endogeneity prevents us from making causal claims Endogeneity is a fundamental concern of social scientists (first to the party) (StataCorp LP) October 20, 2016 Barcelona 2 / 59

  4. Outline Defining concepts and building our intuition 1 Stata built in tools to solve endogeneity problems 2 Stata commands to address endogeneity in non-built-in situations 3 (StataCorp LP) October 20, 2016 Barcelona 3 / 59

  5. Defining concepts and building our intuition (StataCorp LP) October 20, 2016 Barcelona 4 / 59

  6. Building our Intuition: A Regression Model The regression model is given by: y i = β 0 + β 1 x 1 i + . . . + β k x ki + ε i E ( ε i | x 1 i , . . . , x ki ) = 0 Once we have the information of our regressors, on average what we did not include in our model has no importance. E ( y i | x 1 i , . . . , x ki ) = β 0 + β 1 x 1 i + . . . + β k x ki (StataCorp LP) October 20, 2016 Barcelona 5 / 59

  7. Building our Intuition: A Regression Model The regression model is given by: y i = β 0 + β 1 x 1 i + . . . + β k x ki + ε i E ( ε i | x 1 i , . . . , x ki ) = 0 Once we have the information of our regressors, on average what we did not include in our model has no importance. E ( y i | x 1 i , . . . , x ki ) = β 0 + β 1 x 1 i + . . . + β k x ki (StataCorp LP) October 20, 2016 Barcelona 5 / 59

  8. Graphically (StataCorp LP) October 20, 2016 Barcelona 6 / 59

  9. Examples of Endogeneity We want to explain wages and we use years of schooling as a covariate. Years of schooling is correlated with unobserved ability, and work ethic. We want to explain to probability of divorce and use employment status as a covariate. Employment status might be correlated to unobserved economic shocks. We want to explain graduation rates for different school districts and use the fraction of the budget used in education as a covariate. Budget decisions are correlated to unobservable political factors. Estimating demand for a good using prices. Demand and prices are determined simultaneously. (StataCorp LP) October 20, 2016 Barcelona 7 / 59

  10. A General Framework If the unobservables, what we did not include in our model is correlated to our covariates then: E ( ε | X ) � = 0 Omitted variable “bias” Simultaneity Functional form misspecification Selection “bias” A useful implication of the above condition � X ′ ε � E � = 0 (StataCorp LP) October 20, 2016 Barcelona 8 / 59

  11. A General Framework If the unobservables, what we did not include in our model is correlated to our covariates then: E ( ε | X ) � = 0 Omitted variable “bias” Simultaneity Functional form misspecification Selection “bias” A useful implication of the above condition � X ′ ε � E � = 0 (StataCorp LP) October 20, 2016 Barcelona 8 / 59

  12. A General Framework If the unobservables, what we did not include in our model is correlated to our covariates then: E ( ε | X ) � = 0 Omitted variable “bias” Simultaneity Functional form misspecification Selection “bias” A useful implication of the above condition � X ′ ε � E � = 0 (StataCorp LP) October 20, 2016 Barcelona 8 / 59

  13. Example 1: Omitted Variable “Bias” The true model is given by y = β 0 + β 1 x 1 + β 2 x 2 + ε E ( ε | x 1 , x 2 ) = 0 the researcher does not incorporate x 2 , i.e. they think y = β 0 + β 1 x 1 + ν The objective is to estimate β 1 . In our framework we get a consistent estimate if E ( ν | x 1 ) = 0 (StataCorp LP) October 20, 2016 Barcelona 9 / 59

  14. Example 1: Omitted Variable “Bias” The true model is given by y = β 0 + β 1 x 1 + β 2 x 2 + ε E ( ε | x 1 , x 2 ) = 0 the researcher does not incorporate x 2 , i.e. they think y = β 0 + β 1 x 1 + ν The objective is to estimate β 1 . In our framework we get a consistent estimate if E ( ν | x 1 ) = 0 (StataCorp LP) October 20, 2016 Barcelona 9 / 59

  15. Example 1: Endogeneity Using the definition of the true model y = β 0 + β 1 x 1 + β 2 x 2 + ε E ( ε | x 1 , x 2 ) = 0 We know that ν = β 2 x 2 + ε and E ( ν | x 1 ) = β 2 E ( x 2 | x 1 ) E ( ν | x 1 ) = 0 only if β 2 = 0 or x 2 and x 1 are uncorrelated (StataCorp LP) October 20, 2016 Barcelona 10 / 59

  16. Example 1: Endogeneity Using the definition of the true model y = β 0 + β 1 x 1 + β 2 x 2 + ε E ( ε | x 1 , x 2 ) = 0 We know that ν = β 2 x 2 + ε and E ( ν | x 1 ) = β 2 E ( x 2 | x 1 ) E ( ν | x 1 ) = 0 only if β 2 = 0 or x 2 and x 1 are uncorrelated (StataCorp LP) October 20, 2016 Barcelona 10 / 59

  17. Example 1 Simulating Data . clear . set obs 10000 number of observations (_N) was 0, now 10,000 . set seed 111 . // Generating a common component for x1 and x2 . generate a = rchi2(1) . // Generating x1 and x2 . generate x1 = rnormal() + a . generate x2 = rchi2(2)-3 + a . generate e = rchi2(1) - 1 . // Generating the outcome . generate y = 1 - x1 + x2 + e (StataCorp LP) October 20, 2016 Barcelona 11 / 59

  18. Example 1 Estimation . // estimating true model . quietly regress y x1 x2 . estimates store real . //estimating model with omitted variable . quietly regress y x1 . estimates store omitted . estimates table real omitted, se Variable real omitted x1 -.98710456 -.31950213 .00915198 .01482454 x2 .99993928 .00648263 _cons .9920283 .32968254 .01678995 .02983985 legend: b/se (StataCorp LP) October 20, 2016 Barcelona 12 / 59

  19. Example 2: Simultaneity in a market equilibrium The demand and supply equations for the market are given by = β P d + ε d Q d Q s = θ P s + ε s If a researcher wants to estimate Q d and ignores that P d is simultaneously determined, we have an endogeneity problem that fits in our framework. (StataCorp LP) October 20, 2016 Barcelona 13 / 59

  20. Example 2: Assumptions and Equilibrium We assume: All quantities are scalars β < 0 and θ > 0 E ( ε d ) = E ( ε s ) = E ( ε d ε s ) = 0 � ε 2 � ≡ σ 2 E d d The equilibrium prices and quantities are given by: ε s − ε d P = β − θ βε s − θε d = Q β − θ (StataCorp LP) October 20, 2016 Barcelona 14 / 59

  21. Example 2: Endogeneity This is a simple linear model so we can verify if E ( P d ε d ) = 0 Using our equilibrium conditions and the fact that ε s and ε d are uncorrelated we get � ε s − ε d � E ( P d ε d ) = E β − θ ε d ε 2 � � E ( ε s ε d ) − E d = β − θ β − θ ε 2 � � − E d = β − θ − σ 2 d = β − θ (StataCorp LP) October 20, 2016 Barcelona 15 / 59

  22. Example 2: Endogeneity This is a simple linear model so we can verify if E ( P d ε d ) = 0 Using our equilibrium conditions and the fact that ε s and ε d are uncorrelated we get � ε s − ε d � E ( P d ε d ) = E β − θ ε d ε 2 � � E ( ε s ε d ) − E d = β − θ β − θ ε 2 � � − E d = β − θ − σ 2 d = β − θ (StataCorp LP) October 20, 2016 Barcelona 15 / 59

  23. Example 2: Graphically (StataCorp LP) October 20, 2016 Barcelona 16 / 59

  24. Example 3: Functional Form Misspecification Suppose the true model is given by: y = sin ( x ) + ε E ( ε | x ) = 0 But the researcher thinks that: y = x β + ν (StataCorp LP) October 20, 2016 Barcelona 17 / 59

  25. Example 3: Functional Form Misspecification Suppose the true model is given by: y = sin ( x ) + ε E ( ε | x ) = 0 But the researcher thinks that: y = x β + ν (StataCorp LP) October 20, 2016 Barcelona 17 / 59

  26. Example 3: Real vs. Estimated Predicted values (StataCorp LP) October 20, 2016 Barcelona 18 / 59

  27. Example 3: Endogeneity Adding zero we have = x β − x β + sin ( x ) + ε y y = x β + ν ν ≡ sin ( x ) − x β + ε For our estimates to be consistent we need to have E ( ν | X ) = 0 but E ( ν | x ) = sin ( x ) − x β + E ( ε | x ) = sin ( x ) − x β � = 0 (StataCorp LP) October 20, 2016 Barcelona 19 / 59

  28. Example 3: Endogeneity Adding zero we have = x β − x β + sin ( x ) + ε y y = x β + ν ν ≡ sin ( x ) − x β + ε For our estimates to be consistent we need to have E ( ν | X ) = 0 but E ( ν | x ) = sin ( x ) − x β + E ( ε | x ) = sin ( x ) − x β � = 0 (StataCorp LP) October 20, 2016 Barcelona 19 / 59

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend