gov 2002 causal inference ii instrumental variables
play

Gov 2002 - Causal Inference II: Instrumental Variables Matthew - PowerPoint PPT Presentation

Gov 2002 - Causal Inference II: Instrumental Variables Matthew Blackwell Arthur Spirling October 2nd, 2014 Instrumental Variables Last week we talked about how to make progress when you have randomization or selection on the observables.


  1. IV estimator with constant effects Y i = α + τ A i + γ U i + η i ◮ With this in hand, we can formulate an expression for the average treatment effect here: τ = Cov( Y i , Z i ) Cov( A i , Z i ) = Cov( Y i , Z i ) / V [ Z i ] Cov( A i , Z i ) / V [ Z i ] ◮ Reduced form coefficient: Cov( Y i , Z i ) / V [ Z i ]

  2. IV estimator with constant effects Y i = α + τ A i + γ U i + η i ◮ With this in hand, we can formulate an expression for the average treatment effect here: τ = Cov( Y i , Z i ) Cov( A i , Z i ) = Cov( Y i , Z i ) / V [ Z i ] Cov( A i , Z i ) / V [ Z i ] ◮ Reduced form coefficient: Cov( Y i , Z i ) / V [ Z i ] ◮ First stage coefficient: Cov( A i , Z i ) / V [ Z i ]

  3. IV estimator with constant effects Y i = α + τ A i + γ U i + η i ◮ With this in hand, we can formulate an expression for the average treatment effect here: τ = Cov( Y i , Z i ) Cov( A i , Z i ) = Cov( Y i , Z i ) / V [ Z i ] Cov( A i , Z i ) / V [ Z i ] ◮ Reduced form coefficient: Cov( Y i , Z i ) / V [ Z i ] ◮ First stage coefficient: Cov( A i , Z i ) / V [ Z i ] ◮ What happens with a weak first stage?

  4. Wald Estimator ◮ With a binary instrument, there is a simple estimator based on this formulation called the Wald estimator. It is easy to show that: τ = Cov( Y i , Z i ) Cov( A i , Z i ) = E [ Y i | Z i = 1] − E [ Y i | Z i = 0] E [ A i | Z i = 1] − E [ A i | Z i = 0]

  5. Wald Estimator ◮ With a binary instrument, there is a simple estimator based on this formulation called the Wald estimator. It is easy to show that: τ = Cov( Y i , Z i ) Cov( A i , Z i ) = E [ Y i | Z i = 1] − E [ Y i | Z i = 0] E [ A i | Z i = 1] − E [ A i | Z i = 0] ◮ Intuitively, the effects of Z i on Y i divided by the effect of Z i on A i

  6. What about covariates? ◮ No covariates up until now. What if we have a set of covariates X i that we are also conditioning on?

  7. What about covariates? ◮ No covariates up until now. What if we have a set of covariates X i that we are also conditioning on? ◮ Let’s start with linear models for both the outcome and the treatment: Y i = X ′ i β + τ A i + ε i A i = X ′ i α + γ Z i + ν i

  8. What about covariates? ◮ No covariates up until now. What if we have a set of covariates X i that we are also conditioning on? ◮ Let’s start with linear models for both the outcome and the treatment: Y i = X ′ i β + τ A i + ε i A i = X ′ i α + γ Z i + ν i ◮ Now, we assume that X i are exogenous along with Z i : E [ Z i ν i ] = 0 E [ Z i ε i ] = 0 E [ X i ν i ] = 0 E [ X i ε i ] = 0

  9. What about covariates? ◮ No covariates up until now. What if we have a set of covariates X i that we are also conditioning on? ◮ Let’s start with linear models for both the outcome and the treatment: Y i = X ′ i β + τ A i + ε i A i = X ′ i α + γ Z i + ν i ◮ Now, we assume that X i are exogenous along with Z i : E [ Z i ν i ] = 0 E [ Z i ε i ] = 0 E [ X i ν i ] = 0 E [ X i ε i ] = 0 ◮ . . . but A i is endogenous : E [ A i ε i ] � = 0

  10. Getting the reduced form ◮ We can plug the treatment equation into the outcome equation: Y i = X ′ i β + τ [ X ′ i α + γ Z i + ν i ] + ε i = X ′ i β + τ [ X ′ i α + γ Z i ] + [ τν i + ε i ] = X ′ i β + τ [ X ′ i α + γ Z i ] + ε ∗ i

  11. Getting the reduced form ◮ We can plug the treatment equation into the outcome equation: Y i = X ′ i β + τ [ X ′ i α + γ Z i + ν i ] + ε i = X ′ i β + τ [ X ′ i α + γ Z i ] + [ τν i + ε i ] = X ′ i β + τ [ X ′ i α + γ Z i ] + ε ∗ i ◮ Red value in the brackets is the population fitted value of the treatment, E [ A i | X i , Z i ]

  12. Getting the reduced form ◮ We can plug the treatment equation into the outcome equation: Y i = X ′ i β + τ [ X ′ i α + γ Z i + ν i ] + ε i = X ′ i β + τ [ X ′ i α + γ Z i ] + [ τν i + ε i ] = X ′ i β + τ [ X ′ i α + γ Z i ] + ε ∗ i ◮ Red value in the brackets is the population fitted value of the treatment, E [ A i | X i , Z i ] ◮ Because Z i and X i are uncorrelated with ν i and ε i , then this fitted value is also independent of ε ∗ i .

  13. Getting the reduced form ◮ We can plug the treatment equation into the outcome equation: Y i = X ′ i β + τ [ X ′ i α + γ Z i + ν i ] + ε i = X ′ i β + τ [ X ′ i α + γ Z i ] + [ τν i + ε i ] = X ′ i β + τ [ X ′ i α + γ Z i ] + ε ∗ i ◮ Red value in the brackets is the population fitted value of the treatment, E [ A i | X i , Z i ] ◮ Because Z i and X i are uncorrelated with ν i and ε i , then this fitted value is also independent of ε ∗ i . ◮ Thus, the population regression coefficient of a Y i on [ X ′ i α + γ Z i ] is the average treatment effect, τ .

  14. Two-stage least squares ◮ In practice, we estimate the first stage from a sample and calculate OLS fitted values: ˆ A i = X ′ i ˆ α + ˆ γ Z i .

  15. Two-stage least squares ◮ In practice, we estimate the first stage from a sample and calculate OLS fitted values: ˆ A i = X ′ i ˆ α + ˆ γ Z i . ◮ Here, ˆ α and ˆ γ are estimates from OLS. Then, we estimate a regression of Y i on X i and ˆ A i . We plug this into our equation for Y i and note that the error for A i is now a residual: i β + τ ˆ A i + [ ε i + τ ( A i − ˆ Y i = X ′ A i )]

  16. Two-stage least squares ◮ In practice, we estimate the first stage from a sample and calculate OLS fitted values: ˆ A i = X ′ i ˆ α + ˆ γ Z i . ◮ Here, ˆ α and ˆ γ are estimates from OLS. Then, we estimate a regression of Y i on X i and ˆ A i . We plug this into our equation for Y i and note that the error for A i is now a residual: i β + τ ˆ A i + [ ε i + τ ( A i − ˆ Y i = X ′ A i )] ◮ Key question: is ˆ A i uncorrelated with the error?

  17. Two-stage least squares ◮ In practice, we estimate the first stage from a sample and calculate OLS fitted values: ˆ A i = X ′ i ˆ α + ˆ γ Z i . ◮ Here, ˆ α and ˆ γ are estimates from OLS. Then, we estimate a regression of Y i on X i and ˆ A i . We plug this into our equation for Y i and note that the error for A i is now a residual: i β + τ ˆ A i + [ ε i + τ ( A i − ˆ Y i = X ′ A i )] ◮ Key question: is ˆ A i uncorrelated with the error? ◮ ˆ A i is just a function of X i and Z i so it is uncorrelated with ε i .

  18. Two-stage least squares ◮ In practice, we estimate the first stage from a sample and calculate OLS fitted values: ˆ A i = X ′ i ˆ α + ˆ γ Z i . ◮ Here, ˆ α and ˆ γ are estimates from OLS. Then, we estimate a regression of Y i on X i and ˆ A i . We plug this into our equation for Y i and note that the error for A i is now a residual: i β + τ ˆ A i + [ ε i + τ ( A i − ˆ Y i = X ′ A i )] ◮ Key question: is ˆ A i uncorrelated with the error? ◮ ˆ A i is just a function of X i and Z i so it is uncorrelated with ε i . ◮ We also know that ˆ A i is uncorrelated with ( A i − ˆ A i )?

  19. Two-stage least squares ◮ Heuristic procedure:

  20. Two-stage least squares ◮ Heuristic procedure: 1. Run regression of treatment on covariates and instrument

  21. Two-stage least squares ◮ Heuristic procedure: 1. Run regression of treatment on covariates and instrument 2. Construct fitted values of treatment

  22. Two-stage least squares ◮ Heuristic procedure: 1. Run regression of treatment on covariates and instrument 2. Construct fitted values of treatment 3. Run regression of outcome on covariates and fitted values

  23. Two-stage least squares ◮ Heuristic procedure: 1. Run regression of treatment on covariates and instrument 2. Construct fitted values of treatment 3. Run regression of outcome on covariates and fitted values ◮ Note that this isn’t how we actually estimate 2SLS because the standard errors are all wrong.

  24. Two-stage least squares ◮ Heuristic procedure: 1. Run regression of treatment on covariates and instrument 2. Construct fitted values of treatment 3. Run regression of outcome on covariates and fitted values ◮ Note that this isn’t how we actually estimate 2SLS because the standard errors are all wrong. ◮ Computer wants to calculate the standard errors based on ε ∗ i , but what we really want is the standard errors based on ε i .

  25. Nunn & Wantchekon IV example

  26. General 2SLS ◮ To save on notation, we’ll roll all the variables in the structural model in one vector, X i , of size k , some of which may be endogenous.

  27. General 2SLS ◮ To save on notation, we’ll roll all the variables in the structural model in one vector, X i , of size k , some of which may be endogenous. ◮ The structural model, then is: Y i = X ′ i β + ε i

  28. General 2SLS ◮ To save on notation, we’ll roll all the variables in the structural model in one vector, X i , of size k , some of which may be endogenous. ◮ The structural model, then is: Y i = X ′ i β + ε i ◮ Z i will be a vector of l exogenous variables that includes any exogenous variables in X i plus any instruments. Key assumption: E [ Z i ε i ] = 0

  29. Nasty Matrix Algebra ◮ Useful quantities: i ]) − 1 E [ Z i X ′ Π = ( E [ Z i Z ′ i ] (projection matrix) V i = Π ′ Z i (fitted values)

  30. Nasty Matrix Algebra ◮ Useful quantities: i ]) − 1 E [ Z i X ′ Π = ( E [ Z i Z ′ i ] (projection matrix) V i = Π ′ Z i (fitted values) ◮ To derive the 2SLS estimator, take the fitted values, Π ′ Z i and multiply both sides of the outcome equation by them: Y i = X ′ i β + ε i

  31. Nasty Matrix Algebra ◮ Useful quantities: i ]) − 1 E [ Z i X ′ Π = ( E [ Z i Z ′ i ] (projection matrix) V i = Π ′ Z i (fitted values) ◮ To derive the 2SLS estimator, take the fitted values, Π ′ Z i and multiply both sides of the outcome equation by them: Y i = X ′ i β + ε i

  32. Nasty Matrix Algebra ◮ Useful quantities: i ]) − 1 E [ Z i X ′ Π = ( E [ Z i Z ′ i ] (projection matrix) V i = Π ′ Z i (fitted values) ◮ To derive the 2SLS estimator, take the fitted values, Π ′ Z i and multiply both sides of the outcome equation by them: Y i = X ′ i β + ε i Π ′ Z i Y i = Π ′ Z i X ′ i β + Π ′ Z i ε i

  33. Nasty Matrix Algebra ◮ Useful quantities: i ]) − 1 E [ Z i X ′ Π = ( E [ Z i Z ′ i ] (projection matrix) V i = Π ′ Z i (fitted values) ◮ To derive the 2SLS estimator, take the fitted values, Π ′ Z i and multiply both sides of the outcome equation by them: Y i = X ′ i β + ε i Π ′ Z i Y i = Π ′ Z i X ′ i β + Π ′ Z i ε i Π ′ E [ Z i Y i ] = Π ′ E [ Z i X ′ i ] β + Π ′ E [ Z i ε i ]

  34. Nasty Matrix Algebra ◮ Useful quantities: i ]) − 1 E [ Z i X ′ Π = ( E [ Z i Z ′ i ] (projection matrix) V i = Π ′ Z i (fitted values) ◮ To derive the 2SLS estimator, take the fitted values, Π ′ Z i and multiply both sides of the outcome equation by them: Y i = X ′ i β + ε i Π ′ Z i Y i = Π ′ Z i X ′ i β + Π ′ Z i ε i Π ′ E [ Z i Y i ] = Π ′ E [ Z i X ′ i ] β + Π ′ E [ Z i ε i ] Π ′ E [ Z i Y i ] = Π ′ E [ Z i X ′ i ] β

  35. Nasty Matrix Algebra ◮ Useful quantities: i ]) − 1 E [ Z i X ′ Π = ( E [ Z i Z ′ i ] (projection matrix) V i = Π ′ Z i (fitted values) ◮ To derive the 2SLS estimator, take the fitted values, Π ′ Z i and multiply both sides of the outcome equation by them: Y i = X ′ i β + ε i Π ′ Z i Y i = Π ′ Z i X ′ i β + Π ′ Z i ε i Π ′ E [ Z i Y i ] = Π ′ E [ Z i X ′ i ] β + Π ′ E [ Z i ε i ] Π ′ E [ Z i Y i ] = Π ′ E [ Z i X ′ i ] β i ]) − 1 Π ′ E [ Z i Y i ] β = (Π ′ E [ Z i X ′

  36. Nasty Matrix Algebra ◮ Useful quantities: i ]) − 1 E [ Z i X ′ Π = ( E [ Z i Z ′ i ] (projection matrix) V i = Π ′ Z i (fitted values) ◮ To derive the 2SLS estimator, take the fitted values, Π ′ Z i and multiply both sides of the outcome equation by them: Y i = X ′ i β + ε i Π ′ Z i Y i = Π ′ Z i X ′ i β + Π ′ Z i ε i Π ′ E [ Z i Y i ] = Π ′ E [ Z i X ′ i ] β + Π ′ E [ Z i ε i ] Π ′ E [ Z i Y i ] = Π ′ E [ Z i X ′ i ] β i ]) − 1 Π ′ E [ Z i Y i ] β = (Π ′ E [ Z i X ′ i ]) − 1 E [ Z i X ′ i ]) − 1 E [ Z i X ′ i ]) − 1 E [ Z i Y i ] β = ( E [ X i Z ′ i ]( E [ Z i Z ′ i ]( E [ Z i Z ′

  37. How to estimate the parameters ◮ Collect X i into a n × k matrix X = ( X ′ 1 , . . . , X ′ n )

  38. How to estimate the parameters ◮ Collect X i into a n × k matrix X = ( X ′ 1 , . . . , X ′ n ) ◮ Collect Z i into a n × l matrix Z = ( Z ′ 1 , . . . , Z ′ n )

  39. How to estimate the parameters ◮ Collect X i into a n × k matrix X = ( X ′ 1 , . . . , X ′ n ) ◮ Collect Z i into a n × l matrix Z = ( Z ′ 1 , . . . , Z ′ n ) ◮ Matrix party trick: X ′ Z / n = (1 / n ) � N p i X i Z ′ → E [ X i Z ′ i ]. i

  40. How to estimate the parameters ◮ Collect X i into a n × k matrix X = ( X ′ 1 , . . . , X ′ n ) ◮ Collect Z i into a n × l matrix Z = ( Z ′ 1 , . . . , Z ′ n ) ◮ Matrix party trick: X ′ Z / n = (1 / n ) � N p i X i Z ′ → E [ X i Z ′ i ]. i ◮ Take the population formula for the parameters: i ]) − 1 E [ Z i X ′ i ]) − 1 E [ Z i X ′ i ]) − 1 E [ Z i Y i ] β = ( E [ Z i X ′ i ]( E [ Z i Z ′ i ]( E [ Z i Z ′

  41. How to estimate the parameters ◮ Collect X i into a n × k matrix X = ( X ′ 1 , . . . , X ′ n ) ◮ Collect Z i into a n × l matrix Z = ( Z ′ 1 , . . . , Z ′ n ) ◮ Matrix party trick: X ′ Z / n = (1 / n ) � N p i X i Z ′ → E [ X i Z ′ i ]. i ◮ Take the population formula for the parameters: i ]) − 1 E [ Z i X ′ i ]) − 1 E [ Z i X ′ i ]) − 1 E [ Z i Y i ] β = ( E [ Z i X ′ i ]( E [ Z i Z ′ i ]( E [ Z i Z ′ ◮ And plug in the sample values (the n cancels out): ˆ β = [( X ′ Z )( Z ′ Z ) − 1 ( Z ′ X )] − 1 ( Z ′ X )( Z ′ Z ) − 1 ( Z ′ Y )

  42. How to estimate the parameters ◮ Collect X i into a n × k matrix X = ( X ′ 1 , . . . , X ′ n ) ◮ Collect Z i into a n × l matrix Z = ( Z ′ 1 , . . . , Z ′ n ) ◮ Matrix party trick: X ′ Z / n = (1 / n ) � N p i X i Z ′ → E [ X i Z ′ i ]. i ◮ Take the population formula for the parameters: i ]) − 1 E [ Z i X ′ i ]) − 1 E [ Z i X ′ i ]) − 1 E [ Z i Y i ] β = ( E [ Z i X ′ i ]( E [ Z i Z ′ i ]( E [ Z i Z ′ ◮ And plug in the sample values (the n cancels out): ˆ β = [( X ′ Z )( Z ′ Z ) − 1 ( Z ′ X )] − 1 ( Z ′ X )( Z ′ Z ) − 1 ( Z ′ Y ) ◮ This is how R/Stata estimates the 2SLS parameters

  43. Asymptotics for 2SLS ◮ Let V = Z ( Z ′ Z ) − 1 Z ′ X be the matrix of fitted values for X , then we have ˆ β = ( V ′ V ) − 1 V ′ Y

  44. Asymptotics for 2SLS ◮ Let V = Z ( Z ′ Z ) − 1 Z ′ X be the matrix of fitted values for X , then we have ˆ β = ( V ′ V ) − 1 V ′ Y ◮ We can insert the true model for Y : ˆ β = ( V ′ V ) − 1 V ′ ( X β + ε )

  45. Asymptotics for 2SLS ◮ Let V = Z ( Z ′ Z ) − 1 Z ′ X be the matrix of fitted values for X , then we have ˆ β = ( V ′ V ) − 1 V ′ Y ◮ We can insert the true model for Y : ˆ β = ( V ′ V ) − 1 V ′ ( X β + ε ) ◮ Using the matrix party trick and that V ′ X = V ′ V , we have ˆ β = ( V ′ V ) − 1 V ′ X β + ( V ′ V ) − 1 V ′ ε � � − 1 n − 1 � n − 1 � V i V ′ = β + V i ε i i i i

  46. Asymptotics for 2SLS ◮ Let V = Z ( Z ′ Z ) − 1 Z ′ X be the matrix of fitted values for X , then we have ˆ β = ( V ′ V ) − 1 V ′ Y ◮ We can insert the true model for Y : ˆ β = ( V ′ V ) − 1 V ′ ( X β + ε ) ◮ Using the matrix party trick and that V ′ X = V ′ V , we have ˆ β = ( V ′ V ) − 1 V ′ X β + ( V ′ V ) − 1 V ′ ε � � − 1 n − 1 � n − 1 � V i V ′ = β + V i ε i i i i ◮ Consistent because n − 1 � p i V i ε i → E [ V i ε i ] = 0.

  47. Asymptotic variance for 2SLS � � − 1 � � n − 1 � n − 1 / 2 � √ n (ˆ V i V ′ β − β ) = V i ε i i i i ◮ By the CLT, n − 1 / 2 � i V i ε i converges in distribution to N (0 , B ), where B = E [ V ′ i ε ′ i ε i V i ].

  48. Asymptotic variance for 2SLS � � − 1 � � n − 1 � n − 1 / 2 � √ n (ˆ V i V ′ β − β ) = V i ε i i i i ◮ By the CLT, n − 1 / 2 � i V i ε i converges in distribution to N (0 , B ), where B = E [ V ′ i ε ′ i ε i V i ]. ◮ By the LLN, n − 1 � p i V i V ′ → E [ V ′ i V i ]. i

  49. Asymptotic variance for 2SLS � � − 1 � � n − 1 � n − 1 / 2 � √ n (ˆ V i V ′ β − β ) = V i ε i i i i ◮ By the CLT, n − 1 / 2 � i V i ε i converges in distribution to N (0 , B ), where B = E [ V ′ i ε ′ i ε i V i ]. ◮ By the LLN, n − 1 � p i V i V ′ → E [ V ′ i V i ]. i ◮ Thus, we have that √ n (ˆ β − β ) has asymptotic variance: i V i ]) − 1 E [ V ′ i V i ]) − 1 ( E [ V ′ i ε ′ i ε i V i ]( E [ V ′

  50. Asymptotic variance for 2SLS � � − 1 � � n − 1 � n − 1 / 2 � √ n (ˆ V i V ′ β − β ) = V i ε i i i i ◮ By the CLT, n − 1 / 2 � i V i ε i converges in distribution to N (0 , B ), where B = E [ V ′ i ε ′ i ε i V i ]. ◮ By the LLN, n − 1 � p i V i V ′ → E [ V ′ i V i ]. i ◮ Thus, we have that √ n (ˆ β − β ) has asymptotic variance: i V i ]) − 1 E [ V ′ i V i ]) − 1 ( E [ V ′ i ε ′ i ε i V i ]( E [ V ′ ◮ Replace with the sample quantities to get estimates: β ) = ( V ′ V ) − 1 � � � var(ˆ u 2 ( V ′ V ) − 1 i V i V ′ � ˆ i i i ˆ where ˆ u i = Y i − X ′ β

  51. Overidentification ◮ What if we have more instruments than endogenous variables?

  52. Overidentification ◮ What if we have more instruments than endogenous variables? ◮ When there are more instruments than causal parameters ( l > k ), the model is overidentified .

  53. Overidentification ◮ What if we have more instruments than endogenous variables? ◮ When there are more instruments than causal parameters ( l > k ), the model is overidentified . ◮ When there are as many instruments as causal parameters ( l = k ), the model is just identified .

  54. Overidentification ◮ What if we have more instruments than endogenous variables? ◮ When there are more instruments than causal parameters ( l > k ), the model is overidentified . ◮ When there are as many instruments as causal parameters ( l = k ), the model is just identified . ◮ With more than one instrument and constant effects, we can test for the plausibility of the exclusion restriction(s) using an overidentification test.

  55. Overidentification ◮ What if we have more instruments than endogenous variables? ◮ When there are more instruments than causal parameters ( l > k ), the model is overidentified . ◮ When there are as many instruments as causal parameters ( l = k ), the model is just identified . ◮ With more than one instrument and constant effects, we can test for the plausibility of the exclusion restriction(s) using an overidentification test. ◮ Is it plausible to find more than one instrument?

  56. Overidentification tests ◮ Sargan test, Hansen test, J-test, etc.

  57. Overidentification tests ◮ Sargan test, Hansen test, J-test, etc. ◮ Basic idea: under null that all instruments are good, running it with different subset of the instruments should only differ due to sampling noise.

  58. Overidentification tests ◮ Sargan test, Hansen test, J-test, etc. ◮ Basic idea: under null that all instruments are good, running it with different subset of the instruments should only differ due to sampling noise. ◮ Identify the distribution of that noise under the null to develop a test.

  59. Overidentification tests ◮ Sargan test, Hansen test, J-test, etc. ◮ Basic idea: under null that all instruments are good, running it with different subset of the instruments should only differ due to sampling noise. ◮ Identify the distribution of that noise under the null to develop a test. ◮ If we reject the null hypothesis in these overidentification tests, then it means that the exclusion restrcitions for our instruments are probably incorrect. Note that it won’t tell us which of them are incorrect, just that at least one is.

  60. Overidentification tests ◮ Sargan test, Hansen test, J-test, etc. ◮ Basic idea: under null that all instruments are good, running it with different subset of the instruments should only differ due to sampling noise. ◮ Identify the distribution of that noise under the null to develop a test. ◮ If we reject the null hypothesis in these overidentification tests, then it means that the exclusion restrcitions for our instruments are probably incorrect. Note that it won’t tell us which of them are incorrect, just that at least one is. ◮ These overidentification tests depend heavily on the constant effects assumption

  61. Overidentification tests ◮ Sargan test, Hansen test, J-test, etc. ◮ Basic idea: under null that all instruments are good, running it with different subset of the instruments should only differ due to sampling noise. ◮ Identify the distribution of that noise under the null to develop a test. ◮ If we reject the null hypothesis in these overidentification tests, then it means that the exclusion restrcitions for our instruments are probably incorrect. Note that it won’t tell us which of them are incorrect, just that at least one is. ◮ These overidentification tests depend heavily on the constant effects assumption ◮ Once we move away from constant effects, we no longer can generally pool multiple instruments together in this way.

  62. Reading

  63. Reading

  64. Instrumental Variables and Potential Outcomes ◮ The basic idea behind instrumental variable approaches is that we do not have ignorability for A i , but we do have a variable, Z i , that affects A i , but only affects the outcome through A i .

  65. Instrumental Variables and Potential Outcomes ◮ The basic idea behind instrumental variable approaches is that we do not have ignorability for A i , but we do have a variable, Z i , that affects A i , but only affects the outcome through A i . ◮ Note that we allow the instrument, Z i to have an effect on A i , so the treatment must have potential outcomes, A i (1) and A i (0), with the usual consistency assumption: A i = Z i A i (1) + (1 − Z i ) A i (0)

  66. Instrumental Variables and Potential Outcomes ◮ The basic idea behind instrumental variable approaches is that we do not have ignorability for A i , but we do have a variable, Z i , that affects A i , but only affects the outcome through A i . ◮ Note that we allow the instrument, Z i to have an effect on A i , so the treatment must have potential outcomes, A i (1) and A i (0), with the usual consistency assumption: A i = Z i A i (1) + (1 − Z i ) A i (0) ◮ Outcome can depend on both the treatment and the instrument: Y i ( a , z ) is the outcome if unit i had received treatment A i = a and instrument value Z i = z .

  67. Instrumental Variables and Potential Outcomes ◮ The basic idea behind instrumental variable approaches is that we do not have ignorability for A i , but we do have a variable, Z i , that affects A i , but only affects the outcome through A i . ◮ Note that we allow the instrument, Z i to have an effect on A i , so the treatment must have potential outcomes, A i (1) and A i (0), with the usual consistency assumption: A i = Z i A i (1) + (1 − Z i ) A i (0) ◮ Outcome can depend on both the treatment and the instrument: Y i ( a , z ) is the outcome if unit i had received treatment A i = a and instrument value Z i = z . ◮ The effect of the treatment given the value of the instrument is Y i (1 , Z i ) − Y i (0 , Z i ) .

  68. Key assumptions 1. Randomization

  69. Key assumptions 1. Randomization 2. Exclusion Restriction

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend