Slide Set 12 Model Specification and Identification Pietro Coretto - - PowerPoint PPT Presentation

slide set 12 model specification and identification
SMART_READER_LITE
LIVE PREVIEW

Slide Set 12 Model Specification and Identification Pietro Coretto - - PowerPoint PPT Presentation

Notes Slide Set 12 Model Specification and Identification Pietro Coretto pcoretto@unisa.it Econometrics Master in Economics and Finance (MEF) Universit degli Studi di Napoli Federico II Version: Saturday 28 th December, 2019 (h16:07)


slide-1
SLIDE 1

Slide Set 12 Model Specification and Identification

Pietro Coretto pcoretto@unisa.it

Econometrics

Master in Economics and Finance (MEF) Università degli Studi di Napoli “Federico II”

Version: Saturday 28th December, 2019 (h16:07)

  • P. Coretto • MEF

Model Specification and Identification 1 / 33

Summary

Specification of the model Consequences of model misspecification in predictive and explanatory modeling Identification of stochastic models Endogeneity Instrumental variables

  • P. Coretto • MEF

Model Specification and Identification 2 / 33

Notes Notes

slide-2
SLIDE 2

Part I Specification

  • P. Coretto • MEF

Model Specification and Identification 3 / 33

Specification

Specification: is the art of finding: (i) the set of regressors that explain/predict the Y appropriately; (ii) and appropriate settings for the error term Variable selection: the choice of the set of regressors. It is the central issue of model specification although choosing a model means more Predictive paradigm: specification is easier, because it is possible to assess the main goal (prediction error). However, even in this case we can get easily in the overfitting trap. Explanatory/Causal paradigm: specification is much harder! It relies on a combination of theoretical insights and empirical experience. Nobody really knows when the model is well specified! In general we try to exclude causes of bad specification

  • P. Coretto • MEF

Model Specification and Identification 4 / 33

Notes Notes

slide-3
SLIDE 3

Variables Selection

Consider two competing models Model 1: y = Xβ + ε Model 2: y = X0θ0 + Xθ + ν Questions:

1 Predictive paradigm: which model better predicts Y 2 Explanatory/Causal paradigm: which model explains accurately the

relationships between Y and Xs? Which model is “true”?

  • P. Coretto • MEF

Model Specification and Identification 5 / 33

Overifitting trap

Consider the OLS fits of the previous models y = Xb + e (12.1) and y = X0t0 + Xt + v (12.2) From which Xb + e = X0t0 + Xt + v (12.3) Since these are all OLS fits, then e′X = v′X0 = v′X = 0.

  • P. Coretto • MEF

Model Specification and Identification 6 / 33

Notes Notes

slide-4
SLIDE 4

Multiply both sides of (12.3) by e′ and v′, and obtain e′e =e′X0t0 + e′v v′e =v′v which implies e′X0t0 = e′e − v′v (12.4) Now we show that going from the first OLS fit to the second, the sum of squared residuals cannot increase, that is: v′v ≤ e′e

  • P. Coretto • MEF

Model Specification and Identification 7 / 33

v′v = (y − X0t0 − Xt)′(y − X0t0 − Xt) = (Xb + e − X0t0 − Xt)′(Xb + e − X0t0 − Xt) = (X(b − t) − X0t0 + e)′(X(b − t) − X0t0 + e) set a = X(b − t) − X0t0 = a′a + 2e′a + e′e = a′a − 2e′X0t0 + e′e = a′a − 2(e′e − v′v) + e′e = a′a − e′e + 2v′v Observe that v′v = e′e − a′a implies that v′v ≤ e′e Therefore, adding a new set of regressors will reduce the RSS and increases the R2

  • P. Coretto • MEF

Model Specification and Identification 8 / 33

Notes Notes

slide-5
SLIDE 5

Predictive paradigm

Main goal: ability to predict Y as good as possible. A low prediction error (RSS) is targeted Overfitting: whenever the model becomes richer/more flexible (adding regressors), it will fit the observed data not worse. Therefore, any measure

  • f fit based on in-sample information will get into the overfitting trap

Two approaches to solve this problem Penalized (in-sample) measures of fit: an in-sample statistics of fit that penalizes for the addition of regressors Out-of-sample prediction error: estimate the expected prediction error on samples not used to fit the model

  • P. Coretto • MEF

Model Specification and Identification 9 / 33

Penalized (in-sample) measures of fit

Adjusted R2 R2 = 1 −

RSS n−K TSS n−1

= 1 − n − 1 n − K

>1

(1 − R2) You wan to maximize it. It is not based on distributional assumptions for the error ter Akaike’s information criterion AIC = log

RSS

n

  • + 2K

n You wan to minimize it. Implicitly assumes that the error term is at least approximately normal Bayes information criterion BIC = log

RSS

n

  • + K log(n)

n You wan to minimize it. Implicitly assumes that the error term is at least approximately normal

  • P. Coretto • MEF

Model Specification and Identification 10 / 33

Notes Notes

slide-6
SLIDE 6

Out-of-sample prediction error via cross-validation

Step 1: Fix H < n and divide the n units into two subsets: Train set: randomly(*) select n − H units Test set: select the remaining H units. Let yo

1, yo 2, . . . , yo H the

Y -values in this set. Step 2: In-sample estimation apply the OLS to estimate b using observations in the Train set Step 3: Out-of-sample prediction predict the Y -values on the Test set using b. Let ˆ yo

1, ˆ

yo

2, . . . , ˆ

yo

H the

predicted values (*) Remark: the appropriate random split depends on the sampling

  • design. In case of random sampling we may sample n − H observation

uniformly without replacement. In case of time series data we take n − H consecutive observations (blocks)

  • P. Coretto • MEF

Model Specification and Identification 11 / 33

Step 4: measure the expected prediction error The most popular prediction accuracy measure is the root mean square prediction error RMSPE =

  • 1

H

H

  • h=1

(yo

h − ˆ

yo

h)2

Another popular measure is the mean absolute prediction error MAE = 1 H

H

  • h=1

|yo

h − ˆ

yo

h|

Since the splitting introduces additional random variations, steps 1–4 are repeated a number of times and the resulting RMSPE or MAE values are averaged across splittings.

  • P. Coretto • MEF

Model Specification and Identification 12 / 33

Notes Notes

slide-7
SLIDE 7

Explanatory/causal paradigm

Main goal: assess the structural relationships between X and Y , produce “good estimates” and valid inference about partial effects. Example: given a model

log(wage)i = β1 + β2SchoolYearsi + . . . + εi

we want a a “good estimate” of β2, that is the expected variation of

log(wage) produced by a unit change of SchoolYears (ceteris

paribus) we want that Se(b2) is accurate we want that hypothesis tests are optimal Relevant aspects: here we are interested in how a model misspecification impact the quality of estimates and the resulting inference

  • P. Coretto • MEF

Model Specification and Identification 13 / 33

Effects of misspecification on estimation/inference

Reconsider the two alternative models Model (m = 1): y = X1β1 + ε Model (m = 2): y = X1β1 + X2β2 + u Two cases: inclusion of irrelevant variables: the truth = Model 1, but we estimate Model 2?

  • mission of relevant variables:

the truth = Model 2, but we estimate Model 1? Notations: b(m)

h

is the OLS estimate of βh based on Model (m). Let b(m) be the overall OLS estimator under Model (m).

  • P. Coretto • MEF

Model Specification and Identification 14 / 33

Notes Notes

slide-8
SLIDE 8

Inclusion of irrelevant variables

Assume Model 1 generates data (true model), but we estimate Model 2. We are failing at imposing the restriction β2 = 0 Consequences the OLS is still unbiased and consistent for β1 E[b(2)

1

| X1, X2] E[b(2)

2

| X1, X2] = β1 = 0 and

  • b(2)

1

b(2)

2

  • p

− →

  • β1
  • s2 is unbiased for σ2
  • AVar

b(2) is consistent

However, Var[b(2)

1

| X1, X2] Var[b(1)

1

| X1] = ⇒ eventually some efficiency is lost. The loss of efficiency increases with the correlation between X1 and X2 Test power is reduced because probability of false negatives increases (not rejecting H0 when H1 is true). E.g. the default test tempts to exclude relevant variables

  • P. Coretto • MEF

Model Specification and Identification 15 / 33

Omission of relevant variables

Assume Model 2 = truth, i.e. it generates data, but we estimate estimate Model 1. Assume β2 = 0, that is X2 correlates with y Observe that from true Model 2: y − X1β1 = X2β2 + u from wrong Model 1: ε = y − X1β1 Therefore ε = X2β2 + u X2 is called “unobserved heterogeneity”: variations across individual units not accounted by the specified model, and that will go into the error term of the misspecified model Endogeneity: if the unobserved heterogeneity X2 correlates with X1, then in the misspecified model the error correlates with the regressors violating orthogonality assumptions. Failure of orthogonality leads to non identifiable models (see later). Effects of endogeneity are devastating

  • P. Coretto • MEF

Model Specification and Identification 16 / 33

Notes Notes

slide-9
SLIDE 9

For example it is easy to see that OLS is not even unbiased. To see this: b(1)

1

= (X1′X1)−1X1′y can be rewritten as b(1)

1

= (X1′X1)−1X1′(X1β1 + X2β2 + u) that is b(1)

1

= β1 + (X1′X1)−1(X1′X2)β2 + (X1′X1)−1X1′u Assuming that strict esogeneity holds for the true model, i.e. E[X1′u|X1, X2] = 0, then E[b(1)

1

| X1, X2] = β1 + (X1′X1)−1(X1′X2)β2 Therefore b(1)

1

is biased unless X1′X2 = 0 (note: by assumption β2 = 0)

  • P. Coretto • MEF

Model Specification and Identification 17 / 33

Consequences of omission of relevant variables If the omitted variables correlates with the included regressors we observe several nasty effects: biased and inconsistent OLS estimate of β1. This is caused by the lack of identifiability (see later) s2 is biased, it under-estimates the error variance.

  • AVar
  • b(1)

1

  • is not consistent, and standard errors are

under-estimated so estimation error apparently improves No guarantees about hypothesis tests and confidence intervals

  • P. Coretto • MEF

Model Specification and Identification 18 / 33

Notes Notes

slide-10
SLIDE 10

Part II Identification

  • P. Coretto • MEF

Model Specification and Identification 19 / 33

Identification

The distribution function F(y; θ) is your probability model for an

  • bservable quantity of interest y. F is known if its parameter θ is known.

You observe a sample {y1, y2, . . . , yn}, and you want to recover the θ that produced it. Now suppose that there are θ1 = θ2 such that F(y; θ1) = F(y; θ2). This means that the same observed sample is consistent with two different parameters! Estimation makes sense if different parameters will produce different data

  • distribution. This requirement is called identifiability of the model

Motto: N I N E = No Identification No Estimation

  • P. Coretto • MEF

Model Specification and Identification 20 / 33

Notes Notes

slide-11
SLIDE 11

Definition (global identifiability) The model F( · ; θ) is identifiable if for any pair θ′, θ′′ taken in the parameter space: θ′ = θ′′ = ⇒ F( · ; θ′) = F( · ; θ′′) Remark: consistency doesn’t make sense for non identifiable models ˆ θ is consistent for θ′: it means that for large n is “rather likely” that ˆ θ belongs to a small neighbourhood of θ′. suppose that θ′ and θ′′ produce the same data distribution. Then as n → +∞ it is “rather likely” that ˆ θ keep switching between two small neighbourhoods of θ′ and θ′′.... so it can’t converge! There are weaker versions of the identifiability concept. Not important here. Proving identifiability is an “art”.

  • P. Coretto • MEF

Model Specification and Identification 21 / 33

Consider a simple linear model: Y = β0 + β1X + ε where (Y, X, ε) are random variables. Let’s assume that we know the joint distribution of the observables (Y, X). Are (β0, β1) identifiable? Note that Cov[Y, X] = Cov[β0 + β1X + ε, X] = Cov[β0, X] + β1 Cov[X, X] + Cov[X, ε] =β1 Var[X] + Cov[X, ε] A sufficient conditions to identify β1: Cov[X, ε] = 0 and Var[X] > 0, β1 = Cov[Y, X] Var[X] Now β1 is identified, because it is uniquely defined in terms of moments

  • btained from the joint distribution of (Y, X)
  • P. Coretto • MEF

Model Specification and Identification 22 / 33

Notes Notes

slide-12
SLIDE 12

E[Y ] = β0 + β1 E[X] + E[ε] A sufficient condition to identify β0: E[ε] = 0. In fact β0 = E[Y ] − β1 E[X] = E[Y ] − Cov[Y, X] Var[X] E[X] β0 is uniquely defined in terms of moments obtained from the joint distribution of (Y, X). Remarks: the trivial consequence is that any set of estimators consistent for E[X], E[Y ], Var[X], Cov[X, Y ], will asymptotically uniquely recover β0 and β1 this type of assumptions is common to all regression models. The reason is that we need to restrict the DGP in order to achieve identifiability you now realize why in linear models orthogonality/exogeneity conditions are crucial.

  • P. Coretto • MEF

Model Specification and Identification 23 / 33

Exogeneity

A → B stands for: “A causes B in a linear way”. The simple linear model Y = βZ + ε is represented as Z → Y ր ε The regressor Z is exogenous = orthogonal to the error. The error should incorporate exogenous factors that produce variations of Y that can be distinguished from the variations caused by Z. The ultimate goal is to use β to measure the impact of Z on Y all else

  • equal. This is only achieved if there is no “linear” link between Z and ε.
  • P. Coretto • MEF

Model Specification and Identification 24 / 33

Notes Notes

slide-13
SLIDE 13

Endogeneity mechanism

If the Z is endogenous Z → Y ↑ ր ε Now ε correlates with Z. Therefore, when ε moves, also Z moves for the same reason! This implies that β is not able to isolate the impact of Z on Y . In other words the role of β is not identified.

  • P. Coretto • MEF

Model Specification and Identification 25 / 33

Instrumental Variables “filtering” mechanism

Now suppose that there exists a third variable X X → Z → Y ↑ ր ε X is called Instrumental Variable (IV) if it only causes Z, with an indirect effect on Y . The causation mechanism is as follow when ε moves, this still produces variation in Z but these variations are recognized, because we known that Z moves

  • n its own if X moves

if X is a fairly good linear predictor of Z (e.g. Cov[X, Z] = 0),

  • bserving X we know whether Z is varying for reasons not connected

to ε. Therefore, we know how Z impacts Y = ⇒ β is identified.

  • P. Coretto • MEF

Model Specification and Identification 26 / 33

Notes Notes

slide-14
SLIDE 14

Identification with IV

Y = β1Z + ε assume Z is endogenous, that is Cov[Z, ε] = 0, suppose we observe X: Cov[X, ε] = 0 Cov[X, Z] = 0 Knowing the joint distribution of the observables (Y, Z, X), we can compute Cov[X, Y ] = Cov[X, βZ + ε] = β Cov[X, Z] + Cov[X, ε] β = Cov[X, Y ] Cov[X, Z] β is now identified.

  • P. Coretto • MEF

Model Specification and Identification 27 / 33

General conditions for IVs

X is called an instrument of an endogenous regressor Z if it fulfills:

1 validitiy: the IV are orthogonal to the error (IV are exogenous) 2 relevance: instruments are non orthogonal to the regressors.

Scalar regression: Y = βZ + ε, with Cov(X, ε) = 0 and Cov(X, Z) = 0: both validity and relevance are fulfilled the validity rules out the possibility that X causes Y directly, that would be if X was a regressor of Y . In fact, if Y = βZ + δX + u then ε = δX + u = ⇒ Cov(ε, X) = 0 violating the validity

  • assumption. In this case validity is fulfilled if δ = 0

relevance assumption implies that Z = γ1 + γ2X + η with γ2 = 0

  • P. Coretto • MEF

Model Specification and Identification 28 / 33

Notes Notes

slide-15
SLIDE 15

Suppose that the regressor z is a vector, and suppose it is partitioned into z = (z1, z2) elements of z1 are endogenous regressors x1 is a valid and relevant IV for z1 Then x = (x1, z2) is valid and relevant for z If a variable is exogenous it can be instrumented by itself. We will see that we need to find at least as many instruments as there are endogenous variables.

Finding IVs is Art!

Validity: this can’t be checked because involves unobservable random

  • variables. What to do? Use economic intuition to make sure that IVs

are endogenous. Relevance: for vectors this is more difficult to formalize and interpret. There exist methods to test it. However, the methodology is not general and depends on the context.

  • P. Coretto • MEF

Model Specification and Identification 29 / 33

Example: returns to schooling

Consider the following linear model: log(wagei) = β1 + β2educationi + β3abilityi + εi (Model 1) with both regressors being exogenous. ability is not observable, but we are interested in estimating the marginal impact of education on wage. We consider log(wagei) = β1 + β2educationi + ui (Model 2) we are omitting a relevant variable. Now abilityi goes into the

  • error. In fact: ui = β3abilityi + εi

but Cor(educationi, abilityi) = 0 = ⇒ Cor(educationi, ui) = 0 which means that in (Model 2) the education becomes endogenous. An estimator of β2 won’t be consistent, because β2 is not identified.

  • P. Coretto • MEF

Model Specification and Identification 30 / 33

Notes Notes

slide-16
SLIDE 16

We want to use our data to measure the impact of education on wages. If we find a good IV for education, we can still use (Model 2) to identify β2. X =“number of days from 1/1/0 to birth date”. Certainly Cor[X, ability] = 0 so Cor[X, u] = 0, however Cor[X, education] = 0... X doesn’t work! X =“mother’s education”. Certainly Cor[X, education] > 0. However, Cor[X, ability] > 0 implying Cor[X, u] > 0 too... X doesn’t work! X =“number of siblings (total of sisters + brothers)”. It turns out that Cor[X, education] < 0. And Cor[X, ability] = 0 implying Cor[X, u] = 0... well done! We can use X =“number of siblings” as an instrument for education in (Model 2), and get β2 identified. Now it make sense to estimate β2 from (Model 2)

  • P. Coretto • MEF

Model Specification and Identification 31 / 33

Proxy vs IV

A proxy variable is a regressor that is almost perfectly correlated with a variable that can’t be sampled. For instance IQ = γability. log(wagei) =β1 + β2educationi + β3abilityi + εi =β1 + β2educationi + β3 γ IQi + εi (Model 3) b = least square estimator for (Model 3). Now b3 is consistent for (β3/γ), and b2 is consistent for β2. We are not able to identify β3 and γ separately... but who cares! We are interested in β2 because this matters for educational policies! X is a proxy for Z: we require that X is a linear (or linearized) function of Z. Then, eventually after a rescaling, X may well be a regressor of Y . X is an IV for Z: we require that X does not cause Y , therefore X can’t be a regressor of Y (see previous point)

  • P. Coretto • MEF

Model Specification and Identification 32 / 33

Notes Notes

slide-17
SLIDE 17

Weak IV

The validity condition is satisfied when IV are non orthogonal to the

  • regressors. But X and Z are non orthogonal even if Cor[X, Z] is modest.

In order to work an instrument should be a fairly strong predictor for the endogenous regressors. Remember: the role of the instrument is to let us see when the endogenous regressor is moving on its own. IV are strong predictors for the endogenous regressors when they explain much of their variations. If this does not happen, we say that X is a weak instrument. Weak IVs can cause a lot of trouble!

  • P. Coretto • MEF

Model Specification and Identification 33 / 33

Notes Notes