Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen - - PDF document

generalized method of moments gmm estimation
SMART_READER_LITE
LIVE PREVIEW

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen - - PDF document

Econometrics 2 Spring 2005 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 29 Outline of the Lecture (1) Introduction. (2) Moment conditions and methods of moments (MM) estimation. Ordinary least squares (OLS)


slide-1
SLIDE 1

Econometrics 2 — Spring 2005

Generalized Method of Moments (GMM) Estimation

Heino Bohn Nielsen

1 of 29

Outline of the Lecture

(1) Introduction. (2) Moment conditions and methods of moments (MM) estimation.

  • Ordinary least squares (OLS) estimation.
  • Instrumental variables (IVE) estimation.

(3) GMM defined in the general case. (4) Specification test. (5) Linear GMM.

  • Generalized instrumental variables (GIVE or 2SLS) estimation.

2 of 29

slide-2
SLIDE 2

Idea of GMM

Estimation under weak assumptions; based on so-called moment conditions. Moment conditions are statements involving the data and the parameters. Arise naturally in many contexts. For example: (A) In a regression model, yt = x0

tβ + t, we might think that E[yt | xt] = x0 tβ. This

implies the moment condition

E[xtt] = E[xt (yt − x0

tβ)] = 0.

(B) Consider the economic relation

yt = β · E[xt+1 | It] + t = β · xt+1 + (β · (E[xt+1 | It] − xt+1) + t) | {z }

ut

Under rational expectations, the expectation error, E[xt+1 | It] − xt+1, should be

  • rthogonal to the information set, It, and for zt ∈ It we have the moment condition

E[ztut] = 0.

3 of 29

Properties of GMM

GMM is a large sample estimator. Desirable properties as T → ∞.

  • Consistent under weak assumptions.

No distributional assumptions like in maximum likelihood (ML) estimation.

  • Asymptotically efficient in the class of models that uses the same amount of infor-

mation.

  • Many estimators are special cases of GMM.

Unifying framework for comparing estimators.

  • GMM is a nonlinear procedure.

We do not need a regression setup E[yt] = h(xt; β). We can have E[f(yt, xt; β)] = 0.

4 of 29

slide-3
SLIDE 3

Moment Conditions and MM Estimation

  • Consider a variable yt with some (possibly unknown) distribution.

Assume that the mean µ = E[yt] exists. We want to estimate µ.

  • We could state the population moment condition:

E[yt − µ] = 0,

  • r

E[f(yt, µ)] = 0,

where

f(yt, µ) = yt − µ.

  • The parameter µ is identified by the condition if there is a unique solution, in the

sense

E[f(yt, µ)] = 0

  • nly if

µ = µ0.

5 of 29

  • We cannot calculate E[f(yt, µ)] from an observed sample, y1, y2, ..., yt, ..., yT.

Define the sample moment condition as

gT(µ) = 1 T

T

X

t=1

f(yt, µ) = 1 T

T

X

t=1

(yt − µ) = 0.

(∗)

  • By Law of Large Numbers, sample moments converge to population moments,

gT(µ) → E[f(yt, µ)]

for

T → ∞.

(∗∗) The method of moments estimator, b

µMM, is the solution to (∗), i.e. b µMM = 1 T

T

X

t=1

yt.

The sample average can be seen as a MM estimator.

  • MM estimator is consistent. Under weak regularity conditions (∗∗) implies

b µMM → µ0.

6 of 29

slide-4
SLIDE 4

OLS as a MM Estimator

  • Consider the regression model with K explanatory variables

yt = x0

tβ + t.

Assume no-contemporaneous-correlation (minimum for consistency of OLS):

E[xtt] = E[xt (yt − x0

tβ)] = (K×1).

K moment conditions for the K parameters in β.

  • Define the sample moment conditions

gT(β) = 1 T

T

X

t=1

(xt (yt − x0

tβ)) = 1

T X0 (Y − Xβ) =

(K×1).

The MM estimator is given by the solution

b βMM = Ã T X

t=1

xtx0

t

!−1

T

X

t=1

xtyt = (X0X)−1X0Y = b βOLS.

7 of 29

Instrumental Variables as a MM Estimator

  • Consider the regression model

yt = x0

1tβ1 + x2tβ2 + t,

where the K − 1 variables in x1t are predetermined and x2t is endogenous:

E[x1tt] = 0 (K − 1) × 1 E[x2tt] 6= 0.

(#) OLS is inconsistent!

  • Assume there exists a variable, z2t, such that

corr(x2t, z2t) 6= 0

E[z2tt] = 0 (1 × 1)

(##) The new moment condition (##) can replace (#).

8 of 29

slide-5
SLIDE 5
  • Define

xt

K×1 =

µ x1t x2t ¶

and

zt

K×1 =

µ x1t z2t ¶ . zt are called instruments. z2t is the new instrument,

The predetermined variables, x1t, are instruments for themselves.

  • The K population moment conditions are

E[ztt] = E[zt (yt − x0

tβ)] = (K×1).

The K corresponding sample moment conditions are

gT(β) = 1 T

T

X

t=1

(zt (yt − x0

tβ)) = 1

T Z0 (Y − Xβ) =

(K×1).

The MM estimator is given by the unique solution

b βMM = Ã T X

t=1

ztx0

t

!−1

T

X

t=1

ztyt = (Z0X) | {z }

(K×K) −1Z0Y = b

βIV .

9 of 29

Where do Instruments Come From?

  • Consider the two simple equations

ct = β10 + β11yt + β12wt + 1t yt = β20 + β21ct + + β22wt + β23rt + β24τ t + 2t

Say that we are only interested in the first equation.

  • Assume that wt is predetermined. If β21 6= 0, then yt is endogenous and

E[yt1t] 6= 0.

  • In this setup rt and τ t are possible instruments for yt.

We need β23 and β24 different from zero and E[(rt, τ t)1t] = 0.

  • In dynamic models we can often use lagged values as instruments.

Unfortunately, often poor...

  • Note, that in this case we have more potential instruments than we have endogenous
  • variables. This is adressed in GMM/GIVE.

10 of 29

slide-6
SLIDE 6

The GMM Problem Defined

  • Let wt = (yt, x0

t)0 be a vector of model variables and let zt be instruments.

Consider the R moment conditions

E[f(wt, zt, θ)] = 0.

Here θ is a K × 1 vector and f(·) is a R dimensional vector function.

  • Consider the corresponding sample moment conditions

gT(θ) = 1 T

T

X

t=1

f(wt, zt, θ) = 0.

  • When can the R sample moments be used to estimate the K parameters in θ?

11 of 29

Order Condition

R < K

No unique solution to gT(θ) = 0. The parameters are not identified.

R = K

Unique solution to gT(θ) = 0. Exact identification. This is the MM estimator (OLS, IV). Note, that gT(θ) = 0 is potentially a non-linear problem–numerical solution.

R > K

More equations than parameters. Over-identified case. No solution in general (Z0X is a R × K matrix).

  • Not optimal to drop moments!
  • Instead, choose θ to make gT(θ) as close as possible to zero.

12 of 29

slide-7
SLIDE 7

GMM Estimation (R > K)

  • We want to make the R moments gT(θ) as close to zero as possible...how?
  • Assume we have a R × R symmetric and positive definite weight matrix WT.

Then we could define the quadratic form

QT(θ) = gT(θ)0WTgT(θ) (1 × 1).

The GMM estimator is defined as the vector that minimizes QT(θ), i.e.

b θGMM(WT) = arg min

b θ

n gT(b θ)0WTgT(b θ)

  • .
  • The matrix WT tells how much weight to put on each moment condition.

Different WT give different estimators,

b θGMM(WT).

GMM is consistent for any weight matrix, WT. What is the optimal choice of WT?

13 of 29

Optimal GMM Estimation

  • The R sample moments gT(θ) are estimators of E[f(·)]; and random variables.

The law of large numbers implies:

gT(θ) → E[f(·)]

for

T → ∞.

A central limit theorem implies:

√ T · gT(θ) → N(0, S),

where S is the asymptotic variance of the moments,

√ T · gT(θ).

  • Intuitively, moments with little variance should have large weights.

The optimal weight matrix for GMM is a matrix W opt

T

such that plim

T→∞

W opt

T

= W opt = S−1.

14 of 29

slide-8
SLIDE 8
  • Without autocorrelation, a natural estimator b

S of S is b S = V h√ T · gT(θ) i = T · V [gT(θ)] = T · V " 1 T

T

X

t=1

f(wt, zt, θ) # = 1 T ·

T

X

t=1

f(wt, zt, θ)f(wt, zt, θ)0.

This implies that

W opt

T

= b S−1 = Ã 1 T

T

X

t=1

f(wt, zt, θ)f(wt, zt, θ)0 !−1 .

  • Note, that W opt

T

depends on θ in general.

15 of 29

Estimation in Practice

  • Two-step (efficient) GMM.

(1) Choose some initial weight matrix. E.g. W[1] = I or W[1] = (Z0Z)−1.

Find a (consistent) estimator

b θ[1] = arg min

θ

gT(θ)0W[1]gT(θ).

Estimate the optimal weights, W opt

T .

(2) Find the optimal GMM estimate

b θGMM = arg min

θ

gT(θ)0W opt

T gT(θ).

  • Iterated GMM.

Start with some initial weight matrix W[1].

(1) Find an estimate b

θ[1].

(2) Find a new weight matrix, W opt

[2] .

Iterate between b

θ[·] and W opt

[·]

until convergence.

16 of 29

slide-9
SLIDE 9

Properties of Optimal GMM

  • The GMM estimator, b

θGMM(W opt

T ), is asymptotically efficient.

Lowest variance in a class of models that uses same information.

  • The GMM estimator is asymptotically normal, i.e.

√ T · ³ b θGMM − θ ´ → N (0, V ) ,

where

V = ¡ D0W optD ¢−1 = ¡ D0S−1D ¢−1 D = plim ∂gT(θ) ∂θ0 (R × K).

— S measures the variance of the moments. The larger S the larger V . — D measures the sensitivity of the moments wrt. changes in θ. If this is large the parameter can be estimated precisely.

  • Little is known in finite samples.

17 of 29

Specification Test

  • If R > K, we have more moments than parameters.

All moments have expectation zero. In a sense K moments are zero by estimating the parameters. Test if the additional R − K moments are close to zero. If not, some orthogonality condition is violated.

  • Remember, that

√ T · gT(θ) → N(0, S).

This implies that if the weights are optimal, W opt

T

→ S−1, then ξ = gT(b θGMM)0 µ 1 T S ¶−1 gT(b θGMM) = T · gT(b θGMM)0W opt

T gT(b

θGMM) → χ2(R − K).

Hansen test for overidentifying restrictions. (J-test, Sargan test). A test for R − K overidentifying conditions.

18 of 29

slide-10
SLIDE 10

Famous Example: Hansen and Singleton (1982)

  • Consider an optimizing agent with a power utility function on consumption, U(Ct) =

C1−γ

t

1−γ . The first order condition for maximizing the discounted utility of future con-

cumption is given by

E " δ µCt+1 Ct ¶−γ (1 + rt+1) − 1 | It # = 0,

where It is the conditioning information set a time t.

  • Assume rational expectations. Now if zt ∈ It, then it must be orthogonal to the

expectation error, i.e.

f(Ct+1, Ct, rt+1; zt; δ, γ) = E "Ã δ µCt+1 Ct ¶−γ (1 + rt+1) − 1 ! zt # = 0.

This is a moment condition. We need at least R = 2 instruments in zt.

  • Note: Specification is theory driven, nonlinear, and not in regression format.

19 of 29

Linear GMM and GIVE

  • Consider the linear regression model with K explanatory variables

yt = x0

1tβ1 + x0 2tβ2 + t = x0 tβ + t,

where E[x1tt] = 0, but the variables in x2t are endogenous E[x2tt] 6= 0.

  • Assume there exist R > K instruments zt = (x0

1, z0 2), such that

E[ztt] = E[zt(yt − x0

tβ)] = (R×1).

Identification requires a non-zero correlation of z2t and x2t. Rank condition.

  • The sample moments are

gT(β) = 1 T

T

X

t=1

(zt (yt − x0

tβ)) = 1

T Z0 (Y − Xβ) .

Note, that we cannot solve gT(β) = 0 directly.

Z0X is a R × K matrix (of rank K) and cannot be inverted.

20 of 29

slide-11
SLIDE 11
  • Instead we want to minimize the quadratic form

QT(β) = gT(β)0WTgT(β) = µ 1 T Z0 (Y − Xβ) ¶0 WT µ 1 T Z0 (Y − Xβ) ¶ = 1 T 2 ¡ Y 0Z − β0X0Z ¢ WT (Z0Y − Z0Xβ) = 1 T 2 ¡ Y 0ZWTZ0Y − 2β0X0ZWTZ0Y + β0X0ZWTZ0Xβ ¢ .

  • To minimize QT(β) we take the derivative and solve the K equations

∂QT(β) ∂β =

(K×1).

21 of 29

  • The GMM estimator solves the K equations

∂QT(β) ∂β = ∂ ¡ T −2 ¡ Y 0ZWTZ0Y − 2β0X0ZWTZ0Y + β0X0ZWTZ0Xβ ¢¢ ∂β = −2T −2X0ZWTZ0Y + 2T −2X0ZWTZ0Xβ = 0

i.e.

b βGMM(WT) = (X0ZWTZ0X)−1 X0ZWTZ0Y.

  • The optimal weight matrix is the inverse variance of the moments, i.e.

W opt

T

= S−1,

where

S = V h√ T · gT(θ) i = 1 T V [Z0] = 1 T E [Z00Z] = 1 T Z0ΩZ,

where we define E[0] = Ω.

22 of 29

slide-12
SLIDE 12

Case 1: Homoscedastic Errors

  • If E[0] = Ω = σ2I, the natural estimator of S is

b S = 1 T Z0b ΩZ = 1 T b σ2Z0Z,

where b

σ2 is a consistent estimator for σ2.

  • Then the GMM estimator becomes

b βGMM = ³ X0Z b S−1Z0X ´−1 X0Z b S−1Z0Y = Ã X0Z µ 1 T b σ2Z0Z ¶−1 Z0X !−1 X0Z µ 1 T b σ2Z0Z ¶−1 Z0Y = ³ X0Z (Z0Z)−1 Z0X ´−1 X0Z (Z0Z)−1 Z0Y = b βGIV E = b β2SLS

Under homoscedasticity the optimal GMM estimator is GIVE (and 2SLS).

23 of 29

  • Recall, that

b βGMM → N µ β, 1 T ¡ D0W optD ¢−1 ¶ .

The derivative is given by

DT

(R×K)

= ∂gT(β) ∂β0 = ∂ ¡ 1

TZ0 (Y − Xβ)

¢ ∂β0 = − 1 T Z0X = − 1 T

T

X

t=1

ztx0

t.

  • The variance can be estimated by

V h b βGMM i = 1 T ¡ D0

TW optDT

¢−1 = 1 T õ − 1 T Z0X ¶0 µ 1 T b σ2Z0Z ¶−1 µ − 1 T Z0X ¶!−1 = b σ2(X0Z (Z0Z)−1 Z0X)−1,

known from 2SLS.

24 of 29

slide-13
SLIDE 13
  • The specification test is given by

ξ = T · gT(b θGMM)0 b S−1gT(b θGMM) = T · Ã T X

t=1

b tzt !0 Ã b σ2 T

T

X

t=1

ztz0

t

!−1 Ã T X

t=1

b tzt ! = Ã T X

t=1

b tzt !0 Ã b σ2

T

X

t=1

ztz0

t

!−1 Ã T X

t=1

b tzt ! → χ2(R − K).

In the linear case it is denoted the Sargan test.

  • A simple way to calculate ξ is to consider the regression

b t = z0

tγ + residual,

and calculate the test statistic as ξ = T · R2.

25 of 29

Case 2: Heteroscedasticity, No Autocorrelation

  • In the case of heteroscedasticity but no autocorrelation,

E[0] = Ω = ⎛ ⎜ ⎜ ⎝ σ2

1

· · · σ2

2

. . . . . . ...

· · · σ2

T

⎞ ⎟ ⎟ ⎠ ,

we can use the estimator

b S = 1 T Z0b ΩZ = 1 T

T

X

t=1

b 2

tztz0 t.

We only need an estimate of the R × R matrix Z0ΩZ and not the T × T matrix Ω. We get

βGMM(b S−1) = ³ X0Z b S−1Z0X ´−1 X0Z b S−1Z0Y.

Note, that a constant in the weight is not important for the estimation.

26 of 29

slide-14
SLIDE 14
  • The variance of the estimator becomes

V h b βGMM i = 1 T ¡ D0

TW optDT

¢−1 = 1 T ⎛ ⎝ Ã − 1 T

T

X

t=1

xtz0

t

! Ã 1 T

T

X

t=1

b 2

tztz0 t

!−1 Ã − 1 T

T

X

t=1

ztx0

t

!⎞ ⎠

−1

= Ã T X

t=1

xtz0

t

!−1

T

X

t=1

b 2

tztz0 t

à T X

t=1

ztx0

t

!−1 ,

which is the heteroscedasticity consistent variance estimator of White.

27 of 29

Case 3: Autocorrelation

  • The weight-matrix is W opt

T

= b S−1, where b S = V h√ T · gT(θ) i = T −1V " T X

t=1

(ztt) # .

With autocorrelation, we need to take into account the covariances.

  • This is done by the heteroscedasticity and autocorrelation consistent (HAC) estima-
  • tor. Let

Γj

(R×R)

= cov (ztt, zt−jt−j) = E [(ztt)(zt−jt−j)0]

be a covariance matrix for lag j. Then

S = T −1 {V (ztt) + Cov(ztt, zt−1t−1) + Cov(ztt, zt−2t−2) + ... +Cov(ztt, zt+1t+1) + Cov(ztt, zt+2t+2) + ...} = T −1

X

j=−∞

Γj.

28 of 29

slide-15
SLIDE 15
  • If we can argue that Γj = 0 for j larger than some lag, q, we can use the estimator

b S = T −1

q

X

j=−q

b Γj,

where we estimate the covariances by

b Γj = 1 T

T

X

t=j+1

(ztb t)(zt−jb t−j)0.

  • The obtained b

S is not necessarily positive definite.

Instead the covariances can be given decreasing weights, Newey-West estimator. Finite sample properties are unknown.

  • The HAC covariance estimator can also be used for OLS.

29 of 29