Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen - - PDF document

generalized method of moments gmm estimation
SMART_READER_LITE
LIVE PREVIEW

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen - - PDF document

Econometrics 2 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 35 Outline (1) Introduction and motivation (2) Moment Conditions and Identi fi cation (3) A Model Class: Instrumental Variables (IV) Estimation (4) Method of


slide-1
SLIDE 1

Econometrics 2

Generalized Method of Moments (GMM) Estimation

Heino Bohn Nielsen

1 of 35

Outline

(1) Introduction and motivation (2) Moment Conditions and Identification (3) A Model Class: Instrumental Variables (IV) Estimation (4) Method of Moment (MM) Estimation

Examples: Mean, OLS and Linear IV

(5) Generalized Method of Moment (GMM) Estimation

Properties: Consistency and Asymptotic Distribution

(6) Efficient GMM

Examples: Two-Stage Least Squares

(7) Comparison with Maximum Likelihood

Pseudo-ML Estimation

(8) Empirical Example: C-CAPM Model

2 of 35

slide-2
SLIDE 2

Introduction

Generalized method of moments (GMM) is a general estimation principle. Estimators are derived from so-called moment conditions. Three main motivations:

(1) Many estimators can be seen as special cases of GMM.

Unifying framework for comparison.

(2) Maximum likelihood estimators have the smallest variance in the class of consistent

and asymptotically normal estimators. But: We need a full description of the DGP and correct specification. GMM is an alternative based on minimal assumptions.

(3) GMM estimation is often possible where a likelihood analysis is extremely difficult.

We only need a partial specification of the model. Models for rational expectations.

3 of 35

Moment Conditions and Identification

  • A moment condition is a statement involving the data and the parameters:

g(θ0) = E[f(wt, zt, θ0)] = 0.

(∗) where θ is a K × 1 vector of parameters with true value θ0; f(·) is an R × 1 vector

  • f (non-linear) functions; wt contains model variables; and zt contains instruments.
  • If we knew the expectation then we could solve the equations in (∗) to find θ0.
  • If there is a unique solution, so that

E[f(wt, zt, θ)] = 0

if and only if

θ = θ0,

then we say that the system is identified.

  • Identification is essential for doing econometrics. Two ideas:

(1) Is the model constructed so that θ0 is unique (identification). (2) Are the data informative enough to determine θ0 (empirical identification).

4 of 35

slide-3
SLIDE 3

Instrumental Variables Estimation

  • In many applications, the moment condition has the specific form:

f(wt, zt, θ) = u(wt, θ) | {z }

(1×1)

· zt |{z}

(R×1)

,

where the R instruments in zt are multiplied by the disturbance term, u(wt, θ).

  • You can think of u(wt, θ) as the equivalent of an error term.

The moment condition becomes

g(θ0) = E[u(wt, θ0) · zt] = 0,

stating that the instruments are uncorrelated with the error term of the model.

  • This class of estimators is referred to as instrumental variables estimators.

The function u(wt, θ) may be linear or non-linear in θ.

5 of 35

Example: Moment Condition From RE

  • Consider a monetary policy rule, where the interest rate depends on expected future

inflation:

rt = β · E[πt+1 | It] + t.

Noting that

xt+1 = E[xt+1 | It] + vt,

where vt is the expectation error, we can write the model as

rt = β · E[πt+1 | It] + t = β · xt+1 + (t − β · vt) = β · xt+1 + ut.

Note that xt+1 and ut are correlated, so OLS does not work.

  • Under rational expectations, the expectation error, vt, should be orthogonal to the

information set, It, and for zt ∈ It we have the moment condition

E[ut · zt] = E[(rt − β · xt+1) · zt] = 0.

This is enough to identify β.

6 of 35

slide-4
SLIDE 4

Method of Moments (MM) Estimator

  • For a given sample, wt and zt (t = 1, 2, ..., T), we cannot calculate the expectation.

We replace with sample averages to obtain the analogous sample moments:

gT(θ) = 1 T

T

X

t=1

f(wt, zt, θ).

We can derive an estimator, b

θMM, as the solution to gT(b θMM) = 0.

  • To find an estimator, we need at least as many equations as we have parameters.

The order condition for identification is R ≥ K. — R = K is called exact identification. The estimator is denoted the method of moments estimator, b

θMM.

— R > K is called over-identification. The estimator is denoted the generalized method of moments estimator, b

θGMM.

7 of 35

Example: MM Estimator of the Mean

  • Assume that yt is random variable drawn from a population with expectation µ0.

We have a single moment condition:

g(µ0) = E[f(yt, µ0)] = E[yt − µ0] = 0,

where f(yt, µ0) = yt − µ0.

  • For a sample, y1, y2, ..., yT, we state the corresponding sample moment conditions:

gT(b µ) = 1 T

T

X

t=1

(yt − b µ) = 0.

The MM estimator of the mean µ0 is the solution, i.e.

b µMM = 1 T

T

X

t=1

yt,

which is the sample average.

8 of 35

slide-5
SLIDE 5

Example: OLS as a MM Estimator

  • Consider the linear regression model of yt on xt (K × 1):

yt = x0

tβ0 + t.

(∗∗) Assume that (∗∗) represents the conditional expectation:

E[yt | xt] = x0

tβ0

so that

E[t | xt] = 0.

  • That implies the K unconditional moment conditions

g(β0) = E[xtt] = E [xt (yt − x0

tβ0)] = 0,

which we recognize as the minimal assumption for consistency of the OLS estimator.

9 of 35

  • We define the corresponding sample moment conditions as

gT(b β) = 1 T

T

X

t=1

xt ³ yt − x0

tb

β ´ = 1 T

T

X

t=1

xtyt − 1 T

T

X

t=1

xtx0

tb

β = 0.

And the MM estimator is derived as the unique solution:

b βMM = Ã T X

t=1

xtx0

t

!−1

T

X

t=1

xtyt,

provided that PT

t=1 xtx0 t is non-singular.

  • Method of moments is one way to motivate the OLS estimator.

Highlights the minimal (or identifying) assumptions for OLS.

10 of 35

slide-6
SLIDE 6

Example: Under-Identification

  • Consider again a regression model

yt = x0

tβ0 + t = x0 1tγ0 + x0 2tδ0 + t.

  • Assume that the K1 variables in x1t are predetermined, while the K2 = K − K1

variables in x2t are endogenous. That implies

E[x1tt] = 0 (K1 × 1)

(†)

E[x2tt] 6= 0 (K2 × 1).

(††)

  • We have K parameters in β0 = (γ0

0, δ0 0)0, but only K1 < K moment conditions

(i.e. K1 equations to determine K unknowns). The parameters are not identified and cannot be estimated consistently.

11 of 35

Example: Simple IV Estimator

  • Assume K2 new variables, z2t, that are correlated with x2t but uncorrelated with t:

E[z2tt] = 0.

(†††) The K2 moment conditions in (†††) can replace (††). To simplify notation, we define

xt

(K×1)

= µ x1t x2t ¶

and

zt

(K×1)

= µ x1t z2t ¶ . xt are model variables, z2t are new instruments, and zt are instruments.

We say that x1t are instruments for themselves.

  • Using (†) and (†††) we have K moment conditions:

g(β0) = µ E[x1tt] E[z2tt] ¶ = E[ztt] = E[zt (yt − x0

tβ0)] = 0,

which are sufficient to identify the K parameters in β.

12 of 35

slide-7
SLIDE 7
  • The corresponding sample moment conditions are given by

gT(b β) = 1 T

T

X

t=1

zt ³ yt − x0

tb

β ´ = 0.

  • The method of moments estimator is the unique solution:

b βMM = Ã T X

t=1

ztx0

t

!−1

T

X

t=1

ztyt,

provided that PT

t=1 ztx0 t is non-singular.

  • Note the following:

(1) We need the instruments to identify the parameters. (2) The MM estimator coincides with the simple IV estimator. (3) The procedure only works with K2 new instruments (i.e. R = K). (4) Non-singularity of PT

t=1 ztx0 t requires relevant instruments.

13 of 35

Generalized Method of Moments Estimation

  • The case R > K is called over-identification.

More equations than parameters and no solution to gT(θ) = 0 in general.

  • Instead we minimize the distance from gT(θ) to zero.

The distance is measured by the quadratic form

QT(θ) = gT(θ)0WTgT(θ),

where WT is an R × R symmetric and positive definite weight matrix.

  • The GMM estimator depends on the weight matrix:

b θGMM(WT) = arg min

θ

{gT(θ)0WTgT(θ)} .

14 of 35

slide-8
SLIDE 8

Distances and Weight Matrices

  • Consider a simple example with 2 moment conditions

gT(θ) = µ ga gb ¶ ,

where the dependence of T and θ is suppressed.

  • First consider a simple weight matrix, WT = I2 :

QT(θ) = gT(θ)0WTgT(θ) = ¡ ga gb ¢ µ 1 0 0 1 ¶ µ ga gb ¶ = g2

a + g2 b,

which is the square of the simple distance from gT(θ) to zero. Here the coordinates are equally important.

  • Alternatively, look at a different weight matrix:

QT(θ) = gT(θ)0WTgT(θ) = ¡ ga gb ¢ µ 2 0 0 1 ¶ µ ga gb ¶ = 2 · g2

a + g2 b,

which attaches more weight to the first coordinate in the distance.

15 of 35

Consistency: Why Does it Work?

  • Assume that a law of large numbers (LLN) applies to f(wt, zt, θ), i.e.

T −1

T

X

t=1

f(wt, zt, θ) → E[f(wt, zt, θ)]

for

T → ∞.

That requires IID or stationarity and weak dependence.

  • If the moment conditions are correct, g(θ0) = 0, then GMM is consistent,

b θGMM(WT) → θ0

as

T → ∞,

for any WT positive definite.

  • Intuition: If a LLN applies, then gT(θ) converges to g(θ).

Since b

θGMM(WT) minimizes the distance from gT(θ) to zero, it will be a consistent

estimator of the solution to g(θ0) = 0.

  • The weight matrix, WT, has to be positive definite, so that we put a positive and

non-zero weight on all moment conditions.

16 of 35

slide-9
SLIDE 9

Asymptotic Distribution

  • Assume a central limit theorem for f(wt, zt, θ), i.e.:

√ T · gT(θ0) = 1 √ T

T

X

t=1

f(wt, zt, θ0) → N(0, S),

where S is the asymptotic variance.

  • Then it holds that for any positive definite weight matrix, W, the asymptotic distri-

bution of the GMM estimator is given by

√ T ³ b θGMM − θ0 ´ → N(0, V ).

The asymptotic variance is given by

V = (D0WD)−1 D0WSWD (D0WD)−1 ,

where

D = E ∙∂f(wt, zt, θ) ∂θ0 ¸

is the expected value of the R × K matrix of first derivatives of the moments.

17 of 35

Efficient GMM Estimation

  • The variance of b

θGMM depends on the weight matrix, WT.

The efficient GMM estimator has the smallest possible (asymptotic) variance.

  • Intuition: a moment with small variance is informative and should have large weight.

It can be shown that the optimal weight matrix, W opt

T , has the property that

plim W opt

T

= S−1.

With the optimal weight matrix, W = S−1, the asymptotic variance simplifies to

V = ¡ D0S−1D ¢−1 D0S−1SS−1D ¡ D0S−1D ¢−1 = ¡ D0S−1D ¢−1 .

  • The best moment conditions have small S and large D.

— A small S means that the sample variation of the moment (noise) is small. — A large D means that the moment condition is much violated if θ 6= θ0. The moment is very informative on the true values, θ0. Related to the curvature of the criteria function as in ML.

18 of 35

slide-10
SLIDE 10
  • Hypothesis testing can be based on the asymptotic distribution:

b θGMM

a

∼ N(θ0, T −1b V ).

  • An estimator of the asymptotic variance is given by

b V = ¡ D0

TS−1 T DT

¢−1 ,

where

DT |{z}

(R×K)

= ∂gT(θ) ∂θ0 = 1 T

T

X

t=1

∂f(wt, zt, θ) ∂θ0

is the sample average of the first derivatives. And ST is an estimator of S = T · V [gT(θ)]. If the observations are independent, a consistent estimator is

ST = 1 T

T

X

t=1

f(wt, zt, θ)f(wt, zt, θ)0.

Estimation of the weight matrix is typically the most tricky part of GMM.

19 of 35

Computational Issues

  • The estimator is defined by minimizing QT(θ). Minimization can be done by

∂QT(θ) ∂θ = ∂(gT(θ)0WTgT(θ)) ∂θ =

(K×1).

Sometimes analytically but often by numerical optimization.

  • We need an optimal weight matrix, W opt

T , but that depends on the parameters!

Two-step efficient GMM:

(1) Choose an initial weight matrix, e.g. W[1] = IR, and find a consistent but inefficient

first-step GMM estimator

b θ[1] = arg min

θ

gT(θ)0W[1]gT(θ).

(2) Find the optimal weight matrix, W opt

[2] , based on b

θ[1]. Find the efficient estimator b θ[2] = arg min

θ

gT(θ)0W opt

[2] gT(θ).

The estimator is not unique as it depends on the initial weight matrix W[1].

20 of 35

slide-11
SLIDE 11

Iterated GMM estimator:

  • From the estimator b

θ[2] it is natural to update the weights, W opt

[3] , and update b

θ[3].

We can switch between estimating W opt

[·]

and b

θ[·] until convergence.

Iterated GMM does not depend on the initial weight matrix. The two approaches are asymptotically equivalent. Continuously updated GMM estimator:

  • A third approach is to recognize from the outset that the weight matrix depends on

the parameters, and minimize

QT(θ) = gT(θ)0WT(θ)gT(θ).

That is never possible to solve analytically.

21 of 35

Test of Overidentifying Moment Conditions

  • Recall that K moment conditions are sufficient to estimate the K parameters in θ.

If R > K, we can test the validity of the R−K overidentifying moment conditions.

  • By MM estimation we can set K moment conditions equal to zero.

If all R conditions are valid then the R − K moments should also be close to zero.

  • From CLT we have

gT(θ0)

a

∼ N(0, T −1S).

If we use the optimal weights, W opt

T

→ S−1, then ξJ = T · gT(b θGMM)0W opt

T gT(b

θGMM) = T · QT(b θGMM) → χ2(R − K).

  • This is the J-test or the Hansen test for overidentifying restrictions.

In linear models it is often referred to as the Sargan test.

ξJ is not a test of the validity of model or the underlying economic theory. ξJ considers whether the R−K moments are in line with the K identifying moments.

22 of 35

slide-12
SLIDE 12

Example: The C-CAPM Model

  • Consider the consumption based capital asset pricing (C-CAPM) model of Hansen

and Singleton (1982).

  • A representative agent maximizes the discounted value of lifetime utility subject to

a budget constraint:

max

X

s=1

E [δs · u(ct+s) | It] , At+1 = (1 + rt+1) At + yt+1 − ct+1,

where At is financial wealth, yt is income, 0 ≤ δ ≤ 1 is a discount factor, and It is the information set at time t.

  • The first order condition is given by the Euler equation:

u0(ct) = E [δ · u0(ct+1) · Rt+1 | It] ,

where u0(·) is the derivative, and Rt+1 = 1 + rt+1 is the return factor.

23 of 35

  • Now assume a constant relative risk aversion (CRRA) utility function:

u(ct) = c1−γ

t

1 − γ, γ < 1,

so that u0(ct) = c−γ

t . That gives the explicit Euler equation:

c−γ

t

− E £ δ · c−γ

t+1 · Rt+1 | It

¤ = 0.

  • To ensure stationarity, we reformulate:

E " δ · µct+1 ct ¶−γ · Rt+1 − 1 | It # = 0,

which is a conditional moment condition.

  • That implies the unconditional moment conditions

E [f (ct+1, ct, Rt+1; zt; δ, γ)] = E "Ã δ · µct+1 ct ¶−γ · Rt+1 − 1 ! zt # = 0,

for all variables zt ∈ It included in the formation set.

24 of 35

slide-13
SLIDE 13
  • To estimate the parameters, θ = (δ, γ)0, we need at least R = 2 instruments in zt.

We try with R = 3 instruments:

zt = µ 1, ct ct−1 , Rt ¶0 .

  • That produces the moment conditions

E "Ã δ · µct+1 ct ¶−γ · Rt+1 − 1 ! # = 0 E "Ã δ · µct+1 ct ¶−γ · Rt+1 − 1 ! µ ct ct−1 ¶# = 0 E "Ã δ · µct+1 ct ¶−γ · Rt+1 − 1 ! Rt # = 0,

for t = 1, 2, ..., T.

  • The model is formally identified but γ is poorly determined.

Weak instruments, little variation in the data, or wrong model!

25 of 35

Results for US data, 1959 : 3 − 1978 : 12. Method Lags

δ γ T ξJ DF p − val

2-Step

HC 1 0.9987

(0.0086)

0.8770

(3.6792)

237 0.434 1 0.510

Iterated

HC 1 0.9982

(0.0044)

1.0249

(1.8614)

237 1.068 1 0.301

CU

HC 1 0.9981

(0.0044)

0.9549

(1.8629)

237 1.067 1 0.302

2-Step

HAC 1 0.9987

(0.0092)

0.8876

(4.0228)

237 0.429 1 0.513

Iterated

HAC 1 0.9980

(0.0045)

0.8472

(1.8757)

237 1.091 1 0.296

CU

HAC 1 0.9977

(0.0045)

0.7093

(1.8815)

237 1.086 1 0.297

2-Step

HC 2 0.9975

(0.0066)

0.0149

(2.6415)

236 1.597 3 0.660

Iterated

HC 2 0.9968

(0.0045)

−0.0210

(1.7925)

236 3.579 3 0.311

CU

HC 2 0.9958

(0.0046)

−0.5526

(1.8267)

236 3.501 3 0.321

2-Step

HAC 2 0.9970

(0.0068)

−0.1872

(2.7476)

236 1.672 3 0.643

Iterated

HAC 2 0.9965

(0.0047)

−0.2443

(1.8571)

236 3.685 3 0.298

CU

HAC 2 0.9952

(0.0048)

−0.9094

(1.9108)

236 3.591 3 0.309

26 of 35

slide-14
SLIDE 14

Weight Matrix Estimation (Univariate Case)

  • The optimal weight matrix is S−1

T , where ST is a consistent estimator of

S = V [ √ T · gT(θ)] = V "√ T T

T

X

t=1

ft # = 1 T · V " T X

t=1

ft # .

  • If ft and fs are independent, then the variance of the sum is the sum of the variances:

S = 1 T · V " T X

t=1

ft # = 1 T

T

X

t=1

V [ft] = 1 T

T

X

t=1

E £ f2

t

¤ .

A natural estimator is

ST = 1 T

T

X

t=1

f2

t .

  • This is robust to heteroskedasticity by construction and is often referred to as the

heteroskedasticity consistent (HC) covariance estimator.

27 of 35

  • If ft and fs are correlated, the variance includes the covariances:

S = 1 T · V " T X

t=1

ft # = 1 T [V (ft) + 2 · Cov(ft, ft−1) + 2 · Cov(ft, ft−2) + ...] .

  • The heteroskedasticity and autocorrelation consistent (HAC) variance estimator is

ST = b V (ft) +

T−1

X

j=1

2 · d Cov(ft, ft−j),

where

d Cov(ft, ft−j) = 1 T

T

X

t=j+1

ftft−j.

  • Problems:

(1) We cannot estimate as many covariances as observations. (2) The simple HAC estimator is not necessarily positive definite.

28 of 35

slide-15
SLIDE 15
  • We use a weight wj on covariance j, and let wt go to zero as j increases.

This class of so-called kernel estimators can be written as

ST = b V (ft) +

T−1

X

j=1

wj · 2 · d Cov(ft, ft−j),

where wj = k

¡ j

B

¢

. k(·) is a kernel function and B is the bandwidth parameter.

  • Example: Bartlett kernel (Newey-West estimator)

Weights in the Bartlett kernel, B=6 1 Lags

  • 8
  • 6
  • 4
  • 2

2 4 6 8

29 of 35

Example: 2SLS

  • Consider again a regression model

yt = x0

tβ0 + t = x0 1tγ0 + x0 2tδ0 + t,

where E[x1tt] = 0 and E[x2tt] 6= 0. Assume that you have R > K valid instruments in zt so that

g(β0) = E[ztt] = E[zt (yt − x0

tβ0)] = 0.

  • The corresponding sample moments are given by

gT(β) | {z }

(R×1)

= 1 T

T

X

t=1

zt (yt − x0

tβ) = 1

T Z0 (Y − Xβ) ,

where Y (T × 1), X (T × K), and Z (T × R) are the stacked matrices.

  • In this case we cannot solve gT(β) = 0 directly; Z0X is R × K and not invertible.

30 of 35

slide-16
SLIDE 16
  • Instead, we want to derive the GMM estimator by minimizing the criteria function

QT(β) = gT(β)0WTgT(β) = ¡ T −1Z0 (Y − Xβ) ¢0 WT ¡ T −1Z0 (Y − Xβ) ¢ = T −2 ¡ Y 0ZWTZ0Y − 2β0X0ZWTZ0Y + β0X0ZWTZ0Xβ ¢ .

  • We take the first derivative, and the GMM estimator is the solution to

∂QT(β) ∂β = −2T −2X0ZWTZ0Y + 2T −2X0ZWTZ0Xβ = 0.

We find b

βGMM(WT) = (X0ZWTZ0X)−1 X0ZWTZ0Y , depending on WT.

  • To estimate the optimal weight matrix, W opt

T

= S−1

T , we use the estimator

ST = 1 T ·

T

X

t=1

f(wt, zt, θ)f(wt, zt, θ)0 = 1 T

T

X

t=1

b 2

tztz0 t,

which allows for general heteroskedasticity of the disturbance term.

31 of 35

  • For the asymptotic distributions, we recall that

b βGMM

a

∼ N ³ β0, T −1 ¡ D0S−1D ¢−1´ .

The derivative is given by

DT

(R×K)

= ∂gT(β) ∂β0 = ∂ ³ T −1 PT

t=1 zt (yt − x0 tβ)

´ ∂β0 = −T −1

T

X

t=1

ztx0

t,

so the variance of the estimator becomes

V h b βGMM i = T −1 ¡ D0

TW opt T DT

¢−1 = T −1 ⎛ ⎝ Ã −T −1

T

X

t=1

xtz0

t

! Ã T −1

T

X

t=1

b 2

tztz0 t

!−1 Ã −T −1

T

X

t=1

ztx0

t

!⎞ ⎠

−1

= Ã T X

t=1

xtz0

t

!−1

T

X

t=1

b 2

tztz0 t

à T X

t=1

ztx0

t

!−1 .

  • Note that this is the heteroskedasticity consistent (HC) variance estimator (White).

GMM with allowance for heteroskedastic errors automatically produces heteroskedas- ticity consistent standard errors!

32 of 35

slide-17
SLIDE 17
  • If we assume that the error terms are IID, the optimal weight matrix simplifies to

ST = b σ2 T

T

X

t=1

ztz0

t = T −1b

σ2Z0Z,

where b

σ2 is a consistent estimator for σ2.

  • In this case the efficient GMM estimator becomes

b βGMM = ¡ X0ZS−1

T Z0X

¢−1 X0ZS−1

T Z0Y.

= ³ X0Z ¡ T −1b σ2Z0Z ¢−1 Z0X ´−1 X0Z ¡ T −1b σ2Z0Z ¢−1 Z0Y = ³ X0Z (Z0Z)−1 Z0X ´−1 X0Z (Z0Z)−1 Z0Y,

which is identical to the two stage least squares (2SLS) estimator.

  • The variance of the estimator is

V h b βGMM i = T −1 ¡ D0

TS−1 T DT

¢−1 = b σ2(X0Z (Z0Z)−1 Z0X)−1,

which again coincides with the 2SLS variance.

33 of 35

Pseudo-ML (PML) Estimation

  • The first order conditions for ML estimation can be seen as a sample counterpart to

a moment condition

1 T s (θ) = 1 T

T

X

t=1

st (θ) = 0

corresponds to

E[st (θ)] = 0,

and ML becomes a special case of GMM.

  • b

θML is consistent for weaker assumptions than maintained by ML.

The FOC for a normal regression model corresponds to

E[xt(yt − x0

tβ)] = 0,

which is weaker than the assumption that the entire distribution is correctly specified. OLS is consistent even if t is not normal.

  • A ML estimation that maximizes a likelihood function different from the true models

likelihood is referred to as a pseudo-ML or a quasi-ML estimator. Note that the variance matrix is no longer the inverse information.

34 of 35

slide-18
SLIDE 18

(My Unfair) Comparison of ML and GMM

Maximum Likelihood Generalized Method of Moments Assumptions: Full specification. Partial specification/weak assumptions. Know Density(θ0) apart from θ0. Moment conditions: E[f(data;θ0)] = 0. Strong economic assumptions. Efficiency: Cramér Rao lower bound. Efficient based on moment condition. (Smallest possible variance). Larger than Cramér Rao. Typical Statistical description of the data. Estimate deep parameters of approach: Misspecification testing. economic model. Restrictions recover economics. Robustness: First order conditions should hold! Moment conditions should hold! PML is a GMM interpretation of ML. Weights and variances can Use larger PML variance. be made robust.

35 of 35