The Folklore Theorem Return to the general mean-variance - - PowerPoint PPT Presentation

the folklore theorem
SMART_READER_LITE
LIVE PREVIEW

The Folklore Theorem Return to the general mean-variance - - PowerPoint PPT Presentation

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response The Folklore Theorem Return to the general mean-variance specification E( Y | x ) = f ( x , ) , var( Y | x ) = 2 g ( , , x ) 2 . Estimation of via


slide-1
SLIDE 1

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The “Folklore” Theorem

Return to the general mean-variance specification E(Y |x) = f (x, β), var(Y |x) = σ2g(β, θ, x)2. Estimation of β via solution of linear estimating equations. Asymptotic properties of ˆ β under fixed weights vs estimated weights. Throughout we assume the model for the mean function f is correct. Then we consider two scenarios for the variance function g(·): The working model for g(·) is correctly specified; The working model for g(·) is incorrectly specified.

1 / 32 The “Folklore” Theorem

slide-2
SLIDE 2

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Iterative GLS scheme Preliminary estimators ˆ β

∗ and ˆ

θ. Update ˆ σ and ˆ θ by solving

n

  • j=1

  

  • Yj − f
  • xj, ˆ

β

∗2

ˆ σ2g

  • ˆ

β

∗, ˆ

θ, xj 2 − 1    τθ

  • ˆ

β

∗, ˆ

θ, xj

  • = 0,

where τθ(β, θ, x) =

  • 1

νθ(β, θ, x)

  • .

2 / 32 The “Folklore” Theorem

slide-3
SLIDE 3

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Then update ˆ β by solving

n

  • j=1

Yj − f

  • xj, ˆ

β

  • g
  • ˆ

β

∗, ˆ

θ, xj 2 × fβ

  • xj, ˆ

β

  • = 0.

Note that the β argument of g(·) is held at ˆ β

∗ in this update.

Iterate the last two steps C times, possibly to convergence (C = ∞).

3 / 32 The “Folklore” Theorem

slide-4
SLIDE 4

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The theorem Suppose that √n

  • ˆ

β

∗ − β0

  • = Op(1)

and √n

  • ˆ

θ − θ0

  • = Op(1)

(i.e., ˆ β

∗ and ˆ

θ are √n-consistent). Then, under suitable regularity conditions, √n

  • ˆ

β − β0

  • L

− → N

  • 0, σ2

0ΣWLS

  • .

4 / 32 The “Folklore” Theorem

slide-5
SLIDE 5

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Here ΣWLS =

  • lim

n→∞ n−1XTWX

−1 where X = X(β0) =      fβ(x1, β0)T fβ(x2, β0)T . . . fβ(xn, β0)T      , W = diag(w1, w2, . . . , wn) , wj = 1 g(β0, θ0, xj)2.

5 / 32 The “Folklore” Theorem

slide-6
SLIDE 6

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Derivation

0 = n−1/2

n

  • j=1

g

  • ˆ

β

∗, ˆ

θ, xj −2 Yj − f

  • xj, ˆ

β

  • xj, ˆ

β

  • ≈ n−1/2

n

  • j=1

g(β0, θ0, xj)−2{Yj − f (xj, β0)}fβ(xj, β0) +  n−1

n

  • j=1

{Yj − f (xj, β0)}fββ(xj, β0) −n−1

n

  • j=1

g(β0, θ0, xj)−2fβ(xj, β0)fβ(xj, β0)T   n1/2 ˆ β − β0

  • +

 −2n−1

n

  • j=1

g(β0, θ0, xj)−3{Yj − f (xj, β0)}fβ(xj, β0)gβ(β0, θ0, xj)T   n1/2 ˆ β

∗ − β0

  • +

 −2n−1

n

  • j=1

g(β0, θ0, xj)−3{Yj − f (xj, β0)}fβ(xj, β0)gθ(β0, θ0, xj)T   n1/2 ˆ θ − θ0

  • 6 / 32

The “Folklore” Theorem

slide-7
SLIDE 7

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Rewrite as: 0 ≈ Cn+(An1+An2)n1/2(ˆ β−β0)+Dnn1/2(ˆ β

∗−β0)+Enn1/2(ˆ

θ−θ0) where

Cn = σ0n−1/2

n

  • j=1

w1/2

j

fβ(xj, β0)ǫj An1 = σ0n−1

n

  • j=1

w1/2

j

fββ(xj, β0)ǫj An2 = −n−1

n

  • j=1

wjfβ(xj, β0)fβ(xj, β0)T Dn = −2σ0n−1

n

  • j=1

wjfβ(xj, β0)νβ(β0θ0xj, )T ǫj En = −2σ0n−1

n

  • j=1

wjfβ(xj, β0)νθ(β0θ0xj, )T ǫj

7 / 32 The “Folklore” Theorem

slide-8
SLIDE 8

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Also ǫj = Yj − f (xj, β0) σ0g (β0, θ0, xj) so that E(ǫj|xj) = 0 and var(ǫj|xj) = 1.

8 / 32 The “Folklore” Theorem

slide-9
SLIDE 9

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

We have that An1

p

− → 0 An2

p

− → −Σ−1

WLS

Cn

L

− → N

  • 0, σ2

0Σ−1 WLS

  • Dn

p

− → 0 En

p

− → 0 so An = An1 + An2

p

− → −Σ−1

WLS

9 / 32 The “Folklore” Theorem

slide-10
SLIDE 10

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Then √n

  • ˆ

β − β0

  • ≈ −A−1

n Cn L

− → N

  • 0, σ2

0ΣWLS

  • ,

as claimed.

10 / 32 The “Folklore” Theorem

slide-11
SLIDE 11

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Remarks This is the same large-sample distribution as for the WLS estimator with known weights. That is, to this order of approximation, using estimated weights gives the same sampling distribution as if the weights were known. The terms Dn and En corresponding to the “effects” of ˆ β

∗ and ˆ

θ, respectively, are op(1). This implies that the estimators we substitute for β and θ in the weights play no role in determining the large sample properties of the resulting estimator ˆ β. Because the effect of ˆ θ is also negligible, as En = op(1), it implies that how one estimates θ does not matter in determining the properties of the resulting GLS estimator.

11 / 32 The “Folklore” Theorem

slide-12
SLIDE 12

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The main “folklore” message The large sample precision is unaffected not only by the need to estimate the parameters in the weights, but how these parameters are estimated, as long as they are estimated “sensibly.” The result is true for any C: C = 1 for a one-step estimator, C = ∞ for a converged iterated estimator. Here we have assumed that the variance function g(·) is correctly specified.

12 / 32 The “Folklore” Theorem

slide-13
SLIDE 13

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The folklore result implies that ˆ β

·

∼ N

  • β0, σ2
  • X(β0)T W(β0, θ0) X(β0)

−1 . We use this by plugging in ˆ β, ˆ θ, and ˆ σ2 = 1 n − p

n

  • j=1
  • Yj − f
  • xj, ˆ

β 2 g

  • ˆ

β, ˆ θ, xj 2 . This is the approximation used by SAS’s proc nlin and the R function nls.

13 / 32 The “Folklore” Theorem

slide-14
SLIDE 14

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Working Variances

Recall the general mean-variance specification E(Y |x) = f (x, β), var(Y |x) = σ2g(β, θ, x)2. Suppose we use the GLS scheme with the correct mean specification, but a working variance specification var(Y |x) = τ 2h(β, γ, x)2. What do ˆ τ 2 and ˆ γ estimate, and what is the consequence for ˆ β?

14 / 32 The “Folklore” Theorem

slide-15
SLIDE 15

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

We solve

n

  • j=1

  

  • Yj − f
  • xj, ˆ

β

∗2

− ˆ τ 2h

  • ˆ

β

∗, ˆ

γ, xj 2 h

  • ˆ

β

∗, ˆ

γ, xj 2    ξγ

  • ˆ

β

∗, ˆ

γ, xj

  • = 0,

where ξγ(β, γ, x) =

  • 1

∂ log h(β,γ,x) ∂γ

  • .

Suppose that there exist γn and τ 2

n such that

E

  • n
  • j=1
  • {Yj − f (xj, β)}2 − τ 2

nh(β, γn, xj)2

h(β, γn, xj)2

  • ξγ(β, γn, xj)
  • = 0.

15 / 32 The “Folklore” Theorem

slide-16
SLIDE 16

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Then we can view ˆ γ and ˆ τ 2 as estimators of γn and τ 2

n.

Also, if γn → γ∗ and τ 2

n → τ ∗2, then (consistency):

ˆ γ

p

− → γ∗ and ˆ τ 2

p

− → τ ∗2. More strongly (√n-consistency): √n (ˆ γ − γ∗) = Op(1) and √n

  • ˆ

τ 2 − τ ∗2 = Op(1). Note In general, if we fit a parametric model f (x, θ) to data from another density f0(x), the MLE estimates the parameter values θ∗ that minimize the Kullback-Leibler distance from f0(·) to f (·, θ).

16 / 32 The “Folklore” Theorem

slide-17
SLIDE 17

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Asymptotic distribution of ˆ β Using similar methods to those used earlier, we show that ˆ β

·

∼ N

  • β0, σ2
  • XTUX

−1 XTUW−1UX XTUX −1 .

17 / 32 The “Folklore” Theorem

slide-18
SLIDE 18

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Here, as before, X = X(β0) =      fβ(x1, β0)T fβ(x2, β0)T . . . fβ(xn, β0)T      , W = diag(w1, w2, . . . , wn) , wj = 1 g(β0, θ0, xj)2, U = diag(u1, u2, . . . , un) , uj = 1 h(β0, γ∗, xj)2.

18 / 32 The “Folklore” Theorem

slide-19
SLIDE 19

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Notes Note that this is the same asymptotic distribution as in the fixed working weights case. The original “folklore” theorem is the special case where the working variance function h(·) is the same as the true variance function g(·). The efficiency discussion carries over immediately from the fixed weights case.

19 / 32 The “Folklore” Theorem

slide-20
SLIDE 20

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Corrected Standard Errors

If we know, or suspect, that the working variance specification is not the truth, we need to estimate the asymptotic variance matrix σ2

  • XTUX

−1 XTUW−1UX XTUX −1 , rather than σ2

  • XTUX

−1 . X and U can be estimated by plugging in sample estimates ˆ β and ˆ γ. But how about W?

20 / 32 The “Folklore” Theorem

slide-21
SLIDE 21

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Note that σ2 wj = E

  • {Yj − f (xj, β0)}2
  • xj
  • so we can estimate σ2

0W−1 by

R = diag

  • r 2

1, r 2 2, . . . , r 2 n

  • ,

where rj is the unweighted residual rj = Yj − f

  • xj, ˆ

β

  • .

21 / 32 The “Folklore” Theorem

slide-22
SLIDE 22

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

XTURUX is a good large-sample estimator of σ2

0XTUW−1UX, as

XTURUX = σ2

0XTUW−1UX + op(n).

We are led to the sandwich variance estimator

  • XTUX

−1 XTURUX XTUX −1 .

22 / 32 The “Folklore” Theorem

slide-23
SLIDE 23

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Wald Inference

The asymptotic distribution ˆ β

·

∼ N(β0, Σ) may be used to construct: confidence intervals for individual parameters or linear combinations of parameters; hypothesis tests about individual parameters or groups of parameters.

23 / 32 The “Folklore” Theorem

slide-24
SLIDE 24

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Inference is based on the asymptotic normal distribution of an individual parameter estimator, or, to test a hypothesis such as Lβ = Lβ0, the corresponding asymptotic χ2 distribution of a quadratic form such as

  • ˆ

β − β0 T LT L ˆ ΣLT−1 L

  • ˆ

β − β0

  • .

To allow for estimation of Σ, the normal distribution is often replaced by the t-distribution, and the χ2 distribution by the (scaled) F-distribution. This replacement also gives the usual statistics in the (very!) special case of a linear model with known variances.

24 / 32 The “Folklore” Theorem

slide-25
SLIDE 25

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

If the inference is about a nonlinear function of β, say, a(β), then as a(ˆ β) ≈ a(β0) + aT

β (β0)

  • ˆ

β − β0

  • ,

we have, by the Delta method a(ˆ β)

·

∼ N

  • a(β0), aT

β (ˆ

β) Σ aβ(ˆ β)

  • 25 / 32

The “Folklore” Theorem

slide-26
SLIDE 26

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Advantage of Wald inference Σ may be estimated either by assuming that the working variances are correct, or by using the sandwich estimator. Extension of familiar methods. Disadvantages Large-sample distribution may give a poor approximation in small samples. Not invariant to reparametrization of the mean model (may reach different conclusions).

26 / 32 The “Folklore” Theorem

slide-27
SLIDE 27

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Likelihood Inference

To circumvent some of the disadvantages of the Wald inference, at the expense of increased complexity. Start with assuming a normal distribution for Yj, then use likelihood ratio methods for tests, and profile likelihood methods for confidence intervals. The large sample properties of the estimators may be sensitive to the normality assumption for Yj.

27 / 32 The “Folklore” Theorem

slide-28
SLIDE 28

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Alternative approach (less sensitive to assumptions?): L(β) = −1 2

n

  • j=1

ˆ wj {Yj − f (xj, β)}2 is, up to constant terms, the log-likelihood in the linear model. For instance, to test the hypothesis β2 = 0, where β = (βT

1 , βT 2 )T,

β1 ∈ Rr, β2 ∈ Rp−r, we can show that −2{L(ˆ β0) − L(ˆ β)} σ2

·

∼ χ2

p−r.

28 / 32 The “Folklore” Theorem

slide-29
SLIDE 29

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Further improvement in small samples, {L(ˆ β0) − L(ˆ β)}/(p − r) L(ˆ β)/(n − p)

·

∼ Fp−r,n−p.

29 / 32 The “Folklore” Theorem

slide-30
SLIDE 30

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Optimality of GLS

Goal: to show that the GLS estimator is asymptotically optimal in the class of linear estimating equations. The GLS estimator ˆ β satisfies X(β)TW(β, θ0) {Y − f(β)} = 0. Here, we hold θ at its true value θ0; replacing it by its estimator does not change asymptotic distributions, due to the folklore theorem. We assume correct variance specification, so that ˆ β

·

∼ N

  • β0, σ2
  • XTWX

−1 .

30 / 32 The “Folklore” Theorem

slide-31
SLIDE 31

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Consider ˜ β, satisfying the more general linear estimating equation A(β)T{Y − f(β)} = 0. Similar arguments (Taylor first order approximation, WLLN, CLT, Slutsky) show that ˜ β

·

∼ N

  • β0, σ2
  • ATX

−1 ATW−1A XTA −1 . Then we can show that

  • ATX

−1 ATW−1A XTA −1 −

  • XTWX

−1 is nonnegative definite.

31 / 32 The “Folklore” Theorem

slide-32
SLIDE 32

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Conclusion The GLS estimator is asymptotically optimal in the class of linear estimating equations.

32 / 32 The “Folklore” Theorem