[PPT] - Sampling Distributions Recall the general mean-variance PowerPoint Presentation

SLIDE 1

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Sampling Distributions

Recall the general mean-variance specification E(Y |x) = f (x, β), var(Y |x) = σ2g(β, θ, x)2. Closed form estimators with exact known sampling distributions exist

nly in special cases, principally the linear model f (x, β) = xTβ with

Gaussian errors and known variances. ˆ β ∼ N(β, σ2(XTX)−1) for any fixed n. Otherwise, we must use large sample approximations by letting n → ∞.

1 / 50 Sampling Distributions

SLIDE 2

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Issues we aim to address

Analogs of the “unbiasedness” and “minimum variance” properties. Large sample approximations that do not depend on specific distributions for the errors. Consequences of mis-specification of the variances. Tradeoffs between linear and quadratic estimating equations for β, the effect of knowing θ versus the need to estimate it.

2 / 50 Sampling Distributions

SLIDE 3

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Fundamental concepts

Asymptotic distribution. Asymptotic relative efficiency. Disclaimer A casual treatment.

3 / 50 Sampling Distributions

SLIDE 4

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Review of Large Sample Tools

Definition: almost sure convergence: Yn

a.s.

− → Y iff P

lim

n→∞ Yn = Y

= 1.

Definition: convergence in probability: Yn

p

− → Y iff ∀δ > 0, lim

n→∞ P(|Yn − Y | < δ) = 1.

Yn

a.s.

− → Y ⇒ Yn

p

− → Y but not conversely.

4 / 50 Sampling Distributions

SLIDE 5

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

If h(·) is continuous, then Yn

a.s.

− → Y ⇒ h(Yn)

a.s.

− → h(Y ) Yn

p

− → Y ⇒ h(Yn)

p

− → h(Y )

5 / 50 Sampling Distributions

SLIDE 6

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Terminology If ˆ ηn is an estimator from a sample of size n and η0 is the true value

f the parameter, then we have two definitions of consistency:

Strong consistency: ˆ ηn

a.s.

− → η0; Weak consistency: ˆ ηn

p

− → η0. Interpretation Weak consistency: if the sample size n is sufficiently large, the probability is small that ˆ ηn assumes a value outside an arbitrarily small neighborhood of η0; i.e., for n large enough, the probability that ˆ ηn will wander away is small.

6 / 50 Sampling Distributions

SLIDE 7

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Order in probability – Op P

n−k||Yn|| < Mǫ
> 1 − ǫ,

for all n > nǫ. Here · denotes some appropriate norm to measure magnitude of Yn. Notation: Yn = Op(nk); “Big” Op.

7 / 50 Sampling Distributions

SLIDE 8

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Remarks on Op If k = 0, Yn = Op(1). Practically this says that, as n gets large, Yn does not become negligible, nor does it “blow up.” Instead, it is “nicely behaved.” If k = −1/2, Yn = Op(n−1/2). As n → ∞, n−1/2 → 0, so Yn itself “approaches” (converges in probability to) zero at the same “rate” as n−1/2.

8 / 50 Sampling Distributions

SLIDE 9

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Order in probability – op n−kYn

p

− → 0, as n → ∞. Notation: Yn = op(nk); “Little” op.

9 / 50 Sampling Distributions

SLIDE 10

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Remarks on op If k = −1/2, Yn = op(n−1/2). As n → ∞, n−1/2 → 0, so Yn itself “approaches” (converges in probability to) zero at a faster “rate” than n−1/2.

10 / 50 Sampling Distributions

SLIDE 11

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Properties If Xn = op(nk1) and Yn = op(nk2), then XnYn = op(nk1+k2) If Xn = op(nk1) and Yn = Op(nk2), then XnYn = op(nk1+k2) Note that Yn = op(n) ⇒ Yn = Op(n), so the second property implies the first.

11 / 50 Sampling Distributions

SLIDE 12

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Convergence in distribution Definition: convergence in distribution (or law): Yn

L

− → Y iff for each continuity point y of FY (·), lim

n→∞ FYn(y) = FY (y).

Note: Yn

p

− → Y ⇒ Yn

L

− → Y but, in general, not conversely. Special (and trivial) case: if Yn

L

− → y where y is a constant, then Yn

p

− → y. Practical interpretation If we are interested in probability and distributional statements about Yn, we may approximate these with statements about Y .

12 / 50 Sampling Distributions

SLIDE 13

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Asymptotic normality If we can find sequences {an} and {cn > 0} such that cn (Yn − an)

L

− → N(0, 1) we say that Yn is asymptotically normal with asymptotic mean an and asymptotic variance c−2

n .

We write Yn

·

∼ N

an, 1

c2

n

.

13 / 50 Sampling Distributions

SLIDE 14

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Central Limit Theorem Zj are independent with E(Zj) = µj, var(Zj) = Σj. The variance matrices satisfy lim

n→∞

1 n (Σ1 + Σ2 + · · · + Σn) = Σ. The tails of the distributions of Zj satisfy the Lindeberg condition: ∀ǫ > 0, 1 n

n

j=1
||Zj−µj||≥ǫ√n

||z − µj||2dFj(z) → 0 as n → ∞.

14 / 50 Sampling Distributions

SLIDE 15

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Then 1 √n

n

j=1
Zj − µj
L

− → N(0, Σ). In the language of asymptotic normality: if ¯ Zn = 1 n

n

j=1

Zj and ¯ µn = 1 n

n

j=1

µj, then ¯ Zn

·

∼ N

¯

µn, 1 nΣ

.

15 / 50 Sampling Distributions

SLIDE 16

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

More general CLT If we also write ¯ Σn = 1 n

n

j=1

Σj, the variance matrix condition can be written lim

n→∞

¯ Σn = Σ. A more general CLT does not require this convergence: ¯ Zn

·

∼ N

¯

µn, 1 n ¯ Σn

.

16 / 50 Sampling Distributions

SLIDE 17

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The Lindeberg condition becomes: ∀ǫ > 0, 1 nλn

n

j=1

E

1{||Zj−µj||≥ǫ√nλn}||Zj − µj||2

→ 0 as n → ∞, where λn is the smallest eigenvalue of ¯ Σn. Under only this modified Lindeberg condition, ¯ Zn

·

∼ N

¯

µn, 1 n ¯ Σn

.

17 / 50 Sampling Distributions

SLIDE 18

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

In terms of convergence in distribution: Cn ¯ Zn − ¯ µn

L

− → N(0, I) where Cn is any inverse square root of 1

n ¯

Σn: Cn 1 n ¯ Σn

CT

n = I.

18 / 50 Sampling Distributions

SLIDE 19

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Slutsky’s Theorem If Yn

L

− → Y and Vn

p

− → c, a constant, then: Yn + Vn

L

− → Y + c YnVn

L

− → cY and, if c = 0, Yn/Vn

L

− → Y /c.

19 / 50 Sampling Distributions

SLIDE 20

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Multivariate version If Yn

L

− → Y and Vn

p

− → C, a constant matrix, then: Yn + Vn

L

− → Y + C VnYn

L

− → CY and, if C is nonsingular, V−1

n Yn L

− → C−1Y.

20 / 50 Sampling Distributions

SLIDE 21

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Weak Law of Large Numbers {Zj} are uncorrelated and {aj} are constants. If var

1

n

j=1

ajZj

= 1

n2

n

j=1

a2

j var (Zj) → 0 as n → ∞,

then 1 n

n

j=1

ajZj − 1 n

n

j=1

ajE (Zj)

p

− → 0. Furthermore, if n−1 n

j=1 ajE (Zj) → c, then n−1 n j=1 ajZj p

− → c. The condition is satisfied if n−1 n

j=1 a2 j var(Zj) → c, which is a

similar requirement to the CLT.

21 / 50 Sampling Distributions

SLIDE 22

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

How do we use all these results? Suppose that some estimator ˆ η satisfies √n (ˆ η − η0) = A−1

n Cn + op(1),

where: An satisfies the WLLN, and An

p

− → C, a constant matrix; Cn satisfies the CLT, and is asymptotically normal with zero mean. Then ˆ η

·

∼ N(η0, Σn) for some asymptotic variance matrix Σn, typically of the form n−1Σ.

22 / 50 Sampling Distributions

SLIDE 23

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Comparing estimators Suppose that ˆ η(1) and ˆ η(2) are both asymptotically normal with asymptotic mean η0, but with different asymptotic variance matrices n−1Σ(1) and n−1Σ(2). Which should we prefer? In the univariate case, the one with the smaller asymptotic variance.

23 / 50 Sampling Distributions

SLIDE 24

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

In the multivariate case, consider estimating the linear combination λTη: var

λT ˆ

η(1) = 1 nλTΣ(1)λ var

λT ˆ

η(2) = 1 nλTΣ(2)λ. So if for any λ, λTΣ(1)λ ≤ λTΣ(2)λ, we prefer ˆ η(1). That is, if Σ(2) − Σ(1) is non-negative definite.

24 / 50 Sampling Distributions

SLIDE 25

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Asymptotic relative efficiency To measure how much better ˆ η(1) is than ˆ η(2), use the generalized asymptotic relative efficiency  

Σ(1)
Σ(2)




1 k

. Note: for a given λ, the ARE is λTΣ(1)λ λTΣ(2)λ . As λ varies, this ratio varies between the smallest and largest eigenvalues of Σ(1)Σ(2)−1.

25 / 50 Sampling Distributions

SLIDE 26

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The eigenvalues of Σ(1)Σ(2)−1 could be called the canonical AREs. The product of the canonical AREs is

Σ(1)Σ(2)−1
=
Σ(1)
Σ(2)
= (generalized ARE)k

so the generalized ARE is the geometric mean of the canonical AREs.

26 / 50 Sampling Distributions

SLIDE 27

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

M-estimators

{Zj} are independent, each with a distribution depending on a parameter η. An M-estimator ˆ η satisfies either: ˆ η minimizes a scalar criterion

n

j=1

ρj(Zj, η) ; ˆ η is the solution of estimating equations

n

j=1

Ψj(Zj, η) = 0.

27 / 50 Sampling Distributions

SLIDE 28

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

If the ρj(·) are differentiable, the minimum of the scalar criterion can be found by solving the corresponding gradient equation. Not all estimating equations can be interpreted as gradient equations, so the estimating equation approach is somewhat more general. The MLE minimizes a scalar criterion of this form, with ρj(·) = −2 × log likelihood of Zj; M-estimators may be thought of as generalized MLEs.

28 / 50 Sampling Distributions

SLIDE 29

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Consistency Generally, ˆ η is a consistent estimator of the true parameter value η0 if Eη0{Ψj(Zj, η0)} = 0. and Eη0{Ψj(Zj, η∗)} = 0 for any η∗ = η0 (unique η0).

29 / 50 Sampling Distributions

SLIDE 30

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Asymptotic normality Assuming ˆ η

p

− → η0, we expand

n

j=1

Ψj(Zj, ˆ η) = 0 around η0: = 1 √n

n

j=1

Ψj(Zj, ˆ η) ≈ 1 √n

n

j=1

Ψj(Zj, η0) +

1

n

j=1

∂Ψj(Zj, η0) ∂η

√n(ˆ

η − η0).

30 / 50 Sampling Distributions

SLIDE 31

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Rearrange as: √n (ˆ η − η0) = −A−1

n Cn + op(1).

where An = 1 n

n

j=1

∂Ψj(Zj, η0) ∂η and Cn = 1 √n

n

j=1

Ψj(Zj, η0) .

31 / 50 Sampling Distributions

SLIDE 32

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

First we can show that An satisfies the WLLN: An − 1 n

n

j=1

E ∂Ψj(Zj, η0) ∂η

p

− → 0. If in addition 1 n

n

j=1

E ∂Ψj(Zj, η0) ∂η

→ A

for some A, assumed to be nonsingular, then An

p

− → A.

32 / 50 Sampling Distributions

SLIDE 33

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Next we show that Cn satisfies the CLT: Cn

·

∼ N(0, B), where B = lim

n→∞ Bn

(assumed to exist), and Bn = 1 n

n

j=1

E

Ψj(Zj, η0) Ψj(Zj, η0)T

. Finally Slutsky’s theorem implies that √n (ˆ η − η0)

L

− → N

0, A−1B
A−1T

.

33 / 50 Sampling Distributions

SLIDE 34

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Equivalently, ˆ η

·

∼ N

η0, 1

nA−1B

A−1T
.

We can use the alternative version of the CLT to show that ˆ η

·

∼ N

η0, 1

nA−1

n Bn

A−1

n

T

without requiring the existence of either limn→∞ An or limn→∞ Bn.

To use this asymptotic distribution as a small sample approximation, we need an approximation to 1

nA−1 n Bn (A−1 n ) T.

34 / 50 Sampling Distributions

SLIDE 35

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

For An, plug in ˆ η for η0: ˆ An = 1 n

n

j=1

∂Ψj(Zj, ˆ η) ∂η For Bn, two strategies: Plug in ˆ η for η0, in Ψj(·) and in the expectation; model-based variance estimator. Replace the expectation with its sample analog: ˆ Bn = 1 n

n

j=1

Ψj(Zj, ˆ η) Ψj(Zj, ˆ η)T ; a sandwich variance estimator.

35 / 50 Sampling Distributions

SLIDE 36

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

WLS as an M-estimator

The usual nonlinear mean model E(Yj| xj) = f (xj, β) , but with x1, x2, . . . , xn treated as fixed. Simple variance structure var(Yj| xj) = σ2 wj .

36 / 50 Sampling Distributions

SLIDE 37

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Suppose we fit using working variances var(Yj| xj) = σ2 uj . That is, we estimate β by solving

n

j=1

uj {Yj − f (xj, β)} fβ(xj, β) = 0 using working weights uj.

37 / 50 Sampling Distributions

SLIDE 38

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Note that Eβ(Ψj(Yj, β)| xj) = 0 because the mean is specified correctly, regardless of the mis-specification of the variances. So ˆ βu is still consistent. This is in M-estimator form, with Zj = Yj, η = β, and Ψj(Yj, β) = uj {Yj − f (xj, β)} fβ(xj, β) .

38 / 50 Sampling Distributions

SLIDE 39

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Following the same argument as before, n1/2 ˆ β − β0

≈ −A−1

n Cn,

where Cn = σ0n−1/2

n

j=1

ujw −1/2

j

ǫjfβ(xj, β0), An = An1 + An2, An1 = σ0n−1

n

j=1

ujw −1/2

j

ǫjfββ(xj, β0), An2 = −n−1

n

j=1

ujfβ(xj, β0)fβ(xj, β0)T.

39 / 50 Sampling Distributions

SLIDE 40

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Then ˆ βu is asymptotically normal: ˆ βu

·

∼ N

β0, σ2
XTUX

−1 XTUW−1UX XTUX −1 where X is the gradient matrix X = X(β) =      fβ(x1, β)T fβ(x2, β)T . . . fβ(xn, β)T      and U = diag(u1, u2, . . . , un), W = diag(w1, w2, . . . , wn).

40 / 50 Sampling Distributions

SLIDE 41

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Special cases If the working variances are equal to the true variances, the result simplifies to ˆ βw

·

∼ N

β0, σ2
XTWX

−1 If the true variances are constant, i.e., W = I, the result further simplifies to the familiar OLS form ˆ βw

·

∼ N

β0, σ2
XTX

−1 .

41 / 50 Sampling Distributions

SLIDE 42

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Comparisons via asymptotic relative efficiency Fit OLS, i.e., U = I, when the true weight matrix is W. The asymptotic variance matrix of ˆ βOLS is σ2

XTX

−1 XTW−1X XTX −1 . The asymptotic variance matrix of ˆ βw is σ2

XTWX

−1 .

42 / 50 Sampling Distributions

SLIDE 43

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

We can write

XTX

−1 XTW−1X XTX −1 −

XTWX

−1 = QT(I − P)Q, where P = W

1 2X

XTWX

−1 XTW

1 2

and Q = W− 1

2X

XTX

−1 . But P and I − P are symmetric and idempotent, hence nonnegative definite. So the variance of ˆ βOLS exceeds that of ˆ βw by a nonnegative definite matrix.

43 / 50 Sampling Distributions

SLIDE 44

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

More generally, suppose we fit WLS with a working weight matrix U, when the true weight matrix is W. The asymptotic variance matrix of ˆ βu is σ2

XTUX

−1 XTUW−1UX XTUX −1 The asymptotic variance matrix of ˆ βw is σ2

XTWX

−1 .

44 / 50 Sampling Distributions

SLIDE 45

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

For general U, a similar argument shows that the variance of ˆ βu exceeds that of ˆ βw by a nonnegative definite matrix. So the correctly weighted ˆ βw is asymptotically optimal. All canonical asymptotic relative efficiencies of ˆ βOLS or ˆ βu with respect to ˆ βw are at most 1.

45 / 50 Sampling Distributions

SLIDE 46

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Best case: PQ = Q ⇒ OLS is fully efficient. This occurs when each column of WX is in the column space of X; e.g., each column of X is an eigenvector of W. Worst case: generalized ARE is

4w1wn

(w1 + wn)2 × 4w2wn−1 (w2 + wn−1)2 × · · · × 4wkwn−k+1 (wk + wn−k+1)2 1

k

where w1 ≥ w2 ≥ · · · ≥ wn are the ordered weights (and n ≥ 2k). This occurs, e.g., when the jth column of X is the sum of the jth and (n − j + 1)th eigenvectors of W.

46 / 50 Sampling Distributions

SLIDE 47

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

If w1 < 2wn then 4w1wn (w1 + wn)2 > 8 9 = 0.89. Also w2 ≤ w1 < 2wn ≤ 2wn−1, so 4w2wn−1 (w2 + wn−1)2 > 8 9, and so on. So generalized ARE > 8

9.

Conclusion: if optimal weights vary by less than 2 : 1, OLS is at least 89% efficient.

47 / 50 Sampling Distributions

SLIDE 48

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

In the worst case, the canonical AREs are the factors 4wjwn−j+1 (wj + wn−j+1)2, j = 1, 2, . . . , k. So if we are interested in estimating the linear combination λTβ, 4w1wn (w1 + wn)2 ≤ var

λT ˆ

βw

var
λT ˆ

βOLS ≤ 4wkwn−k+1 (wk + wn−k+1)2.

48 / 50 Sampling Distributions

SLIDE 49

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

So again, if w1 < 2wn, then var

λT ˆ

βw

var
λT ˆ

βOLS > 8 9 and, for all λ, the ARE of ˆ βOLS for estimating λTβ is at least .89. That is, there are no linear combinations λTβ for which ˆ βOLS performs especially badly.

49 / 50 Sampling Distributions

SLIDE 50

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Note: above inequalities are true for general, non-diagonal, W if “optimal weight” is replaced by “eigenvalue of W”. They also generalize to arbitrary W and U if W is replaced by U−1W.

50 / 50 Sampling Distributions