Hypothesis Testing: Large Sample Asymptotic Theory Part IV James - - PowerPoint PPT Presentation

hypothesis testing large sample
SMART_READER_LITE
LIVE PREVIEW

Hypothesis Testing: Large Sample Asymptotic Theory Part IV James - - PowerPoint PPT Presentation

Hypothesis Testing: Large Sample Asymptotic Theory Part IV James J. Heckman University of Chicago Econ 312 This draft, April 18, 2006 This lecture consists of three sections: Three commonly used tests Composite hypothesis


slide-1
SLIDE 1

Hypothesis Testing: Large Sample

Asymptotic Theory — Part IV

James J. Heckman University of Chicago Econ 312 This draft, April 18, 2006

slide-2
SLIDE 2

This lecture consists of three sections:

  • Three commonly used tests
  • Composite hypothesis
  • Application to linear regression model

ln $ =

  • X

=1

ln ( | ) , where the are i.i.d. 1

slide-3
SLIDE 3

1 Three commonly used tests

In this section we examine 3 commonly used (asymptotically equivalent) tests:

  • 1. Wald Test
  • 2. Rao, Score or Lagrange Multiplier (LM) Test
  • 3. LR Test

We motivate the derivation of the test statistic in each case and show asymptotic equivalence of the three tests. 2

slide-4
SLIDE 4

1.1 Wald Test

Consider a simple null hypothesis: 0 : = 0, : 6= 0. Natural test uses the fact1 that:

0) (0 1

0 )

so that

0)00

0) 2() ( =number of parameters), where: 0 = 1

  • 2 ln $

¸ = 2 ln ( | ) ¸ , for i.i.d.

1See Lecture III on Asymptotic theory.

3

slide-5
SLIDE 5

This is the Wald Test and is similar to the test used in OLS hypothesis testing. We use the uniform convergence of ˆ to 0 to obtain plim ˆ

= 0, so that the test statistic for the Wald

test: =

0)0ˆ

0)

d

2() in large samples. Note that the Wald test is based on the unrestricted MLE model. 4

slide-6
SLIDE 6

Recall at true parameter vector = 0,

  • ln $ ( | )
  • ¸

= Z ln $ ( | )

  • ¯

¯ ¯ ¯

=0

$ ( | 0) = 0 because Z $ ( | 0) = 1 =

  • Z $ ( | )
  • = 0,

but $ = ln $

  • $

=

  • Z ln $ ( | )
  • ¯

¯ ¯ ¯

=0

$ ( | 0) = 0 5

slide-7
SLIDE 7

Dierentiating again, Z 2 ln $ ( | ) ¯ ¯ ¯ ¯

=0

$ ( | 0) + Z ln $ ( | )

  • ¯

¯ ¯ ¯

=0

ln $ ( | ) ¯ ¯ ¯ ¯

=0

$ ( | 0) = Therefore, 0 = " 1

  • Ã

ln $

  • ¯

¯ ¯ ¯

=0

ln $ ¯ ¯ ¯ ¯

=0

!#

  • 6
slide-8
SLIDE 8

Cramer-Rao Lower Bound (Scalar Case) Consider an estimator () of : (()) = Z ()$ (; ) Under regularity, (())

  • =

Z () ln $ (; )

  • =
  • μ

() ln $

  • 7
slide-9
SLIDE 9

E( ())

  • =

Z () $ d = Z () ln $

  • $ d

= Cov μ () ln $

, because E μ ln $

= 0. 8

slide-10
SLIDE 10

From the Cauchy-Schwartz inequality, μ (())

  • ¶2

(()) μ ln $

| {z }

  • For full rank information matrix 0,

(()) μ (())

  • ¶2
  • 9
slide-11
SLIDE 11

If unbiased, (()) = ln (())

  • = 1

(()) 1 There is a vector version: (()) 1 Cannot do better than MLE in terms of eciency. 10

slide-12
SLIDE 12

1.2 Rao Test (also LM or Score Test)

The second Test — Rao Test (LM) is based on the restricted

  • model. Observes that in a large enough sample 0 (true para-

meter value) should be a root of the likelihood equation: 1

  • ln $
  • ¯

¯ ¯ ¯ 1

  • X

=1

ln ( )

  • ¯

¯ ¯ ¯ = 0 i.e., it imposes the null onto the model for all sample sizes. In contrast, the Wald Test gets its statistics from estimates of the unrestricted model (i.e., a model where the null is not im- posed on the estimates). The Likelihood Ratio Test compares the likelihood restricted with the likelihood unrestricted. 11

slide-13
SLIDE 13

ln $

  • ¯

¯ ¯ ¯ˆ

M L

= 0 and E μ ln ( | )

= 0, but 1

  • ln $
  • P

E μ ln ( | )

= 0 =

  • 1
  • ln $
  • ¯

¯ ¯ ¯

P

0. 12

slide-14
SLIDE 14

Now

By C.L.T.; Lindberg-Levy for i.i.d.

z }| { 1

  • ln $
  • ¯

¯ ¯ ¯ = 1

  • X

=1

ln ( )

  • ¯

¯ ¯ ¯ (0 0) implies that the hypothesis could be tested by testing if score ln $

  • ¯

¯ ¯ ¯ = 0, at the restricted parameter values.2

2For the distributional results, refer to earlier lectures on Asymptotic

theory.

13

slide-15
SLIDE 15

Thus this test uses the statistic:

  • =

μ 1

  • ln $
  • ¶0

1

ˆ

  • μ 1
  • ln $

d

2 () . 2() in large samples can be shown using plim ˆ

= 0.

14

slide-16
SLIDE 16

The Rao test is also called the Score Test or the Lagrange Multiplier test. This is because this test can be motivated from the solution to a constrained maximization of the log- likelihood subject to constraint = 0. We get: Lagrangian: ln $ 0( 0) FOC: ln $

  • = 0

At null: = ln $

  • = 0

Thus one can test on (the Lagrange Multiplier) or on the Score μ ln $

. 15

slide-17
SLIDE 17

Asymptotic Equivalence of Wald and LM tests To estab- lish the asymptotic relationships between the two tests, we use Taylor’s theorem to write: 1

  • X ln
  • ¯

¯ ¯ ¯ˆ

  • |

{z }

=0 by construction

= 1

  • X ln
  • ¯

¯ ¯ ¯ + 1

  • μX 2 ln

0) where is an intermediate value with k0k kk kˆ k 16

slide-18
SLIDE 18

From above, we get the duality relationships between score and parameter vectors: (ˆ 0) = μX 2 ln ¶1

  • X ln
  • ¯

¯ ¯ ¯

0) = μ 1

  • μX 2 ln

  • ¶1

1

  • X ln
  • ¯

¯ ¯ ¯

  • Noting that plim = 0, substituting into the Wald statistic

we get: plim = μ 1

  • ln $
  • ¶0¯

¯ ¯ ¯ 1

0 01

μ 1

  • ln $
  • ¶¯

¯ ¯ ¯ = plim Thus, asymptotically the Wald and Rao tests are equivalent. 17

slide-19
SLIDE 19

1.3 Likelihood Ratio Test

The third commonly used test is the Likelihood Ratio Test and it uses both the restricted and the unrestricted models. Taylor expanding the Likelihood function around the point ˆ we get: ln $(0) | {z }

showed start with the restricted

= ln $(ˆ ) + ln $

  • ¯

¯ ¯ ¯ˆ

  • (0 ˆ

) | {z }

=0 by construction

+1 2(0 ˆ )0 μ2 ln $ ¶

(0 ˆ

) where is an intermediate value with k0k kk kˆ k. 18

slide-20
SLIDE 20

ln $0

  • ln $

³ ˆ ´ ln $ ³ ˆ

  • ´

ln $ ln $ = ln $0 + ln $0 ( 0)

  • |

{z }

=0

+ 0 2 2$0 ( 0) 2 (ln $ ln $0) = ( 0)0 2$0 0 ( 0) = 2 (ln $ ln $0) | {z }

  • 19
slide-21
SLIDE 21

=

  • ( 0)0

μ 22$0 () 0 ¶ ( 0)

  • where

μ 22$0 () 0 ¶

p

and ( 0)

  • d

¡ 0 1 ¢ . 20

slide-22
SLIDE 22

2[ln $(ˆ )ln $(0)] =

0)0 μ1

  • 2 ln $

0) 2 ln " $(ˆ ) $(0) #

0)00

0). We may also write 2 ln " $(ˆ ) $(0) #

  • 00 2 () ;

¡ 0 1 ¢ , and

  • ³

ˆ ´

  • .

21

slide-23
SLIDE 23

Based on the above derivation, we get the statistic for the likelihood ratio test: = 2 ln à ˆ

  • ˆ
  • !
  • where ˆ

is the restricted maximum likelihood function and ˆ is the unrestricted maximum likelihood function, and as shown above:

  • 2()

Note that ˆ ˆ (as an unrestricted maximized function would be greater than or equal to the restricted maximized function), so that always LR 0. 22

slide-24
SLIDE 24

Asymptotic Equivalence of LR and Wald tests From the derivation above, it is clear that asymptotically, the LR test statistic converges to the Wald test statistic, i.e., plim =

0) | {z }

d

  • 00

0) | {z }

d

  • = plim

where (0 0). We saw earlier that the LM and Wald tests were asymptotically

  • equivalent. Thus along with the above asymptotic equivalence
  • f LR and Wald we get that asymptotically all three commonly

used tests are equivalent, i.e., asymptotically: 23

slide-25
SLIDE 25

2 Composite Hypothesis

Consider a parameter vector = (1 2) (where 1 and 2 are possibly vectors). In many situations, we may want to consider estimating only a subset of the parameters 2. This could be because we omit 1 or we hypothesize that 1 = 0. In this context, 2 questions are relevant:

  • When do we get an unbiased estimate for 2 if we do the

MLE estimation omitting 1?

  • How do the results obtained in the previous section apply

to test the composite hypothesis: 0 : 1 = 0 and 2 unrestricted : 1 and 2 both unrestricted We explore these two questions in the next few subsections. 24

slide-26
SLIDE 26

2.1 Definitions

We use the following defined expressions in the rest of the section.

  • Define true parameter vector 0 (¯

1 ¯ 2);

  • Define estimated parameter vector (unconstrained) as

ˆ (ˆ 1 ˆ 2);

  • Define estimated parameter vector (constrained, i.e. un-

der the null hypothesis setting 1 = 0) as ˜ (0 ˜ 2);

  • In Taylor expansions hereafter, we use 0 and inter-

changeably (neglecting (1) terms); 25

slide-27
SLIDE 27
  • Define the various information matrix related terms as

follows: 1

  • μ2$
  • 0 =

μ 11 12 21 22 ¶ 26

slide-28
SLIDE 28

2.2 When is ˆ 2 unbiased if 1 is omitted?

In this subsection, we show that the condition for ˆ 2 to be an unbiased estimator of ¯ 2 when the model is misspecified and estimated omitting 1 is that the score vectors ln $ 1 and ln $ 2 be orthogonal to each other. 27

slide-29
SLIDE 29

First, Taylor expanding the score vectors 1 and 2, we get: 1

  • μ$

1 ¶

ˆ

  • =

1

  • μ$

1 ¶ (1) 11

1 ¯ 1) 12

2 ¯ 2) + (1) 1

  • μ$

2 ¶

ˆ

  • =

1

  • μ$

2 ¶ (2) 22

2 ¯ 2) 21

1 ¯ 1) + (1) (1) 0 = 1

  • μ$

1 ¶ 11

1¯ 1)12

2¯ 2)+(1) (2) 0 = 1

  • μ$

2 ¶ 22

2¯ 2)21

1¯ 1)+(1) 28

slide-30
SLIDE 30

Collecting terms we get: μ (ˆ 1 ¯ 1)

2 ¯ 2) ¶ = []1

  • 1
  • μ$

1 ¶ 1

  • μ$

2 ¶

  • Next consider ˜

2 (MLE of 2 given ¯ 1 = 0). Expanding around root of likelihood ¯ 1 = 0, we get: 1

  • μ$

2 ¶

˜

  • =

1

  • μ$

2 ¶ 22

2 ¯ 2) + (1) (3) Now the first term on the right hand side is in common with the term on of (2) above. Therefore we have that: 22

  • (e

2 2) = 22

  • (b

2 2)+21

  • (b

1 1)+(1) 29

slide-31
SLIDE 31
  • r:
  • (e

2 2) =

  • (b

2 2) (4) +1

2221

  • (b

1 1) + (1) Thus, we get that for

2 ¯ 2) =

2 ¯ 2) we need: 21 = ln $ 2 μ ln $ 1 ¶0¸ = 0 i.e., for ˜ (the constrained estimator) to have the same proper- ties as ˆ (the unconstrained estimator) we need that the score vectors ln $ 1 and ln $ 2 be uncorrelated. 30

slide-32
SLIDE 32

Similarity with OLS result: Note that this result is analo- gous to a result for the OLS framework. In the OLS model we have: = 11 + 22 + If we run the regression omitting 1, we have: = 22 + { + 11} Then we get, under standard OLS assumptions: plimˆ 2 = 2 + plim μ0

22

  • ¶1

plim μ0

21

1 so that we get consistent estimate ˆ 2 if the 2 and 1 are

  • rthogonal to each other.3

3Note plim2 01

  • = [0

21] by appropriate LLN.

31

slide-33
SLIDE 33

Note that the score vectors in MLE are analogous to the data vectors in OLS; we shall revisit this analogy again in Section 3 below. 32

slide-34
SLIDE 34

2.3 Hypothesis testing results for the com- posite hypothesis

In this subsection, we show the following results:

  • Asymptotic equivalence of LR test and Wald test;
  • Asymptotic equivalence of Rao test (Score/LM test) and

the LR test. 33

slide-35
SLIDE 35

2.3.1 Equivalence between Likelihood Ratio and Wald Tests From the definition of the LR test statistic in Section 1.3, we have the analogous definition for the composite hypothesis case as: LR = 2 ln $(ˆ 1 ˆ 2) $(1 = 0 ˜ 2) = 2 ln $(1 = 0 ˜ 2) $(ˆ 1 ˆ 2)

  • r

LR = 2[ln $(1 = 0 ˜ 2) ln $(¯ 1 ¯ 2)] +2[ln $(ˆ 1 ˆ 2) ln $(¯ 1 ¯ 2)] 34

slide-36
SLIDE 36

Then following same steps as in derivation in Section 1.3, we get: LR =

1 ¯ 1 ˆ 2 ¯ 2)00

1 ¯ 1 ˆ 2 ¯ 2)

  • (0 ˜

2 ¯ 2)00

  • (0 ˜

2 ¯ 2) + (1)

  • r

LR =

1 ¯ 1 ˆ 2 ¯ 2)00

1 ¯ 1 ˆ 2 ¯ 2)

2 ¯ 2)022

2 ¯ 2) + (1) 35

slide-37
SLIDE 37

Substituting for (˜ 2 ¯ 2) from equation (4) in Section 2.2, we get: LR =

1 ¯ 1 ˆ 2 ¯ 2)0¯

1¯ 2

1 ¯ 1 ˆ 2 ¯ 2)

  • h

(ˆ 2 ¯ 2) + (22)121

1 ¯ 1) i0 22 h (ˆ 2 ¯ 2) + (22)121

1 ¯ 1) i + (1) Call ˆ 1 ¯ 1 and ˆ 2 ¯ 2 . 36

slide-38
SLIDE 38

Then we have: LR =

  • 011
  • +
  • 021
  • +
  • 012
  • +
  • 022
  • 012(22)122(22)121
  • 012(22)122
  • 022
  • 0221

2221

  • + (1)

=

  • 0[11 12(22)121]
  • + (1)

=

  • 0(11)1

+ (1) (5) where 11 is the upper left diagonal block of: 1

0 =

μ 11 12 21 22 ¶1 37

slide-39
SLIDE 39

(See partitioned inverse result established in section 3.) In the composite hypothesis case, the Wald test exploits the fact that:

1 ¯ 1) (0 11) so that the Wald test statistic here is: =

1 ¯ 1)0(11)1 (ˆ 1 ¯ 1) + (1) (6) From (5) and (6) above, we get directly the asymptotic equiv- alence of the Wald and LR tests, i.e., , in large samples. 38

slide-40
SLIDE 40

2.3.2 Equivalence between Rao (Score/LM) test and the other tests To derive the statistic for the Rao (Score/LM) test, first com- pute the derivative with respect to 1 when 1 constraint is imposed): 1

  • ln $

1 ¯ ¯ ¯ ¯˜

  • =

1

  • ln $

1 ¯ ¯ ¯ ¯ 12

  • ³

˜ 2 2 ´ +(1) (7) 39

slide-41
SLIDE 41

From Eqn (3) in Section 2.2 we have:

  • ³

˜ 2 2 ´ = 1

22

1

  • ln $

2 ¯ ¯ ¯ ¯ + (1) Substituting above result in (7), we get: 1

  • ln $

1 ¯ ¯ ¯ ¯˜

  • =

1

  • ln $

1 ¯ ¯ ¯ ¯ 121

22

1

  • ln $

2 ¯ ¯ ¯ ¯ + (1)

  • 121

22 + (1)

defining 1

  • μ$

1 ¶ and 1

  • μ$

2 ¶ . 40

slide-42
SLIDE 42

Then we obtain variance of the key score parameter as: μ 1

  • ln $

1 ¯ ¯ ¯ ¯˜

  • [( 121

22)( 121 22)0]

= 11 + 121

22221 2221

2121

2221

= (11)1 41

slide-43
SLIDE 43

·

We have the Rao test statistic: LM = μ 1

  • ln $

1 ¯ ¯ ¯ ¯˜

  • ¶0

11 μ 1

  • ln $

1 ¯ ¯ ¯ ¯˜

= £ 121

22

¤0 11 £ 121

22

¤

  • From Section 2.2 and results for partitioned inverses (refer Sec-

tion 2.3.1), we have:

1 ¯ 1) = 11 1

  • ln $

1 ¯ ¯ ¯ ¯ + 12 " 1

  • $

2 ¯ ¯ ¯ ¯ # + (1) = 11 " 1

  • ln $

1 ¯ ¯ ¯ ¯ 121

22

1

  • ln $

2 ¯ ¯ ¯ ¯ # + (1) = 11 £ 121

22

¤ + (1) 42

slide-44
SLIDE 44

Substituting this into expressions for the Wald statistic, we get:

  • =

1 ¯ 1)011 (ˆ 1 ¯ 1) = [ 121

22]011[ 121 22] + (1)

  • LM (Rao/Score)

Thus, along with the result in Section 2.3.1, we have even for the composite hypothesis case, asymptotically: (Rao) 43

slide-45
SLIDE 45

3 Application to linear regression model

In this section, we look at some analogies between standard MLE results derived in the earlier sections and OLS regression

  • results. (Recall that we already look at some analogies between

OLS and MLE in Section 2.2 above.) First, we show analogy between OLS and MLE for estimation

  • f parameter sub-vectors. Then we derive expressions for the

three common test statistics and establish certain results for these. 44

slide-46
SLIDE 46

3.1 Estimation of parameter subsets

Key result in regression is Theorem of the Partitioned Inverse: = 11 12 21 22 ¸ assume nonsingular, Assume 11 is nonsingular so 22 211

11 12 = .

Then 1 = 1

11 ( + 1212111)

1

11 121

1211

11

1 ¸ Proof: Multiply it out. 45

slide-47
SLIDE 47

In OLS case, it produces a useful result (Frisch-Waugh-Goldberger): 0 =

11

  • 12
  • 21
  • 22

¸ In the OLS case we partition the matrix of independent vari- ables as = ¡ 1 2 ¢ and also 0 = μ 0

11 12

10

2 22

  • 46
slide-48
SLIDE 48

We define: 1

  • 1(0

11)10 1

  • 22 0

21(0 11)10 12

=

212

Now we have result for OLS regression: ˆ

  • =

μ ˆ 1 ˆ 2 ¶ = (0)1(0 ) =

11 12

10

2 22

¸1 0

1 2

¸ (8) 47

slide-49
SLIDE 49

Then using the results from theorem of partitioned inverse above and simplifying, we can write: ˆ 1 = (0

11)1(0 1 )

+(0

11)10 1210 21(0 11)10 1

(0

11)10 1210 2

= (0

11)10 1 (0 11)1(0 12)10 21

ˆ 2 = 10

21(0 11)10 1 + 10 2

= 10

2( 1(0 11)11)

= 10

21

(9) This leads to a Double Residual Regression Result. ˆ 2: Regress

  • n 1, 2 on 1; form residuals. Regress one set of residuals
  • n another.

48

slide-50
SLIDE 50

Result: ˆ 2 is the regression of “cleaned out ” on “cleaned

  • ut 2”

This result follows directly from the results (8) and (9) above. Define: “Cleaned out ” = [ 1(0

11)10 1]

= 1 (Residual from the regression of on 1) “Cleaned out 2”

2 = [ 1(0 11)10 1] 2 = 12

(Residual from the regression of 2 on 1) 49

slide-51
SLIDE 51

Then from (9) above we have: ˆ 2 = 10

21

= [0

212]1(0 21 )

= [0

2

  • 2]1(

2 )

Note that we really don’t have to clean out , just since 1is

  • idempotent. Further, if 1 and 2 are orthogonal (uncorre-

lated), then

2 = 2, and hence unbiased/consistent estimate

  • f ˆ

2 can be obtained by directly regressing 2 on . (Recall that we derived the same result for MLE in Section 2.2.) 50

slide-52
SLIDE 52

Also observe (derived directly from ˆ = 1ˆ 1 + 2ˆ 2) that: ˆ 0 ˆ = 01(0

11)10 11

| {z }

Part due to 1

+ 01210

21

| {z }

Part due to orthogonalized 2

Thus, unless regressors are orthogonal, there are no unique contributions. Proof.

  • 1. Observe that:

11 = 0 22 = 0 1 = 2 = + = ³ 1ˆ 1 + 2ˆ 2 ´ + 51

slide-53
SLIDE 53
  • 2. Observe that:

12 = ( 1) 2 = linear combination of 2 =

  • · 12 = 0.

3.

21

= 2111 + 0

212ˆ

2 + 0

22

= 2111 + 0

212ˆ

2 + 0 (12) , where 2111 0 and 12 0. Thus, ˆ 2 = (0

212)1 (0 21 ) .

52

slide-54
SLIDE 54
  • 4. Observe that ˆ

2 is the result of the regression 1 = 12ˆ 2 + error2, but = 11 + 22 + =

  • 1 = 111 + 122 + 1

=

  • 1 = 122 + 111 0

=

  • error2 = .

(from Davidson & MacKinon) 53

slide-55
SLIDE 55

Analogously, for MLE in neighborhood of optimum we have that:

  • ³

b ´

  • ln $
  • μ ln $
  • ¶0
  • 1

1

  • ln $
  • + (1)

54

slide-56
SLIDE 56

= 1

  • ln $

1 μ ln $ 1 ¶0 ln $ 1 μ ln $ 2 ¶0 ln $ 2 μ ln $ 1 ¶0 ln $ 2 μ ln $ 2 ¶0

  • 1

× 1

  • ln $

1 ln $ 2

  • + (1),

where is a 1 × vector of ones, i.e. = (1 1 1 1 1). 55

slide-57
SLIDE 57

Comparing above equation to the result for the standard OLS regression (see eqn (8) above), we note that the MLE result is analogous to regressing on the score vector (a vector of

  • nes); i.e., we have that:

[0

1 2]

  • μ ln $

1 ¶ μ ln $ 2 ¶0¸ and

  • 56
slide-58
SLIDE 58

Further, we also get:

  • ³

ˆ 2 ¯ 2 ´ = "μ ln $ 2 ¶0 " ln $ 1 μ ln $0 1 ln $ 1 ¶1 ln $0 1 # ln $ 2 #1 × Ã ln $ 1 μ ln $ 1 ¶0 ln $ 1 ¸1 ln $0 1 !

+ (1),

which is analogous to regression result (9) above. 57

slide-59
SLIDE 59

3.2 Hypothesis testing results for the linear regression model

Consider in classical linear regression model. = + ¡ 0 2

  • ¢

i.i.d., i.e., () = 1 p 22

  • exp

μ 1 2 2

  • 2

=

  • ( | )

= ( | ) = 1

  • 22
  • exp

μ 1 22

  • ( )2

  • 58
slide-60
SLIDE 60

With i.i.d. sampling, we have the log likelihood function: ln $ = 2 ln 2 2 ln 2 1 22

  • X

=1

( )2 59

slide-61
SLIDE 61

3.2.1 MLE Solution ln $

  • =

X = 0 at optimum yields: ˆ

  • =

³X

  • ´1 X
  • ˆ

2

  • =

1

  • X

=1

( ˆ )2 60

slide-62
SLIDE 62

1

  • 2 ln $

= (0)

  • 1

2 1

  • 2 ln $

(2

)

=

  • "
  • 2

1 2

  • +

1 2 (2

)2

X ( ˆ )2 # = 1 (22

)2

X ( ˆ )0 = 0 (This holds unless we have that the model is such that there’s a functional relationship between 2

and — this we exclude

by assumption.) 61

slide-63
SLIDE 63

1

  • 2 ln $

(2

)2 =

  • 2(ˆ

2

)2

This yields the information matrix: 0 =

  • 1

2

  • 1

24

  • which is and block diagonal.

(By previous result we may ignore parameter estimation error between b and b 2). 62

slide-64
SLIDE 64

3.2.2 Expressions for the various test statistics Let: = (1 2) and 0: 2 = 0. Then the expression for the Likelihood Ratio test statistic is given by: = 2 ln " $(b ) $(˜ ) #

  • where $(ˆ

) is the unrestricted maximized likelihood function and $(˜ ) is the restricted maximized likelihood function. 63

slide-65
SLIDE 65

We have: ln $(˜ ) =

  • 2 ln ˜

2 1 2˜ 2

  • X

=1

( 1˜ 1)2 =

  • 2 ln
  • P ³

1˜ 1 ´2

  • using fact ˜

2

= 1

  • P

=1( 1˜

1)2. 64

slide-66
SLIDE 66

Similarly, ln $(ˆ ) = 2 ln

  • P ³

1ˆ 1 2ˆ 2 ´2

  • =
  • =

2 ln " $(b ) $(b 0) # =

  • 2 ln

μ 0 [ 1(0

11)1]

  • 0 [ (0)10]

¶ (10) 65

slide-67
SLIDE 67

The expression for the Wald Test statistic is obtained using partitioned inverse theorem on information matrix above and standard asymptotic normality result for MLE:

  • (b

2 ¯ 2) ¡ 0 2(0

2[ 1(0 11)10 1]2)1¢ ·

  • =
  • ³

b 2

2

´0 0

2 ( 1(0 11)10 1) 2

2

  • ¸

×

  • ³

b 2

2

´

  • where

2 is the hypothesized value of 2.

66

slide-68
SLIDE 68

The test statistic for the Rao (LM/Score) is obtained by com- puting the derivative of the log likelihood function at the point where we have that 2 = 0 is imposed. We get: ln $(0) = 2 ln 2 P (1 11 22)2 22

  • =
  • 1
  • ln $(0)

2 ¯ ¯ ¯ ¯

2=0

= 1

  • P ³

1 1˜ 1 ´ 2 2˜ 2

  • 67
slide-69
SLIDE 69

As discussed in earlier sections, the Rao test statistic can now be formed by examining the asymptotic distribution of the score vector. In this linear regression context, the Rao test can be motivated in another intuitive way. Basically, the Rao test considers the restrictions to be valid if the score 0 (i.e., the log likelihood function is maximized) when the restrictions are imposed. Looking at the rhs of the expression above, in the linear regression case, this is equivalent to checking if the residual from the restricted MLE ˜ = (1 1˜ 1) and 2 are orthogonal! 68

slide-70
SLIDE 70

Accordingly, we have the Rao test statistic:

  • =

μ 1

2 2˜ 2

  • ¶0 ¡

22¢ μ 1

2 2˜ 2

= μ 1

2 2˜ 2

  • ¶0

˜ 2

  • ¡

2

¡ 1(0

11)10 1

¢ 2 ¢1 × μ 1

2 2˜ 2

69

slide-71
SLIDE 71

Observe a feature of Rao test: Is it equivalent to regressing 11 on 2 and testing for statistical significance? The answer is: not in general, as shown below. Regressing 11 on 2 gives: 11 = 2 + ˆ

  • =

(0

22)10 21

70

slide-72
SLIDE 72

OLS test of = 0 yields statistic: b 0(0

22)b

  • =
  • 012(0

22)1(0 22)(0 22)10 21

( 2(0

22)10 21 )0( 2(0 22)10 21 )

=

  • 012(0

22)10 21

0 [ 12(0

22)10 2] [ 2(0 22)10 21]

=

  • 012(0

22)10 21

  • +12(0

22)10 21

12(0

22)10 2

2(0

22)10 21

  • 71
slide-73
SLIDE 73

Whereas the Rao statistic as derived earlier is: =

  • 012(0

22)10 21

0121

  • 72
slide-74
SLIDE 74

So only if 12 = 2 (i.e., 1 2) are these equal, not in general. Note that if we regress 1 on 1 and 2, then testing if 2 in this regression is significant is equivalent to the Rao test. That is, if we set the model as: 11 = 11 + 22 + b

  • b

2 = (0

212)10 21

Then the OLS test of 2 = 0 is equivalent to the Rao test (left as exercise for the reader). 73

slide-75
SLIDE 75

3.2.3 Relationship between the various tests While we demonstrated in earlier sections that, asymptotically, the three tests are equivalent, in this special linear regression case we can show that so that the Wald test is the most conservative (most likely to reject the null) and the Rao test is the least conservative (least likely to reject the null) in small/finite samples. Define: ()

  • Expected Sum of Squared Residuals (Unrestricted)

= ( ˆ 11 ˆ 22)0( ˆ 11 ˆ 22) ()

  • Expected Sum of Squared Residuals (Restricted)

= ( ˜ 11)0( ˜ 11) 74

slide-76
SLIDE 76

Using the results in the earlier subsections of Section 3, and from the expressions for Wald, LR and LM test statistics de- rived above it can be shown (left as exercise for the reader) that: = () () () ¸ = ln () () ¸ (Rao) = () () () ¸ 75

slide-77
SLIDE 77
  • 1. Proof that :

Define () () = = ln = ( 1) Now 1 ln for 1 =

  • 2. Proof that Rao:

Define () () 1 . = (Rao) = [1 ] = ln Now ln 1 (since 1 always) = Rao. 76