Hypothesis Testing: Large Sample Asymptotic Theory Part IV James - - PowerPoint PPT Presentation
Hypothesis Testing: Large Sample Asymptotic Theory Part IV James - - PowerPoint PPT Presentation
Hypothesis Testing: Large Sample Asymptotic Theory Part IV James J. Heckman University of Chicago Econ 312 This draft, April 18, 2006 This lecture consists of three sections: Three commonly used tests Composite hypothesis
This lecture consists of three sections:
- Three commonly used tests
- Composite hypothesis
- Application to linear regression model
ln $ =
- X
=1
ln ( | ) , where the are i.i.d. 1
1 Three commonly used tests
In this section we examine 3 commonly used (asymptotically equivalent) tests:
- 1. Wald Test
- 2. Rao, Score or Lagrange Multiplier (LM) Test
- 3. LR Test
We motivate the derivation of the test statistic in each case and show asymptotic equivalence of the three tests. 2
1.1 Wald Test
Consider a simple null hypothesis: 0 : = 0, : 6= 0. Natural test uses the fact1 that:
- (ˆ
0) (0 1
0 )
so that
- (ˆ
0)00
- (ˆ
0) 2() ( =number of parameters), where: 0 = 1
- 2 ln $
¸ = 2 ln ( | ) ¸ , for i.i.d.
1See Lecture III on Asymptotic theory.
3
This is the Wald Test and is similar to the test used in OLS hypothesis testing. We use the uniform convergence of ˆ to 0 to obtain plim ˆ
= 0, so that the test statistic for the Wald
test: =
- (ˆ
0)0ˆ
- (ˆ
0)
d
2() in large samples. Note that the Wald test is based on the unrestricted MLE model. 4
Recall at true parameter vector = 0,
- ln $ ( | )
- ¸
= Z ln $ ( | )
- ¯
¯ ¯ ¯
=0
$ ( | 0) = 0 because Z $ ( | 0) = 1 =
- Z $ ( | )
- = 0,
but $ = ln $
- $
=
- Z ln $ ( | )
- ¯
¯ ¯ ¯
=0
$ ( | 0) = 0 5
Dierentiating again, Z 2 ln $ ( | ) ¯ ¯ ¯ ¯
=0
$ ( | 0) + Z ln $ ( | )
- ¯
¯ ¯ ¯
=0
ln $ ( | ) ¯ ¯ ¯ ¯
=0
$ ( | 0) = Therefore, 0 = " 1
- Ã
ln $
- ¯
¯ ¯ ¯
=0
ln $ ¯ ¯ ¯ ¯
=0
!#
- 6
Cramer-Rao Lower Bound (Scalar Case) Consider an estimator () of : (()) = Z ()$ (; ) Under regularity, (())
- =
Z () ln $ (; )
- =
- μ
() ln $
- ¶
- 7
E( ())
- =
Z () $ d = Z () ln $
- $ d
= Cov μ () ln $
- ¶
, because E μ ln $
- ¶
= 0. 8
From the Cauchy-Schwartz inequality, μ (())
- ¶2
(()) μ ln $
- ¶
| {z }
- For full rank information matrix 0,
(()) μ (())
- ¶2
- 9
If unbiased, (()) = ln (())
- = 1
(()) 1 There is a vector version: (()) 1 Cannot do better than MLE in terms of eciency. 10
1.2 Rao Test (also LM or Score Test)
The second Test — Rao Test (LM) is based on the restricted
- model. Observes that in a large enough sample 0 (true para-
meter value) should be a root of the likelihood equation: 1
- ln $
- ¯
¯ ¯ ¯ 1
- X
=1
ln ( )
- ¯
¯ ¯ ¯ = 0 i.e., it imposes the null onto the model for all sample sizes. In contrast, the Wald Test gets its statistics from estimates of the unrestricted model (i.e., a model where the null is not im- posed on the estimates). The Likelihood Ratio Test compares the likelihood restricted with the likelihood unrestricted. 11
ln $
- ¯
¯ ¯ ¯ˆ
M L
= 0 and E μ ln ( | )
- ¶
= 0, but 1
- ln $
- P
E μ ln ( | )
- ¶
= 0 =
- 1
- ln $
- ¯
¯ ¯ ¯
P
0. 12
Now
By C.L.T.; Lindberg-Levy for i.i.d.
z }| { 1
- ln $
- ¯
¯ ¯ ¯ = 1
- X
=1
ln ( )
- ¯
¯ ¯ ¯ (0 0) implies that the hypothesis could be tested by testing if score ln $
- ¯
¯ ¯ ¯ = 0, at the restricted parameter values.2
2For the distributional results, refer to earlier lectures on Asymptotic
theory.
13
Thus this test uses the statistic:
- =
μ 1
- ln $
- ¶0
1
ˆ
- μ 1
- ln $
- ¶
d
2 () . 2() in large samples can be shown using plim ˆ
= 0.
14
The Rao test is also called the Score Test or the Lagrange Multiplier test. This is because this test can be motivated from the solution to a constrained maximization of the log- likelihood subject to constraint = 0. We get: Lagrangian: ln $ 0( 0) FOC: ln $
- = 0
At null: = ln $
- = 0
Thus one can test on (the Lagrange Multiplier) or on the Score μ ln $
- ¶
. 15
Asymptotic Equivalence of Wald and LM tests To estab- lish the asymptotic relationships between the two tests, we use Taylor’s theorem to write: 1
- X ln
- ¯
¯ ¯ ¯ˆ
- |
{z }
=0 by construction
= 1
- X ln
- ¯
¯ ¯ ¯ + 1
- μX 2 ln
¶
(ˆ
0) where is an intermediate value with k0k kk kˆ k 16
From above, we get the duality relationships between score and parameter vectors: (ˆ 0) = μX 2 ln ¶1
- X ln
- ¯
¯ ¯ ¯
- (ˆ
0) = μ 1
- μX 2 ln
¶
- ¶1
1
- X ln
- ¯
¯ ¯ ¯
- Noting that plim = 0, substituting into the Wald statistic
we get: plim = μ 1
- ln $
- ¶0¯
¯ ¯ ¯ 1
0 01
μ 1
- ln $
- ¶¯
¯ ¯ ¯ = plim Thus, asymptotically the Wald and Rao tests are equivalent. 17
1.3 Likelihood Ratio Test
The third commonly used test is the Likelihood Ratio Test and it uses both the restricted and the unrestricted models. Taylor expanding the Likelihood function around the point ˆ we get: ln $(0) | {z }
showed start with the restricted
= ln $(ˆ ) + ln $
- ¯
¯ ¯ ¯ˆ
- (0 ˆ
) | {z }
=0 by construction
+1 2(0 ˆ )0 μ2 ln $ ¶
(0 ˆ
) where is an intermediate value with k0k kk kˆ k. 18
ln $0
- ln $
³ ˆ ´ ln $ ³ ˆ
- ´
ln $ ln $ = ln $0 + ln $0 ( 0)
- |
{z }
=0
+ 0 2 2$0 ( 0) 2 (ln $ ln $0) = ( 0)0 2$0 0 ( 0) = 2 (ln $ ln $0) | {z }
- 19
=
- ( 0)0
μ 22$0 () 0 ¶ ( 0)
- where
μ 22$0 () 0 ¶
p
and ( 0)
- d
¡ 0 1 ¢ . 20
2[ln $(ˆ )ln $(0)] =
- (ˆ
0)0 μ1
- 2 ln $
¶
- (ˆ
0) 2 ln " $(ˆ ) $(0) #
- (ˆ
0)00
- (ˆ
0). We may also write 2 ln " $(ˆ ) $(0) #
- 00 2 () ;
¡ 0 1 ¢ , and
- ³
ˆ ´
- .
21
Based on the above derivation, we get the statistic for the likelihood ratio test: = 2 ln à ˆ
- ˆ
- !
- where ˆ
is the restricted maximum likelihood function and ˆ is the unrestricted maximum likelihood function, and as shown above:
- 2()
Note that ˆ ˆ (as an unrestricted maximized function would be greater than or equal to the restricted maximized function), so that always LR 0. 22
Asymptotic Equivalence of LR and Wald tests From the derivation above, it is clear that asymptotically, the LR test statistic converges to the Wald test statistic, i.e., plim =
- (ˆ
0) | {z }
d
- 00
- (ˆ
0) | {z }
d
- = plim
where (0 0). We saw earlier that the LM and Wald tests were asymptotically
- equivalent. Thus along with the above asymptotic equivalence
- f LR and Wald we get that asymptotically all three commonly
used tests are equivalent, i.e., asymptotically: 23
2 Composite Hypothesis
Consider a parameter vector = (1 2) (where 1 and 2 are possibly vectors). In many situations, we may want to consider estimating only a subset of the parameters 2. This could be because we omit 1 or we hypothesize that 1 = 0. In this context, 2 questions are relevant:
- When do we get an unbiased estimate for 2 if we do the
MLE estimation omitting 1?
- How do the results obtained in the previous section apply
to test the composite hypothesis: 0 : 1 = 0 and 2 unrestricted : 1 and 2 both unrestricted We explore these two questions in the next few subsections. 24
2.1 Definitions
We use the following defined expressions in the rest of the section.
- Define true parameter vector 0 (¯
1 ¯ 2);
- Define estimated parameter vector (unconstrained) as
ˆ (ˆ 1 ˆ 2);
- Define estimated parameter vector (constrained, i.e. un-
der the null hypothesis setting 1 = 0) as ˜ (0 ˜ 2);
- In Taylor expansions hereafter, we use 0 and inter-
changeably (neglecting (1) terms); 25
- Define the various information matrix related terms as
follows: 1
- μ2$
- ¶
- 0 =
μ 11 12 21 22 ¶ 26
2.2 When is ˆ 2 unbiased if 1 is omitted?
In this subsection, we show that the condition for ˆ 2 to be an unbiased estimator of ¯ 2 when the model is misspecified and estimated omitting 1 is that the score vectors ln $ 1 and ln $ 2 be orthogonal to each other. 27
First, Taylor expanding the score vectors 1 and 2, we get: 1
- μ$
1 ¶
ˆ
- =
1
- μ$
1 ¶ (1) 11
- (ˆ
1 ¯ 1) 12
- (ˆ
2 ¯ 2) + (1) 1
- μ$
2 ¶
ˆ
- =
1
- μ$
2 ¶ (2) 22
- (ˆ
2 ¯ 2) 21
- (ˆ
1 ¯ 1) + (1) (1) 0 = 1
- μ$
1 ¶ 11
- (ˆ
1¯ 1)12
- (ˆ
2¯ 2)+(1) (2) 0 = 1
- μ$
2 ¶ 22
- (ˆ
2¯ 2)21
- (ˆ
1¯ 1)+(1) 28
Collecting terms we get: μ (ˆ 1 ¯ 1)
- (ˆ
2 ¯ 2) ¶ = []1
- 1
- μ$
1 ¶ 1
- μ$
2 ¶
- Next consider ˜
2 (MLE of 2 given ¯ 1 = 0). Expanding around root of likelihood ¯ 1 = 0, we get: 1
- μ$
2 ¶
˜
- =
1
- μ$
2 ¶ 22
- (˜
2 ¯ 2) + (1) (3) Now the first term on the right hand side is in common with the term on of (2) above. Therefore we have that: 22
- (e
2 2) = 22
- (b
2 2)+21
- (b
1 1)+(1) 29
- r:
- (e
2 2) =
- (b
2 2) (4) +1
2221
- (b
1 1) + (1) Thus, we get that for
- (˜
2 ¯ 2) =
- (ˆ
2 ¯ 2) we need: 21 = ln $ 2 μ ln $ 1 ¶0¸ = 0 i.e., for ˜ (the constrained estimator) to have the same proper- ties as ˆ (the unconstrained estimator) we need that the score vectors ln $ 1 and ln $ 2 be uncorrelated. 30
Similarity with OLS result: Note that this result is analo- gous to a result for the OLS framework. In the OLS model we have: = 11 + 22 + If we run the regression omitting 1, we have: = 22 + { + 11} Then we get, under standard OLS assumptions: plimˆ 2 = 2 + plim μ0
22
- ¶1
plim μ0
21
- ¶
1 so that we get consistent estimate ˆ 2 if the 2 and 1 are
- rthogonal to each other.3
3Note plim2 01
- = [0
21] by appropriate LLN.
31
Note that the score vectors in MLE are analogous to the data vectors in OLS; we shall revisit this analogy again in Section 3 below. 32
2.3 Hypothesis testing results for the com- posite hypothesis
In this subsection, we show the following results:
- Asymptotic equivalence of LR test and Wald test;
- Asymptotic equivalence of Rao test (Score/LM test) and
the LR test. 33
2.3.1 Equivalence between Likelihood Ratio and Wald Tests From the definition of the LR test statistic in Section 1.3, we have the analogous definition for the composite hypothesis case as: LR = 2 ln $(ˆ 1 ˆ 2) $(1 = 0 ˜ 2) = 2 ln $(1 = 0 ˜ 2) $(ˆ 1 ˆ 2)
- r
LR = 2[ln $(1 = 0 ˜ 2) ln $(¯ 1 ¯ 2)] +2[ln $(ˆ 1 ˆ 2) ln $(¯ 1 ¯ 2)] 34
Then following same steps as in derivation in Section 1.3, we get: LR =
- (ˆ
1 ¯ 1 ˆ 2 ¯ 2)00
- (ˆ
1 ¯ 1 ˆ 2 ¯ 2)
- (0 ˜
2 ¯ 2)00
- (0 ˜
2 ¯ 2) + (1)
- r
LR =
- (ˆ
1 ¯ 1 ˆ 2 ¯ 2)00
- (ˆ
1 ¯ 1 ˆ 2 ¯ 2)
- (˜
2 ¯ 2)022
- (˜
2 ¯ 2) + (1) 35
Substituting for (˜ 2 ¯ 2) from equation (4) in Section 2.2, we get: LR =
- (ˆ
1 ¯ 1 ˆ 2 ¯ 2)0¯
1¯ 2
- (ˆ
1 ¯ 1 ˆ 2 ¯ 2)
- h
(ˆ 2 ¯ 2) + (22)121
- (ˆ
1 ¯ 1) i0 22 h (ˆ 2 ¯ 2) + (22)121
- (ˆ
1 ¯ 1) i + (1) Call ˆ 1 ¯ 1 and ˆ 2 ¯ 2 . 36
Then we have: LR =
- 011
- +
- 021
- +
- 012
- +
- 022
- 012(22)122(22)121
- 012(22)122
- 022
- 0221
2221
- + (1)
=
- 0[11 12(22)121]
- + (1)
=
- 0(11)1
+ (1) (5) where 11 is the upper left diagonal block of: 1
0 =
μ 11 12 21 22 ¶1 37
(See partitioned inverse result established in section 3.) In the composite hypothesis case, the Wald test exploits the fact that:
- (ˆ
1 ¯ 1) (0 11) so that the Wald test statistic here is: =
- (ˆ
1 ¯ 1)0(11)1 (ˆ 1 ¯ 1) + (1) (6) From (5) and (6) above, we get directly the asymptotic equiv- alence of the Wald and LR tests, i.e., , in large samples. 38
2.3.2 Equivalence between Rao (Score/LM) test and the other tests To derive the statistic for the Rao (Score/LM) test, first com- pute the derivative with respect to 1 when 1 constraint is imposed): 1
- ln $
1 ¯ ¯ ¯ ¯˜
- =
1
- ln $
1 ¯ ¯ ¯ ¯ 12
- ³
˜ 2 2 ´ +(1) (7) 39
From Eqn (3) in Section 2.2 we have:
- ³
˜ 2 2 ´ = 1
22
1
- ln $
2 ¯ ¯ ¯ ¯ + (1) Substituting above result in (7), we get: 1
- ln $
1 ¯ ¯ ¯ ¯˜
- =
1
- ln $
1 ¯ ¯ ¯ ¯ 121
22
1
- ln $
2 ¯ ¯ ¯ ¯ + (1)
- 121
22 + (1)
defining 1
- μ$
1 ¶ and 1
- μ$
2 ¶ . 40
Then we obtain variance of the key score parameter as: μ 1
- ln $
1 ¯ ¯ ¯ ¯˜
- ¶
- [( 121
22)( 121 22)0]
= 11 + 121
22221 2221
2121
2221
= (11)1 41
·
We have the Rao test statistic: LM = μ 1
- ln $
1 ¯ ¯ ¯ ¯˜
- ¶0
11 μ 1
- ln $
1 ¯ ¯ ¯ ¯˜
- ¶
= £ 121
22
¤0 11 £ 121
22
¤
- From Section 2.2 and results for partitioned inverses (refer Sec-
tion 2.3.1), we have:
- (ˆ
1 ¯ 1) = 11 1
- ln $
1 ¯ ¯ ¯ ¯ + 12 " 1
- $
2 ¯ ¯ ¯ ¯ # + (1) = 11 " 1
- ln $
1 ¯ ¯ ¯ ¯ 121
22
1
- ln $
2 ¯ ¯ ¯ ¯ # + (1) = 11 £ 121
22
¤ + (1) 42
Substituting this into expressions for the Wald statistic, we get:
- =
- (ˆ
1 ¯ 1)011 (ˆ 1 ¯ 1) = [ 121
22]011[ 121 22] + (1)
- LM (Rao/Score)
Thus, along with the result in Section 2.3.1, we have even for the composite hypothesis case, asymptotically: (Rao) 43
3 Application to linear regression model
In this section, we look at some analogies between standard MLE results derived in the earlier sections and OLS regression
- results. (Recall that we already look at some analogies between
OLS and MLE in Section 2.2 above.) First, we show analogy between OLS and MLE for estimation
- f parameter sub-vectors. Then we derive expressions for the
three common test statistics and establish certain results for these. 44
3.1 Estimation of parameter subsets
Key result in regression is Theorem of the Partitioned Inverse: = 11 12 21 22 ¸ assume nonsingular, Assume 11 is nonsingular so 22 211
11 12 = .
Then 1 = 1
11 ( + 1212111)
1
11 121
1211
11
1 ¸ Proof: Multiply it out. 45
In OLS case, it produces a useful result (Frisch-Waugh-Goldberger): 0 =
11
- 12
- 21
- 22
¸ In the OLS case we partition the matrix of independent vari- ables as = ¡ 1 2 ¢ and also 0 = μ 0
11 12
10
2 22
¶
- 46
We define: 1
- 1(0
11)10 1
- 22 0
21(0 11)10 12
=
212
Now we have result for OLS regression: ˆ
- =
μ ˆ 1 ˆ 2 ¶ = (0)1(0 ) =
11 12
10
2 22
¸1 0
1 2
¸ (8) 47
Then using the results from theorem of partitioned inverse above and simplifying, we can write: ˆ 1 = (0
11)1(0 1 )
+(0
11)10 1210 21(0 11)10 1
(0
11)10 1210 2
= (0
11)10 1 (0 11)1(0 12)10 21
ˆ 2 = 10
21(0 11)10 1 + 10 2
= 10
2( 1(0 11)11)
= 10
21
(9) This leads to a Double Residual Regression Result. ˆ 2: Regress
- n 1, 2 on 1; form residuals. Regress one set of residuals
- n another.
48
Result: ˆ 2 is the regression of “cleaned out ” on “cleaned
- ut 2”
This result follows directly from the results (8) and (9) above. Define: “Cleaned out ” = [ 1(0
11)10 1]
= 1 (Residual from the regression of on 1) “Cleaned out 2”
2 = [ 1(0 11)10 1] 2 = 12
(Residual from the regression of 2 on 1) 49
Then from (9) above we have: ˆ 2 = 10
21
= [0
212]1(0 21 )
= [0
2
- 2]1(
2 )
Note that we really don’t have to clean out , just since 1is
- idempotent. Further, if 1 and 2 are orthogonal (uncorre-
lated), then
2 = 2, and hence unbiased/consistent estimate
- f ˆ
2 can be obtained by directly regressing 2 on . (Recall that we derived the same result for MLE in Section 2.2.) 50
Also observe (derived directly from ˆ = 1ˆ 1 + 2ˆ 2) that: ˆ 0 ˆ = 01(0
11)10 11
| {z }
Part due to 1
+ 01210
21
| {z }
Part due to orthogonalized 2
Thus, unless regressors are orthogonal, there are no unique contributions. Proof.
- 1. Observe that:
11 = 0 22 = 0 1 = 2 = + = ³ 1ˆ 1 + 2ˆ 2 ´ + 51
- 2. Observe that:
12 = ( 1) 2 = linear combination of 2 =
- · 12 = 0.
3.
21
= 2111 + 0
212ˆ
2 + 0
22
= 2111 + 0
212ˆ
2 + 0 (12) , where 2111 0 and 12 0. Thus, ˆ 2 = (0
212)1 (0 21 ) .
52
- 4. Observe that ˆ
2 is the result of the regression 1 = 12ˆ 2 + error2, but = 11 + 22 + =
- 1 = 111 + 122 + 1
=
- 1 = 122 + 111 0
=
- error2 = .
(from Davidson & MacKinon) 53
Analogously, for MLE in neighborhood of optimum we have that:
- ³
b ´
- ln $
- μ ln $
- ¶0
- 1
1
- ln $
- + (1)
54
= 1
- ln $
1 μ ln $ 1 ¶0 ln $ 1 μ ln $ 2 ¶0 ln $ 2 μ ln $ 1 ¶0 ln $ 2 μ ln $ 2 ¶0
- 1
× 1
- ln $
1 ln $ 2
- + (1),
where is a 1 × vector of ones, i.e. = (1 1 1 1 1). 55
Comparing above equation to the result for the standard OLS regression (see eqn (8) above), we note that the MLE result is analogous to regressing on the score vector (a vector of
- nes); i.e., we have that:
[0
1 2]
- μ ln $
1 ¶ μ ln $ 2 ¶0¸ and
- 56
Further, we also get:
- ³
ˆ 2 ¯ 2 ´ = "μ ln $ 2 ¶0 " ln $ 1 μ ln $0 1 ln $ 1 ¶1 ln $0 1 # ln $ 2 #1 × Ã ln $ 1 μ ln $ 1 ¶0 ln $ 1 ¸1 ln $0 1 !
+ (1),
which is analogous to regression result (9) above. 57
3.2 Hypothesis testing results for the linear regression model
Consider in classical linear regression model. = + ¡ 0 2
- ¢
i.i.d., i.e., () = 1 p 22
- exp
μ 1 2 2
- 2
- ¶
=
- ( | )
= ( | ) = 1
- 22
- exp
μ 1 22
- ( )2
¶
- 58
With i.i.d. sampling, we have the log likelihood function: ln $ = 2 ln 2 2 ln 2 1 22
- X
=1
( )2 59
3.2.1 MLE Solution ln $
- =
X = 0 at optimum yields: ˆ
- =
³X
- ´1 X
- ˆ
2
- =
1
- X
=1
( ˆ )2 60
1
- 2 ln $
= (0)
- 1
2 1
- 2 ln $
(2
)
=
- "
- 2
1 2
- +
1 2 (2
)2
X ( ˆ )2 # = 1 (22
)2
X ( ˆ )0 = 0 (This holds unless we have that the model is such that there’s a functional relationship between 2
and — this we exclude
by assumption.) 61
1
- 2 ln $
(2
)2 =
- 2(ˆ
2
)2
This yields the information matrix: 0 =
- 1
2
- 1
24
- which is and block diagonal.
(By previous result we may ignore parameter estimation error between b and b 2). 62
3.2.2 Expressions for the various test statistics Let: = (1 2) and 0: 2 = 0. Then the expression for the Likelihood Ratio test statistic is given by: = 2 ln " $(b ) $(˜ ) #
- where $(ˆ
) is the unrestricted maximized likelihood function and $(˜ ) is the restricted maximized likelihood function. 63
We have: ln $(˜ ) =
- 2 ln ˜
2 1 2˜ 2
- X
=1
( 1˜ 1)2 =
- 2 ln
- P ³
1˜ 1 ´2
- using fact ˜
2
= 1
- P
=1( 1˜
1)2. 64
Similarly, ln $(ˆ ) = 2 ln
- P ³
1ˆ 1 2ˆ 2 ´2
- =
- =
2 ln " $(b ) $(b 0) # =
- 2 ln
μ 0 [ 1(0
11)1]
- 0 [ (0)10]
¶ (10) 65
The expression for the Wald Test statistic is obtained using partitioned inverse theorem on information matrix above and standard asymptotic normality result for MLE:
- (b
2 ¯ 2) ¡ 0 2(0
2[ 1(0 11)10 1]2)1¢ ·
- =
- ³
b 2
2
´0 0
2 ( 1(0 11)10 1) 2
2
- ¸
×
- ³
b 2
2
´
- where
2 is the hypothesized value of 2.
66
The test statistic for the Rao (LM/Score) is obtained by com- puting the derivative of the log likelihood function at the point where we have that 2 = 0 is imposed. We get: ln $(0) = 2 ln 2 P (1 11 22)2 22
- =
- 1
- ln $(0)
2 ¯ ¯ ¯ ¯
2=0
= 1
- P ³
1 1˜ 1 ´ 2 2˜ 2
- 67
As discussed in earlier sections, the Rao test statistic can now be formed by examining the asymptotic distribution of the score vector. In this linear regression context, the Rao test can be motivated in another intuitive way. Basically, the Rao test considers the restrictions to be valid if the score 0 (i.e., the log likelihood function is maximized) when the restrictions are imposed. Looking at the rhs of the expression above, in the linear regression case, this is equivalent to checking if the residual from the restricted MLE ˜ = (1 1˜ 1) and 2 are orthogonal! 68
Accordingly, we have the Rao test statistic:
- =
μ 1
- P˜
2 2˜ 2
- ¶0 ¡
22¢ μ 1
- P˜
2 2˜ 2
- ¶
= μ 1
- P˜
2 2˜ 2
- ¶0
˜ 2
- ¡
2
¡ 1(0
11)10 1
¢ 2 ¢1 × μ 1
- P˜
2 2˜ 2
- ¶
69
Observe a feature of Rao test: Is it equivalent to regressing 11 on 2 and testing for statistical significance? The answer is: not in general, as shown below. Regressing 11 on 2 gives: 11 = 2 + ˆ
- =
(0
22)10 21
70
OLS test of = 0 yields statistic: b 0(0
22)b
- =
- 012(0
22)1(0 22)(0 22)10 21
( 2(0
22)10 21 )0( 2(0 22)10 21 )
=
- 012(0
22)10 21
0 [ 12(0
22)10 2] [ 2(0 22)10 21]
=
- 012(0
22)10 21
- +12(0
22)10 21
12(0
22)10 2
2(0
22)10 21
- 71
Whereas the Rao statistic as derived earlier is: =
- 012(0
22)10 21
0121
- 72
So only if 12 = 2 (i.e., 1 2) are these equal, not in general. Note that if we regress 1 on 1 and 2, then testing if 2 in this regression is significant is equivalent to the Rao test. That is, if we set the model as: 11 = 11 + 22 + b
- b
2 = (0
212)10 21
Then the OLS test of 2 = 0 is equivalent to the Rao test (left as exercise for the reader). 73
3.2.3 Relationship between the various tests While we demonstrated in earlier sections that, asymptotically, the three tests are equivalent, in this special linear regression case we can show that so that the Wald test is the most conservative (most likely to reject the null) and the Rao test is the least conservative (least likely to reject the null) in small/finite samples. Define: ()
- Expected Sum of Squared Residuals (Unrestricted)
= ( ˆ 11 ˆ 22)0( ˆ 11 ˆ 22) ()
- Expected Sum of Squared Residuals (Restricted)
= ( ˜ 11)0( ˜ 11) 74
Using the results in the earlier subsections of Section 3, and from the expressions for Wald, LR and LM test statistics de- rived above it can be shown (left as exercise for the reader) that: = () () () ¸ = ln () () ¸ (Rao) = () () () ¸ 75
- 1. Proof that :
Define () () = = ln = ( 1) Now 1 ln for 1 =
- 2. Proof that Rao:
Define () () 1 . = (Rao) = [1 ] = ln Now ln 1 (since 1 always) = Rao. 76