Hypothesis Testing Appendix James J. Heckman University of Chicago - - PowerPoint PPT Presentation
Hypothesis Testing Appendix James J. Heckman University of Chicago - - PowerPoint PPT Presentation
Hypothesis Testing Appendix James J. Heckman University of Chicago Econ 312 Spring 2019 A Further results on Bayesian vs. Classical Testing (The phenomenon shows up even when we dont have .) Point mass placed at 0 ; 0
A Further results on Bayesian vs. Classical Testing
(The phenomenon shows up even when we don’t have .) Point mass placed at 0; 0 For rest of parameter space, we have 1 0 = 1; 1 is density of ( | ) is model. 1
Marginal density of = () = Z ( | ) () = ( | 0) 0 + (1 0) 1 () 1 () = Z ( | ) 1 () 2
Posterior probability that = 0 is (0 | ) = ( | 0) 0 () = ( | 0) 0 ( | 0) 0 + (1 0) 1 () = 1 1 + (10)
1() (|0)
3
The posterior probability that for 0 Posterior Odds Ratio = 1 μ1 0 ¶ 1 () ( | 0) = μ 1 0 ¶ μ ( | 0) 1 () ¶ Bayes factor: ( | 0) 1 () 4
Example: Let ( 2); 2 known. L ¡ ¯ 2 ¢ = 1 p 22 exp
- 22
¡ ¯
- ¢2
¸ Let 1 = (1 2) be prior; then 1 = ( 2 + 2) (0 | ) =
- 1 + 1 0
· h 2 ³ 2 + 2
- ´i 1
2 exp
- ( ¯
)
2
2
- 2+ 2
- ¸
¡ 2 2
- ¢ 1
2 exp
- ( ¯
0)
2
2
- 2
- ¸
- 1
center at = 0. (This is a judgment about priors.) 5
(0 | ) =
- 1 + (1 0)
exp h
- 2
2[1+2(2)]
i1 (1 + 22)12
- 1
As we get “Lindley Paradox” = ¯ ¯ ¯ ¯ ¯ ³
- ´
usual normal statistic (2 tail test). 6
Look at the following table of values: (0 | ), = 0, 0 = 12, = . (sample size) score value 1 5 10 20 50 100 1000 1.645 .1 .42 .44 .49 .56 .65 .72 .89 1.960 .05 .35 .33 .37 .42 .52 .60 .80 2.576 .01 .21 .13 .14 .16 .22 .27 .53 3.291 .001 .086 .026 .024 .026 .034 .045 .124 Observe: Not monotone in : 7
Now, consider the minimum of (0 | ) over all ; and 1 = (marginal on alternative) Thm (Dickey and Savage): (0 | ) =
- 1 + (1 0)
() ( | 0) ¸1 where () = sup
6=0
( | ) usually obtained by substituting in maximum likelihood esti- mate. 8
The bound on the Bayes factor is = ( | 0) 1 () ( | 0) () . The proof of this is really trivial. 9
Now, look at the example (MLE): () = ¡ ¯ | ¯
- ¢
- (0 | )
- "
1 + 1 0 (22) 1
2
(22) 1
2 exp
¡ ¯ ¢2 (22) #1 =
- 1 + 1 0
exp μ1 22 ¶¸1 For 0 = 1
2 we get that for 168 (Berger and Selke; JASA, 1987).
10
(0 | ) [ (value)] × [125] .
- values
Lower bound on Bound on Bayes factor (0 | ) 1.645 .1 .205 1/(3.87) 1.960 .05 .127 1/(6.83) 2.576 .01 .035 1/(27.60) 3.291 .001 .0044 1/(224.83) 11
- Thm. Berger and Selke.
= ½ 1 : 1 is symmetric about 0 and non-increasing in | 0| ¾
- values
Lower bound on Bound on Bayes factor (0 | ) 1.645 .1 .390 1/(1.56) 1.960 .05 .290 1/(2.45) 2.576 .01 .109 1/(8.17) 3.291 .001 .018 1/(54.55) 12
B Bayesian Credibility Sets
- Def. A 100 (1 ) % credible set for is a subset of such
that 1 ( | ) = Z
- d (|) ()
= Z
- ( | ) d.
13
What is “best” ratio? A 100 (1 ) % highest posterior density credible set is a subset C of such that = { | ( | ) ()} where () is the largest constant such that ( | ) 1. 14
- Example. Let ( | ) = ( () 2) be a posterior 100 (1 ) %
credible set is given by ³ () + 1 ³ 2 ´ () 1 ³ 2 ´
- ´
, identical to the classical case for normal posteriors. Consider, however, what would happen in the Dickey Leamer case. A possibility of disconnected credibility sets (multimodel). This can happen in the classical case. 15
- Problems. The forms of credible sets are not subject to simple
ends.
- Example. Suppose the posterior is
( | ) =
- 1 + 2
ln ( | ) = ln + ln ¡ 1 + 2¢ ln ( | )
- =
1 2 1 + 2 16
2 ln ( | ) 2 =
- 2
1 + 2 + (2) (2) ¡ 1 + 2¢2 = 2 1 + 2 22 1 + 2 1 ¸ = 2 1 + 2 2 1 1 + 2 ¸ 0, where 1. This is increasing in . credible set is ( () ). 17
Use a transformation:
- =
exp () () =
- ¡
1 + (log )2¢1 decreasing on (1 exp ()). we have a credible set given by (1 d()), and in terms of the original coordinate system, by (0 ln d ()) (upper tail in original parametrization, lower tail in new para- metrization). 18
C Model Selection in Bayesian Way
The common classical approach is to use a pretest procedure. We seek ( | ), = +, (0 1). Predictive density under the hypothesis: ( | ) = Z Z ( | ) ( ) , ( | ) = ( | ) () Prior () : ¡ ; ()1¢
- ¡
1 + ()1 0¢ where we assume that is known. 19
( ) = 1 + ()1 0 () = (2) 1
2 | ( )| 1 2 exp
μ 1 2 ¶
- =
( )0 1 ( ) 20
Let = 1. 1 () = £ ()1 0 + 1 ¤1 =
- h
(0 + )1 0i | ()|1 = | + 0|1
- =
ˆ = (0)1 0 21
- Exercise. Prove that
- =
³ ˆ
- ´0 ³
ˆ
- ´
+ ³ ˆ ´ ¡ ()1 + 1¢1 ³ ˆ ´ = ( )0 ( )
- ³
ˆ ´ ( + )1 ³ ˆ ´ . For the case when 1 is unknown (gamma-normal case), ( )
- ¡
| 1 ()1¢
- ¡
| 2
1 1
¢ . 22
Then, as shown, we have predictive density: ( ) = Z · · · Z ( | ) ( ) = (1 ) ¯ ¯ ¯ ¯
- 2
1
¯ ¯ ¯ ¯
1 2 μ
1 + 2 ¶ 1+
2
- =
( + 0)1 0 || = || | + 0|1 (1 ) = (1)12 ¡1
2 + 2 1
¢ ! 2 ¡1
2 1
¢ ! 23
The Bayes factor for relative to is (1 ) (1 ) |
|
1 2
¯ ¯
- ¯
¯
1 2
|
+ 0 | 1
2
¯ ¯
+ 0
- ¯
¯ 1
2
·
1
- 1
³ 1 +
2
1
´ (1+ )
2
³ +
2
- ´
(1+ ) 2
. 24
D Diuse Prior Approach
Let |
|
1 2, = , get small and set 1 = 0 (in which case we
don’t get ). What types of diuse priors? (Many exist and they do not all lead to the same inference – a problem.)
- =
- r
- =
()
- 2
- r
- =
()
- 2 0,
and set 0. This is like saying we don’t have many a priori
- bservations!
25
( | ) ( | )
- 2
- 2
¯ ¯0
- ¯
¯
1 2
|0
|
1 2
μESS ESS ¶
2
, where ESS is the regression sum of squares. 26
This strongly favors a small parameter model, = ¯ ¯0
- ¯
¯
1 2
|0
|
1 2
μESS ESS ¶
2
- r
= μESS ESS ¶
2
. There is no unique diuse prior. 27
E Dominated Priors and
As , , and 0 grow, ( | ) ( | ) = ¯ ¯0
- ¯
¯
1 2
|0
|
1 2
μESS ESS ¶
2
. , to derive , assume,:
- X
28
You acquire the term from the determinant ¯ ¯ ¯
+ 0
- ¯
¯ ¯
1
2
¯ ¯ ¯
+
- ¯
¯ ¯
1
2 ×
μ1 1 ¶ ¡ 1 + 2
1
¢ 1+
2
(1 + 2
1)
1+ 2
= μ2
1
2
1
¶
2 ¡
1 + 2
1
¢ 1+
2
(1 + 2
1)
1+ 2
29
=
- 1
1
- 2
2
(1 + )
1+ 2
(1 + )
1+ 2
= ³
- 2
´ (11 + )1 (11 + )1 ¡11
- +
¢
2
¡11
- +
¢
2
- =
- 2
μESS ESS ¶
2
. 30
Now observe . For nested models, Bayes factor 1
- ³
- 1
´ , where = is the number of restrictions. These methods easily handle non-nested setups (this is lacking in classical approaches), among other merits. Measures of location: ( | ) = X
- ( | ) ( | )
This allows for other model parameters to generate information about parameters. It also gives a measure of variability. 31
Issues:
- 1. Bayesian confidence intervals.
- 2. Bayesian values?
- 3. Multiple hypothesis testing.
Classical model selection, etc. 32