Hypothesis Testing Appendix James J. Heckman University of Chicago - - PowerPoint PPT Presentation

hypothesis testing
SMART_READER_LITE
LIVE PREVIEW

Hypothesis Testing Appendix James J. Heckman University of Chicago - - PowerPoint PPT Presentation

Hypothesis Testing Appendix James J. Heckman University of Chicago Econ 312 Spring 2019 A Further results on Bayesian vs. Classical Testing (The phenomenon shows up even when we dont have .) Point mass placed at 0 ; 0


slide-1
SLIDE 1

Hypothesis Testing

Appendix

James J. Heckman University of Chicago Econ 312 Spring 2019

slide-2
SLIDE 2

A Further results on Bayesian vs. Classical Testing

(The phenomenon shows up even when we don’t have .) Point mass placed at 0; 0 For rest of parameter space, we have 1 0 = 1; 1 is density of ( | ) is model. 1

slide-3
SLIDE 3

Marginal density of = () = Z ( | ) () = ( | 0) 0 + (1 0) 1 () 1 () = Z ( | ) 1 () 2

slide-4
SLIDE 4

Posterior probability that = 0 is (0 | ) = ( | 0) 0 () = ( | 0) 0 ( | 0) 0 + (1 0) 1 () = 1 1 + (10)

1() (|0)

3

slide-5
SLIDE 5

The posterior probability that for 0 Posterior Odds Ratio = 1 μ1 0 ¶ 1 () ( | 0) = μ 1 0 ¶ μ ( | 0) 1 () ¶ Bayes factor: ( | 0) 1 () 4

slide-6
SLIDE 6

Example: Let ( 2); 2 known. L ¡ ¯ 2 ¢ = 1 p 22 exp

  • 22

¡ ¯

  • ¢2

¸ Let 1 = (1 2) be prior; then 1 = ( 2 + 2) (0 | ) =

  • 1 + 1 0

· h 2 ³ 2 + 2

  • ´i 1

2 exp

  • ( ¯

)

2

2

  • 2+ 2
  • ¸

¡ 2 2

  • ¢ 1

2 exp

  • ( ¯

0)

2

2

  • 2
  • ¸
  • 1

center at = 0. (This is a judgment about priors.) 5

slide-7
SLIDE 7

(0 | ) =

  • 1 + (1 0)

exp h

  • 2

2[1+2(2)]

i1 (1 + 22)12

  • 1

As we get “Lindley Paradox” = ¯ ¯ ¯ ¯ ¯ ³

  • ´

usual normal statistic (2 tail test). 6

slide-8
SLIDE 8

Look at the following table of values: (0 | ), = 0, 0 = 12, = . (sample size) score value 1 5 10 20 50 100 1000 1.645 .1 .42 .44 .49 .56 .65 .72 .89 1.960 .05 .35 .33 .37 .42 .52 .60 .80 2.576 .01 .21 .13 .14 .16 .22 .27 .53 3.291 .001 .086 .026 .024 .026 .034 .045 .124 Observe: Not monotone in : 7

slide-9
SLIDE 9

Now, consider the minimum of (0 | ) over all ; and 1 = (marginal on alternative) Thm (Dickey and Savage): (0 | ) =

  • 1 + (1 0)

() ( | 0) ¸1 where () = sup

6=0

( | ) usually obtained by substituting in maximum likelihood esti- mate. 8

slide-10
SLIDE 10

The bound on the Bayes factor is = ( | 0) 1 () ( | 0) () . The proof of this is really trivial. 9

slide-11
SLIDE 11

Now, look at the example (MLE): () = ¡ ¯ | ¯

  • ¢
  • (0 | )
  • "

1 + 1 0 (22) 1

2

(22) 1

2 exp

¡ ¯ ¢2 (22) #1 =

  • 1 + 1 0

exp μ1 22 ¶¸1 For 0 = 1

2 we get that for 168 (Berger and Selke; JASA, 1987).

10

slide-12
SLIDE 12

(0 | ) [ (value)] × [125] .

  • values

Lower bound on Bound on Bayes factor (0 | ) 1.645 .1 .205 1/(3.87) 1.960 .05 .127 1/(6.83) 2.576 .01 .035 1/(27.60) 3.291 .001 .0044 1/(224.83) 11

slide-13
SLIDE 13
  • Thm. Berger and Selke.

= ½ 1 : 1 is symmetric about 0 and non-increasing in | 0| ¾

  • values

Lower bound on Bound on Bayes factor (0 | ) 1.645 .1 .390 1/(1.56) 1.960 .05 .290 1/(2.45) 2.576 .01 .109 1/(8.17) 3.291 .001 .018 1/(54.55) 12

slide-14
SLIDE 14

B Bayesian Credibility Sets

  • Def. A 100 (1 ) % credible set for is a subset of such

that 1 ( | ) = Z

  • d (|) ()

= Z

  • ( | ) d.

13

slide-15
SLIDE 15

What is “best” ratio? A 100 (1 ) % highest posterior density credible set is a subset C of such that = { | ( | ) ()} where () is the largest constant such that ( | ) 1. 14

slide-16
SLIDE 16
  • Example. Let ( | ) = ( () 2) be a posterior 100 (1 ) %

credible set is given by ³ () + 1 ³ 2 ´ () 1 ³ 2 ´

  • ´

, identical to the classical case for normal posteriors. Consider, however, what would happen in the Dickey Leamer case. A possibility of disconnected credibility sets (multimodel). This can happen in the classical case. 15

slide-17
SLIDE 17
  • Problems. The forms of credible sets are not subject to simple

ends.

  • Example. Suppose the posterior is

( | ) =

  • 1 + 2

ln ( | ) = ln + ln ¡ 1 + 2¢ ln ( | )

  • =

1 2 1 + 2 16

slide-18
SLIDE 18

2 ln ( | ) 2 =

  • 2

1 + 2 + (2) (2) ¡ 1 + 2¢2 = 2 1 + 2 22 1 + 2 1 ¸ = 2 1 + 2 2 1 1 + 2 ¸ 0, where 1. This is increasing in . credible set is ( () ). 17

slide-19
SLIDE 19

Use a transformation:

  • =

exp () () =

  • ¡

1 + (log )2¢1 decreasing on (1 exp ()). we have a credible set given by (1 d()), and in terms of the original coordinate system, by (0 ln d ()) (upper tail in original parametrization, lower tail in new para- metrization). 18

slide-20
SLIDE 20

C Model Selection in Bayesian Way

The common classical approach is to use a pretest procedure. We seek ( | ), = +, (0 1). Predictive density under the hypothesis: ( | ) = Z Z ( | ) ( ) , ( | ) = ( | ) () Prior () : ¡ ; ()1¢

  • ¡

1 + ()1 0¢ where we assume that is known. 19

slide-21
SLIDE 21

( ) = 1 + ()1 0 () = (2) 1

2 | ( )| 1 2 exp

μ 1 2 ¶

  • =

( )0 1 ( ) 20

slide-22
SLIDE 22

Let = 1. 1 () = £ ()1 0 + 1 ¤1 =

  • h

(0 + )1 0i | ()|1 = | + 0|1

  • =

ˆ = (0)1 0 21

slide-23
SLIDE 23
  • Exercise. Prove that
  • =

³ ˆ

  • ´0 ³

ˆ

  • ´

+ ³ ˆ ´ ¡ ()1 + 1¢1 ³ ˆ ´ = ( )0 ( )

  • ³

ˆ ´ ( + )1 ³ ˆ ´ . For the case when 1 is unknown (gamma-normal case), ( )

  • ¡

| 1 ()1¢

  • ¡

| 2

1 1

¢ . 22

slide-24
SLIDE 24

Then, as shown, we have predictive density: ( ) = Z · · · Z ( | ) ( ) = (1 ) ¯ ¯ ¯ ¯

  • 2

1

¯ ¯ ¯ ¯

1 2 μ

1 + 2 ¶ 1+

2

  • =

( + 0)1 0 || = || | + 0|1 (1 ) = (1)12 ¡1

2 + 2 1

¢ ! 2 ¡1

2 1

¢ ! 23

slide-25
SLIDE 25

The Bayes factor for relative to is (1 ) (1 ) |

|

1 2

¯ ¯

  • ¯

¯

1 2

|

+ 0 | 1

2

¯ ¯

+ 0

  • ¯

¯ 1

2

·

1

  • 1

³ 1 +

2

1

´ (1+ )

2

³ +

2

  • ´

(1+ ) 2

. 24

slide-26
SLIDE 26

D Diuse Prior Approach

Let |

|

1 2, = , get small and set 1 = 0 (in which case we

don’t get ). What types of diuse priors? (Many exist and they do not all lead to the same inference – a problem.)

  • =
  • r
  • =

()

  • 2
  • r
  • =

()

  • 2 0,

and set 0. This is like saying we don’t have many a priori

  • bservations!

25

slide-27
SLIDE 27

( | ) ( | )

  • 2
  • 2

¯ ¯0

  • ¯

¯

1 2

|0

|

1 2

μESS ESS ¶

2

, where ESS is the regression sum of squares. 26

slide-28
SLIDE 28

This strongly favors a small parameter model, = ¯ ¯0

  • ¯

¯

1 2

|0

|

1 2

μESS ESS ¶

2

  • r

= μESS ESS ¶

2

. There is no unique diuse prior. 27

slide-29
SLIDE 29

E Dominated Priors and

As , , and 0 grow, ( | ) ( | ) = ¯ ¯0

  • ¯

¯

1 2

|0

|

1 2

μESS ESS ¶

2

. , to derive , assume,:

  • X

28

slide-30
SLIDE 30

You acquire the term from the determinant ¯ ¯ ¯

+ 0

  • ¯

¯ ¯

1

2

¯ ¯ ¯

+

  • ¯

¯ ¯

1

2 ×

μ1 1 ¶ ¡ 1 + 2

1

¢ 1+

2

(1 + 2

1)

1+ 2

= μ2

1

2

1

2 ¡

1 + 2

1

¢ 1+

2

(1 + 2

1)

1+ 2

29

slide-31
SLIDE 31

=

  • 1

1

  • 2

2

(1 + )

1+ 2

(1 + )

1+ 2

= ³

  • 2

´ (11 + )1 (11 + )1 ¡11

  • +

¢

2

¡11

  • +

¢

2

  • =
  • 2

μESS ESS ¶

2

. 30

slide-32
SLIDE 32

Now observe . For nested models, Bayes factor 1

  • ³
  • 1

´ , where = is the number of restrictions. These methods easily handle non-nested setups (this is lacking in classical approaches), among other merits. Measures of location: ( | ) = X

  • ( | ) ( | )

This allows for other model parameters to generate information about parameters. It also gives a measure of variability. 31

slide-33
SLIDE 33

Issues:

  • 1. Bayesian confidence intervals.
  • 2. Bayesian values?
  • 3. Multiple hypothesis testing.

Classical model selection, etc. 32