Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State - - PowerPoint PPT Presentation

bayesian hypothesis testing
SMART_READER_LITE
LIVE PREVIEW

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State - - PowerPoint PPT Presentation

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019 Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 1 / 25 Outline Scientific method Statistical hypothesis testing Simple vs


slide-1
SLIDE 1

Bayesian hypothesis testing

  • Dr. Jarad Niemi

STAT 544 - Iowa State University

March 7, 2019

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 1 / 25

slide-2
SLIDE 2

Outline

Scientific method

Statistical hypothesis testing Simple vs composite hypotheses

Simple Bayesian hypothesis testing

All simple hypotheses All composite hypotheses

Propriety

Posterior Prior predictive distribution

Bayesian hypothesis testing with mixed hypotheses (models)

Prior model probability Prior for parameters in composite hypotheses

WARNING: do not use non-informative priors

Posterior model probability

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 2 / 25

slide-3
SLIDE 3

Scientific method

http://www.wired.com/wiredscience/2013/04/whats-wrong-with-the-scientific-method/ Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 3 / 25

slide-4
SLIDE 4

Statistical hypothesis testing

Statistical hypothesis testing

Definition A simple hypothesis specifies the value for all parameters of interest while a composite hypothesis does not. Let Yi

ind

∼ Ber(θ) and H0 : θ = 0.5 (simple) H1 : θ = 0.5 (composite)

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 4 / 25

slide-5
SLIDE 5

Statistical hypothesis testing

Prior probabilities on simple hypotheses

What is your prior probability for the following hypotheses:

a coin flip has exactly 0.5 probability of landing heads a fertilizer treatment has zero effect on plant growth inactivation of a mouse growth gene has zero effect on mouse hair color a butterfly flapping its wings in Australia has no effect on temperature in Ames guessing the color of a card drawn from a deck has probability 0.5

Many null hypotheses have zero probability a priori, so why bother performing the hypothesis test?

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 5 / 25

slide-6
SLIDE 6

Statistical hypothesis testing All simple hypotheses

Bayesian hypothesis testing with all simple hypotheses

Let Y ∼ p(y|θ) and Hj : θ = dj for j = 1, . . . , J. Treat this as a discrete prior on the dj, i.e. P(θ = dj) = pj. The posterior is then P(θ = dj|y) = pjp(y|dj) J

k=1 pkp(y|dk)

∝ pjp(y|dj). For example, suppose Yi

ind

∼ Ber(θ) and P(θ = dj) = 1/11 where dj = j/10 for j = 0, . . . , 10. The posterior is P(θ = dj|y) ∝ 1 11

n

  • i=1

(dj)yi(1 − dj)1−yi = (dj)ny(1 − dj)n(1−y) If j = 0 (j = 10), any yi = 1 (yi = 0) will make the posterior probability

  • f H0 (H1) zero.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 6 / 25

slide-7
SLIDE 7

Statistical hypothesis testing All simple hypotheses

Discrete prior example

n = 13; y = rbinom(n,1,.45); sum(y) [1] 7

0.0 0.1 0.2 0.3 0.00 0.25 0.50 0.75 1.00

theta value variable

prior posterior

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 7 / 25

slide-8
SLIDE 8

Statistical hypothesis testing All composite hypotheses

Bayesian hypothesis testing with all composite hypotheses

Let Y ∼ p(y|θ) and Hj : θ ∈ (Ej−1, Ej] for j = 1, . . . , J. Just calculate the area under the curve, i.e. prior probabilities are P(Hj) = P(Ej−1 < θ < Ej) = Ej

Ej−1

p(θ)dθ. and posterior probabilities are P(Hj|y) = P(Ej−1 < θ < Ej|y) = Ej

Ej−1

p(θ|y)dθ For example, suppose Yi

ind

∼ Ber(θ) and Ej = j/10 for j = 0, . . . , 10. Now, assume θ ∼ Be(1, 1) and thus θ|y ∼ Be(1 + ny, 1 + n[1 − y]).

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 8 / 25

slide-9
SLIDE 9

Statistical hypothesis testing All composite hypotheses

Beta example

The posterior probabilities are

0.03 0.12 0.25 0.3 0.21 0.08 0.01

1 2 3 0.00 0.25 0.50 0.75 1.00

x y

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 9 / 25

slide-10
SLIDE 10

Posterior propriety Tonelli’s Theorem

Tonelli’s Theorem (successor to Fubini’s Theorem)

Theorem Tonelli’s Theorem states that if X and Y are σ-finite measure spaces and f is non-negative and measureable, then

  • X
  • Y

f(x, y)dydx =

  • Y
  • X

f(x, y)dxdy i.e. you can interchange the integrals (or sums). On the following slides, the use of this theorem will be indicated by TT.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 10 / 25

slide-11
SLIDE 11

Posterior propriety Proper priors

Proper priors with discrete data

Theorem If the prior is proper and the data are discrete, then the posterior is always proper. Proof. Let p(θ) be the prior and p(y|θ) be the statistical model. Thus, we need to show that p(y) =

  • Θ

p(y|θ)p(θ)dθ < ∞ ∀y. For discrete y, we have p(y) ≤

z∈Y p(z) = z∈Y

  • Θ p(z|θ)p(θ)dθ

T T

=

  • Θ
  • z∈Y p(z|θ)p(θ)dθ

=

  • Θ p(θ)dθ = 1.

Thus the posterior is always proper if y is discrete and the prior is proper.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 11 / 25

slide-12
SLIDE 12

Posterior propriety Proper priors

Proper priors with continuous data

Theorem If the prior is proper and the data are continuous, then the posterior is almost always proper. Proof. Let p(θ) be the prior and p(y|θ) be the statistical model. Thus, we need to show that p(y) =

  • Θ

p(y|θ)p(θ)dθ < ∞ for almost all y. For continuous y, we have

  • Y p(z)dz =
  • Y
  • Θ p(z|θ)p(θ)dθdz

T T

=

  • Θ
  • Y p(z|θ)dz p(θ)dθ =
  • Θ p(θ)dθ = 1

thus p(y) is finite except on a set of measure zero, i.e. p(θ|y) is almost always proper.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 12 / 25

slide-13
SLIDE 13

Posterior propriety Propriety of prior predictive distributions

Proper prior predictive distributions

In the previous derivations when the prior is proper, we showed that

  • z∈Y

p(z) = 1 and

  • Y

p(z)dz = 1 for discrete and continuous data, respectively. Corollary When the prior is proper, the prior predictive distribution is also proper.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 13 / 25

slide-14
SLIDE 14

Posterior propriety Propriety of prior predictive distributions

Improper prior predictive distributions

Theorem If p(θ) is improper, then p(y) =

  • p(y|θ)p(θ)dθ is improper.

Proof.

  • p(y)dy

= p(y|θ)p(θ)dθdy TT =

  • p(θ)
  • p(y|θ)dydθ

=

  • p(θ)dθ

since p(θ) is improper, so is p(y). A similar result holds for discrete y replacing the integral with a sum.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 14 / 25

slide-15
SLIDE 15

Bayesian hypothesis testing

Bayesian hypothesis testing

To evaluate the relative plausibility of a hypothesis (model), we use the posterior model probability: p(Hj|y) = p(y|Hj)p(Hj) p(y) = p(y|Hj)p(Hj) J

k=1 p(y|Hk)p(Hk)

∝ p(y|Hj)p(Hj). where p(Hj) is the prior model probability and p(y|Hj) =

  • p(y|θ)p(θ|Hj)dθ

is the marginal likelihood under model Hj and p(θ|Hj) is the prior for parameters θ when model Hj is true.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 15 / 25

slide-16
SLIDE 16

Bayesian hypothesis testing

Marginal likelihood

The marginal likelihood calculation differs for simple vs composite hypotheses: Simple hypotheses can be considered to have a Dirac delta function for a prior, e.g. if H0 : θ = θ0 then θ|H0 ∼ δθ0. Then the marginal likelihood is p(y|H0) =

  • p(y|θ)p(θ|H0)dθ = p(y|θ0).

Composite hypotheses have a continuous prior and thus p(y|Hj) =

  • p(y|θ)p(θ|Hj)dθ.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 16 / 25

slide-17
SLIDE 17

Bayesian hypothesis testing

Two models

If we only have two models: H0 and H1, then p(H0|y) = p(y|H0)p(H0) p(y|H0)p(H0) + p(y|H1)p(H1) = 1 1 + p(y|H1)

p(y|H0) p(H1) p(H0)

where p(H1) p(H0) = p(H1) 1 − p(H1) is the prior odds in favor of H1 and BF(H1 : H0) = p(y|H1) p(y|H0) = 1 BF(H0 : H1) is the Bayes Factor for model H1 relative to H0.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 17 / 25

slide-18
SLIDE 18

Bayesian hypothesis testing Binomial model

Binomial model

Consider a coin flipping experiment so that Yi

ind

∼ Ber(θ) and the null hypothesis H0 : θ = 0.5 versus the alternative H1 : θ = 0.5 and θ|H1 ∼ Be(a, b). BF(H0 : H1) =

0.5n 1

0 θny(1−θ)n(1−y) θa−1(1−θ)b−1 Beta(a,b)

=

0.5n

1 Beta(a,b)

1

0 θa+ny−1(1−θ)b+n−ny−1θ

=

0.5n

Beta(a+ny,b+n−ny) Beta(a,b)

=

0.5nBeta(a,b) Beta(a+ny,b+n−ny)

and with p(H0) = p(H1) the posterior model probability is P(H0|y) = 1 1 +

1 BF(H0:H1)

.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 18 / 25

slide-19
SLIDE 19

Bayesian hypothesis testing Binomial model

Sample size and sample average

P(H0) = P(H1) = 0.5 and θ|H1 ∼ Be(1, 1):

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

ybar p(H0|y) n

10 20 30

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 19 / 25

slide-20
SLIDE 20

Bayesian hypothesis testing Binomial model

“Non-informative” prior

Recall that θ ∼ Be(a, b) has a prior successes and b prior failures. Thus, in some sense a, b → 0 puts minimal prior data into the analysis. If θ|H1 ∼ Be(e, e), then BF(H0 : H1) = 0.5nBe(e, e) Be(e + ny, e + n − ny)

e→0

− → ∞ for any y ∈ (0, 1) since Be(e, e) e→0 − → ∞.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 20 / 25

slide-21
SLIDE 21

Bayesian hypothesis testing Binomial model

Limit of proper prior

P(H0) = P(H1) = 0.5 and θ|H1 ∼ Be(e, e):

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

ybar p(H0|y) e

1e−04 0.001 0.01 0.1 1

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 21 / 25

slide-22
SLIDE 22

Bayesian hypothesis testing Binomial model

Normal example

Consider the model Y ∼ N(θ, 1) and the hypothesis test H0 : θ = 0 versus H1 : θ = 0 with prior θ|H1 ∼ N(0, C). The predictive distribution under H1 is p(y|H1) =

  • p(y|θ)p(θ|H1)dθ = N(y; 0, 1 + C)

and the Bayes factor is BF(H0 : H1) = N(y; 0, 1) N(y; 0, 1 + C). The Bayes factor will increase as C → ∞ for any y and this only gets worse if you use an improper prior.

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 22 / 25

slide-23
SLIDE 23

Bayesian hypothesis testing Binomial model

Normal example

0.00 0.25 0.50 0.75 1.00 25 50 75 100

C p(H0|y) y

1 2 3 4 5

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 23 / 25

slide-24
SLIDE 24

Bayesian hypothesis testing Binomial model

Summary

Treat hypothesis testing as parameter estimation

All simple hypotheses: discrete prior All composite hypotheses: continuous prior

Formal Bayesian hypothesis testing (simple and composite hypotheses)

Specify prior model probabilities Specify parameter priors for composite hypotheses WARNING: Do not use non-informative priors! Calculate Bayes Factors or posterior model probabilities

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 24 / 25

slide-25
SLIDE 25

Bayesian hypothesis testing Binomial model

Scientific method updated

All models are wrong, but some are useful. George Box 1987

Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 25 / 25