Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019 Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 1 / 25

Outline Scientific method Statistical hypothesis testing Simple vs composite hypotheses Simple Bayesian hypothesis testing All simple hypotheses All composite hypotheses Propriety Posterior Prior predictive distribution Bayesian hypothesis testing with mixed hypotheses (models) Prior model probability Prior for parameters in composite hypotheses WARNING: do not use non-informative priors Posterior model probability Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 2 / 25

Scientific method http://www.wired.com/wiredscience/2013/04/whats-wrong-with-the-scientific-method/ Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 3 / 25

Statistical hypothesis testing Statistical hypothesis testing Definition A simple hypothesis specifies the value for all parameters of interest while a composite hypothesis does not. ind Let Y i ∼ Ber ( θ ) and H 0 : θ = 0 . 5 (simple) H 1 : θ � = 0 . 5 (composite) Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 4 / 25

Statistical hypothesis testing Prior probabilities on simple hypotheses What is your prior probability for the following hypotheses: a coin flip has exactly 0.5 probability of landing heads a fertilizer treatment has zero effect on plant growth inactivation of a mouse growth gene has zero effect on mouse hair color a butterfly flapping its wings in Australia has no effect on temperature in Ames guessing the color of a card drawn from a deck has probability 0.5 Many null hypotheses have zero probability a priori , so why bother performing the hypothesis test? Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 5 / 25

Statistical hypothesis testing All simple hypotheses Bayesian hypothesis testing with all simple hypotheses Let Y ∼ p ( y | θ ) and H j : θ = d j for j = 1 , . . . , J . Treat this as a discrete prior on the d j , i.e. P ( θ = d j ) = p j . The posterior is then p j p ( y | d j ) P ( θ = d j | y ) = ∝ p j p ( y | d j ) . � J k =1 p k p ( y | d k ) ind For example, suppose Y i ∼ Ber ( θ ) and P ( θ = d j ) = 1 / 11 where d j = j/ 10 for j = 0 , . . . , 10 . The posterior is n P ( θ = d j | y ) ∝ 1 ( d j ) y i (1 − d j ) 1 − y i = ( d j ) ny (1 − d j ) n (1 − y ) � 11 i =1 If j = 0 ( j = 10 ), any y i = 1 ( y i = 0 ) will make the posterior probability of H 0 ( H 1 ) zero. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 6 / 25

Statistical hypothesis testing All simple hypotheses Discrete prior example n = 13; y = rbinom(n,1,.45); sum(y) [1] 7 0.3 0.2 variable value prior posterior 0.1 0.0 0.00 0.25 0.50 0.75 1.00 theta Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 7 / 25

Statistical hypothesis testing All composite hypotheses Bayesian hypothesis testing with all composite hypotheses Let Y ∼ p ( y | θ ) and H j : θ ∈ ( E j − 1 , E j ] for j = 1 , . . . , J . Just calculate the area under the curve, i.e. prior probabilities are � E j P ( H j ) = P ( E j − 1 < θ < E j ) = p ( θ ) dθ. E j − 1 and posterior probabilities are � E j P ( H j | y ) = P ( E j − 1 < θ < E j | y ) = p ( θ | y ) dθ E j − 1 ind For example, suppose Y i ∼ Ber ( θ ) and E j = j/ 10 for j = 0 , . . . , 10 . Now, assume θ ∼ Be (1 , 1) and thus θ | y ∼ Be (1 + ny, 1 + n [1 − y ]) . Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 8 / 25

Statistical hypothesis testing All composite hypotheses Beta example The posterior probabilities are 3 2 y 1 0 0 0.03 0.12 0.25 0.3 0.21 0.08 0.01 0 0 0.00 0.25 0.50 0.75 1.00 x Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 9 / 25

Posterior propriety Tonelli’s Theorem Tonelli’s Theorem (successor to Fubini’s Theorem) Theorem Tonelli’s Theorem states that if X and Y are σ -finite measure spaces and f is non-negative and measureable, then � � � � f ( x, y ) dydx = f ( x, y ) dxdy X Y Y X i.e. you can interchange the integrals (or sums). On the following slides, the use of this theorem will be indicated by TT. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 10 / 25

Posterior propriety Proper priors Proper priors with discrete data Theorem If the prior is proper and the data are discrete, then the posterior is always proper. Proof. Let p ( θ ) be the prior and p ( y | θ ) be the statistical model. Thus, we need to show that � p ( y ) = p ( y | θ ) p ( θ ) dθ < ∞ ∀ y. Θ For discrete y , we have T T � � p ( y ) ≤ � z ∈Y p ( z ) = � Θ p ( z | θ ) p ( θ ) dθ = � z ∈Y p ( z | θ ) p ( θ ) dθ z ∈Y Θ � = Θ p ( θ ) dθ = 1 . Thus the posterior is always proper if y is discrete and the prior is proper. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 11 / 25

Posterior propriety Proper priors Proper priors with continuous data Theorem If the prior is proper and the data are continuous, then the posterior is almost always proper. Proof. Let p ( θ ) be the prior and p ( y | θ ) be the statistical model. Thus, we need to show that � p ( y ) = p ( y | θ ) p ( θ ) dθ < ∞ for almost all y. Θ For continuous y , we have T T � � � � � � Y p ( z ) dz = Θ p ( z | θ ) p ( θ ) dθdz = Y p ( z | θ ) dz p ( θ ) dθ = Θ p ( θ ) dθ = 1 Y Θ thus p ( y ) is finite except on a set of measure zero, i.e. p ( θ | y ) is almost always proper. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 12 / 25

Posterior propriety Propriety of prior predictive distributions Proper prior predictive distributions In the previous derivations when the prior is proper, we showed that � � p ( z ) = 1 and p ( z ) dz = 1 Y z ∈Y for discrete and continuous data, respectively. Corollary When the prior is proper, the prior predictive distribution is also proper. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 13 / 25

Posterior propriety Propriety of prior predictive distributions Improper prior predictive distributions Theorem If p ( θ ) is improper, then p ( y ) = � p ( y | θ ) p ( θ ) dθ is improper. Proof. p ( y | θ ) p ( θ ) dθdy TT � � � � � p ( y ) dy = = p ( θ ) p ( y | θ ) dydθ = � p ( θ ) dθ since p ( θ ) is improper, so is p ( y ) . A similar result holds for discrete y replacing the integral with a sum. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 14 / 25

Bayesian hypothesis testing Bayesian hypothesis testing To evaluate the relative plausibility of a hypothesis (model), we use the posterior model probability: p ( H j | y ) = p ( y | H j ) p ( H j ) p ( y | H j ) p ( H j ) = ∝ p ( y | H j ) p ( H j ) . � J p ( y ) k =1 p ( y | H k ) p ( H k ) where p ( H j ) is the prior model probability and � p ( y | H j ) = p ( y | θ ) p ( θ | H j ) dθ is the marginal likelihood under model H j and p ( θ | H j ) is the prior for parameters θ when model H j is true. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 15 / 25

Bayesian hypothesis testing Marginal likelihood The marginal likelihood calculation differs for simple vs composite hypotheses: Simple hypotheses can be considered to have a Dirac delta function for a prior, e.g. if H 0 : θ = θ 0 then θ | H 0 ∼ δ θ 0 . Then the marginal likelihood is � p ( y | H 0 ) = p ( y | θ ) p ( θ | H 0 ) dθ = p ( y | θ 0 ) . Composite hypotheses have a continuous prior and thus � p ( y | H j ) = p ( y | θ ) p ( θ | H j ) dθ. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 16 / 25

Bayesian hypothesis testing Two models If we only have two models: H 0 and H 1 , then p ( y | H 0 ) p ( H 0 ) 1 p ( H 0 | y ) = p ( y | H 0 ) p ( H 0 ) + p ( y | H 1 ) p ( H 1 ) = 1 + p ( y | H 1 ) p ( H 1 ) p ( y | H 0 ) p ( H 0 ) where p ( H 1 ) p ( H 1 ) p ( H 0 ) = 1 − p ( H 1 ) is the prior odds in favor of H 1 and BF ( H 1 : H 0 ) = p ( y | H 1 ) 1 p ( y | H 0 ) = BF ( H 0 : H 1 ) is the Bayes Factor for model H 1 relative to H 0 . Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 17 / 25

Bayesian hypothesis testing Binomial model Binomial model ind Consider a coin flipping experiment so that Y i ∼ Ber ( θ ) and the null hypothesis H 0 : θ = 0 . 5 versus the alternative H 1 : θ � = 0 . 5 and θ | H 1 ∼ Be ( a, b ) . 0 . 5 n BF ( H 0 : H 1 ) = � 1 0 θ ny (1 − θ ) n (1 − y ) θa − 1(1 − θ ) b − 1 dθ Beta ( a,b ) 0 . 5 n = � 1 1 0 θ a + ny − 1 (1 − θ ) b + n − ny − 1 θ Beta ( a,b ) 0 . 5 n = Beta ( a + ny,b + n − ny ) Beta ( a,b ) 0 . 5 n Beta ( a,b ) = Beta ( a + ny,b + n − ny ) and with p ( H 0 ) = p ( H 1 ) the posterior model probability is 1 P ( H 0 | y ) = . 1 1 + BF ( H 0 : H 1 ) Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 18 / 25

Bayesian hypothesis testing Binomial model Sample size and sample average P ( H 0 ) = P ( H 1 ) = 0 . 5 and θ | H 1 ∼ Be (1 , 1) : 1.00 0.75 n p(H 0 |y) 10 0.50 20 30 0.25 0.00 0.00 0.25 0.50 0.75 1.00 ybar Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 19 / 25

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019 Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing March 7, 2019 1 / 25 Outline Scientific method Statistical hypothesis testing Simple vs

Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 7,

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Conditional probabilities P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

CS 730/730W/830: Intro AI Naive Bayes Boosting 1 handout: slides asst 5 milestone was due

1 2 3 4 5 6 7 8

Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners Minimum description

Scalable Machine Learning 2. Statistics Alex Smola Yahoo! Research and ANU

7 Modelling Uncertainty Bayes theorem 7 Modelling Uncertainty Bayes theorem

Advanced Mathematical Methods Part II Statistics Statistical Inference Mel Slater

ET-805 Bayesian Knowledge Tracing Ramkumar.Rajendran@iitb.ac.in Activity - TPS Think