Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa - PowerPoint PPT Presentation

Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019 Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 1 / 16

Outline Review of formal Bayesian hypothesis testing Likelihood ratio tests Jeffrey-Lindley paradox p -value interpretation Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 2 / 16

Bayes tests = evaluate predictive models Consider a standard hypothesis test scenario: H 0 : θ = θ 0 , H 1 : θ � = θ 0 A Bayesian measure of the support for the null hypothesis is the Bayes Factor: BF ( H 0 : H 1 ) = p ( y | H 0 ) p ( y | θ 0 ) p ( y | H 1 ) = � p ( y | θ ) p ( θ | H 1 ) dθ where p ( θ | H 1 ) is the prior distribution for θ under the alternative hypothesis. Thus the Bayes Factor measures the predictive ability of the two Bayesian models. Both models say p ( y | θ ) are the data model if we know θ , but 1. Model 0 says θ = θ 0 and thus p ( y | θ 0 ) is our predictive distribution for y under model H 0 while 2. Model 1 says p ( θ | H 1 ) is our uncertainty about θ and thus � p ( y | H 1 ) = p ( y | θ ) p ( θ | H 1 ) dθ is our predictive distribution for y under model H 1 . Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 3 / 16

Normal example Consider y ∼ N ( θ, 1) and H 0 : θ = 0 , H 1 : θ � = 0 and we assume θ | H 1 ∼ N (0 , C ) . Thus, BF ( H 0 : H 1 ) = p ( y | H 0 ) p ( y | θ 0 ) N ( y ; 0 , 1) p ( y | H 1 ) = p ( y | θ ) p ( θ | H 1 ) dθ = N ( y ; 0 , 1 + C ) . � Now, as C → ∞ , our predictions about y become less sharp. Predictive distributions 0.4 H0 H1 p(y) 0.2 0.0 −4 −2 0 2 4 y Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 4 / 16

Likelihood Ratio Tests Likelihood Ratio Tests Consider a likelihood L ( θ ) = p ( y | θ ) , then the likelihood ratio test statistic for testing H 0 : θ ∈ Θ 0 and H 1 : θ ∈ Θ c 0 with Θ = Θ 0 ∪ Θ c 0 is sup Θ L ( θ ) = L (ˆ λ ( y ) = sup Θ 0 L ( θ ) θ 0 ,MLE ) L (ˆ θ MLE ) where ˆ θ MLE and ˆ θ 0 ,MLE are the (restricted) MLEs. The likelihood ratio test (LRT) is any test that has a rejection region of the form { y : λ ( y ) ≤ c } . (Casella & Berger Def 8.2.1) Under certain conditions (see Casella & Berger 10.3.3), as n → ∞ − 2 log λ ( y ) → χ 2 ν where ν us the difference between the number of free parameters specified by θ ∈ θ 0 and the number of free parameters specified by θ ∈ Θ . Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 5 / 16

Likelihood Ratio Tests Binomial example Binomial example iid Consider a coin flipping experiment so that Y i ∼ Ber ( θ ) and the null hypothesis H 0 : θ = 0 . 5 versus the alternative H 1 : θ � = 0 . 5 . Then λ ( y ) = sup Θ 0 L ( θ ) 0 . 5 n 0 . 5 n sup Θ L ( θ ) = θ MLE ) n − ny = ˆ θ ny MLE (1 − ˆ y ny (1 − y ) n − ny and − 2 log λ ( y ) → χ 2 1 as n → ∞ so p -value ≈ P ( χ 2 1 > − 2 log λ ( y )) . If p -value < a , then we reject H 0 at significance level a . Typically a = 0 . 05 . Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 6 / 16

Likelihood Ratio Tests Binomial example Binomial example Y ∼ Bin ( n, θ ) and, for the Bayesian analysis, θ | H 1 ∼ Be (1 , 1) and p ( H 0 ) = p ( H 1 ) = 0 . 5 : Likelihood ratio test pvalue Posterior probability 1.00 0.75 factor(n) statistic 10 0.50 20 30 0.25 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 ybar Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 7 / 16

Jeffrey-Lindley paradox Do p -values and posterior probabilities agree? Suppose n = 10 , 000 and y = 4 , 900 , then the p -value is p -value ≈ P ( χ 2 1 > − 2 log(0 . 135)) = 0 . 045 so we would reject H 0 at the 0.05 level. The posterior probability of H 0 is 1 p ( H 0 | y ) ≈ 1 + 1 / 10 . 8 = 0 . 96 , so the probability of H 0 being true is 96%. It appears the Bayesian and LRT p -value completely disagree! Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 8 / 16

Jeffrey-Lindley paradox Binomial y = 0 . 49 with n → ∞ 1.00 0.75 variable value 0.50 pvalue post_prob 0.25 0.00 0 1 2 3 4 5 log10(n) Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 9 / 16

Jeffrey-Lindley paradox Jeffrey-Lindley Paradox Jeffrey-Lindley Paradox Definition The Jeffrey-Lindley Paradox concerns a situation when comparing two hypotheses H 0 and H 1 given data y and find a frequentist test result is significant leading to rejection of H 0 , but our posterior belief in H 0 being true is high. This can happen when the effect size is small, n is large, H 0 is relatively precise, H 1 is relatively diffuse, and the prior model odds is ≈ 1 . Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 10 / 16

Jeffrey-Lindley paradox Jeffrey-Lindley Paradox Comparison The test statistic with point null hypotheses: p ( y | θ 0 ) λ ( y ) = p ( y | ˆ θ MLE ) p ( y | θ ) p ( θ | H 1 ) dθ = p ( y | H 0 ) p ( y | θ 0 ) BF ( H 0 : H 1 ) = � p ( y | H 1 ) A few comments: The LRT chooses the best possible alternative value. The Bayesian test penalizes for vagueness in the prior. The LRT can be interpreted as a Bayesian point mass prior exactly at the MLE. Generally, p -values provide a measure of lack-of-fit of the data to the null model. Bayesian tests compare predictive performance of two Bayesian models (model+prior). Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 11 / 16

Jeffrey-Lindley paradox Jeffrey-Lindley Paradox Normal mean testing Let y ∼ N ( θ, 1) and we are testing H 0 : θ = 0 vs H 1 : θ � = 0 We can compute a two-sided p -value via p -value = 2Φ( −| y | ) where Φ( · ) is the cumulative distribution function for a standard normal. Typically, we set our Type I error rate at level a , i.e. P ( reject H 0 | H 0 true ) = a. But, if we reject H 0 , i.e. the p -value < a , we should be interested in P ( H 0 true | reject H 0 ) = 1 − FDR where FDR is the False Discovery Rate. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 12 / 16

p -value interpretation App p -value interpretation Let y ∼ N ( θ, 1) and we are testing H 0 : θ = 0 vs H 1 : θ � = 0 For the following activity, you need to tell me 1. the observed p -value, 2. the relative frequencies of null and alternative hypotheses, and 3. the distribution for θ under the alternative. Then this p -value app below will calculate (via simulation) the probability the null hypothesis is true. shiny::runGitHub('jarad/pvalue') Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 13 / 16

p -value interpretation App approach p -value app approach The idea is that a scientist performs a series of experiments. For each experiment, whether H 0 or H 1 is true is randomly determined, θ is sampled according to which hypothesis is true, and the p -value is calculated. This process is repeated until a p -value of the desired value is achieved, e.g. p -value=0.05, and the true hypothesis is recorded. Thus, K P ( H 0 true | p -value = 0 . 05) ≈ 1 � I( H 0 true | p -value ≈ 0 . 05) . K k =1 Thus, there is nothing Bayesian happening here except that the probability being calculated has the unknown quantity on the left and the known quantity on the right. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 14 / 16

p -value interpretation Prosecutor’s Fallacy Prosecutor’s Fallacy It is common for those using statistics to equate the following ? p -value ≈ P ( data | H 0 true ) � = P ( H 0 true | data ) . but we can use Bayes rule to show us that these probabilities cannot be equated p ( H 0 | y ) = p ( y | H 0 ) p ( H 0 ) p ( y | H 0 ) p ( H 0 ) = p ( y ) p ( y | H 0 ) p ( H 0 ) + p ( y | H 1 ) p ( H 1 ) This situation is common enough that it is called The Prosecutor’s Fallacy. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 15 / 16

p -value interpretation ASA Statement on p -values ASA Statement on p -values https://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108 Principles: 1. P -values can indicate how incompatible the data are with a specified statistical model[, the model associated with the null hypothesis]. 2. P -values do not measure the probability the studied hypothesis is true, or the probability that the data were produced by random chance alone. 3. Scientific conclusions and business or policy decisions should not be based solely on whether a p -value passes a specific threshold. 4. Proper inference requires full reporting and transparency. 5. A p -value, or statistical significance, does not measure the size of an effect or the importance of the result. 6. By itself, a p -value does not provide a good measure of evidence regarding a model or hypothesis. Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 16 / 16

Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa - PowerPoint PPT Presentation

Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019 Jarad Niemi (STAT544@ISU) Bayesian hypothesis testing (cont.) March 7, 2019 1 / 16 Outline Review of formal Bayesian hypothesis testing

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

The Bernoulli Generalized Likelihood Ratio test (BGLR) for Non-Stationary Multi-Armed Bandits

13. hypothesis testing 1 competing hypotheses 2 competing hypotheses 3 competing hypotheses

Recall: Linear Regression 200 180 160 140 Power (bhp)

E9 205 Machine Learning for Signal Processing ML, MAP, MMSE and Gaussian 28-08-2019 Modeling

Chapter 8: Hypothesis Testing STK4011/9011: Statistical Inference Theory Johan Pensar

Test for Covariances Max Turgeon STAT 7200Multivariate Statistics Objectives Review

Classification Fundamentals and Overview September 17, 2019 Classification Fundamentals

Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, , 2019 2019