Frequentist Properties of Bayesian Methods Applied Bayesian - PowerPoint PPT Presentation

Frequentist Properties of Bayesian Methods Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago November 21, 2017 Frequentist Properties 1 Last edited November 20, 2017 by <ebalderama@luc.edu>

Properties of Bayesian and Frequentist methods So far we have discussed Bayesian methods as being separate from the frequentist approach. There are strengths and weaknesses to both approaches: Sometimes, Bayesian methods do not provide exact frequentist coverage. Frequentist inference violates the likelihood principle. No prescriptive Bayesian approach to model and prior specification. No prescriptive Frequentist approach to small sample problems. Frequentist Properties 2 Last edited November 20, 2017 by <ebalderama@luc.edu>

Properties of Bayesian and Frequentist methods However, in many cases, methods with frequentist properties are desirable . For example, we may want a method with Type I error control or 80% power. Bayesian methods are great when the likelihood and prior is fully and correctly specified, but frequentist properties offer protection against misspecification (“all models are wrong, but bad models lead to bad answers”). We can design Bayesian methods to achieve these frequentist properties. In this view, Bayesian methods generate procedures/algorithms for further study. Often Bayesian methods are very competitive with frequentist methods using frequentist criteria. Frequentist Properties 3 Last edited November 20, 2017 by <ebalderama@luc.edu>

Calibrated Bayes Little in Little (2011), Statistical Science “To summarize, Bayesian statistics is strong for inference under an assumed model , but relatively weak for the development and assessment of models. Frequentist statistics provides useful tools for model development and assessment , but has weaknesses for inference under an assumed model. If this summary is accepted, then a natural compromise is to use frequentist methods for model development and assessment, and Bayesian methods for inference under a model. This capitalizes on the strengths of both paradigms, and is the essence of the approach known as Calibrated Bayes .” Frequentist Properties 4 Last edited November 20, 2017 by <ebalderama@luc.edu>

Calibrated Bayes Box in Little (2011), Statistical Science “I believe that... sampling theory is needed for exploration and ultimate criticism of the entertained model in the light of the current data, while Bayes’ theory is needed for estimation of parameters conditional on adequacy of the model.” Frequentist Properties 5 Last edited November 20, 2017 by <ebalderama@luc.edu>

Calibrated Bayes Rubin in Little (2011), Statistical Science “The applied statistician should be Bayesian in principle and calibrated to the real world in practice —appropriate frequency calculations help to define such a tie... Frequency calculations are useful for making Bayesian statements scientific, scientific in the sense of capable of being shown wrong by empirical test; here the technique is the calibration of Bayesian probabilities to the frequencies of actual events.” Frequentist Properties 6 Last edited November 20, 2017 by <ebalderama@luc.edu>

Bayes as a procedure generator A Bayesian analysis produces a posterior distribution which summarize our uncertainty after observing the data. However, if you have to give a one-number summary as an estimate, you might pick the posterior mean ˆ θ B = E ( θ | Y ) This estimator, ˆ θ B , can be evaluated along with MLE or MOM estimators. Is it biased? Is it consistent? How does its MSE compare with the MLE? These are all frequentist properties of the Bayesian estimator. Frequentist Properties 7 Last edited November 20, 2017 by <ebalderama@luc.edu>

Bayes as a procedure generator Similarly, if we have to give an interval estimate, we might use the 95% posterior credible set. Note: In practice, this interval is motivated by the one data set we observed. But we could view this as a procedure for constructing an interval and inspect its frequentist properties. If we analyzed many datasets, each time computing a 95% posterior interval, how many would contain the true value? A Bayes test is to reject H 0 if Prob ( H 0 | Y ) < c What are the Type I and Type II errors of this test? Can we pick the threshold c to control Type I error? Frequentist Properties 8 Last edited November 20, 2017 by <ebalderama@luc.edu>

Calibrated Bayes Quote from FDA Because of the inherent flexibility in the design of a Bayesian clinical trial, a thorough evaluation of the operating characteristics should be part of the trial planning . This includes evaluation of: probability of erroneously approving an ineffective or unsafe device ( type I error rate ), probability of erroneously disapproving a safe and effective device ( type II error rate ), power (the converse of type II error rate: the probability of appropriately approving a safe and effective device, sample size distribution (and expected sample size), prior probability of claims for the device, and if applicable, probability of stopping at each interim look. Frequentist Properties 9 Last edited November 20, 2017 by <ebalderama@luc.edu>

Calibrated Bayes Quote from FDA “Pure” Bayesian approaches to statistics do not necessarily place the same emphasis on the notion of control of type I error as traditional frequentist approaches. There have, however, been some proposals in the literature that Bayesian methods should be “calibrated” to have good frequentist properties (e.g. Rubin, 1984; Box, 1980). In this spirit, as well as in adherence to regulatory practice, FDA recommends you provide the type I and II error rates of your proposed Bayesian analysis plan. Frequentist Properties 10 Last edited November 20, 2017 by <ebalderama@luc.edu>

Bayesian decision theory Before studying the frequentist properties of Bayesian estimators and hypothesis tests, we should first determine the “best” Bayesian method. For example, should we take the estimator to be the posterior mean, median, or mode? Defining “best” requires a scoring system; we call this the loss function, L (ˆ θ, θ ) . Examples: Squared error loss is L (ˆ θ, θ ) = (ˆ θ − θ ) 2 � � Absolute loss is L (ˆ � ˆ θ, θ ) = θ − θ � � � Frequentist Properties 11 Last edited November 20, 2017 by <ebalderama@luc.edu>

Bayesian decision theory The summary of the posterior that minimizes the expected (posterior) loss is the Bayes rule or Bayes decision rule (not to be confused with Bayes’ Rule). Examples: Squared error loss, L (ˆ θ, θ ) = (ˆ θ − θ ) 2 , ⇒ implies we should use the posterior mean for ˆ = θ . � � Absolute loss, L (ˆ � ˆ θ, θ ) = θ − θ � , � � ⇒ implies we should use the posterior median for ˆ = θ . Frequentist Properties 12 Last edited November 20, 2017 by <ebalderama@luc.edu>

Bias/variance trade-off Assume Y 1 , . . . , Y n ∼ Normal ( µ, σ 2 ) µ 1 = ¯ Estimator 1: ˆ Y µ 2 = c ¯ n Estimator 2: ˆ Y , where c = n + m � � 0 , σ 2 Note: Recall ˆ µ 2 is the posterior mean under prior µ ∼ Normal m Compute the bias and variance of each estimator 1 Compute the mean squared error, MSE = bias 2 + variance . 2 Which estimator is preferred? Frequentist Properties 13 Last edited November 20, 2017 by <ebalderama@luc.edu>

Properties of Bayesian estimators Broadly speaking, the following comparisons between Bayes and MLE hold: Bayesian estimators have smaller standard errors because the prior adds information. Bayesian estimators are biased if the prior is not centered on the truth. Depending on this bias/variance trade-off, Bayesian estimators may have smaller MSE than the MLE. If the prior is weak the methods are similar. For any prior that does not depend on the sample size, as n increases the prior is overwhelmed by the likelihood and the posterior approaches the MLE’s sampling distribution . Frequentist Properties 14 Last edited November 20, 2017 by <ebalderama@luc.edu>

Bayesian central limit theorem Assumptions: the usual MLE conditions on the likelihood the prior does not depend on n and puts non-zero probability on the true value θ 0 Bayesian CLT For large datasets, the posterior is approximately normal: � θ 0 , I ( θ 0 ) − 1 � p ( θ | Y ) → Normal , where I is the information matrix. Bayes methods are asymptotically unbiased . Bayes and MLE will be equivalent for large samples! Frequentist Properties 15 Last edited November 20, 2017 by <ebalderama@luc.edu>

Bayesian central limit theorem However, the interpretation is different. We can use the Bayesian interpretation, e.g., P ( H 0 | Y ) or P ( 3 . 4 < θ < 5 . 6 ) How does CLT help? The Bayesian CLT gives a way to approximate the posterior without MCMC ( n → ∞ ). Most still use MCMC with the hope that it better approximates the exact posterior ( S → ∞ ). The CLT is useful for initial values and tuning. Frequentist Properties 16 Last edited November 20, 2017 by <ebalderama@luc.edu>

Methods for studying frequentist properties Theoretical studies of Bayesian estimators use the same basic approaches as frequentist methods. Theorems and proofs (of consistency, etc.) are ideal. When the math is intractable, simulation studies are used. Frequentist Properties 17 Last edited November 20, 2017 by <ebalderama@luc.edu>

Frequentist Properties of Bayesian Methods Applied Bayesian - PowerPoint PPT Presentation

Frequentist Properties of Bayesian Methods Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago November 21, 2017 Frequentist Properties 1 Last edited November 20, 2017 by

Bayesian Methods for Parameter Estimation Bayesian vs Frequentist Inference Frequentist Chris

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Workshop 7.2b: Introduction to Bayesian models Murray Logan February 7, 2017 Table of

Workshop 7.2b: Introduction to Bayesian models Murray Logan 07 Feb 2017 Section 1 Frequentist

Frequentist and Bayesian stochastic frontier models in Stata Federico Belotti Silvio Daidone

Bayesian dynamic borrowing of external information: What can be gained in terms of frequentist

Two Statistical Paradigms Bayesian versus Frequentist Steven Janke April 2012 (Bayesian

Statistics for Applications Chapter 8: Bayesian Statistics 1/17 The Bayesian approach (1)

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Facilitating Antibacterial Drug Development: Bayesian vs Frequentist Methods Scott S. Emerson,

Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19

Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Bayesian statistics DS GA 1002 Statistical and Mathematical Models

Bayesian Methods in Cryo-EM Marcus A. Brubaker York University / Structura Biotechnology Toronto,

Which probability Which probability Which probability Which probability theory for cosmology?

Quasi-Bayesian inference - pitfalls of incoherence Jacek Osiewalski (Cracow University of

The mind is a neural computer, fitted by natural selection with combinatorial algorithms for

Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano

Part 9: Text Classification; The Nave Bayes algorithm Francesco Ricci Most of these slides

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Data Mining Techniques: Statistical Decision Theory Nearest Neighbor Classification and