The Calibrated Bayes Factor for Model Comparison Steve MacEachern - PowerPoint PPT Presentation

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University Joint work with Xinyi Xu, Pingbo Lu and Ruoxi Xu Supported by the NSF and NSA Bayesian Nonparametrics Workshop ICERM 2012

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Outline • The Bayes factor – when it works, and when it doesn’t • The calibrated Bayes factor • Ohio Family Health Survey (OFHS) analysis • Wrap-up

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Outline Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Bayes factors • The Bayes factor is one of the most important and most widely used tools for Bayesian hypothesis testing and model comparison. • Given two models M 1 and M 2 , we have BF = m ( y ; M 1 ) m ( y ; M 2 ) , • Some rules of thumb for using Bayes factors (Jeffreys 1961) • 1 < Bayes factor ≤ 3 : weak evidence for M 1 • 3 < Bayes factor ≤ 10 : substantial evidence for M 1 • 10 < Bayes factor ≤ 100 : strong evidence for M 1 • 100 < Bayes factor: decisive evidence for M 1

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Monotonicity and the Bayes factor • The Bayes factor is best examined by consideration of log ( BF ) log ( m ( y ; M 1 )) − log ( m ( y ; M 2 )) = n − 1 log m ( Y i + 1 | Y 0: i , M 1 ) � m ( Y i + 1 | Y 0: i , M 2 ) . = i = 0 • The expectation under M 1 is non-negative, and is postive if M 1 � M 2 . • Consider examining the data set one observation at a time. • If M 1 is right, each obs’n makes a positive contribution in expectation. • “Trace” of GMSS is similar to Brownian motion with non-linear drift.

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Bayes factors (Cont.) • log(BF) versus sample size 2.0 1.5 1.0 log(Bayes factor) 0.5 0.0 −0.5 −1.0 0 10 20 30 40 50 sample size

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Example 1. Suppose that we have n = 176 i.i.d. observations from a skew-normal(location=0, scale=1.5, shape=2.5) . Compare a Gaussian parametric model vs. a Mixture of Dirichlet processes (MDP) nonparametric model. • Gaussian parametric model: iid y i | θ,σ 2 N ( θ,σ 2 ) , ∼ i = 1 ,..., n N ( µ,τ 2 ) θ ∼ • DP nonparametric model: iid y i | θ i ,σ 2 N ( θ i ,σ 2 ) , ∼ i = 1 ,..., n iid θ i | G ∼ G DP ( M = 2 , N ( µ,τ 2 )) G ∼ Common priors on hyper-parameters: σ 2 ∼ IG (7 , 0 . 3) , τ 2 ∼ IG (11 , 9 . 5) µ ∼ N (0 , 500) ,

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Model comparison results • Using the Bayes factor: B P , NP = e 4 . 92 ≈ 137 Decisive evidence for the parametric model! ⇒ • Using posterior predictive performance: E [log m ( Y n | Y 1:( n − 1) ; P )] = − 1 . 4267 E [log m ( Y n | Y 1:( n − 1) ; NP )] = − 1 . 3977 The nonparametric model is better! ⇒ • Given the same sample size, why do the Bayes factor and the posterior marginal likelihoods provide such very different results?

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up A motivating example (Cont.) Here is the whole story... We randomly select subsamples of smaller size, compute the log Bayes factor and log posterior predictive distribution based on each subsample, and then take averages to smooth the plot • E [log( Bayes factor )] vs. sample size 5 4 log(Bayes Factor) 3 2 1 0 1 21 41 56 71 86 101 126 151 176 Sample size

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up A motivating example(Cont.) • E [log( posterior predictive density )] vs. sample size −1.3 −1.3977 −1.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● E(log(Marginal Predictive Likelihood)) ● ● ● ● −1.4265 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1.6 ● ● ● ● ● ● ● ● ● −1.7 ● ● ● ● ● ● ● Para model ● ● ● Non−P model −1.8 ● ● ● ● −1.9 ● ● −2.0 1 20 40 55 70 85 100 125 150 175 Sample size

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Bayes factors and predictive distributions • The small model accumulates an enormous lead when the sample size is small. When the large model starts to have better predictive performance at larger sample sizes, the Bayes factor is slow to reflect the change in predictive performances! • The Bayes factor follows from Bayes Theorem. It is, of course, exactly right, provided the inputs are right • right likelihood • right prior • right loss

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Bayes factors – do they work? • The Bayes factor works well for the subjective Bayesian • The within-model prior distributions are meaningful • The calculation follows from Bayes theorem • Estimation, testing, and all other inference fit as part of a comprehensive analysis • The Bayes factor breaks down for rule-based priors • “Objective” priors (noninformative priors) • High-dimensional settings (too much to specify) • Infinite dimensional models (nonparametric Bayes) • Many, many variants on the Bayes factor • Most change the prior specifically for model comparison/choice • One class of modified Bayes factors stands out • within-model priors specified by some rule • partial update is performed • then the Bayes factor calculation commences

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Bayes factors and partial updates • Several different partial update methods (e.g., Lempers (1970), Geisser and Eddy (1979, PRESS), Berger and Pericchi (1995, 1996, IBF), O’Hagan (1995, FBF)) • Training data / test data split • Rotation through different splits to stabiliize calculation • Prior before updating is generally “noninformative” • Minimal training sets have been advocated • Questions ! • Why a minimal update? • What if there is no noninformative prior? • Do the methods work in more complex settings? • In search of a principled solution: The question is not Bayesian model selection, but Bayesian model selection which is consistent with the rest of the analysis.

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Outline Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Toward a solution - the calibrated Bayes factor • Subjective Bayesian analyses work well in high-information settings, much less well in low-information settings (try elicitation for yourself) • We begin with the situation where all (Bayesians) agree on the solution and use this to drive the technique • We propose that one start the Bayes factor calculation after the partial posterior resembles a high-info subjective prior • Elements of the problem • Measure the information content of the partial posterior • Benchmark prior to describe adequate prior information • Criterion for whether partial posterior matches benchmark • We recommend calibration of the Bayes factor in any low-information or rule-based prior setting • In these settings, elicited priors are unstable (Psychology)

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up Measurement of prior information • Measure the proximity of f θ 1 and f θ 2 through the the Symmetric - Kullback-Leibler (SKL) divergence � � SKL ( f θ 1 , f θ 2 ) = 1 E θ 1 log f θ 1 + E θ 2 log f θ 2 . 2 f θ 2 f θ 1 • SKL driven by likelihood, appropriate for Bayesians • The distribution on ( θ 1 ,θ 2 ) induces a distribution on SKL ( f θ 1 , f θ 2 ) • This works for infinite-dimensional models too, unlike alternatives such as Fisher information • Criterion: Evaluate the information contained in π using the percentiles of the distribution of SKL divergence

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up A benchmark prior • To calibrate the Bayes factor and select a training sample size, we choose a benchmark prior and then require the updated priors to contain at least as much information as this benchmark prior. • In order to perform a reasonable analysis where subjective input has little impact on the final conclusion, we set the benchmark to be a “minimally informative" prior – the unit information prior (Kass and Wasserman 1995), which contains the amount of information in a single observation • Under the Gaussian model Y ∼ N ( θ,σ 2 ) , a unit information prior on θ is N ( µ,σ 2 ) , inducing a χ 2 1 distribution on SKL ( f θ 1 , f θ 2 ) .

The Calibrated Bayes Factor for Model Comparison Steve MacEachern - PowerPoint PPT Presentation

Bayes factors and prior distributions The calibrated Bayes factor OFHS analysis Wrap-up The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University Joint work with Xinyi Xu, Pingbo Lu and Ruoxi Xu Supported

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics in the Era of Big Data Rod

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Calibrated Bayes: an attractive framework for official statistics in the 21st century Roderick

Triadic Factor Analysis Cynthia Glodeanu Institute of Algebra, TU Dresden October 19, 2010.

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Bayes factors: A re-volution in psychology Geoff Patching Department of Psychology

The necessity and formulation of a robust (imprecise) Bayes Factor Patrick Schwaferts

On-demand radio imaging On-demand radio imaging access to calibrated data for all astronomers

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking

PLURISUBHARMONICITY and PSEUDOCONVEXITY IN CALIBRATED (and other) GEOMETRIES with REESE HARVEY

Data Center TCP (DCTCP) 1 TCP in the Data Center Well see TCP does not meet demands of

Rules of Thumb and Attention Elasticities: Evidence from Over- and Under-Reaction to Taxes

Rate of Return (IRR) in a Leveraged Buyout? Got Mental Math? Quick Approximations for IRR Can

Keys to Creating Thumb Keys to Creating Thumb - Stopping Content Stopping Content Sean Ellenby

The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 Boosting General

IV and IV-GMM Christopher F Baum EC 823: Applied Econometrics Boston College, Spring 2013

CompSci 514: Computer Networks Lecture 14 Datacenter Transport protocols II Xiaowei Yang

My default postgresql.conf file, step by step Ilya Kosmodemiansky ik@dataegret.com Before

Sambuz

Useful Links

Newsletter

Mail Us