Prediction and Model Comparison Applied Bayesian Statistics Dr. - PowerPoint PPT Presentation

Prediction and Model Comparison Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 31, 2017 Prediction and Model Comparison 1 Last edited October 25, 2017 by <ebalderama@luc.edu>

MCMC (Bayesian) Modeling is all about MCMCMC � Prediction and Model Comparison 2 Last edited October 25, 2017 by <ebalderama@luc.edu>

MCMC � MCMCMC Steps in (Bayesian) modeling: Model Creation (Choice; Computation) 1 Model Checking (Criticism; Diagnostics) 2 Model Comparison (Choice; Selection; Change) 3 Prediction and Model Comparison 3 Last edited October 25, 2017 by <ebalderama@luc.edu>

MCMC � MCMCMC Steps in (Bayesian) modeling: Model Creation (Choice; Computation) 1 Model Checking (Criticism; Diagnostics) 2 Model Comparison (Choice; Selection; Change) 3 Repeat! 4 Prediction and Model Comparison 3 Last edited October 25, 2017 by <ebalderama@luc.edu>

MCMC What we’re focusing on today Recall conditional distributions: f ( A | B ) = f ( A , B ) f ( B ) joint conditional = marginal This time, we’ll give some attention to the marginal distribution. f ( θ | y ) = f ( y | θ ) f ( θ ) f ( y | θ ) f ( θ ) � = f ( y ) f ( y | θ ) f ( θ ) d θ Prediction and Model Comparison 4 Last edited October 25, 2017 by <ebalderama@luc.edu>

MCMC Outline Some Popular Bayesian fit statistics 1 Connection to classical statistics Predictive Distributions 2 Prior predictive distribution Posterior predictive distribution Posterior predictive checks Predictive Performance 3 Precision Accuracy Extreme Values Prediction and Model Comparison 5 Last edited October 25, 2017 by <ebalderama@luc.edu>

MCMC Modeling Classical methods... Standardized Pearson residuals 1 p -values 2 Likelihood ratio 3 MLE 4 ...also apply in a Bayesian analysis: Posterior mean of the standardized residuals. 1 Posterior probabilities 2 Bayes factor 3 Posterior mean 4 Prediction and Model Comparison 6 Last edited October 25, 2017 by <ebalderama@luc.edu>

Popular Model Fit Statistics Bayes Factor For determining which model fits the data “better”, the Bayes factor is commonly used in a hypothesis test. Given data y and two competing models, M 1 and M 2 , with parameter vectors θ 1 and θ 2 , respectively, the Bayes factor is a measure of how much the data favors Model 1 over Model 2: � f ( y | θ 1 ) f ( θ 1 ) d θ 1 BF ( y ) = f 1 ( y ) � f 2 ( y ) = f ( y | θ 2 ) f ( θ 2 ) d θ 2 Note: The Bayes factor is an odds ratio: the ratio of the posterior and prior odds of favoring Model 1 over Model 2: Prediction and Model Comparison 7 Last edited October 25, 2017 by <ebalderama@luc.edu>

Popular Model Fit Statistics Bayes Factor The good: More robust than frequentist hypothesis testing. Often used for testing a “full model” vs. “reduced model” like in classical statistics. One model doesn’t need to be nested within the other model. The bad: Difficult to compute, although easy to approximate with software. Only defined for proper marginal density functions. Computation is conditional that one of the models is true. Because of this, Gelman thinks Bayes factors are irrelevant. Prefers looking at distance measures between data and model. Many distance measures to choose from! One of which is... Prediction and Model Comparison 8 Last edited October 25, 2017 by <ebalderama@luc.edu>

Popular Model Fit Statistics DIC Like many good measures of model fit and comparison, the Deviance Information Criterion (DIC) includes how well the model fits the data ( goodness of fit ) and 1 the complexity of the model ( effective number of parameters ). 2 The Deviance Information Criterion (DIC) is given by DIC = ¯ D + p D where ¯ D = E ( D ) is the “mean deviance” 1 p D = ¯ D − D (¯ θ ) is the “mean deviance − deviance at means” 2 Prediction and Model Comparison 9 Last edited October 25, 2017 by <ebalderama@luc.edu>

Popular Model Fit Statistics Deviance Define the deviance as D = − 2 log f ( y | θ ) Example: Poisson likelihood � µ y i � � i e − µ i D = − 2 log y i ! i � � � = − 2 − µ i + y i log µ i − log ( y i ! ) i Prediction and Model Comparison 10 Last edited October 25, 2017 by <ebalderama@luc.edu>

Popular Model Fit Statistics DIC DIC can then be rewritten as DIC = ¯ D + p D = p D + D (¯ ( since ¯ D = p D + D (¯ θ ) + p D θ )) = D (¯ θ ) + 2 p D = − 2 log f ( y | ¯ θ ) + 2 p D which is a generalization of AIC = − 2 log f ( y | ˆ θ MLE ) + 2 k DIC can be used to compare different models as well as different methods. Preferred models have low DIC values. Prediction and Model Comparison 11 Last edited October 25, 2017 by <ebalderama@luc.edu>

Popular Model Fit Statistics DIC Requires joint posterior distribution to be approximately multivariate normal. Doesn’t work well with highly non-linear models mixture models with discrete parameters models with missing data If p D is negative log-likelihood may be non-concave prior may be misspecified posterior mean may not be a good estimator Prediction and Model Comparison 12 Last edited October 25, 2017 by <ebalderama@luc.edu>

Predictive Distributions Predictions Maybe a better (best?) way to decide between competing models is to rank them based on how “well” each model does in predicting future observations. Prediction and Model Comparison 13 Last edited October 25, 2017 by <ebalderama@luc.edu>

Predictive Distributions The plug-in approach to prediction Example Consider the regression model � β 0 + X i 1 β 1 + · · · + X ip β p , σ 2 � ind Y i ∼ Normal Suppose we have a new covariate vector X new and we would like to predict the corresponding response Y new . The “plug-in” approach would be to fix β and σ at their posterior means ˆ β and σ to make predictions: ˆ � σ 2 � Y new | ˆ X new ˆ β , ˆ σ ∼ Normal β , ˆ . Prediction and Model Comparison 14 Last edited October 25, 2017 by <ebalderama@luc.edu>

Predictive Distributions The plug-in approach to prediction However, this plug-in approach suppresses uncertainty about the parameters, β and σ . Therefore, the prediction intervals will be too narrow, leading to undercoverage . We need to account for all uncertainty when making predictions, including our uncertainty about β and σ . Prediction and Model Comparison 15 Last edited October 25, 2017 by <ebalderama@luc.edu>

Predictive Distributions Predictive distributions In Bayesian analyses, predictive distributions are used for comparing models in terms of how “well” each model does in predicting future observations. The idea is, we want to explore the predictive distributions of the unknown observations , which accounts for the uncertainty in predicting those observations. Having distributions for the unknown future observations comes naturally in a Bayesian analysis because of uncertainty distributions for the unknown model parameters. First, a question: Before any data is observed, what could we use for predictions? Prediction and Model Comparison 16 Last edited October 25, 2017 by <ebalderama@luc.edu>

Predictive Distributions Prior Predictive Distribution Before any data is observed, what could we use for predictions? We have a likelihood function, but to account for all uncertainty when making predictions, we marginalize over the model parameters. The marginal likelihood is what one would expect data to look like after averaging over the prior distribution of θ , so it is also called the prior predictive distribution : � f ( y ) = f ( y | θ ) f ( θ ) d θ Prediction and Model Comparison 17 Last edited October 25, 2017 by <ebalderama@luc.edu>

Predictive Distributions Posterior Predictive Distribution More interestingly, what if a set of data y have already been observed? How can we make predictions for future (or new or unobserved) observations y new ? We can select from the marginal posterior likelihood of y new , called the posterior predictive distribution (PPD) : � f ( y new | y ) = f ( y new | θ ) f ( θ | y ) d θ This distribution is what one would expect y new to look like after observing y and averaging over the posterior distribution of θ given y . The concept of the PPD applies generally (e.g., logistic regression). Prediction and Model Comparison 18 Last edited October 25, 2017 by <ebalderama@luc.edu>

Predictive Distributions Posterior Predictive Distribution Equivalently, y new can be considered missing values and treated as additional parameters to be estimated in a Bayesian framework. (More on missing values later) Example For a certain complete dataset, we may want to randomly assign NA values to some number m observations which creates a test set y mis = { y 1 , y 2 , . . . , y m } . After MCMC, the m posterior predictive distributions, P 1 , P 2 , . . . , P m , can be used to determine measures of overall model goodness-of-fit, as well as predictive performance measures of each y i in the test set. Prediction and Model Comparison 19 Last edited October 25, 2017 by <ebalderama@luc.edu>

Prediction and Model Comparison Applied Bayesian Statistics Dr. - PowerPoint PPT Presentation

Prediction and Model Comparison Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 31, 2017 Prediction and Model Comparison 1 Last edited October 25, 2017 by

k -Step Ahead Prediction Error Model 1. k -Step Ahead Prediction Error Model 1. ARMAX model is

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

Comparison of Costello Geomagnetic Activity Index Model and JHU/APL Models for Kp Prediction

Markov Model Prediction of Markov Model Prediction of I/O Requests for Scientific I/O Requests

Comparison Based Merging Upper and Lower bounds EMADS Fall 2003: Comparison Based Merging Page 1

(seasonal) prediction systems Arun Kumar Climate Prediction Center College Park, Maryland, USA

Summary of part I: prediction and RL Prediction is important for action selection The

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 20 Probabilistic Prediction Also

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

Image and Video Coding: Intra Prediction & Picture Partitioning Intra-Picture Prediction

4 th Quarter 2013 Analyst/Investor Briefing 26 Feb 2014 3.00pm Presented by: TH PLANTATIONS

Administration Homework 2 due on Monday Scribes needed CS 611 Winskel2, Gunter

Models, Over-approximations and Robustness Eugenio Moggi DIBRIS, Genova Univ. Rennes, 2020-05-14

Integra(ngProcessandData Management:Finally? Marlon Dumas

Compilation/linking revisited Memory and C/C++ modules From Reading #6 source object file 1

Topic 7 1. Defining and using pointers 2. Arrays and pointers 3. C and C++ strings 4.

Unix Processes Prof. Tevfik Kosar Presented by Sonali Batra University at Buffalo September

PROCEED project FINAL WORKSHOP Gothenburg, 25 th February 2020 RISE Research Institutes of

Prediction and Model Comparison Applied Bayesian Statistics Dr. - PowerPoint PPT Presentation

Prediction and Model Comparison Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 31, 2017 Prediction and Model Comparison 1 Last edited October 25, 2017 by

k -Step Ahead Prediction Error Model 1. k -Step Ahead Prediction Error Model 1. ARMAX model is

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

Comparison of Costello Geomagnetic Activity Index Model and JHU/APL Models for Kp Prediction

Markov Model Prediction of Markov Model Prediction of I/O Requests for Scientific I/O Requests

Comparison Based Merging Upper and Lower bounds EMADS Fall 2003: Comparison Based Merging Page 1

(seasonal) prediction systems Arun Kumar Climate Prediction Center College Park, Maryland, USA

Summary of part I: prediction and RL Prediction is important for action selection The

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 20 Probabilistic Prediction Also

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

Image and Video Coding: Intra Prediction &amp; Picture Partitioning Intra-Picture Prediction

4 th Quarter 2013 Analyst/Investor Briefing 26 Feb 2014 3.00pm Presented by: TH PLANTATIONS

Administration Homework 2 due on Monday Scribes needed CS 611 Winskel2, Gunter

Models, Over-approximations and Robustness Eugenio Moggi DIBRIS, Genova Univ. Rennes, 2020-05-14

Integra(ngProcessandData Management:Finally? Marlon Dumas

Compilation/linking revisited Memory and C/C++ modules From Reading #6 source object file 1

Topic 7 1. Defining and using pointers 2. Arrays and pointers 3. C and C++ strings 4.

Unix Processes Prof. Tevfik Kosar Presented by Sonali Batra University at Buffalo September

PROCEED project FINAL WORKSHOP Gothenburg, 25 th February 2020 RISE Research Institutes of

Image and Video Coding: Intra Prediction & Picture Partitioning Intra-Picture Prediction