Regression Applied Bayesian Statistics Dr. Earvin Balderama - PowerPoint PPT Presentation

Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 24, 2017 Regression 1 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC Bayesian linear regression Suppose the data is ind µ, σ 2 � � Y i ∼ Normal . If we want to model Y i in terms of covariate information, what we are really saying is that µ varies across the i = 1 , . . . , n observations (instead of remaining constant). The (multiple) linear regression model is ind µ i , σ 2 � � Y i ∼ Normal µ i = β 0 + X i 1 β 1 + · · · + X ip β p Bayesian and classical linear regression are similar if n ≫ p and the priors are uninformative. However, the results can be different for challenging problems and the interpretation is always different. Regression 2 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC Ordinary Least Squares The least squares estimate of β = ( β 0 , β 1 , . . . , β p ) T is n ˆ � ( Y i − µ i ) 2 , β OLS = argmin β i = 1 where µ i = β 0 + X i 1 β 1 + · · · + X ip β p . If the errors are Gaussian then the likelihood is n � − ( Y i − µ i ) 2 � � � n i = 1 ( Y i − µ i ) 2 � � f ( Y i | β , σ 2 ) ∝ exp = exp − 2 σ 2 2 σ 2 i = 1 Therefore, if the errors are Gaussian, ˆ β OLS is also the MLE. Note: ˆ β OLS is unbiased even if the errors are non-Gaussian. Regression 3 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC Ordinary Least Squares In algebra notation, Let Y = ( Y 0 , Y 1 , . . . , Y n ) T be the response vector and X be the n × ( p + 1 ) matrix of covariates. Then the mean of Y is X β and the least squares solution is ˆ ( Y − X β ) T ( Y − X β ) = ( X T X ) − 1 X T Y β OLS = argmin β If the errors are Gaussian then the sampling distribution is � � − 1 � β , σ 2 � ˆ X T X β OLS ∼ Normal . Note: If σ 2 is estimated, then the sampling distribution is multivariate t . Regression 4 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC Bayesian linear regression The likelihood is ind β 0 + X i 1 β 1 + · · · + X ip β p , σ 2 � � Y i ∼ Normal We will need to set priors for β 0 , β 1 , . . . , β p , σ 2 . Note: For the purpose of setting priors and better interpretability of coefficient estimates later on, it is helpful to standardize both the response and each covariate to have mean 0 and variance 1. Many priors for β have been considered: Improper priors 1 Gaussian priors 2 Double-exponential priors 3 ... 4 Regression 5 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC Improper priors The flat prior is f ( β ) = 1. (This is also the Jeffrey’s prior). Note: This is improper, but the posterior is proper under the same conditions required by least squares. With σ 2 known, the posterior is � � − 1 � β OLS , σ 2 � ˆ X T X β | Y ∼ Normal Therefore, shouldn’t the results be similar to least squares? How do they differ? Regression 6 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC Improper priors Because we rarely know σ , we set a prior for the error variance σ 2 , typically with InverseGamma ( a , b ) , with a and b set to something small, say a = b = 0 . 01. The posterior for β then follows a multivariate t centered at ˆ β OLS . The Jeffreys prior is f ( β, σ 2 ) = 1 σ 2 , which is the limit as a , b → 0. Regression 7 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC Multivariate normal prior Another common prior is Zellner’s g-prior 0 , σ 2 � � − 1 � � X T X β ∼ Normal g Note: This prior is proper assuming X is full rank. The posterior mean is 1 ˆ β OLS 1 + g Note: g controlled the amount of shrinkage. g = 1 n is common, and called the unit information prior . Regression 8 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC Univariate Gaussian priors If there are many covariates, or if the covariates are collinear, then ˆ β OLS is unstable. Independent priors can counteract collinearity: 0 , σ 2 � � ind β j ∼ Normal g The posterior mode is p n ( Y i − µ i ) 2 + g � � β 2 argmin j β i = 1 j = 1 Note: In classical statistics, this is known as the ridge regression solution and is used to stabilize the least squares solution. Regression 9 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC BLASSO An increasingly popular prior is the double-exponential or Bayesian LASSO prior . The prior is β j ∼ DE ( τ ) with PDF � −| β | � f ( β ) ∝ exp τ Note: Basically, the squared term in the Gaussian prior is replaced with an absolute value. The shape of the PDF is more peaked at 0. This favors settings where there are many β j near zero and a few large β j ; that is, p is large but most of the covariates are noise. Regression 10 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC BLASSO 1.0 Gaussian BLASSO 0.8 0.6 Prior 0.4 0.2 0.0 − 3 − 2 − 1 0 1 2 3 β Regression 11 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC BLASSO The posterior mode is p n ( Y i − µ i ) 2 + g � � | β j | argmin β i = 1 j = 1 Note: In classical statistics, this is known as the LASSO solution and is used to add stability by shrinking estimates towards 0, and also setting some coefficients to 0. Covariates with coefficients set to 0 can be removed from analysis. Therefore, LASSO performs variable selection and estimation simultaneously! Regression 12 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC Logistic regression In logistic regression , we have a binary response Y i ∈ { 0 , 1 } , and logit [ P ( Y i = 1 )] = β 0 + β 1 X i 1 + · · · + β p X ip P ( Y i = 1 ) = expit ( β 0 + β 1 X i 1 + · · · + β p X ip ) ∈ [ 0 , 1 ] � x � The logit link is the log-odds: logit ( x ) = log 1 − x e x The expit transformation is the inverse: expit ( x ) = 1 + e x The β j represents the change in the log odds of Y i = 1 corresponding to a one-unit increase in covariate j . All of the priors discussed apply. Computationally the full conditional distributions are no longer conjugate and so we must use Metropolis sampling. Regression 13 Last edited October 24, 2017 by <ebalderama@luc.edu>

MCMC Bayesian regression packages in R The dlaplace function in the rmutil gives the density of the double-exponential (Laplace) distribution. Of course, there is also rlaplace , plaplace , etc. The BLR function in the BLR package is probably the most common for Bayesian linear regression. It also works well for BLASSO, and is super fast. The MCMClogit function in the MCMCpack package performs Metropolis sampling efficiently for logistic regression. The MCMCpack package also includes many other functions for several other regression methods, e.g., MCMCpoisson and MCMCprobit for Poisson and probit regression, respectively. Another option is to code your own MCMC sampler yourself in R. Regression 14 Last edited October 24, 2017 by <ebalderama@luc.edu>

Regression Applied Bayesian Statistics Dr. Earvin Balderama - PowerPoint PPT Presentation

Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 24, 2017 Regression 1 Last edited October 24, 2017 by <ebalderama@luc.edu> MCMC Bayesian linear

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS

Lecture 9: Regularized/penalized regression Felix Held, Mathematical Sciences MSA220/MVE440

Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin

$TITLE: M5-1.GMS, ordinary least squares using NLP $ONTEXT there are I observations on two

Introduction to regression Supervised Learning with scikit-learn Boston housing data In [1]:

Slide Set 4 CLRM estimation Pietro Coretto pcoretto@unisa.it Econometrics Master in Economics

Pattern recognition in nuclear fusion data by means of geometric methods in probabilistic spaces

Ma c hine L e a rning with MAT L AB - - Re g re ssion Stanley Liang, PhD York University Re

Regression Applied Bayesian Statistics Dr. Earvin Balderama - PowerPoint PPT Presentation

Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 24, 2017 Regression 1 Last edited October 24, 2017 by <ebalderama@luc.edu> MCMC Bayesian linear

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS

Lecture 9: Regularized/penalized regression Felix Held, Mathematical Sciences MSA220/MVE440

Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin

$TITLE: M5-1.GMS, ordinary least squares using NLP $ONTEXT there are I observations on two

Introduction to regression Supervised Learning with scikit-learn Boston housing data In [1]:

Slide Set 4 CLRM estimation Pietro Coretto pcoretto@unisa.it Econometrics Master in Economics

Pattern recognition in nuclear fusion data by means of geometric methods in probabilistic spaces

Ma c hine L e a rning with MAT L AB - - Re g re ssion Stanley Liang, PhD York University Re

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and