Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State University April 23, 2019 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 1 / 24

Outline Linear regression Classical regression Default Bayesian regression Conjugate subjective Bayesian regression Simulating from the posterior Inference on functions of parameters Posterior for optimum of a quadratic Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 2 / 24

Linear regression Linear Regression Basic idea understand the relationship between response y and explanatory variables x = ( x 1 , . . . , x k ) based on data from experimental units index by i . If we assume linearity, independence, normality, and constant variance, then we have ind ∼ N ( β 1 x i 1 + · · · + β k x ik , σ 2 ) y i where x i 1 = 1 if we want to include an intercept. In matrix notation, we have y ∼ N ( Xβ, σ 2 I) where y = ( y 1 , . . . , y n ) ′ , β = ( β 1 , . . . , β k ) ′ , and X is an n × k full-rank matrix with each row being x i = ( x i 1 , . . . , x ik ) . Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 3 / 24

Linear regression Classical regression Classical regression How do you find confidence intervals for β ? What is the MLE for β ? β = ˆ ˆ β MLE = ( X ′ X ) − 1 X ′ y What is the sampling distribution for ˆ β ? β ∼ t n − k ( β, s 2 ( X ′ X ) − 1 ) ˆ where s 2 = SSE/ [ n − k ] and SSE = ( Y − X ˆ β ) ′ ( Y − X ˆ β ) . What is the sampling distribution for s 2 ? [ n − k ] s 2 ∼ χ 2 n − k σ 2 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 4 / 24

Linear regression Default Bayesian inference Default Bayesian regression Assume the standard noninformative prior p ( β, σ 2 ) ∝ 1 /σ 2 then the posterior is p ( β, σ 2 | y ) = p ( β | σ 2 , y ) p ( σ 2 | y ) ∼ N (ˆ β | σ 2 , y β, σ 2 V β ) σ 2 | y ∼ Inv- χ 2 ( n − k, s 2 ) ∼ t n − k (ˆ β, s 2 V β ) β | y ˆ = ( X ′ X ) − 1 X ′ y β = ( X ′ X ) − 1 V β n − k ( y − X ˆ β ) ′ ( y − X ˆ s 2 1 = β ) The posterior is proper if n > k and rank ( X ) = k . Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 5 / 24

Linear regression Default Bayesian inference Comparison to classical regression In classical regression, we have fixed, but unknown, true parameters β 0 and σ 2 0 and quantify our uncertainty about these parameters using the sampling distribution of the error variance and regression coefficients, i.e. [ n − k ] s 2 ∼ χ 2 n − k σ 2 0 and ˆ β ∼ t n − k ( β 0 , s 2 ( X ′ X ) − 1 ) . In the default Bayesian regression, we still have the fixed, but unknown, true parameters, but quantify our uncertainty about these parameters using prior and posterior distributions, i.e. s 2 [ n − k ] � � � y ∼ χ 2 ( n − k ) � σ 2 and β | y ∼ t n − k (ˆ β, s 2 ( X ′ X ) − 1 ) . Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 6 / 24

Linear regression Cricket chirps Cricket chirps As an example, consider the relationship between the number of cricket chirps (in 15 seconds) and temperature (in Fahrenheit). From example in LearnBayes::blinreg . 20 18 chirps 16 14 70 75 80 85 90 temp Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 7 / 24

Linear regression Cricket chirps Default Bayesian regression summary(m <- lm(chirps~temp)) Call: lm(formula = chirps ~ temp) Residuals: Min 1Q Median 3Q Max -1.74107 -0.58123 0.02956 0.58250 1.50608 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.61521 3.14434 -0.196 0.847903 temp 0.21568 0.03919 5.504 0.000102 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9849 on 13 degrees of freedom Multiple R-squared: 0.6997,Adjusted R-squared: 0.6766 F-statistic: 30.29 on 1 and 13 DF, p-value: 0.0001015 confint(m) # Credible intervals 2.5 % 97.5 % (Intercept) -7.4081577 6.1777286 temp 0.1310169 0.3003406 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 8 / 24

Linear regression Cricket chirps Default Bayesian regression - Full posteriors beta1 beta2 Histogram of sigma 10 0.12 400 0.10 8 300 0.08 6 Frequency f(x) f(x) 0.06 200 4 0.04 100 2 0.02 0.00 0 0 −10 −5 0 5 0.10 0.20 0.30 0.5 1.5 2.5 x x sigma Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 9 / 24

Linear regression Subjective Bayesian inference Fully conjugate subjective Bayesian inference If we assume the following normal-gamma prior, β | σ 2 ∼ N ( m 0 , σ 2 C 0 ) σ 2 ∼ Inv- χ 2 ( v 0 , s 2 0 ) then the posterior is β | σ 2 , y ∼ N ( m n , σ 2 C n ) σ 2 | y ∼ Inv- χ 2 ( v n , s 2 n ) with = m 0 + C 0 X ′ ( XC 0 X ′ + I) − 1 ( y − Xm 0 ) m n = C 0 − C 0 X ′ ( XC 0 X ′ + I) − 1 XC 0 C n v n = v 0 + n 0 + ( y − Xm 0 ) ′ ( XC 0 X ′ + I) − 1 ( y − Xm 0 ) v n s 2 = v 0 s 2 n Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 10 / 24

Linear regression Subjective Bayesian inference Information about chirps per 15 seconds Let Y i is the average number of chirps per 15 seconds and X i is the temperature in Fahrenheit. And we assume ind ∼ N ( β 0 + β 1 X i , σ 2 ) Y i then β 0 is the expected number of chirps at 0 degrees Fahrenheit β 1 is the expected increase in number of chirps (per 15 seconds) for each degree increase in Fahrenheit. Based on prior experience the prior β 1 ∼ N (0 , 1) might be reasonable. Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 11 / 24

Linear regression Subjective Bayesian inference Subjective Bayesian regression m = arm::bayesglm(chirps~temp, # Default prior for \ beta_0 is N(0,Inf) prior.mean=0, # E[ \ beta_1] prior.scale=1, # V[ \ beta_1] prior.df=Inf) # normal prior summary(m) Call: arm::bayesglm(formula = chirps ~ temp, prior.mean = 0, prior.scale = 1, prior.df = Inf) Deviance Residuals: Min 1Q Median 3Q Max -1.73940 -0.57939 0.03139 0.58435 1.50809 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.61478 3.14415 -0.196 0.847999 temp 0.21565 0.03919 5.503 0.000102 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be 0.9700575) Null deviance: 41.993 on 14 degrees of freedom Residual deviance: 12.611 on 13 degrees of freedom AIC: 45.966 Number of Fisher Scoring iterations: 11 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 12 / 24

Linear regression Subjective Bayesian inference Subjective vs Default # Subjective analysis m$coefficients (Intercept) temp -0.6147847 0.2156511 confint(m) 2.5 % 97.5 % (Intercept) -6.7780731 5.5476365 temp 0.1388701 0.2924879 # compared to default analysis tmp = lm(chirps~temp) tmp$coefficients (Intercept) temp -0.6152146 0.2156787 confint(tmp) 2.5 % 97.5 % (Intercept) -7.4081577 6.1777286 temp 0.1310169 0.3003406 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 13 / 24

Linear regression Subjective Bayesian inference Subjective vs Default 20 18 chirps 16 14 70 75 80 85 90 temp Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 14 / 24

Linear regression Subjective Bayesian inference Shrinkage (as V [ β 1 ] gets smaller) beta0 beta1 12 0.20 8 0.15 estimate 4 0.10 0 1e−02 1e−01 1e+00 1e+01 1e+02 1e−02 1e−01 1e+00 1e+01 1e+02 V Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 15 / 24

Linear regression Subjective Bayesian inference Shrinkage (as V [ β 1 ] gets smaller) 20 18 V 1e+02 1e+01 chirps 1e+00 1e−01 1e−02 16 14 70 75 80 85 90 temp Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 16 / 24

Linear regression Simulation Simulating from the posterior Although the full posterior for β and σ 2 is available, the decomposition p ( β, σ 2 | y ) = p ( β | σ 2 , y ) p ( σ 2 | y ) suggests an approach to simulating from the posterior via 1. ( σ 2 ) ( j ) ∼ Inv- χ 2 ( n − k, s 2 ) and 2. β ( j ) ∼ N (ˆ β, ( σ 2 ) ( j ) V β ) . This also provides an approach to obtaining posteriors for any function γ = f ( β, σ 2 ) of the parameters via p ( γ | β, σ 2 , y ) p ( β | σ 2 , y ) p ( σ 2 | y ) dβdσ 2 � � p ( γ | y ) = p ( γ | β, σ 2 ) p ( β | σ 2 , y ) p ( σ 2 | y ) dβdσ 2 � � = I( γ = f ( β, σ 2 )) p ( β | σ 2 , y ) p ( σ 2 | y ) dβdσ 2 � � = by adding the step 3. γ ( j ) = f ( β ( j ) , ( σ 2 ) ( j ) ) . Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 17 / 24

Linear regression Posterior for global maximum Posterior for global maximum Consider this potato yield data set 50 40 yield 30 20 0 50 100 150 200 250 N.rate with a goal of estimating the optimal nitrogen rate. Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 18 / 24

Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State University April 23, 2019 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 1 / 24 Outline Linear regression Classical regression Default Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Bayesian linear regression (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University April 20,

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Bayesian Linear Regression Seung-Hoon Na Chonbuk National University Bayesian Linear Regression

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Announcements Piazza started Matlab Grader homework, email Friday, 2 (of 9) homeworks Due 21

Projected Stein variational Newton: A fast and scalable Bayesian inference method in high

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 modified by Julia

Bayes meets Dijkstra Exact Inference by Program Verification Joost-Pieter Katoen Dagstuhl

Trieste, 14 Mai 2015 J. Jasche, Bayesian LSS Inference What do we want to do? homogeneous vs.

Ebba: An Embedded DSL for Bayesian Inference Linkping University, 17 June 2014 Henrik Nilsson

P = P (accidents happen in period t ) = 1 e A P ( B ) t A P ( B ) t , if