bayesian linear regression
play

Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State University April 23, 2019 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 1 / 24 Outline Linear regression Classical regression Default Bayesian


  1. Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State University April 23, 2019 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 1 / 24

  2. Outline Linear regression Classical regression Default Bayesian regression Conjugate subjective Bayesian regression Simulating from the posterior Inference on functions of parameters Posterior for optimum of a quadratic Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 2 / 24

  3. Linear regression Linear Regression Basic idea understand the relationship between response y and explanatory variables x = ( x 1 , . . . , x k ) based on data from experimental units index by i . If we assume linearity, independence, normality, and constant variance, then we have ind ∼ N ( β 1 x i 1 + · · · + β k x ik , σ 2 ) y i where x i 1 = 1 if we want to include an intercept. In matrix notation, we have y ∼ N ( Xβ, σ 2 I) where y = ( y 1 , . . . , y n ) ′ , β = ( β 1 , . . . , β k ) ′ , and X is an n × k full-rank matrix with each row being x i = ( x i 1 , . . . , x ik ) . Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 3 / 24

  4. Linear regression Classical regression Classical regression How do you find confidence intervals for β ? What is the MLE for β ? β = ˆ ˆ β MLE = ( X ′ X ) − 1 X ′ y What is the sampling distribution for ˆ β ? β ∼ t n − k ( β, s 2 ( X ′ X ) − 1 ) ˆ where s 2 = SSE/ [ n − k ] and SSE = ( Y − X ˆ β ) ′ ( Y − X ˆ β ) . What is the sampling distribution for s 2 ? [ n − k ] s 2 ∼ χ 2 n − k σ 2 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 4 / 24

  5. Linear regression Default Bayesian inference Default Bayesian regression Assume the standard noninformative prior p ( β, σ 2 ) ∝ 1 /σ 2 then the posterior is p ( β, σ 2 | y ) = p ( β | σ 2 , y ) p ( σ 2 | y ) ∼ N (ˆ β | σ 2 , y β, σ 2 V β ) σ 2 | y ∼ Inv- χ 2 ( n − k, s 2 ) ∼ t n − k (ˆ β, s 2 V β ) β | y ˆ = ( X ′ X ) − 1 X ′ y β = ( X ′ X ) − 1 V β n − k ( y − X ˆ β ) ′ ( y − X ˆ s 2 1 = β ) The posterior is proper if n > k and rank ( X ) = k . Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 5 / 24

  6. Linear regression Default Bayesian inference Comparison to classical regression In classical regression, we have fixed, but unknown, true parameters β 0 and σ 2 0 and quantify our uncertainty about these parameters using the sampling distribution of the error variance and regression coefficients, i.e. [ n − k ] s 2 ∼ χ 2 n − k σ 2 0 and ˆ β ∼ t n − k ( β 0 , s 2 ( X ′ X ) − 1 ) . In the default Bayesian regression, we still have the fixed, but unknown, true parameters, but quantify our uncertainty about these parameters using prior and posterior distributions, i.e. s 2 [ n − k ] � � � y ∼ χ 2 ( n − k ) � σ 2 and β | y ∼ t n − k (ˆ β, s 2 ( X ′ X ) − 1 ) . Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 6 / 24

  7. Linear regression Cricket chirps Cricket chirps As an example, consider the relationship between the number of cricket chirps (in 15 seconds) and temperature (in Fahrenheit). From example in LearnBayes::blinreg . 20 18 chirps 16 14 70 75 80 85 90 temp Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 7 / 24

  8. Linear regression Cricket chirps Default Bayesian regression summary(m <- lm(chirps~temp)) Call: lm(formula = chirps ~ temp) Residuals: Min 1Q Median 3Q Max -1.74107 -0.58123 0.02956 0.58250 1.50608 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.61521 3.14434 -0.196 0.847903 temp 0.21568 0.03919 5.504 0.000102 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9849 on 13 degrees of freedom Multiple R-squared: 0.6997,Adjusted R-squared: 0.6766 F-statistic: 30.29 on 1 and 13 DF, p-value: 0.0001015 confint(m) # Credible intervals 2.5 % 97.5 % (Intercept) -7.4081577 6.1777286 temp 0.1310169 0.3003406 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 8 / 24

  9. Linear regression Cricket chirps Default Bayesian regression - Full posteriors beta1 beta2 Histogram of sigma 10 0.12 400 0.10 8 300 0.08 6 Frequency f(x) f(x) 0.06 200 4 0.04 100 2 0.02 0.00 0 0 −10 −5 0 5 0.10 0.20 0.30 0.5 1.5 2.5 x x sigma Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 9 / 24

  10. Linear regression Subjective Bayesian inference Fully conjugate subjective Bayesian inference If we assume the following normal-gamma prior, β | σ 2 ∼ N ( m 0 , σ 2 C 0 ) σ 2 ∼ Inv- χ 2 ( v 0 , s 2 0 ) then the posterior is β | σ 2 , y ∼ N ( m n , σ 2 C n ) σ 2 | y ∼ Inv- χ 2 ( v n , s 2 n ) with = m 0 + C 0 X ′ ( XC 0 X ′ + I) − 1 ( y − Xm 0 ) m n = C 0 − C 0 X ′ ( XC 0 X ′ + I) − 1 XC 0 C n v n = v 0 + n 0 + ( y − Xm 0 ) ′ ( XC 0 X ′ + I) − 1 ( y − Xm 0 ) v n s 2 = v 0 s 2 n Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 10 / 24

  11. Linear regression Subjective Bayesian inference Information about chirps per 15 seconds Let Y i is the average number of chirps per 15 seconds and X i is the temperature in Fahrenheit. And we assume ind ∼ N ( β 0 + β 1 X i , σ 2 ) Y i then β 0 is the expected number of chirps at 0 degrees Fahrenheit β 1 is the expected increase in number of chirps (per 15 seconds) for each degree increase in Fahrenheit. Based on prior experience the prior β 1 ∼ N (0 , 1) might be reasonable. Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 11 / 24

  12. Linear regression Subjective Bayesian inference Subjective Bayesian regression m = arm::bayesglm(chirps~temp, # Default prior for \ beta_0 is N(0,Inf) prior.mean=0, # E[ \ beta_1] prior.scale=1, # V[ \ beta_1] prior.df=Inf) # normal prior summary(m) Call: arm::bayesglm(formula = chirps ~ temp, prior.mean = 0, prior.scale = 1, prior.df = Inf) Deviance Residuals: Min 1Q Median 3Q Max -1.73940 -0.57939 0.03139 0.58435 1.50809 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.61478 3.14415 -0.196 0.847999 temp 0.21565 0.03919 5.503 0.000102 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be 0.9700575) Null deviance: 41.993 on 14 degrees of freedom Residual deviance: 12.611 on 13 degrees of freedom AIC: 45.966 Number of Fisher Scoring iterations: 11 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 12 / 24

  13. Linear regression Subjective Bayesian inference Subjective vs Default # Subjective analysis m$coefficients (Intercept) temp -0.6147847 0.2156511 confint(m) 2.5 % 97.5 % (Intercept) -6.7780731 5.5476365 temp 0.1388701 0.2924879 # compared to default analysis tmp = lm(chirps~temp) tmp$coefficients (Intercept) temp -0.6152146 0.2156787 confint(tmp) 2.5 % 97.5 % (Intercept) -7.4081577 6.1777286 temp 0.1310169 0.3003406 Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 13 / 24

  14. Linear regression Subjective Bayesian inference Subjective vs Default 20 18 chirps 16 14 70 75 80 85 90 temp Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 14 / 24

  15. Linear regression Subjective Bayesian inference Shrinkage (as V [ β 1 ] gets smaller) beta0 beta1 12 0.20 8 0.15 estimate 4 0.10 0 1e−02 1e−01 1e+00 1e+01 1e+02 1e−02 1e−01 1e+00 1e+01 1e+02 V Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 15 / 24

  16. Linear regression Subjective Bayesian inference Shrinkage (as V [ β 1 ] gets smaller) 20 18 V 1e+02 1e+01 chirps 1e+00 1e−01 1e−02 16 14 70 75 80 85 90 temp Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 16 / 24

  17. Linear regression Simulation Simulating from the posterior Although the full posterior for β and σ 2 is available, the decomposition p ( β, σ 2 | y ) = p ( β | σ 2 , y ) p ( σ 2 | y ) suggests an approach to simulating from the posterior via 1. ( σ 2 ) ( j ) ∼ Inv- χ 2 ( n − k, s 2 ) and 2. β ( j ) ∼ N (ˆ β, ( σ 2 ) ( j ) V β ) . This also provides an approach to obtaining posteriors for any function γ = f ( β, σ 2 ) of the parameters via p ( γ | β, σ 2 , y ) p ( β | σ 2 , y ) p ( σ 2 | y ) dβdσ 2 � � p ( γ | y ) = p ( γ | β, σ 2 ) p ( β | σ 2 , y ) p ( σ 2 | y ) dβdσ 2 � � = I( γ = f ( β, σ 2 )) p ( β | σ 2 , y ) p ( σ 2 | y ) dβdσ 2 � � = by adding the step 3. γ ( j ) = f ( β ( j ) , ( σ 2 ) ( j ) ) . Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 17 / 24

  18. Linear regression Posterior for global maximum Posterior for global maximum Consider this potato yield data set 50 40 yield 30 20 0 50 100 150 200 250 N.rate with a goal of estimating the optimal nitrogen rate. Jarad Niemi (STAT544@ISU) Bayesian linear regression April 23, 2019 18 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend