Bayesian Linear Regression Seung-Hoon Na Chonbuk National - - PowerPoint PPT Presentation

β–Ά
bayesian linear regression
SMART_READER_LITE
LIVE PREVIEW

Bayesian Linear Regression Seung-Hoon Na Chonbuk National - - PowerPoint PPT Presentation

Bayesian Linear Regression Seung-Hoon Na Chonbuk National University Bayesian Linear Regression Compute the full posterior over and 2 Case 1) the noise variance 2 is known Use Gaussian prior Case 2) the noise


slide-1
SLIDE 1

Bayesian Linear Regression

Seung-Hoon Na Chonbuk National University

slide-2
SLIDE 2

Bayesian Linear Regression

  • Compute the full posterior over 𝒙 and 𝜏2
  • Case 1) the noise variance 𝜏2 is known

– Use Gaussian prior

  • Case 2) the noise variance 𝜏2 is unknown

– Use normal inverse gamma (NIG) prior

slide-3
SLIDE 3

Posterior: 𝜏2 is known

  • The likelihood
  • The conjugate prior:
  • ffset

putting an improper prior on 𝜈 Further assume that the output is centered:

slide-4
SLIDE 4

Posterior: 𝜏2 is known

  • The posterior:

– If and then the posterior mean reduces to the ridge estimate with

𝒙𝑂 = 𝜏2 𝜐2 𝑱 + 𝒀𝑼𝒀

βˆ’1

π’€π‘ˆπ’›

slide-5
SLIDE 5

Posterior: 𝜏2 is known

  • 1D example

– the true parameters: π‘₯0 = βˆ’0.3, π‘₯1 = 0.5

  • Sequential Bayesian inference
  • Posterior given the first n data points
slide-6
SLIDE 6

π‘₯0 = βˆ’0.3, π‘₯1 = 0.5 π‘œ = 0 π‘œ = 1 π‘œ = 2 π‘œ = 20

slide-7
SLIDE 7

Posterior Predictive: 𝜏2 is known

  • The posterior predictive distribution at a test

point x: Gaussian

  • The plug-in approximation: constant error bar
slide-8
SLIDE 8

Posterior Predictive: 𝜏2 is known

slide-9
SLIDE 9

Posterior Predictive: 𝜏2 is known

10 samples from the plugin approximation to posterior predictive. 10 samples from the posterior predictive

slide-10
SLIDE 10

Bayesian linear regression: 𝜏2 is unknown

  • The likelihood:
  • The natural conjugate prior:
slide-11
SLIDE 11

Inverse Wishart Distribution

  • Similarly,

If D = 1, the Wishart reduces to the Gamma distribution

slide-12
SLIDE 12

Inverse Wishart Distribution

If D = 1, this reduces to the inverse Gamma

slide-13
SLIDE 13

Bayesian linear regression: 𝜏2 is unknown

  • The posterior:
  • The posterior marginals
slide-14
SLIDE 14

Bayesian linear regression: 𝜏2 is unknown

  • The posterior predictive: Student T distribution
  • Given new test inputs
slide-15
SLIDE 15

Bayesian linear regression: 𝜏2 is unknown – Uninformative prior

  • It is common to set 𝑏0 = 𝑐0 = 0,

corresponding to an uninformative prior for 𝜏2, and to set

  • The unit information prior:
slide-16
SLIDE 16

Bayesian linear regression: 𝜏2 is unknown – Uninformative prior

  • An uninformative prior: use the uninformative

limit of the conjugate g-prior, which corresponds to setting 𝑕 = ∞

slide-17
SLIDE 17

Bayesian linear regression: 𝜏2 is unknown – Uninformative prior

  • The marginal distribution of the weights:
slide-18
SLIDE 18

Bayesian linear regression: Evidence procedure

  • Evidence procedure

– an empirical Bayes procedure for picking the hyper- parameters – Choose to maximize the marginal likelihood, where πœ‡ = 1/𝜏2 is the precision of the

  • bservation noise and 𝛽 is the precision of the prior

– Provides an alternative to using cross validation

slide-19
SLIDE 19

Bayesian linear regression: Evidence procedure