Gaussian Processes Seung-Hoon Na Chonbuk National University - - PowerPoint PPT Presentation

gaussian processes
SMART_READER_LITE
LIVE PREVIEW

Gaussian Processes Seung-Hoon Na Chonbuk National University - - PowerPoint PPT Presentation

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression Predictions using noisy observations The case of a single test input: Gaussian Process Regression Computational and numerical issues It is


slide-1
SLIDE 1

Gaussian Processes

Seung-Hoon Na Chonbuk National University

slide-2
SLIDE 2

Gaussian Process Regression

  • Predictions using noisy observations

– The case of a single test input:

slide-3
SLIDE 3

Gaussian Process Regression

  • Computational and numerical issues

– It is unwise to directly invert – Instead, we use a Cholesky decomposition

Marginal probability

slide-4
SLIDE 4

Gaussian Process Regression

  • 𝑔 𝑦∗ = 𝑙∗∗ − 𝒍∗

𝑼𝑳𝒛 −𝟐𝒍∗

  • 𝑑ℎ𝑝𝑚𝑓𝑡𝑙𝑧 𝑳𝑧

= 𝑴𝑴𝑈

  • 𝒍∗

𝑈𝑳𝑧 −1𝒍∗ = 𝑴−1𝒍∗ 𝑈 𝑴−1𝒍∗

  • 𝒘 = 𝑴−1𝒍∗=𝑴 \ 𝒍∗
slide-5
SLIDE 5

Cholesky Decomposition

  • The Cholesky decomposition(CD)

– The CD of a symmetric, positive definite matrix A decomposes A into a product of a lower triangular matrix L and its transpose

  • Solving linear system using CD:

– To solve – We have two steps:

  • Computing the determinant of a matrix
slide-6
SLIDE 6

Gaussian Process Classification

  • The main difficulty is that the Gaussian prior is

not conjugate to the bernoulli/ multinoulli likelihood  several approximations are available

– Gaussian approximation – Expectation propagation (Kuss and Rasmussen 2005; Nickisch and Rasmussen 2008) – Variational inference (Girolami and Rogers 2006; Opper and Archambeau 2009) – MCMC (Neal 1997; Christensen et al. 2006)

slide-7
SLIDE 7

Gaussian Process Classification binary classification

– Logistic regression: – Probit regression: – 𝑔: GP regression

slide-8
SLIDE 8

Gaussian Process Classification

  • Define the log of the unnormalized posterior
slide-9
SLIDE 9

Gaussian Process Classification

  • Formula for
slide-10
SLIDE 10

Gaussian Process Classification

  • Use IRLS to find the MAP estimate
  • At convergence, the Gaussian approximation of

the posterior:

slide-11
SLIDE 11

Gaussian Process Classification

  • Computing the posterior predictive
  • The predictive mean:
slide-12
SLIDE 12

Gaussian Process Classification

  • The predictive variance:

– Use the law of total variance

https://www.macroeconomics.tu- berlin.de/fileadmin/fg124/financial_crises/exercise/Variances.pdf

slide-13
SLIDE 13

Gaussian Process Classification

  • The predictive variance:

Matrix inversion lemma

slide-14
SLIDE 14

Matrix inversion lemma

  • Consider a general partitioned matrix

where we assume 𝑭 and 𝑰 are invertiable

slide-15
SLIDE 15

Gaussian Process Classification

  • Convert to a predictive distribution for binary

responses

  • This can be approximated using

– Monte Carlo approximation – Probit approximation – …

slide-16
SLIDE 16

Gaussian Process Classification

  • Marginal likelihood

– Used to optimize the kernel parameters – Applying the Laplace approximation, we have:

  • Computing the derivatives

– Now, since ෠ 𝒈, 𝑿, as well as K, depend on 𝜾

  • More complex than in the regression case
slide-17
SLIDE 17

Gaussian Process Classification

  • Laplace approximation to the marginal

likelihood.

slide-18
SLIDE 18

Gaussian Process Classification

  • Numerically stable computation

– To avoid inverting K or W, introduce using B:

  • B: has eigenvalues bounded below by 1 and can be safely

inverted

– Applying the matrix inversion lemma, we have: – The IRLS update, now:

slide-19
SLIDE 19

Gaussian Process Classification

  • Numerically stable computation

– At convergence, we have: – The log-marginal likelihood is: where we exploited:

slide-20
SLIDE 20

Gaussian Process Classification

  • Numerically stable computation

– Compute the predictive distr. – Here, at the mode, – Thus, the predictive mean: – Also, we use: – Thus, the predictive variance:

slide-21
SLIDE 21

Gaussian Process Classification

slide-22
SLIDE 22

Gaussian Process Classification

– Manual parameters, short length scale.

The posterior predictive probability for the red circle class generated by a GP with an SE kernel. Thick black line is the decision boundary if we threshold at a probability of 0.5.

slide-23
SLIDE 23

Gaussian Process Classification

  • Learned parameters, long length scale
slide-24
SLIDE 24

Gaussian Process Classification: Multi-class classification

– Again, we will use a Gaussian approximation to the posterior

  • 1) Use IRLS to compute the mode
  • 2) Apply the Gaussian approximation at the mode
slide-25
SLIDE 25

Gaussian Process Classification: Multi-class classification

– The unnormalized log posterior: – 𝒛: a dummy encoding of 𝑧𝑗’s with the same layout as 𝒈 – 𝑳: a block diagonal matrix containing 𝑳𝑑

slide-26
SLIDE 26

Gaussian Process Classification: Multi-class classification

– Use IRLS to compute the mode

slide-27
SLIDE 27

Gaussian Process Classification: Multi-class classification

  • The posterior predictive:
slide-28
SLIDE 28

Gaussian Process Classification: Multi-class classification

– The covariance of the latent response

  • Computer the posterior predictive for the

visible response:

slide-29
SLIDE 29

Gaussian Process Classification: Multi-class classification

  • Computing the marginal likelihood

– similar to the binary case