gaussian processes
play

Gaussian Processes Seung-Hoon Na Chonbuk National University - PowerPoint PPT Presentation

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression Predictions using noisy observations The case of a single test input: Gaussian Process Regression Computational and numerical issues It is


  1. Gaussian Processes Seung-Hoon Na Chonbuk National University

  2. Gaussian Process Regression • Predictions using noisy observations – The case of a single test input:

  3. Gaussian Process Regression • Computational and numerical issues – It is unwise to directly invert – Instead, we use a Cholesky decomposition Marginal probability

  4. Gaussian Process Regression 𝑼 𝑳 𝒛 −𝟐 𝒍 ∗ • 𝑔 𝑦 ∗ = 𝑙 ∗∗ − 𝒍 ∗ = 𝑴𝑴 𝑈 • 𝑑ℎ𝑝𝑚𝑓𝑡𝑙𝑧 𝑳 𝑧 −1 𝒍 ∗ = 𝑴 −1 𝒍 ∗ 𝑈 𝑴 −1 𝒍 ∗ 𝑈 𝑳 𝑧 • 𝒍 ∗ • 𝒘 = 𝑴 −1 𝒍 ∗ = 𝑴 \ 𝒍 ∗

  5. Cholesky Decomposition • The Cholesky decomposition(CD) – The CD of a symmetric, positive definite matrix A decomposes A into a product of a lower triangular matrix L and its transpose • Solving linear system using CD: – To solve – We have two steps: • Computing the determinant of a matrix

  6. Gaussian Process Classification • The main difficulty is that the Gaussian prior is not conjugate to the bernoulli/ multinoulli likelihood  several approximations are available – Gaussian approximation – Expectation propagation (Kuss and Rasmussen 2005; Nickisch and Rasmussen 2008) – Variational inference (Girolami and Rogers 2006; Opper and Archambeau 2009) – MCMC (Neal 1997; Christensen et al. 2006)

  7. Gaussian Process Classification binary classification – Logistic regression: – Probit regression: – 𝑔 : GP regression

  8. Gaussian Process Classification • Define the log of the unnormalized posterior

  9. Gaussian Process Classification • Formula for

  10. Gaussian Process Classification • Use IRLS to find the MAP estimate • At convergence, the Gaussian approximation of the posterior:

  11. Gaussian Process Classification • Computing the posterior predictive • The predictive mean :

  12. Gaussian Process Classification • The predictive variance : – Use the law of total variance https://www.macroeconomics.tu- berlin.de/fileadmin/fg124/financial_crises/exercise/Variances.pdf

  13. Gaussian Process Classification • The predictive variance : Matrix inversion lemma

  14. Matrix inversion lemma • Consider a general partitioned matrix where we assume 𝑭 and 𝑰 are invertiable

  15. Gaussian Process Classification • Convert to a predictive distribution for binary responses • This can be approximated using – Monte Carlo approximation – Probit approximation – …

  16. Gaussian Process Classification • Marginal likelihood – Used to optimize the kernel parameters – Applying the Laplace approximation, we have: • Computing the derivatives – Now, since ෠ 𝒈 , 𝑿 , as well as K , depend on 𝜾 • More complex than in the regression case

  17. Gaussian Process Classification • Laplace approximation to the marginal likelihood.

  18. Gaussian Process Classification • Numerically stable computation – To avoid inverting K or W , introduce using B: • B : has eigenvalues bounded below by 1 and can be safely inverted – Applying the matrix inversion lemma, we have: – The IRLS update, now:

  19. Gaussian Process Classification • Numerically stable computation – At convergence, we have: – The log-marginal likelihood is: where we exploited:

  20. Gaussian Process Classification • Numerically stable computation – Compute the predictive distr. – Here, at the mode, – Thus, the predictive mean : – Also, we use: – Thus, the predictive varianc e:

  21. Gaussian Process Classification

  22. Gaussian Process Classification The posterior predictive probability for the red circle class generated by a GP with an SE kernel. Thick black line is the decision boundary if we threshold at a probability of 0.5. – Manual parameters, short length scale.

  23. Gaussian Process Classification • Learned parameters, long length scale

  24. Gaussian Process Classification: Multi-class classification – Again, we will use a Gaussian approximation to the posterior • 1) Use IRLS to compute the mode • 2) Apply the Gaussian approximation at the mode

  25. Gaussian Process Classification: Multi-class classification – The unnormalized log posterior: – 𝒛 : a dummy encoding of 𝑧 𝑗 ’s with the same layout as 𝒈 – 𝑳 : a block diagonal matrix containing 𝑳 𝑑

  26. Gaussian Process Classification: Multi-class classification – Use IRLS to compute the mode

  27. Gaussian Process Classification: Multi-class classification • The posterior predictive:

  28. Gaussian Process Classification: Multi-class classification – The covariance of the latent response • Computer the posterior predictive for the visible response:

  29. Gaussian Process Classification: Multi-class classification • Computing the marginal likelihood – similar to the binary case

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend