modern gaussian processes scalable inference and novel
play

Modern Gaussian Processes: Scalable Inference and Novel Applications - PowerPoint PPT Presentation

Modern Gaussian Processes: Scalable Inference and Novel Applications (Part II-b) Approximate Inference Edwin V. Bonilla and Maurizio Filippone CSIROs Data61, Sydney, Australia and EURECOM, Sophia Antipolis, France July 14 th , 2019 1


  1. Modern Gaussian Processes: Scalable Inference and Novel Applications (Part II-b) Approximate Inference Edwin V. Bonilla and Maurizio Filippone CSIRO’s Data61, Sydney, Australia and EURECOM, Sophia Antipolis, France July 14 th , 2019 1

  2. Challenges in Bayesian Reasoning with Gaussian Process Priors p ( f ) : prior over geology and rock properties p ( y | f ) : observation model’s likelihood $20 Million geothermal well Geol. surveys and explorations 2

  3. Challenges in Bayesian Reasoning with Gaussian Process Priors p ( f ) : prior over geology and rock properties p ( y | f ) : observation model’s likelihood p ( f | y ) : posterior geological model: p ( f | θ ) p ( y | f ) $20 Million geothermal well p ( f | y , θ ) = � p ( f | θ ) p ( y | f ) d f � �� � hard bit Geol. surveys and explorations 2

  4. Challenges in Bayesian Reasoning with Gaussian Process Priors p ( f ) : prior over geology and rock properties p ( y | f ) : observation model’s likelihood p ( f | y ) : posterior geological model: p ( f | θ ) p ( y | f ) $20 Million geothermal well p ( f | y , θ ) = � p ( f | θ ) p ( y | f ) d f � �� � hard bit Challenges: ◮ Non-linear likelihood models ◮ Large datasets Geol. surveys and explorations 2

  5. Automated Probabilistic Reasoning • Approximate inference Deterministic Stochastic Goal: Build generic yet practical VI inference tools for Computational E ffi ciency practitioners and researchers MCMC Automation 3

  6. Automated Probabilistic Reasoning • Approximate inference Deterministic Stochastic Goal: Build generic yet practical VI inference tools for Computational E ffi ciency practitioners and researchers MCMC Automation 3

  7. Automated Probabilistic Reasoning • Approximate inference Deterministic Stochastic Goal: Build generic yet practical VI inference tools for Computational E ffi ciency practitioners and researchers • Other dimensions: MCMC ◮ Accuracy ◮ Convergence Automation 3

  8. Outline 1 Latent Gaussian Process Models (LGPMs) 2 Variational Inference 3 Scalability through Inducing Variables and Stochastic Variational Inference (SVI) 4

  9. Latent Gaussian Process Models (LGPMs)

  10. Latent Gaussian Process Models (LGPMs) Supervised learning D = { x n , y n } N n =1 • Factorised GP priors over Q latent functions: f j ( x ) ∼ GP (0 , κ j ( x , x ′ ; θ )) Q � p ( F | X , θ ) = N ( F · j ; 0 , K j ) j =1 5

  11. Latent Gaussian Process Models (LGPMs) Supervised learning D = { x n , y n } N n =1 • Factorised GP priors over Q latent functions: f j ( x ) ∼ GP (0 , κ j ( x , x ′ ; θ )) Q � p ( F | X , θ ) = N ( F · j ; 0 , K j ) j =1 • Factorised likelihood over observations N � p ( Y | X , F , φ ) = p ( Y n · | F n · , φ ) n =1 5

  12. Latent Gaussian Process Models (LGPMs) Supervised learning D = { x n , y n } N n =1 • Factorised GP priors over Q latent functions: f j ( x ) ∼ GP (0 , κ j ( x , x ′ ; θ )) Q � p ( F | X , θ ) = N ( F · j ; 0 , K j ) j =1 • Factorised likelihood over observations N � p ( Y | X , F , φ ) = p ( Y n · | F n · , φ ) n =1 What can we model within this framework? 5

  13. Examples of LGPMs (1) • Multi-output regression • Multi-class classification ◮ P = Q classes ◮ softmax likelihood 6

  14. Examples of LGPMs (2) • Inversion problems 7

  15. Examples of LGPMs (3) • Log Gaussian Cox processes (LGCPs) 8

  16. Inference in LGPMs We only require access to ‘black-box’ likelihoods. How can we carry out inference in these general models? 9

  17. Variational Inference

  18. Variational Inference (VI): Optimise Rather than Integrate Recall our posterior estimation problem: 1 p ( F | Y ) p ( Y | F ) = p ( F ) p ( Y ) � �� � ���� � �� � posterior ���� prior conditional likelihood marginal likelihood 10

  19. Variational Inference (VI): Optimise Rather than Integrate Recall our posterior estimation problem: 1 p ( F | Y ) p ( Y | F ) = p ( F ) p ( Y ) � �� � ���� � �� � posterior ���� prior conditional likelihood marginal likelihood � • Estimating p ( Y ) = p ( F ) p ( Y | F ) d F is hard 10

  20. Variational Inference (VI): Optimise Rather than Integrate Recall our posterior estimation problem: 1 p ( F | Y ) p ( Y | F ) = p ( F ) p ( Y ) � �� � ���� � �� � posterior ���� prior conditional likelihood marginal likelihood � • Estimating p ( Y ) = p ( F ) p ( Y | F ) d F is hard • Instead, approximate q ( F | λ ) ≈ p ( F | Y ) to minimize: = E q ( F | λ ) log q ( F | λ ) def kl [ q ( F | λ ) � p ( F | Y )] p ( F | Y ) 10

  21. Variational Inference (VI): Optimise Rather than Integrate Recall our posterior estimation problem: 1 p ( F | Y ) p ( Y | F ) = p ( F ) p ( Y ) � �� � ���� � �� � posterior ���� prior conditional likelihood marginal likelihood � • Estimating p ( Y ) = p ( F ) p ( Y | F ) d F is hard • Instead, approximate q ( F | λ ) ≈ p ( F | Y ) to minimize: = E q ( F | λ ) log q ( F | λ ) def kl [ q ( F | λ ) � p ( F | Y )] p ( F | Y ) Properties : kl [ q � p ] ≥ 0 , kl [ q � p ] = 0 iff q = p . 10

  22. Decomposition of the Marginal Likelihood log p ( Y ) = kl [ q ( F | λ ) � p ( F | Y )] + L elbo ( λ ) KL[q ∥ p] log p ( Y ) ℒ ELBO ( λ ) Fig reproduced from Bishop (2006) • L elbo ( λ ) is a lower bound on the log marginal likelihood • The optimum is achieved when q = p • Maximizing L elbo ( λ ) ≡ minimizing kl [ q ( F | λ ) � p ( F | Y )] 11

  23. Variational Inference Strategy • The evidence lower bound L elbo ( λ ) can be written as: def E q ( F | λ ) log p ( Y | F ) L elbo ( λ ) − kl [ q ( F | λ ) � p ( F )] = � �� � � �� � KL(approx. posterior � prior) expected log likelihood (ELL) • ELL is a model-fit term and KL is a penalty term 12

  24. Variational Inference Strategy • The evidence lower bound L elbo ( λ ) can be written as: def E q ( F | λ ) log p ( Y | F ) L elbo ( λ ) = − kl [ q ( F | λ ) � p ( F )] � �� � � �� � KL(approx. posterior � prior) expected log likelihood (ELL) • ELL is a model-fit term and KL is a penalty term 1 • What family of distributions? 0.8 ◮ As flexible as possible 0.6 ◮ Tractability is the main 0.4 constraint 0.2 ◮ No risk of over-fitting 0 −2 −1 0 1 2 3 4 Fig from Bishop (2006) 12

  25. Variational Inference Strategy • The evidence lower bound L elbo ( λ ) can be written as: def E q ( F | λ ) log p ( Y | F ) L elbo ( λ ) = − kl [ q ( F | λ ) � p ( F )] � �� � � �� � KL(approx. posterior � prior) expected log likelihood (ELL) • ELL is a model-fit term and KL is a penalty term 1 • What family of distributions? 0.8 ◮ As flexible as possible 0.6 ◮ Tractability is the main 0.4 constraint 0.2 ◮ No risk of over-fitting 0 −2 −1 0 1 2 3 4 Fig from Bishop (2006) We want to maximise L elbo ( λ ) wrt variational parameters λ 12

  26. Automated VI for LGPMs (Nguyen and Bonilla, NeurIPS, 2014) Goal : Approximate intractable posterior p ( F | Y ) with variational distribution K K Q � � � q ( F | λ ) = π k q k ( F | λ ) = N ( F k ; m kj , S kj ) π k k =1 k =1 j =1 with variational parameters λ = { m kj , S kj } , 13

  27. Automated VI for LGPMs (Nguyen and Bonilla, NeurIPS, 2014) Goal : Approximate intractable posterior p ( F | Y ) with variational distribution K K Q � � � q ( F | λ ) = π k q k ( F | λ ) = N ( F k ; m kj , S kj ) π k k =1 k =1 j =1 with variational parameters λ = { m kj , S kj } , Recall L elbo ( λ ) = - KL + ELL: • KL term can be bounded using Jensen’s inequality ◮ Exact gradients of parameters 13

  28. Automated VI for LGPMs (Nguyen and Bonilla, NeurIPS, 2014) Goal : Approximate intractable posterior p ( F | Y ) with variational distribution K K Q � � � q ( F | λ ) = π k q k ( F | λ ) = N ( F k ; m kj , S kj ) π k k =1 k =1 j =1 with variational parameters λ = { m kj , S kj } , Recall L elbo ( λ ) = - KL + ELL: • KL term can be bounded using Jensen’s inequality ◮ Exact gradients of parameters ELL and its gradients can be estimated efficiently 13

  29. Expected Log Likelihood Term Th.1: Efficient estimation The ELL and its gradients can be estimated using expectations over univariate Gaussian distributions. def = q k ( n ) ( F · n | λ k ( n ) ) q k ( n ) N � E q k log p ( Y | F ) = E q k ( n ) log p ( Y n · | F n · ) n =1 ∇ λ k ( n ) E q k ( n ) log p ( Y n · | F n · ) = E q k ( n ) ∇ λ k ( n ) log q k ( n ) ( F · n | λ k ( n ) ) log p ( Y n · | F 14

  30. Expected Log Likelihood Term Th.1: Efficient estimation The ELL and its gradients can be estimated using expectations over univariate Gaussian distributions. def = q k ( n ) ( F · n | λ k ( n ) ) q k ( n ) N � E q k log p ( Y | F ) = E q k ( n ) log p ( Y n · | F n · ) n =1 ∇ λ k ( n ) E q k ( n ) log p ( Y n · | F n · ) = E q k ( n ) ∇ λ k ( n ) log q k ( n ) ( F · n | λ k ( n ) ) log p ( Y n · | F Practical consequences • Can use unbiased Monte Carlo estimates • Gradients of the likelihood are not required (only likelihood evaluations) • Holds ∀ Q ≥ 1 14

  31. Scalability through Inducing Variables and Stochastic Variational Inference (SVI)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend