Modern Gaussian Processes: Scalable Inference and Novel Applications - PowerPoint PPT Presentation

Modern Gaussian Processes: Scalable Inference and Novel Applications (Part II-b) Approximate Inference Edwin V. Bonilla and Maurizio Filippone CSIRO’s Data61, Sydney, Australia and EURECOM, Sophia Antipolis, France July 14 th , 2019 1

Challenges in Bayesian Reasoning with Gaussian Process Priors p ( f ) : prior over geology and rock properties p ( y | f ) : observation model’s likelihood $20 Million geothermal well Geol. surveys and explorations 2

Challenges in Bayesian Reasoning with Gaussian Process Priors p ( f ) : prior over geology and rock properties p ( y | f ) : observation model’s likelihood p ( f | y ) : posterior geological model: p ( f | θ ) p ( y | f ) $20 Million geothermal well p ( f | y , θ ) = � p ( f | θ ) p ( y | f ) d f � �� hard bit Challenges: ◮ Non-linear likelihood models ◮ Large datasets Geol. surveys and explorations 2

Automated Probabilistic Reasoning • Approximate inference Deterministic Stochastic Goal: Build generic yet practical VI inference tools for Computational E ffi ciency practitioners and researchers MCMC Automation 3

Automated Probabilistic Reasoning • Approximate inference Deterministic Stochastic Goal: Build generic yet practical VI inference tools for Computational E ffi ciency practitioners and researchers • Other dimensions: MCMC ◮ Accuracy ◮ Convergence Automation 3

Outline 1 Latent Gaussian Process Models (LGPMs) 2 Variational Inference 3 Scalability through Inducing Variables and Stochastic Variational Inference (SVI) 4

Latent Gaussian Process Models (LGPMs)

Latent Gaussian Process Models (LGPMs) Supervised learning D = { x n , y n } N n =1 • Factorised GP priors over Q latent functions: f j ( x ) ∼ GP (0 , κ j ( x , x ′ ; θ )) Q � p ( F | X , θ ) = N ( F · j ; 0 , K j ) j =1 5

Latent Gaussian Process Models (LGPMs) Supervised learning D = { x n , y n } N n =1 • Factorised GP priors over Q latent functions: f j ( x ) ∼ GP (0 , κ j ( x , x ′ ; θ )) Q � p ( F | X , θ ) = N ( F · j ; 0 , K j ) j =1 • Factorised likelihood over observations N � p ( Y | X , F , φ ) = p ( Y n · | F n · , φ ) n =1 5

Latent Gaussian Process Models (LGPMs) Supervised learning D = { x n , y n } N n =1 • Factorised GP priors over Q latent functions: f j ( x ) ∼ GP (0 , κ j ( x , x ′ ; θ )) Q � p ( F | X , θ ) = N ( F · j ; 0 , K j ) j =1 • Factorised likelihood over observations N � p ( Y | X , F , φ ) = p ( Y n · | F n · , φ ) n =1 What can we model within this framework? 5

Examples of LGPMs (1) • Multi-output regression • Multi-class classification ◮ P = Q classes ◮ softmax likelihood 6

Examples of LGPMs (2) • Inversion problems 7

Examples of LGPMs (3) • Log Gaussian Cox processes (LGCPs) 8

Inference in LGPMs We only require access to ‘black-box’ likelihoods. How can we carry out inference in these general models? 9

Variational Inference

Variational Inference (VI): Optimise Rather than Integrate Recall our posterior estimation problem: 1 p ( F | Y ) p ( Y | F ) = p ( F ) p ( Y ) � �� posterior �� prior conditional likelihood marginal likelihood 10

Variational Inference (VI): Optimise Rather than Integrate Recall our posterior estimation problem: 1 p ( F | Y ) p ( Y | F ) = p ( F ) p ( Y ) � �� posterior �� prior conditional likelihood marginal likelihood � • Estimating p ( Y ) = p ( F ) p ( Y | F ) d F is hard 10

Variational Inference (VI): Optimise Rather than Integrate Recall our posterior estimation problem: 1 p ( F | Y ) p ( Y | F ) = p ( F ) p ( Y ) � �� posterior �� prior conditional likelihood marginal likelihood � • Estimating p ( Y ) = p ( F ) p ( Y | F ) d F is hard • Instead, approximate q ( F | λ ) ≈ p ( F | Y ) to minimize: = E q ( F | λ ) log q ( F | λ ) def kl [ q ( F | λ ) � p ( F | Y )] p ( F | Y ) 10

Decomposition of the Marginal Likelihood log p ( Y ) = kl [ q ( F | λ ) � p ( F | Y )] + L elbo ( λ ) KL[q ∥ p] log p ( Y ) ℒ ELBO ( λ ) Fig reproduced from Bishop (2006) • L elbo ( λ ) is a lower bound on the log marginal likelihood • The optimum is achieved when q = p • Maximizing L elbo ( λ ) ≡ minimizing kl [ q ( F | λ ) � p ( F | Y )] 11

Variational Inference Strategy • The evidence lower bound L elbo ( λ ) can be written as: def E q ( F | λ ) log p ( Y | F ) L elbo ( λ ) − kl [ q ( F | λ ) � p ( F )] = � �� KL(approx. posterior � prior) expected log likelihood (ELL) • ELL is a model-fit term and KL is a penalty term 12

Variational Inference Strategy • The evidence lower bound L elbo ( λ ) can be written as: def E q ( F | λ ) log p ( Y | F ) L elbo ( λ ) = − kl [ q ( F | λ ) � p ( F )] � �� KL(approx. posterior � prior) expected log likelihood (ELL) • ELL is a model-fit term and KL is a penalty term 1 • What family of distributions? 0.8 ◮ As flexible as possible 0.6 ◮ Tractability is the main 0.4 constraint 0.2 ◮ No risk of over-fitting 0 −2 −1 0 1 2 3 4 Fig from Bishop (2006) 12

Variational Inference Strategy • The evidence lower bound L elbo ( λ ) can be written as: def E q ( F | λ ) log p ( Y | F ) L elbo ( λ ) = − kl [ q ( F | λ ) � p ( F )] � �� KL(approx. posterior � prior) expected log likelihood (ELL) • ELL is a model-fit term and KL is a penalty term 1 • What family of distributions? 0.8 ◮ As flexible as possible 0.6 ◮ Tractability is the main 0.4 constraint 0.2 ◮ No risk of over-fitting 0 −2 −1 0 1 2 3 4 Fig from Bishop (2006) We want to maximise L elbo ( λ ) wrt variational parameters λ 12

Automated VI for LGPMs (Nguyen and Bonilla, NeurIPS, 2014) Goal : Approximate intractable posterior p ( F | Y ) with variational distribution K K Q � � � q ( F | λ ) = π k q k ( F | λ ) = N ( F k ; m kj , S kj ) π k k =1 k =1 j =1 with variational parameters λ = { m kj , S kj } , 13

Automated VI for LGPMs (Nguyen and Bonilla, NeurIPS, 2014) Goal : Approximate intractable posterior p ( F | Y ) with variational distribution K K Q � � � q ( F | λ ) = π k q k ( F | λ ) = N ( F k ; m kj , S kj ) π k k =1 k =1 j =1 with variational parameters λ = { m kj , S kj } , Recall L elbo ( λ ) = - KL + ELL: • KL term can be bounded using Jensen’s inequality ◮ Exact gradients of parameters 13

Automated VI for LGPMs (Nguyen and Bonilla, NeurIPS, 2014) Goal : Approximate intractable posterior p ( F | Y ) with variational distribution K K Q � � � q ( F | λ ) = π k q k ( F | λ ) = N ( F k ; m kj , S kj ) π k k =1 k =1 j =1 with variational parameters λ = { m kj , S kj } , Recall L elbo ( λ ) = - KL + ELL: • KL term can be bounded using Jensen’s inequality ◮ Exact gradients of parameters ELL and its gradients can be estimated efficiently 13

Expected Log Likelihood Term Th.1: Efficient estimation The ELL and its gradients can be estimated using expectations over univariate Gaussian distributions. def = q k ( n ) ( F · n | λ k ( n ) ) q k ( n ) N � E q k log p ( Y | F ) = E q k ( n ) log p ( Y n · | F n · ) n =1 ∇ λ k ( n ) E q k ( n ) log p ( Y n · | F n · ) = E q k ( n ) ∇ λ k ( n ) log q k ( n ) ( F · n | λ k ( n ) ) log p ( Y n · | F 14

Expected Log Likelihood Term Th.1: Efficient estimation The ELL and its gradients can be estimated using expectations over univariate Gaussian distributions. def = q k ( n ) ( F · n | λ k ( n ) ) q k ( n ) N � E q k log p ( Y | F ) = E q k ( n ) log p ( Y n · | F n · ) n =1 ∇ λ k ( n ) E q k ( n ) log p ( Y n · | F n · ) = E q k ( n ) ∇ λ k ( n ) log q k ( n ) ( F · n | λ k ( n ) ) log p ( Y n · | F Practical consequences • Can use unbiased Monte Carlo estimates • Gradients of the likelihood are not required (only likelihood evaluations) • Holds ∀ Q ≥ 1 14

Scalability through Inducing Variables and Stochastic Variational Inference (SVI)

Modern Gaussian Processes: Scalable Inference and Novel Applications - PowerPoint PPT Presentation

Modern Gaussian Processes: Scalable Inference and Novel Applications (Part II-b) Approximate Inference Edwin V. Bonilla and Maurizio Filippone CSIROs Data61, Sydney, Australia and EURECOM, Sophia Antipolis, France July 14 th , 2019 1

Modern Gaussian Processes: Scalable Inference and Novel Applications (Part IV) Theory & Code

Modern Gaussian Processes: Scalable Inference and Novel Applications (Part III) Applications,

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Scalable Gaussian processes with a twist of Probabilistic Numerics Kurt Cutajar EURECOM, Sophia

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

A Novel Framework For Scalable Video A Novel Framework For Scalable Video Streaming Over

Welcome to CE488 Environmental Geotechnics Lecture #7 Dept. of Civil Engineering IIT Bombay

Geretsried in Bavaria an example of a geothermal well drilling plan vs reality International

POWER PRODUCTION IN SOULTZ Daniel Fritsch EEIG Philippe Lutz EDF R&D 15 Septembre

Tectonics and geothermal Tectonics and geothermal exploration and production exploration and

Mute/unmute Start/stop video This workshop will be recorded Chat Enter your

Hydrates Bearing Sediments Stability Conditions Overlying Sediment Layer (Geothermal

economy Charlotte Banks Energy Research & Project Officer APSE Energy www.apse.org.uk The

Gamma- -Ray Burst observation with GLAST Ray Burst observation with GLAST Gamma F. Piron F.

Modern Gaussian Processes: Scalable Inference and Novel Applications - PowerPoint PPT Presentation

Modern Gaussian Processes: Scalable Inference and Novel Applications (Part II-b) Approximate Inference Edwin V. Bonilla and Maurizio Filippone CSIROs Data61, Sydney, Australia and EURECOM, Sophia Antipolis, France July 14 th , 2019 1

Modern Gaussian Processes: Scalable Inference and Novel Applications (Part IV) Theory &amp; Code

Modern Gaussian Processes: Scalable Inference and Novel Applications (Part III) Applications,

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Scalable Gaussian processes with a twist of Probabilistic Numerics Kurt Cutajar EURECOM, Sophia

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

A Novel Framework For Scalable Video A Novel Framework For Scalable Video Streaming Over

Welcome to CE488 Environmental Geotechnics Lecture #7 Dept. of Civil Engineering IIT Bombay

Geretsried in Bavaria an example of a geothermal well drilling plan vs reality International

POWER PRODUCTION IN SOULTZ Daniel Fritsch EEIG Philippe Lutz EDF R&amp;D 15 Septembre

Tectonics and geothermal Tectonics and geothermal exploration and production exploration and

Mute/unmute Start/stop video This workshop will be recorded Chat Enter your

Hydrates Bearing Sediments Stability Conditions Overlying Sediment Layer (Geothermal

economy Charlotte Banks Energy Research &amp; Project Officer APSE Energy www.apse.org.uk The

Gamma- -Ray Burst observation with GLAST Ray Burst observation with GLAST Gamma F. Piron F.

Modern Gaussian Processes: Scalable Inference and Novel Applications (Part IV) Theory & Code

POWER PRODUCTION IN SOULTZ Daniel Fritsch EEIG Philippe Lutz EDF R&D 15 Septembre

economy Charlotte Banks Energy Research & Project Officer APSE Energy www.apse.org.uk The