Advances in Gaussian Processes Tutorial at NIPS 2006 in Vancouver - PowerPoint PPT Presentation

Advances in Gaussian Processes Tutorial at NIPS 2006 in Vancouver Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, Tübingen December 4th, 2006 Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 1 / 55

The Prediction Problem 420 CO 2 concentration, ppm 400 ? 380 360 340 320 1960 1980 2000 2020 year Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 2 / 55

The Prediction Problem 420 CO 2 concentration, ppm 400 380 360 340 320 1960 1980 2000 2020 year Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 3 / 55

The Prediction Problem Ubiquitous questions: • Model fitting • how do I fit the parameters? • what about overfitting? • Model Selection • how to I find out which model to use? • how sure can I be? • Interpretation • what is the accuracy of the predictions? • can I trust the predictions, even if • . . . I am not sure about the parameters? • . . . I am not sure of the model structure? Gaussian processes solve some of the above, and provide a practical framework to address the remaining issues. Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 6 / 55

Outline Part I: foundations Part II: advanced topics • What is a Gaussian process • Example • from distribution to process • priors over functions • distribution over functions • hierarchical priors using • the marginalization property hyperparameters • learning the covariance • Inference function • Bayesian inference • Approximate methods for • posterior over functions classification • predictive distribution • marginal likelihood • Gaussian Process latent variable • Occam’s Razor models • automatic complexity penalty • Sparse methods Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 7 / 55

The Gaussian Distribution The Gaussian distribution is given by p ( x | µ ✱ Σ ) = N ( µ ✱ Σ ) = ( 2 π ) − D / 2 | Σ | − 1 / 2 exp − 1 2 ( x − µ ) ⊤ Σ − 1 ( x − µ ) � � where µ is the mean vector and Σ the covariance matrix. Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 8 / 55

Conditionals and Marginals of a Gaussian joint Gaussian joint Gaussian conditional marginal Both the conditionals and the marginals of a joint Gaussian are again Gaussian. Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 9 / 55

What is a Gaussian Process? A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables. Informally: infinitely long vector ≃ function Definition : a Gaussian process is a collection of random variables, any finite number of which have (consistent) Gaussian distributions. � A Gaussian distribution is fully specified by a mean vector, µ , and covariance matrix Σ : f = ( f 1 ✱ ✳ ✳ ✳ ✱ f n ) ⊤ ∼ N ( µ ✱ Σ ) ✱ indexes i = 1 ✱ ✳ ✳ ✳ ✱ n A Gaussian process is fully specified by a mean function m ( x ) and covariance function k ( x ✱ x ′ ) : m ( x ) ✱ k ( x ✱ x ′ ) � � f ( x ) ∼ GP indexes: x ✱ Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 10 / 55

The marginalization property Thinking of a GP as a Gaussian distribution with an infinitely long mean vector and an infinite by infinite covariance matrix may seem impractical. . . . . . luckily we are saved by the marginalization property : Recall: � p ( x ) = p ( x ✱ y ) d y ✳ For Gaussians: �� a � A B � �� p ( x ✱ y ) = N p ( x ) = N ( a ✱ A ) ⇒ = B ⊤ b C ✱ Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 11 / 55

Random functions from a Gaussian Process Example one dimensional Gaussian process: m ( x ) = 0 ✱ k ( x ✱ x ′ ) = exp (− 1 2 ( x − x ′ ) 2 ) � � p ( f ( x )) ∼ GP ✳ To get an indication of what this distribution over functions looks like, focus on a finite subset of function values f = ( f ( x 1 ) ✱ f ( x 2 ) ✱ ✳ ✳ ✳ ✱ f ( x n )) ⊤ , for which f ∼ N ( 0 ✱ Σ ) ✱ where Σ ij = k ( x i ✱ x j ) . Then plot the coordinates of f as a function of the corresponding x values. Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 12 / 55

Some values of the random function 1.5 1 0.5 output, f(x) 0 −0.5 −1 −1.5 −5 0 5 input, x Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 13 / 55

Sequential Generation Factorize the joint distribution n � p ( f 1 ✱ ✳ ✳ ✳ ✱ f n | x 1 ✱ ✳ ✳ ✳ x n ) = p ( f i | f i − 1 ✱ ✳ ✳ ✳ ✱ f 1 ✱ x i ✱ ✳ ✳ ✳ ✱ x 1 ) ✱ i = 1 and generate function values sequentially. What do the individual terms look like? For Gaussians: �� a � A B � �� p ( x | y ) = N ( a + BC − 1 ( y − b ) ✱ A − BC − 1 B ⊤ ) p ( x ✱ y ) = N ⇒ = B ⊤ b C ✱ Do try this at home! Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 14 / 55

Function drawn at random from a Gaussian Process with Gaussian covariance 8 7 6 5 4 3 2 1 0 −1 −2 6 4 6 2 4 0 2 0 −2 −2 −4 −4 −6 −6 Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 15 / 55

Maximum likelihood, parametric model Supervised parametric learning: • data: x ✱ y • model: y = f w ( x ) + ε Gaussian likelihood: � exp (− 1 2 ( y c − f w ( x c )) 2 /σ 2 p ( y | x ✱ w ✱ M i ) ∝ noise ) ✳ c Maximize the likelihood: w ML = argmax p ( y | x ✱ w ✱ M i ) ✳ w Make predictions, by plugging in the ML estimate: p ( y ∗ | x ∗ ✱ w ML ✱ M i ) Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 16 / 55

Bayesian Inference, parametric model Supervised parametric learning: • data: x ✱ y • model: y = f w ( x ) + ε Gaussian likelihood: � 2 ( y c − f w ( x c )) 2 /σ 2 exp (− 1 p ( y | x ✱ w ✱ M i ) ∝ noise ) ✳ c Parameter prior: p ( w | M i ) Posterior parameter distribution by Bayes rule p ( a | b ) = p ( b | a ) p ( a ) / p ( b ) : p ( w | x ✱ y ✱ M i ) = p ( w | M i ) p ( y | x ✱ w ✱ M i ) p ( y | x ✱ M i ) Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 17 / 55

Bayesian Inference, parametric model, cont. Making predictions: � p ( y ∗ | x ∗ ✱ x ✱ y ✱ M i ) = p ( y ∗ | w ✱ x ∗ ✱ M i ) p ( w | x ✱ y ✱ M i ) d w Marginal likelihood: � p ( y | x ✱ M i ) = p ( w | M i ) p ( y | x ✱ w ✱ M i ) d w ✳ Model probability: p ( M i | x ✱ y ) = p ( M i ) p ( y | x ✱ M i ) p ( y | x ) Problem: integrals are intractable for most interesting models! Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 18 / 55

Non-parametric Gaussian process models In our non-parametric model, the “parameters” is the function itself! Gaussian likelihood: y | x ✱ f ( x ) ✱ M i ∼ N ( f ✱ σ 2 noise I ) (Zero mean) Gaussian process prior: � m ( x ) ≡ 0 ✱ k ( x ✱ x ′ ) � f ( x ) | M i ∼ GP Leads to a Gaussian process posterior m post ( x ) = k ( x ✱ x )[ K ( x ✱ x ) + σ 2 noise I ] − 1 y ✱ � f ( x ) | x ✱ y ✱ M i ∼ GP k post ( x ✱ x ′ ) = k ( x ✱ x ′ ) − k ( x ✱ x )[ K ( x ✱ x ) + σ 2 noise I ] − 1 k ( x ✱ x ′ ) � ✳ And a Gaussian predictive distribution: y ∗ | x ∗ ✱ x ✱ y ✱ M i ∼ N k ( x ∗ ✱ x ) ⊤ [ K + σ 2 noise I ] − 1 y ✱ � k ( x ∗ ✱ x ∗ ) + σ 2 noise − k ( x ∗ ✱ x ) ⊤ [ K + σ 2 noise I ] − 1 k ( x ∗ ✱ x ) � Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 19 / 55

Prior and Posterior 2 2 1 1 output, f(x) output, f(x) 0 0 −1 −1 −2 −2 −5 0 5 −5 0 5 input, x input, x Predictive distribution: k ( x ∗ ✱ x ) ⊤ [ K + σ 2 noise I ] − 1 y ✱ p ( y ∗ | x ∗ ✱ x ✱ y ) ∼ N � k ( x ∗ ✱ x ∗ ) + σ 2 noise − k ( x ∗ ✱ x ) ⊤ [ K + σ 2 noise I ] − 1 k ( x ∗ ✱ x ) � Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 20 / 55

Advances in Gaussian Processes Tutorial at NIPS 2006 in Vancouver - PowerPoint PPT Presentation

Advances in Gaussian Processes Tutorial at NIPS 2006 in Vancouver Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, Tbingen December 4th, 2006 Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Determining the PSF over the Full FoV of LSST using Anisotropic Gaussian Processes

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Prediction models of Social Media data Daniel Preotiuc-Pietro daniel@dcs.shef.ac.uk 11.10.2013

Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the

Techniques for Private Data Analysis Sofya Raskhodnikova Penn State University Based on joint

Bayesian Model Comparison Roberto Trotta - www.robertotrotta.com Analytics, Computation and

arXiv:2007.10928v1 [cs.LG] 21 Jul 2020 Abstract The No Free Lunch theorems prove that under a

tic r The he e extr xtragala lactic ray sk y sky Thr hree a appr pproa oache

Econometric Evaluation of Social Programs Part I: Counterfactuals, Causality and Structural