CSci 8980: Advanced Topics in Graphical Models Gaussian Processes - PowerPoint PPT Presentation

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007

Gaussian Processes Outline

Gaussian Processes Outline Parametric Bayesian Regression

Gaussian Processes Outline Parametric Bayesian Regression Parameters to Functions

Gaussian Processes Outline Parametric Bayesian Regression Parameters to Functions GP Regression

Gaussian Processes Outline Parametric Bayesian Regression Parameters to Functions GP Regression GP Classification

Gaussian Processes Outline Parametric Bayesian Regression Parameters to Functions GP Regression GP Classification We will use

Gaussian Processes Outline Parametric Bayesian Regression Parameters to Functions GP Regression GP Classification We will use Primary: Carl Rasmussen’s GP tutorial slides (NIPS’06)

Gaussian Processes Outline Parametric Bayesian Regression Parameters to Functions GP Regression GP Classification We will use Primary: Carl Rasmussen’s GP tutorial slides (NIPS’06) Secondary: Hanna Wallach’s slides on regression

The Prediction Problem 420 CO 2 concentration, ppm 400 380 360 340 320 1960 1980 2000 2020 year Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 3 / 55

Maximum likelihood, parametric model Supervised parametric learning: • data: x ✱ y • model: y = f w ( x ) + ε Gaussian likelihood: � exp (− 1 2 ( y c − f w ( x c )) 2 /σ 2 p ( y | x ✱ w ✱ M i ) ∝ noise ) ✳ c Maximize the likelihood: w ML = argmax p ( y | x ✱ w ✱ M i ) ✳ w Make predictions, by plugging in the ML estimate: p ( y ∗ | x ∗ ✱ w ML ✱ M i ) Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 16 / 55

Bayesian Inference, parametric model Supervised parametric learning: • data: x ✱ y • model: y = f w ( x ) + ε Gaussian likelihood: � 2 ( y c − f w ( x c )) 2 /σ 2 exp (− 1 p ( y | x ✱ w ✱ M i ) ∝ noise ) ✳ c Parameter prior: p ( w | M i ) Posterior parameter distribution by Bayes rule p ( a | b ) = p ( b | a ) p ( a ) / p ( b ) : p ( w | x ✱ y ✱ M i ) = p ( w | M i ) p ( y | x ✱ w ✱ M i ) p ( y | x ✱ M i ) Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 17 / 55

Bayesian Inference, parametric model, cont. Making predictions: � p ( y ∗ | x ∗ ✱ x ✱ y ✱ M i ) = p ( y ∗ | w ✱ x ∗ ✱ M i ) p ( w | x ✱ y ✱ M i ) d w Marginal likelihood: � p ( y | x ✱ M i ) = p ( w | M i ) p ( y | x ✱ w ✱ M i ) d w ✳ Model probability: p ( M i | x ✱ y ) = p ( M i ) p ( y | x ✱ M i ) p ( y | x ) Problem: integrals are intractable for most interesting models! Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 18 / 55

Bayesian Linear Regression Bayesian Linear Regression (2) Likelihood of parameters is: P ( y | X , w ) = N ( X ⊤ w , σ 2 I ) . Assume a Gaussian prior over parameters: P ( w ) = N ( 0 , Σ p ) . Apply Bayes’ theorem to obtain posterior: P ( w | y , X ) ∝ P ( y | X , w ) P ( w ) . Hanna M. Wallach hmw26@cam.ac.uk Introduction to Gaussian Process Regression

Bayesian Linear Regression Bayesian Linear Regression (3) Posterior distribution over w is: P ( w | y , X ) = N ( 1 + 1 σ 2 A − 1 X y , A − 1 ) where A = Σ − 1 σ 2 XX ⊤ . p Predictive distribution is: � P ( f ⋆ | x ⋆ , X , y ) = f ( x ⋆ | w ) P ( w | X , y ) d w = N ( 1 σ 2 x ⋆ ⊤ A − 1 X y , x ⋆ ⊤ A − 1 x ⋆ ) . Hanna M. Wallach hmw26@cam.ac.uk Introduction to Gaussian Process Regression

Non-parametric Gaussian process models In our non-parametric model, the “parameters” is the function itself! Gaussian likelihood: y | x ✱ f ( x ) ✱ M i ∼ N ( f ✱ σ 2 noise I ) (Zero mean) Gaussian process prior: � m ( x ) ≡ 0 ✱ k ( x ✱ x ′ ) � f ( x ) | M i ∼ GP Leads to a Gaussian process posterior m post ( x ) = k ( x ✱ x )[ K ( x ✱ x ) + σ 2 noise I ] − 1 y ✱ � f ( x ) | x ✱ y ✱ M i ∼ GP k post ( x ✱ x ′ ) = k ( x ✱ x ′ ) − k ( x ✱ x )[ K ( x ✱ x ) + σ 2 noise I ] − 1 k ( x ✱ x ′ ) � ✳ And a Gaussian predictive distribution: y ∗ | x ∗ ✱ x ✱ y ✱ M i ∼ N k ( x ∗ ✱ x ) ⊤ [ K + σ 2 noise I ] − 1 y ✱ � k ( x ∗ ✱ x ∗ ) + σ 2 noise − k ( x ∗ ✱ x ) ⊤ [ K + σ 2 noise I ] − 1 k ( x ∗ ✱ x ) � Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 19 / 55

The Gaussian Distribution The Gaussian distribution is given by p ( x | µ ✱ Σ ) = N ( µ ✱ Σ ) = ( 2 π ) − D / 2 | Σ | − 1 / 2 exp − 1 2 ( x − µ ) ⊤ Σ − 1 ( x − µ ) � � where µ is the mean vector and Σ the covariance matrix. Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 8 / 55

Conditionals and Marginals of a Gaussian joint Gaussian joint Gaussian conditional marginal Both the conditionals and the marginals of a joint Gaussian are again Gaussian. Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 9 / 55

What is a Gaussian Process? A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely many variables. Informally: infinitely long vector ≃ function Definition : a Gaussian process is a collection of random variables, any finite number of which have (consistent) Gaussian distributions. � A Gaussian distribution is fully specified by a mean vector, µ , and covariance matrix Σ : f = ( f 1 ✱ ✳ ✳ ✳ ✱ f n ) ⊤ ∼ N ( µ ✱ Σ ) ✱ indexes i = 1 ✱ ✳ ✳ ✳ ✱ n A Gaussian process is fully specified by a mean function m ( x ) and covariance function k ( x ✱ x ′ ) : m ( x ) ✱ k ( x ✱ x ′ ) � � f ( x ) ∼ GP indexes: x ✱ Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 10 / 55

The marginalization property Thinking of a GP as a Gaussian distribution with an infinitely long mean vector and an infinite by infinite covariance matrix may seem impractical. . . . . . luckily we are saved by the marginalization property : Recall: � p ( x ) = p ( x ✱ y ) d y ✳ For Gaussians: �� a � A B � �� p ( x ✱ y ) = N p ( x ) = N ( a ✱ A ) ⇒ = B ⊤ b C ✱ Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 11 / 55

Random functions from a Gaussian Process Example one dimensional Gaussian process: m ( x ) = 0 ✱ k ( x ✱ x ′ ) = exp (− 1 2 ( x − x ′ ) 2 ) � � p ( f ( x )) ∼ GP ✳ To get an indication of what this distribution over functions looks like, focus on a finite subset of function values f = ( f ( x 1 ) ✱ f ( x 2 ) ✱ ✳ ✳ ✳ ✱ f ( x n )) ⊤ , for which f ∼ N ( 0 ✱ Σ ) ✱ where Σ ij = k ( x i ✱ x j ) . Then plot the coordinates of f as a function of the corresponding x values. Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 12 / 55

Some values of the random function 1.5 1 0.5 output, f(x) 0 −0.5 −1 −1.5 −5 0 5 input, x Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 13 / 55

Non-parametric Gaussian process models In our non-parametric model, the “parameters” is the function itself! Gaussian likelihood: y | x ✱ f ( x ) ✱ M i ∼ N ( f ✱ σ 2 noise I ) (Zero mean) Gaussian process prior: � m ( x ) ≡ 0 ✱ k ( x ✱ x ′ ) � f ( x ) | M i ∼ GP Leads to a Gaussian process posterior m post ( x ) = k ( x ✱ x )[ K ( x ✱ x ) + σ 2 noise I ] − 1 y ✱ � f ( x ) | x ✱ y ✱ M i ∼ GP k post ( x ✱ x ′ ) = k ( x ✱ x ′ ) − k ( x ✱ x )[ K ( x ✱ x ) + σ 2 noise I ] − 1 k ( x ✱ x ′ ) � ✳ And a Gaussian predictive distribution: y ∗ | x ∗ ✱ x ✱ y ✱ M i ∼ N k ( x ∗ ✱ x ) ⊤ [ K + σ 2 noise I ] − 1 y ✱ � k ( x ∗ ✱ x ∗ ) + σ 2 noise − k ( x ∗ ✱ x ) ⊤ [ K + σ 2 noise I ] − 1 k ( x ∗ ✱ x ) � Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 19 / 55

Prior and Posterior 2 2 1 1 output, f(x) output, f(x) 0 0 −1 −1 −2 −2 −5 0 5 −5 0 5 input, x input, x Predictive distribution: k ( x ∗ ✱ x ) ⊤ [ K + σ 2 noise I ] − 1 y ✱ p ( y ∗ | x ∗ ✱ x ✱ y ) ∼ N � k ( x ∗ ✱ x ∗ ) + σ 2 noise − k ( x ∗ ✱ x ) ⊤ [ K + σ 2 noise I ] − 1 k ( x ∗ ✱ x ) � Rasmussen (MPI for Biological Cybernetics) Advances in Gaussian Processes December 4th, 2006 20 / 55

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes - PowerPoint PPT Presentation

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian Processes Outline Parametric

CSci 8980: Advanced Topics in Graphical Models Variational Inference Instructor: Arindam Banerjee

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

CSci 8980: Advanced Topics in Graphical Models Application: Gene Expression Analysis Instructor:

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam

CSci 8980: Advanced Topics in Graphical Models Expectation Propagation Instructor: Arindam

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Lecture 21: Advanced Gaussian Processes Andrew Gordon Wilson

Special Topics: CSci 8980 Machine Learning in Computer Systems Jon B. Weissman (jon@cs.umn.edu)

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Grbner Bases of Gaussian Graphical Models Alex Fink, Jenna Rajchgot, Seth Sullivant Queen Mary

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Information Rates for Phase Noise Channels Luca Barletta Politecnico di Milano

Konstantin Tretjakov (kt@ut.ee) 22. november 2005 RVM & Eponine p.1/ ??

CAN WE TRUST THE GOSPELS? DR PETER J. WILLIAMS @DRPJWILLIAMS MAIN SOURCES ABOUT EMPEROR TIBERIUS

Linear Regression Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid

Exploiting Synergy Between Testing and Inferred Partial Specifications Tao Xie David Notkin

The complexity of string partitioning Anne Condon 1 nuch 1 , 2 Chris Thachuk 1 J an Ma 1

Correlators of operators on Wilson loops in N=4 SYM and AdS 2 /CFT 1 Arkady Tseytlin M.

Robust Statistics in Stata Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 London Stata

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes - PowerPoint PPT Presentation

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian Processes Outline Parametric

CSci 8980: Advanced Topics in Graphical Models Variational Inference Instructor: Arindam Banerjee

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

CSci 8980: Advanced Topics in Graphical Models Application: Gene Expression Analysis Instructor:

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam

CSci 8980: Advanced Topics in Graphical Models Expectation Propagation Instructor: Arindam

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Lecture 21: Advanced Gaussian Processes Andrew Gordon Wilson

Special Topics: CSci 8980 Machine Learning in Computer Systems Jon B. Weissman (jon@cs.umn.edu)

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Grbner Bases of Gaussian Graphical Models Alex Fink, Jenna Rajchgot, Seth Sullivant Queen Mary

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Information Rates for Phase Noise Channels Luca Barletta Politecnico di Milano

Konstantin Tretjakov (kt@ut.ee) 22. november 2005 RVM &amp; Eponine p.1/ ??

CAN WE TRUST THE GOSPELS? DR PETER J. WILLIAMS @DRPJWILLIAMS MAIN SOURCES ABOUT EMPEROR TIBERIUS

Linear Regression Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid

Exploiting Synergy Between Testing and Inferred Partial Specifications Tao Xie David Notkin

The complexity of string partitioning Anne Condon 1 nuch 1 , 2 Chris Thachuk 1 J an Ma 1

Correlators of operators on Wilson loops in N=4 SYM and AdS 2 /CFT 1 Arkady Tseytlin M.

Robust Statistics in Stata Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 London Stata

Konstantin Tretjakov (kt@ut.ee) 22. november 2005 RVM & Eponine p.1/ ??