Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 - PowerPoint PPT Presentation

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 1 / 22

Gaussian Process, (kriging in geostatistics) Autoregressive moving average model, Kalman filters, and radial basis function networks can be viewed as forms of Gaussian process models. Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 2 / 22

Linear regression revisited w T φ ( x ) y ( x ) = N ( w | 0 , α − 1 I ) p ( w ) = y = Φ w E [ y ] = Φ E [ w ] E [ yy T ] = Φ E [ ww T ]Φ T = 1 α ΦΦ T = K cov [ y ] = where K is Gram matrix with elements K nm = k ( x n , x m ) = 1 αφ ( x n ) T φ ( x m ) Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 3 / 22

Gaussian Process A Gaussian process is defined as a probability distribution over functions y ( x ) suth that the set of values of y ( x ) evaluated at an arbitrary set of points x 1 , · · · , x N jointly have a Gaussian distribution. Gaussian random field : when the input vector x is two-dimentional. Stochastic process : y ( x ) is specified by giving the joint probability distirubtion for any finite set of values y ( x 1 ) , · · · , y ( x N ) in a consistent manner. Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 4 / 22

GP Connection to Kernel For Gaussian stochstic process, the joint distribution over N variables y 1 , · · · , y N is specified completely by the second-order statistics. For most applications, we have no prior knowledge, so by symmetry(also for sparsity) we take the mean of y ( x ) to be zero. Then the Gaussian process is deteremined by the covariance of y ( x ) which is specified by the kernel function: E [ y ( x n ) , y ( x m )] = k ( x n , x m ) Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 5 / 22

Two Examples of GP Specificy the covaraince (kernel) directly. 1 Gaussian Kernel: k ( x , x ′ ) = exp ( −|| x − x ′ || 2 / 2 σ 2 ) 2 Exponential Kernel: k ( x , x ′ ) = exp ( − θ | x − x ′ | ) (correpsonds to the Ornstein-Uhlenbeck process original introduced for Brownian motion) 3 3 1.5 1.5 0 0 −1.5 −1.5 −3 −3 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 6 / 22

GP for Regression with Random Noise If the noise on the observed target values are considered: N ( t n | y n , β − 1 ) p ( t n | y n ) = N ( t | y , β − 1 I N ) p ( t | y ) = N ( y | 0 , K ) p ( y ) = � p ( t ) = p ( t | y ) p ( y ) d y = N ( t | 0 , C ) where C ( x n , x m ) = k ( x n , x m ) + β − 1 δ nm . Covraince simply add. Hint: matrix inverse lemma [ B − 1 + CD − 1 C T ] − 1 = B − BC ( D + C T BC ) − 1 C T B Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 7 / 22

� ✁ 3 0 −3 −1 0 1 Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 8 / 22

Commonly Used Kernel for Regression � � − θ 1 2 || x n − x m || 2 + θ 2 + θ 3 x T k ( x n , x m ) = θ 0 exp n x m Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 9 / 22

GP for Prediction p ( t N +1 ) = N ( t N +1 | 0 , C N +1 ) � C N � k C N +1 = k T c k T C − 1 m ( x N +1 ) = N t σ 2 ( x N +1 ) c − k T C − 1 = N k If we rewrite m ( x N +1 ) = � N n =1 a n k ( x n , x N +1 ), and define a kernel function depending only on the distance || x n − x m || , we obtain an expansion in radial basis function. Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 10 / 22

1 0.5 0 −0.5 −1 0 0.2 0.4 0.6 0.8 1 Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 11 / 22

Computation time for GP regression 1 Training: GP: inversion of a N × N matrix O ( N 3 ) + O ( N 2 ). Linear basis function model: inversion of a M × M matrix O ( M 3 ) + O ( M 2 ). 2 Prediction: GP: O ( N ). Linear basis function: O ( M ). Advantages of GP If the number of basis functions is larger than the number of data points, GP is computionally more efficient. Donot need to construct the basis function. Can learn the hyperparameters (maximum likelihood estimation) Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 12 / 22

Automatic relevance determination Previous example doesn’t consider the relevave importance of each dimension. Define a kernel as � 2 � − 1 � γ i ( x i − x ′ i ) 2 k ( x , x ′ ) = θ 0 exp 2 i =1 Atuomatically learn the hyperparameters resulting ARD which automatically determine the relative importance of each basis. Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 13 / 22

GP for Classification Similar to logistic/probit regression, using a nonlinear activation function to transform ( −∞ , + ∞ ) into probability interval (0 , 1). Assume latent variable a and the target output given latent variables are determined: σ ( a ) t (1 − σ ( a )) 1 − t p ( t | a ) = Latent variables a follows the Gaussian Process For prediction, � p ( t N +1 = 1 | t N ) = p ( t N +1 = 1 | a N +1 ) p ( a N +1 | t N ) da N +1 Unfortunately, this is analytically intractable and may be approximated using sampling methods or analytical approximation. Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 14 / 22

GP for classification prediction Gaussin approximation to the posterior distribution over a N +1 . � p ( a N +1 | t N ) = p ( a N +1 | a N ) p ( a N | t N ) d a N N ( a N +1 | k T C − 1 N a N , c − k T C − 1 p ( a N +1 | a N ) = N k ) Need to estimate p ( a N | t N ): use Gaussian Approximation The shape of single-mode distribution is close to Gaussian distribution. Increasing the number of data points falling in a fixed region of x space, then the corresponding uncertainty in the function a ( x ) will decrease, asymptotically leading to a Gaussian. Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 15 / 22

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 - PowerPoint PPT Presentation

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian Process Jul. 31th, 2007 1 / 22 Gaussian Process, (kriging in geostatistics) Autoregressive moving average model, Kalman filters, and radial basis

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

MAP for Gaussian mean and variance Conjugate priors Mean: Gaussian prior Variance:

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

The Gaussian Distribution Chris Williams School of Informatics, University of Edinburgh October

Chapter 9 Gaussian Channel Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei

Gaussian tutorial -Infrared spectra calculation In this tutorial Gaussian 03 program was used

Gaussian Process Implicit Surfaces Oliver Williams 1 Microsoft Research and Trinity Hall

Probabilistic Graphical Models Lecture 21: Advanced Gaussian Processes Andrew Gordon Wilson

Bayesian Optimization of Gaussian Processes applied to Performance Tuning Ramki Ramakrishna

Fast Laplace Approximation for Gaussian Ketter Processes with a Tensor Product Kernel

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Convex Analysis in Stochastic Teams and Asymptotic Optimality of Finite Model Representations and

TV Ads Attribution and Gaussian Processes Adrin Jalali November 16, 2016 1 / 27 Problem

An Extension of the Divergence Operator for Gaussian Processes Jorge A. Len Departamento de

Bayesian Optimization in Reduced Eigenbases David Gaudrie 1 , Rodolphe Le Riche 2 , Victor Picheny

Reduced-Set Models for Improving the Training and Execution Speed of Kernel Methods Hassan A.