Fitting Covariance and Multioutput Gaussian Processes Neil D. - PowerPoint PPT Presentation

Learning Covariance Parameters Can we determine covariance parameters from the data? − y ⊤ K − 1 y � � N � y | 0 , K � = 1 exp 1 2 n (2 π ) 2 | K | 2 The parameters are inside the covariance function (matrix). k i , j = k ( x i , x j ; θ )

Learning Covariance Parameters Can we determine covariance parameters from the data? 2 log | K |− y ⊤ K − 1 y log N � y | 0 , K � = − 1 2 − n 2 log 2 π The parameters are inside the covariance function (matrix). k i , j = k ( x i , x j ; θ )

Learning Covariance Parameters Can we determine covariance parameters from the data? 2 log | K | + y ⊤ K − 1 y E ( θ ) = 1 2 The parameters are inside the covariance function (matrix). k i , j = k ( x i , x j ; θ )

Eigendecomposition of Covariance A useful decomposition for understanding the objective function. K = R Λ 2 R ⊤ λ 1 Diagonal of Λ represents distance λ 2 along axes. R gives a rotation of these axes. where Λ is a diagonal matrix and R ⊤ R = I .

Capacity control: log | K | λ 1 0 Λ = 0 λ 2 λ 1

Capacity control: log | K | λ 1 0 Λ = λ 2 0 λ 2 λ 1

Capacity control: log | K | λ 1 0 Λ = λ 2 0 λ 2 λ 1 | Λ | = λ 1 λ 2

Capacity control: log | K | λ 1 0 Λ = | Λ | λ 2 0 λ 2 λ 1 | Λ | = λ 1 λ 2

Capacity control: log | K | λ 1 0 0 Λ = 0 λ 2 0 | Λ | λ 2 0 0 λ 3 λ 1 | Λ | = λ 1 λ 2

Capacity control: log | K | λ 1 0 0 Λ = 0 λ 2 0 | Λ | λ 2 0 0 λ 3 λ 3 λ 1 | Λ | = λ 1 λ 2 λ 3

Capacity control: log | K | λ 1 0 Λ = | Λ | λ 2 0 λ 2 λ 1 | Λ | = λ 1 λ 2

Capacity control: log | K | | Λ | w 1 , 1 w 1 , 2 R Λ = λ 1 w 2 , 1 w 2 , 2 λ 2 | R Λ | = λ 1 λ 2

Data Fit: y ⊤ K − 1 y 2 6 4 2 λ 2 y 2 0 λ 1 -2 -4 -6 -6 -4 -2 0 2 4 6 y 1

Data Fit: y ⊤ K − 1 y 2 6 4 2 λ 1 y 2 0 λ 2 -2 -4 -6 -6 -4 -2 0 2 4 6 y 1

Learning Covariance Parameters Can we determine length scales and noise levels from the data? 20 2 15 1 10 y ( x ) 5 0 0 -1 -5 -10 -2 10 − 1 10 0 10 1 -2 -1 0 1 2 x length scale, ℓ 2 log | K | + y ⊤ K − 1 y E ( θ ) = 1 2

Gene Expression Example ◮ Given given expression levels in the form of a time series from Della Gatta et al. (2008). ◮ Want to detect if a gene is expressed or not, fit a GP to each gene (Kalaitzis and Lawrence, 2011).

Kalaitzis and Lawrence BMC Bioinformatics 2011, 12 :180 http://www.biomedcentral.com/1471-2105/12/180 RESEARCH ARTICLE Open Access A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression Alfredo A Kalaitzis * and Neil D Lawrence * Abstract Background: The analysis of gene expression from time series underpins many biological studies. Two basic forms of analysis recur for data of this type: removing inactive (quiet) genes from the study and determining which genes are differentially expressed. Often these analysis stages are applied disregarding the fact that the data is drawn from a time series. In this paper we propose a simple model for accounting for the underlying temporal nature of the data based on a Gaussian process. Results: We review Gaussian process (GP) regression for estimating the continuous trajectories underlying in gene expression time-series. We present a simple approach which can be used to filter quiet genes, or for the case of time series in the form of expression ratios, quantify differential expression. We assess via ROC curves the rankings produced by our regression framework and compare them to a recently proposed hierarchical Bayesian model for the analysis of gene expression time-series (BATS). We compare on both simulated and experimental data showing that the proposed approach considerably outperforms the current state of the art.

1 0.5 log 10 SNR 0 -0.5 -1 -1.5 -2 -2.5 1 1.5 2 2.5 3 3.5 log 10 length scale Contour plot of Gaussian process likelihood.

1 1 0.5 log 10 SNR 0.5 0 -0.5 y ( x ) 0 -1 -1.5 -0.5 -2 -2.5 -1 1 1.5 2 2.5 3 3.5 0 50100 150 200 250 300 log 10 length scale x Optima: length scale of 1.2221 and log 10 SNR of 1.9654 log likelihood is -0.22317.

1 1 0.5 log 10 SNR 0.5 0 -0.5 y ( x ) 0 -1 -1.5 -0.5 -2 -2.5 -1 1 1.5 2 2.5 3 3.5 0 50100 150 200 250 300 log 10 length scale x Optima: length scale of 1.5162 and log 10 SNR of 0.21306 log likelihood is -0.23604.

1 0.8 0.6 0.5 log 10 SNR 0.4 0 0.2 -0.5 y ( x ) 0 -1 -0.2 -1.5 -0.4 -2 -0.6 -2.5 -0.8 1 1.5 2 2.5 3 3.5 0 50100 150 200 250 300 log 10 length scale x Optima: length scale of 2.9886 and log 10 SNR of -4.506 log likelihood is -2.1056.

Outline Constructing Covariance GP Limitations Kalman Filter

Limitations of Gaussian Processes ◮ Inference is O ( n 3 ) due to matrix inverse (in practice use Cholesky). ◮ Gaussian processes don’t deal well with discontinuities (financial crises, phosphorylation, collisions, edges in images). ◮ Widely used exponentiated quadratic covariance (RBF) can be too smooth in practice (but there are many alternatives!!).

Outline Constructing Covariance GP Limitations Kalman Filter

Simple Markov Chain ◮ Assume 1-d latent state, a vector over time, x = [ x 1 . . . x T ]. ◮ Markov property, x i = x i − 1 + ǫ i , ǫ i ∼N (0 , α ) = ⇒ x i ∼N ( x i − 1 , α ) ◮ Initial state, x 0 ∼ N (0 , α 0 ) ◮ If x 0 ∼ N (0 , α ) we have a Markov chain for the latent states. ◮ Markov chain it is specified by an initial distribution (Gaussian) and a transition distribution (Gaussian).

Gauss Markov Chain 4 2 0 x -2 -4 0 1 2 3 4 5 6 7 8 9 t x 0 = 0, ǫ i ∼ N (0 , 1) x 0 = 0 . 000, ǫ 1 = − 2 . 24 x 1 = 0 . 000 − 2 . 24 = − 2 . 24

Gauss Markov Chain 4 2 0 x -2 -4 0 1 2 3 4 5 6 7 8 9 t x 0 = 0, ǫ i ∼ N (0 , 1) x 1 = − 2 . 24, ǫ 2 = 0 . 457 x 2 = − 2 . 24 + 0 . 457 = − 1 . 78

Gauss Markov Chain 4 2 0 x -2 -4 0 1 2 3 4 5 6 7 8 9 t x 0 = 0, ǫ i ∼ N (0 , 1) x 3 = − 1 . 6, ǫ 4 = − 0 . 292 x 4 = − 1 . 6 − 0 . 292 = − 1 . 89

Multivariate Gaussian Properties: Reminder If � � z ∼ N µ , C and x = Wz + b then � W µ + b , WCW ⊤ � x ∼ N

Multivariate Gaussian Properties: Reminder Simplified : If � � 0 , σ 2 I z ∼ N and x = Wz then � 0 , σ 2 WW ⊤ � x ∼ N

Matrix Representation of Latent Variables x 1 1 0 0 0 0 ǫ 1 x 2 1 1 0 0 0 ǫ 2 = × x 3 1 1 1 0 0 ǫ 3 x 4 1 1 1 1 0 ǫ 4 x 5 1 1 1 1 1 ǫ 5 x 1 = ǫ 1

Matrix Representation of Latent Variables x 1 1 0 0 0 0 ǫ 1 x 2 1 1 0 0 0 ǫ 2 = × x 3 1 1 1 0 0 ǫ 3 x 4 1 1 1 1 0 ǫ 4 x 5 1 1 1 1 1 ǫ 5 x 2 = ǫ 1 + ǫ 2

Matrix Representation of Latent Variables x 1 1 0 0 0 0 ǫ 1 x 2 1 1 0 0 0 ǫ 2 = × x 3 1 1 1 0 0 ǫ 3 x 4 1 1 1 1 0 ǫ 4 x 5 1 1 1 1 1 ǫ 5 x 3 = ǫ 1 + ǫ 2 + ǫ 3

Matrix Representation of Latent Variables x 1 1 0 0 0 0 ǫ 1 x 2 1 1 0 0 0 ǫ 2 = × x 3 1 1 1 0 0 ǫ 3 x 4 1 1 1 1 0 ǫ 4 x 5 1 1 1 1 1 ǫ 5 x 4 = ǫ 1 + ǫ 2 + ǫ 3 + ǫ 4

Matrix Representation of Latent Variables x 1 1 0 0 0 0 ǫ 1 x 2 1 1 0 0 0 ǫ 2 = × x 3 1 1 1 0 0 ǫ 3 x 4 1 1 1 1 0 ǫ 4 x 5 1 1 1 1 1 ǫ 5 x 5 = ǫ 1 + ǫ 2 + ǫ 3 + ǫ 4 + ǫ 5

Matrix Representation of Latent Variables x = × ǫ L 1

Multivariate Process ◮ Since x is linearly related to ǫ we know x is a also Gaussian process. ◮ Simply invoke our properties of multivariate Gaussian densities.

Latent Process x = L 1 ǫ

Latent Process x = L 1 ǫ ǫ ∼ N ( 0 , α I )

Latent Process x = L 1 ǫ ǫ ∼ N ( 0 , α I ) = ⇒

Latent Process x = L 1 ǫ ǫ ∼ N ( 0 , α I ) = ⇒ � � 0 , α L 1 L ⊤ x ∼ N 1

Fitting Covariance and Multioutput Gaussian Processes Neil D. - PowerPoint PPT Presentation

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPSS 13th September 2016 Outline Constructing Covariance GP Limitations Kalman Filter Outline Constructing Covariance GP Limitations Kalman Filter Constructing

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPMC 6th February 2017

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Introduction to Gaussian Processes Neil D. Lawrence GPMC 6th February 2017 Book Rasmussen and

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Covariance Functions and Classification Carl Edward Rasmussen Max Planck

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Lecture 13 Gaussian Process Models - Part 2 3/06/2018 1 EDA and GPs 2 Variogram When fitting

Probability and Statistics for Computer Science cov ( X, Y

Visualization 1 Applied Multivariate Statistics Spring 2012 Goals Covariance, Correlation

Deep Neural Networks as Gaussian Processes Jaehoon Lee Google Brain Workshop on Accelerating the

An assessment of the tropical Humidity Temperature covariance using AIRS Antonia Gambacorta,

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan,

= = = 2 Further Var( X ) Var( ) Y a a a X =

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product