State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes - PowerPoint PPT Presentation

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2 , 3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018

Outline Gaussian Processes Temporal GPs as stochastic differential equations (SDEs) Learning and inference with Gaussian Likelihoods Speeding up computation of state space model parameters Non-Gaussian likelihoods Approximate inference algorithms Computational primitives and how to compute them Experiments Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 2/ 14

Gaussian Processes (GPs) Def: Gaussian Process (GP) is a stochastic process where for any inputs t all corresponding outputs y are distributed as y ∼ N ( m ( t ) , K ( t , t | θ )) . Denoted: f ( t ) ∼ GP ( m ( t ) , k ( t , t ′ | θ )) ◮ Used as a prior over continuous functions in statistical models ◮ Properties (e.g. smoothness) are determined by the covariance function k ( t , t ′ | θ ) Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 3/ 14

Temporal Gaussian Processes ◮ Input data is 1-D, usually time ◮ Fully probabilistic (Bayesian) approach ◮ Conveniently combining structural Challenges: components by covariance operations ◮ Large datasets ◮ Non-Gaussian likelihoods ◮ Applicability for unevenly sampled data Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 4/ 14

GP as a Stochastic Differential Equation (SDE) Addressing challenge 1 Given a 1-D time series: { y i , t i } N i = 1 ◮ Gaussian Process ◮ Equivalent Stochastic Differential model: Equation (SDE) [3] d f ( t ) f ( t ) ∼ GP ( m ( t ) , k ( t , t ′ )) = Ff ( t ) + Lw ( t ); f 0 ∼ N ( 0 , P ∞ ) GP prior d t n n � y | f ∼ P ( y i | f ( t i )) Likelihood � y | f ∼ P ( y i | Hf ( t i )) i = 1 i = 1 ◮ Latent Posterior: ◮ f ( t ) = Hf ( t ) ◮ w ( t ) - multidimensional white noise Q ( f | D ) = ◮ F , L , H , P ∞ are determined from the � m + K α , ( K − 1 + W ) − 1 � � � N f covariance K [3] � Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 5/ 14

Inference and Learning with Gaussian likelihood Gaussian likelihood: P ( y i | f ( t i )) = N ( y i | f ( t i ) , σ 2 n I ) ◮ Solve SDE between time points ◮ Posterior parameters: (equivalent discrete time model): W = σ − 2 I n f i = A i − 1 f i − 1 + q i − 1 ; q i − 1 ∼ N ( 0 , Q i − 1 ) α = ( K + W − 1 ) − 1 ( y − m ) ǫ n ∼ N ( 0 , σ 2 y i = Hf i + ǫ i ; n ) ◮ Parameters of the discrete ◮ Evidence: model: log Z GPR = − 1 A i = A [∆ t i ] = e ∆ t i F , 2 α ⊤ ( y − m ) Q i = P ∞ − A i P ∞ A ⊤ − 1 2 log | K + W − 1 | − N i 2 log ( 2 πσ 2 n ) ◮ Inference and learning by Kalman FIlter (KF) and ıve approach has O ( N 3 ) ◮ The na¨ Rauch-Tung-Striebel (RTS) complexity smoother in O ( N ) complexity Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 6/ 14

Fast computation of A i and Q i by interpolation Problem: ◮ Use 4-point interpolation: ◮ When there are many ∆ t i A ≈ c 1 A j − 1 + c 2 A j + parameters computation c 3 A j + 1 + c 4 A j + 2 . can be slow Coefficients { c i } 4 i = 1 are Solution: efficiently computable ◮ ψ : s �→ e s X is smooth mean ± min/max errors visualized. mapping, hence 10 Na¨ ıve interpolation (similar to State space 8 State space ( K = 2000 ) Evaluation time (s) KISS-GP [4]) State space ( K = 10 ) 6 ◮ Evaluate ψ on an 4 equispaced grid 2 s 1 , s 2 , .., s K , where 0 s j = s 0 + j · ∆ s 5 10 15 20 · 10 3 Number of training inputs, n Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 7/ 14

Non-Gaussian Likelihoods Addressing challenge 2 Posterior as a Gaussian approximation: � m + K α , ( K − 1 + W ) − 1 � � � Q ( f | D ) = N f Laplace Approximation ◮ log P ( f | D ) ∼ log P ( f | y ) + log P ( f | t ) ◮ Laplace approximation (LA) ◮ Find the mode ˆ f of this function ◮ Variational Bayes (VB) by Newton method ◮ Direct Kullback-Liebler ◮ Hessian at the mode ˆ f is minimization (KL) precision W = − ∂ 2 log P (ˆ f | t ) ◮ Assumed Density Filtering (ADF) � a.k.a. single sweep Expectation ◮ log Z LA = − 1 α ⊤ mvm K ( α ) + Propagation (EP) 2 � i log P ( y i | ˆ ld K ( W ) − 2 � f i ) Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 8/ 14

Computational Primitives The following computational primitives allow to cast the covariance approximation in more generic terms: ◮ Linear system solving: solve K ( W , r ) := ( K + W − 1 ) − 1 r ◮ Matrix-vector multiplications: mvm K ( r ) := Kr ◮ Log-determinants: ld K ( W ) := log | B | with well-conditioned 1 1 2 K W B = I + W 2 ◮ Predictions need latent mean E [ f ∗ ] and variance V [ f ∗ ] Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 9/ 14

Tackling computational primitives Using state space from of temporal GPs SpInGP: ◮ The first two computational primitives are calculated using SpInGP [5] approach: ◮ Idea is: using state space form compose the inverse of the covariance matrix, which turns out to be block-tridiagonal KF and RTS Smoothing: ◮ The last two primitives are solved by Kalman filtering and RTS smoothing ◮ Predictions are computed by primitive 4 and then by propagation through likelihood Comments: ◮ Derivatives of computational primitives, required for learning, are computed in a similar way ◮ SpInGP involves computations with block-tridiagonal matrices. These computations are similar to KF and RTS smoothing (see [1] Appendix) Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 10/ 14

Experiments 2-3 Experiments are designed to emphasize the paper findings and statements 1. A robust regression (Student’s t likelihood) study example with n = 34 , 154 observations 2. Numerical effects in non-Gaussian likelihoods Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 11/ 14

Experiment 4 ◮ A new interesting data set with commercial airline accidents dates scraped from Wikipedia [6] ◮ Accidents over the time-span of ∼ 100 years, n = 35 , 959 days ◮ We model the accident intensity as a Log Gaussian Cox process (Poisson likelihood) ◮ The GP prior is set up as: k ( t , t ′ ) = k Mat. ( t , t ′ ) + k per. ( t , t ′ ) k Mat. ( t , t ′ ) Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 12/ 14

Conclusions ◮ This paper brings together research done in state space GPs and non-Gaussian approximate inference ◮ We improve stability and provide additional speed-up by fast computations of the state space model parameters ◮ We provide unifying code for all approches in GPML toolbox v. 4.2 [7] ◮ Visit our poster: #151 Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 13/ 14

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes - PowerPoint PPT Presentation

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2 , 3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline Gaussian Processes Temporal GPs as

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Determining the PSF over the Full FoV of LSST using Anisotropic Gaussian Processes

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Partial-Order Planning 1 State-Space vs. Plan-Space State-space ( situation space ) planning

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Understanding Wide Neural Networks Jaehoon Lee Google Brain HEP-AI Journal Club Feb 5, 2019

Gaussian Processes for Robotics McGill COMP 765 Oct 24 th , 2017 A robot must learn Modeling

Lecture 13 Gaussian Process Models - Part 2 Colin Rundel 03/01/2017 1 EDA and GPs 2 t i t j t

A Short Introduction to Bayesian Optimization With applications to parameter tuning on

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

I ntroduction to Mobile Robotics Gaussian Processes Wolfram Burgard Cyrill Stachniss Giorgio

Reconst nstruct ruct Radio o Map with Automatic atically ally Constru tructed cted Gaussia

15.1 Last Lecture Want to solve a regression problem. confidence band f = argmin f 2