state space gaussian processes with non gaussian
play

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes - PowerPoint PPT Presentation

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2 , 3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018 Outline Gaussian Processes Temporal GPs as


  1. State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2 Alexander Grigorievskiy 2 , 3 1 Philips Research, 2 Aalto University, 3 Silo.AI ICML2018 July 13, 2018

  2. Outline Gaussian Processes Temporal GPs as stochastic differential equations (SDEs) Learning and inference with Gaussian Likelihoods Speeding up computation of state space model parameters Non-Gaussian likelihoods Approximate inference algorithms Computational primitives and how to compute them Experiments Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 2/ 14

  3. Gaussian Processes (GPs) Def: Gaussian Process (GP) is a stochastic process where for any inputs t all corresponding outputs y are distributed as y ∼ N ( m ( t ) , K ( t , t | θ )) . Denoted: f ( t ) ∼ GP ( m ( t ) , k ( t , t ′ | θ )) ◮ Used as a prior over continuous functions in statistical models ◮ Properties (e.g. smoothness) are determined by the covariance function k ( t , t ′ | θ ) Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 3/ 14

  4. Temporal Gaussian Processes ◮ Input data is 1-D, usually time ◮ Fully probabilistic (Bayesian) approach ◮ Conveniently combining structural Challenges: components by covariance operations ◮ Large datasets ◮ Non-Gaussian likelihoods ◮ Applicability for unevenly sampled data Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 4/ 14

  5. Temporal Gaussian Processes ◮ Input data is 1-D, usually time ◮ Fully probabilistic (Bayesian) approach ◮ Conveniently combining structural Challenges: components by covariance operations ◮ Large datasets ◮ Non-Gaussian likelihoods ◮ Applicability for unevenly sampled data Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 4/ 14

  6. GP as a Stochastic Differential Equation (SDE) Addressing challenge 1 Given a 1-D time series: { y i , t i } N i = 1 ◮ Gaussian Process ◮ Equivalent Stochastic Differential model: Equation (SDE) [3] d f ( t ) f ( t ) ∼ GP ( m ( t ) , k ( t , t ′ )) = Ff ( t ) + Lw ( t ); f 0 ∼ N ( 0 , P ∞ ) GP prior d t n n � y | f ∼ P ( y i | f ( t i )) Likelihood � y | f ∼ P ( y i | Hf ( t i )) i = 1 i = 1 ◮ Latent Posterior: ◮ f ( t ) = Hf ( t ) ◮ w ( t ) - multidimensional white noise Q ( f | D ) = ◮ F , L , H , P ∞ are determined from the � m + K α , ( K − 1 + W ) − 1 � � � N f covariance K [3] � Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 5/ 14

  7. GP as a Stochastic Differential Equation (SDE) Addressing challenge 1 Given a 1-D time series: { y i , t i } N i = 1 ◮ Gaussian Process ◮ Equivalent Stochastic Differential model: Equation (SDE) [3] d f ( t ) f ( t ) ∼ GP ( m ( t ) , k ( t , t ′ )) = Ff ( t ) + Lw ( t ); f 0 ∼ N ( 0 , P ∞ ) GP prior d t n n � y | f ∼ P ( y i | f ( t i )) Likelihood � y | f ∼ P ( y i | Hf ( t i )) i = 1 i = 1 ◮ Latent Posterior: ◮ f ( t ) = Hf ( t ) ◮ w ( t ) - multidimensional white noise Q ( f | D ) = ◮ F , L , H , P ∞ are determined from the � m + K α , ( K − 1 + W ) − 1 � � � N f covariance K [3] � Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 5/ 14

  8. Inference and Learning with Gaussian likelihood Gaussian likelihood: P ( y i | f ( t i )) = N ( y i | f ( t i ) , σ 2 n I ) ◮ Solve SDE between time points ◮ Posterior parameters: (equivalent discrete time model): W = σ − 2 I n f i = A i − 1 f i − 1 + q i − 1 ; q i − 1 ∼ N ( 0 , Q i − 1 ) α = ( K + W − 1 ) − 1 ( y − m ) ǫ n ∼ N ( 0 , σ 2 y i = Hf i + ǫ i ; n ) ◮ Parameters of the discrete ◮ Evidence: model: log Z GPR = − 1 A i = A [∆ t i ] = e ∆ t i F , 2 α ⊤ ( y − m ) Q i = P ∞ − A i P ∞ A ⊤ − 1 2 log | K + W − 1 | − N i 2 log ( 2 πσ 2 n ) ◮ Inference and learning by Kalman FIlter (KF) and ıve approach has O ( N 3 ) ◮ The na¨ Rauch-Tung-Striebel (RTS) complexity smoother in O ( N ) complexity Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 6/ 14

  9. Inference and Learning with Gaussian likelihood Gaussian likelihood: P ( y i | f ( t i )) = N ( y i | f ( t i ) , σ 2 n I ) ◮ Solve SDE between time points ◮ Posterior parameters: (equivalent discrete time model): W = σ − 2 I n f i = A i − 1 f i − 1 + q i − 1 ; q i − 1 ∼ N ( 0 , Q i − 1 ) α = ( K + W − 1 ) − 1 ( y − m ) ǫ n ∼ N ( 0 , σ 2 y i = Hf i + ǫ i ; n ) ◮ Parameters of the discrete ◮ Evidence: model: log Z GPR = − 1 A i = A [∆ t i ] = e ∆ t i F , 2 α ⊤ ( y − m ) Q i = P ∞ − A i P ∞ A ⊤ − 1 2 log | K + W − 1 | − N i 2 log ( 2 πσ 2 n ) ◮ Inference and learning by Kalman FIlter (KF) and ıve approach has O ( N 3 ) ◮ The na¨ Rauch-Tung-Striebel (RTS) complexity smoother in O ( N ) complexity Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 6/ 14

  10. Fast computation of A i and Q i by interpolation Problem: ◮ Use 4-point interpolation: ◮ When there are many ∆ t i A ≈ c 1 A j − 1 + c 2 A j + parameters computation c 3 A j + 1 + c 4 A j + 2 . can be slow Coefficients { c i } 4 i = 1 are Solution: efficiently computable ◮ ψ : s �→ e s X is smooth mean ± min/max errors visualized. mapping, hence 10 Na¨ ıve interpolation (similar to State space 8 State space ( K = 2000 ) Evaluation time (s) KISS-GP [4]) State space ( K = 10 ) 6 ◮ Evaluate ψ on an 4 equispaced grid 2 s 1 , s 2 , .., s K , where 0 s j = s 0 + j · ∆ s 5 10 15 20 · 10 3 Number of training inputs, n Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 7/ 14

  11. Non-Gaussian Likelihoods Addressing challenge 2 Posterior as a Gaussian approximation: � m + K α , ( K − 1 + W ) − 1 � � � Q ( f | D ) = N f Laplace Approximation ◮ log P ( f | D ) ∼ log P ( f | y ) + log P ( f | t ) ◮ Laplace approximation (LA) ◮ Find the mode ˆ f of this function ◮ Variational Bayes (VB) by Newton method ◮ Direct Kullback-Liebler ◮ Hessian at the mode ˆ f is minimization (KL) precision W = − ∂ 2 log P (ˆ f | t ) ◮ Assumed Density Filtering (ADF) � a.k.a. single sweep Expectation ◮ log Z LA = − 1 α ⊤ mvm K ( α ) + Propagation (EP) 2 � i log P ( y i | ˆ ld K ( W ) − 2 � f i ) Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 8/ 14

  12. Computational Primitives The following computational primitives allow to cast the covariance approximation in more generic terms: ◮ Linear system solving: solve K ( W , r ) := ( K + W − 1 ) − 1 r ◮ Matrix-vector multiplications: mvm K ( r ) := Kr ◮ Log-determinants: ld K ( W ) := log | B | with well-conditioned 1 1 2 K W B = I + W 2 ◮ Predictions need latent mean E [ f ∗ ] and variance V [ f ∗ ] Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 9/ 14

  13. Tackling computational primitives Using state space from of temporal GPs SpInGP: ◮ The first two computational primitives are calculated using SpInGP [5] approach: ◮ Idea is: using state space form compose the inverse of the covariance matrix, which turns out to be block-tridiagonal KF and RTS Smoothing: ◮ The last two primitives are solved by Kalman filtering and RTS smoothing ◮ Predictions are computed by primitive 4 and then by propagation through likelihood Comments: ◮ Derivatives of computational primitives, required for learning, are computed in a similar way ◮ SpInGP involves computations with block-tridiagonal matrices. These computations are similar to KF and RTS smoothing (see [1] Appendix) Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 10/ 14

  14. Experiments 2-3 Experiments are designed to emphasize the paper findings and statements 1. A robust regression (Student’s t likelihood) study example with n = 34 , 154 observations 2. Numerical effects in non-Gaussian likelihoods Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 11/ 14

  15. Experiment 4 ◮ A new interesting data set with commercial airline accidents dates scraped from Wikipedia [6] ◮ Accidents over the time-span of ∼ 100 years, n = 35 , 959 days ◮ We model the accident intensity as a Log Gaussian Cox process (Poisson likelihood) ◮ The GP prior is set up as: k ( t , t ′ ) = k Mat. ( t , t ′ ) + k per. ( t , t ′ ) k Mat. ( t , t ′ ) Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 12/ 14

  16. Conclusions ◮ This paper brings together research done in state space GPs and non-Gaussian approximate inference ◮ We improve stability and provide additional speed-up by fast computations of the state space model parameters ◮ We provide unifying code for all approches in GPML toolbox v. 4.2 [7] ◮ Visit our poster: #151 Non-Gaussian State Space GPs Poster #151 Nickisch, Solin, Grigorievskiy 13/ 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend