state space expectation propagation
play

State Space Expectation Propagation Efficient Inference Schemes for - PowerPoint PPT Presentation

IJ Aalto University State Space Expectation Propagation Efficient Inference Schemes for Temporal Gaussian Processes William Wilkinson , Paul Chang , Michael Riis Andersen , Arno Solin Aalto University , Technical University of


  1. IJ Aalto University State Space Expectation Propagation Efficient Inference Schemes for Temporal Gaussian Processes William Wilkinson ∗ , Paul Chang ∗ , Michael Riis Andersen † , Arno Solin ∗ Aalto University ∗ , Technical University of Denmark † ICML 2020

  2. Motivation • We’re interested in long temporal and spatio-temporal data with interesting non-conjugate GP models (e.g. classification, log-Gaussian Cox processes). • Idea: We should treat the temporal dimension in a fundamentally different manner to other dimensions. State Space Expectation Propagation Wilkinson et. al. 1/10

  3. Approximate Inference in Temporal GPs There exists a dual kernel / SDE form for most popular Gaussian process (GP) models � � 0 , K θ ( t , t ′ ) f ( t ) ∼ GP f k = A θ, k f k − 1 + q k , q k ∼ N ( 0 , Q k ) , y k ∼ p ( y k | f ( t k )) y k = h ( f k , σ k ) , σ k ∼ N ( 0 , Σ k ) State Space Expectation Propagation Wilkinson et. al. 2/10

  4. Approximate Inference in Temporal GPs There exists a dual kernel / SDE form for most popular Gaussian process (GP) models � � 0 , K θ ( t , t ′ ) f ( t ) ∼ GP f k = A θ, k f k − 1 + q k , q k ∼ N ( 0 , Q k ) , y k ∼ p ( y k | f ( t k )) y k = h ( f k , σ k ) , σ k ∼ N ( 0 , Σ k ) inference in O ( n ) via Kalman filtering and smoothing State Space Expectation Propagation Wilkinson et. al. 2/10

  5. Approximate Inference Kalman filter update step: p ( f k | y 1 : k ) ∝ N ( m predict , P predict ) p ( y k | f ( t k )) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  6. Approximate Inference Kalman filter update step: p ( f k | y 1 : k ) ∝ N ( m predict , P predict ) p ( y k | f ( t k )) k k ≈ N ( m predict , P predict k , P site ) N ( m site k ) k k � �� � “site” 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  7. Approximate Inference Kalman filter update step: Approx. Inference: p ( f k | y 1 : k ) ∝ N ( m predict , P predict ) p ( y k | f ( t k )) k k select parameters ← ≈ N ( m predict , P predict ) N ( m site k , P site k ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  8. Approximate Inference Kalman filter update step: Approx. Inference: p ( f k | y 1 : k ) ∝ N ( m predict , P predict ) p ( y k | f ( t k )) k k select parameters ← ≈ N ( m predict , P predict ) N ( m site k , P site k ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  9. Approximate Inference 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  10. Approximate Inference 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  11. Approximate Inference 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  12. Approximate Inference 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  13. Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  14. Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  15. Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  16. Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  17. Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  18. Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  19. Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  20. Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  21. Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k Our Contribution: Given marginal posterior N ( m post . , P post . ) , we show k k how approximate inference amounts to a simple site parameter update rule during smoothing. 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10

  22. Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k Our Contribution: Given marginal posterior N ( m post . , P post . ) , we show k k how approximate inference amounts to a simple site parameter update rule during smoothing. This encompasses: • Power Expectation Propagation • Variational Inference (with natural gradients) • Extended Kalman Smoothing • Unscented / Gauss-Hermite Kalman Smoothing • Posterior Linearisation State Space Expectation Propagation Wilkinson et. al. 3/10

  23. Parameter Update Rules for ∇ L k = d L k d m k Power Expectation Propagation: q cavity ( f k ) = q post. ( f k ) / q α site ( f k ) � � L k = log E q cavity p α ( y k | f k ) � � − 1 � P cavity P site ∇ 2 L k = − α + � k k � − 1 ∇ L k = m cavity m site ∇ 2 L k − � k k State Space Expectation Propagation Wilkinson et. al. 4/10

  24. Parameter Update Rules for ∇ L k = d L k d m k Power Expectation Propagation: Variational Inference: q cavity ( f k ) = q post. ( f k ) / q α site ( f k ) � � L k = log E q cavity p α ( y k | f k ) � � L k = E q post. log p ( y k | f k ) � � − 1 � P cavity � − 1 P site ∇ 2 L k P site = − α + � � ∇ 2 L k = − k k k � − 1 ∇ L k � − 1 ∇ L k = m post. m site � ∇ 2 L k = m cavity − m site ∇ 2 L k − � k k k k State Space Expectation Propagation Wilkinson et. al. 4/10

  25. Parameter Update Rules for ∇ L k = d L k d m k Power Expectation Propagation: Variational Inference: q cavity ( f k ) = q post. ( f k ) / q α site ( f k ) � � L k = log E q cavity p α ( y k | f k ) � � L k = E q post. log p ( y k | f k ) � � − 1 � P cavity � − 1 P site ∇ 2 L k P site = − α + � � ∇ 2 L k = − k k k � − 1 ∇ L k � − 1 ∇ L k = m post. m site � ∇ 2 L k = m cavity − m site ∇ 2 L k − � k k k k Extended Kalman Smoother: = y k − h ( m post . v k , 0 ) k f P post . S k = H ⊤ H f + H σ Σ k H ⊤ k σ � − 1 � � − 1 � P site H ⊤ H σ Σ k H ⊤ = H f k f σ = m post. + P post . m site + ( P site ) H ⊤ f S − 1 v k k k k k k for H f = d h d f and H σ = d h d σ , σ k ∼ N ( 0 , Σ k ) State Space Expectation Propagation Wilkinson et. al. 4/10

  26. A Unifying Perspective • For sequential data, the EKF / UKF / GHKF are equivalent to single-sweep EP where the moment matching is solved via linearisation. State Space Expectation Propagation Wilkinson et. al. 5/10

  27. A Unifying Perspective • For sequential data, the EKF / UKF / GHKF are equivalent to single-sweep EP where the moment matching is solved via linearisation. • The iterated Kalman smoothers (EKS / UKS / GHKS) can also be recovered under certain parameter choices. But note that they optimise a different objective to EP (see paper for details). State Space Expectation Propagation Wilkinson et. al. 5/10

  28. A Unifying Perspective • For sequential data, the EKF / UKF / GHKF are equivalent to single-sweep EP where the moment matching is solved via linearisation. • The iterated Kalman smoothers (EKS / UKS / GHKS) can also be recovered under certain parameter choices. But note that they optimise a different objective to EP (see paper for details). • We show how natural gradient VI updates are surprisingly similar to the EP updates (when using a similar parametrisation). State Space Expectation Propagation Wilkinson et. al. 5/10

  29. New Algorithms • We propose to mix the beneficial properties of EP with the efficiency of classical smoothers. State Space Expectation Propagation Wilkinson et. al. 6/10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend