state space methods for temporal gps
play

State space methods for temporal GPs Arno Solin Assistant Professor - PowerPoint PPT Presentation

State space methods for temporal GPs Arno Solin Assistant Professor in Machine Learning Department of Computer Science Aalto University G AUSSIAN P ROCESS S UMMER S CHOOL September 11, 2019 @arnosolin arno.solin.fi Outline Motivation:


  1. State space methods for temporal GPs Arno Solin Assistant Professor in Machine Learning Department of Computer Science Aalto University G AUSSIAN P ROCESS S UMMER S CHOOL September 11, 2019 � @arnosolin � arno.solin.fi

  2. Outline Motivation: Recap Temporal models Spatio- Three views temporal GPs into GPs State space Further models extensions General likelihoods State space methods for temporal GPs Arno Solin 2/44

  3. Motivation: Temporal models � One-dimensional problems (the data has a natural ordering) � Spatio-temporal models (something developing over time) � Long / unbounded data (sensor data streams, daily observations, etc.) State space methods for temporal GPs Arno Solin 3/44

  4. Three views into GPs Kernel (moment) GP GP Spectral State space (Fourier) (path) State space methods for temporal GPs Arno Solin 4/44

  5. Kernel (moment) representation f ( t ) ∼ GP ( µ ( t ) , κ ( t , t ′ )) GP prior � y | f ∼ p ( y i | f ( t i )) likelihood i ◮ Let’s focus on the GP prior only. ◮ A temporal Gaussian process (GP) is a random function f ( t ) , such that joint distribution of f ( t 1 ) , . . . , f ( t n ) is always Gaussian. ◮ Mean and covariance functions have the form: µ ( t ) = E [ f ( t )] , κ ( t , t ′ ) = E [( f ( t ) − µ ( t ))( f ( t ′ ) − µ ( t ′ )) T ] . ◮ Convenient for model specification, but expanding the kernel to a covariance matrix can be problematic (the notorious O ( n 3 ) scaling). State space methods for temporal GPs Arno Solin 5/44

  6. Spectral (Fourier) representation ◮ The Fourier transform of a function f ( t ) : R → R is � F [ f ]( i ω ) = f ( t ) exp( − i ω t ) d t R ◮ For a stationary GP, the covariance function can be written in terms of the difference between two inputs: κ ( t , t ′ ) � κ ( t − t ′ ) ◮ Wiener–Khinchin: If f ( t ) is a stationary Gaussian process with covariance function κ ( t ) , then its spectral density is S ( ω ) = F [ κ ] . ◮ Spectral representation of a GP in terms of spectral density function S ( ω ) = E [˜ f ( i ω )˜ f T ( − i ω )] State space methods for temporal GPs Arno Solin 6/44

  7. State space (path) representation [1/3] ◮ Path or state space representation as solution to a linear time-invariant (LTI) stochastic differential equation (SDE): d f = F f d t + L d β , where f = ( f , d f / d t , . . . ) and β ( t ) is a vector of Wiener processes. ◮ Equivalently, but more informally d f ( t ) = F f ( t ) + L w ( t ) , d t where w ( t ) is white noise. ◮ The model now consists of a drift matrix F ∈ R m × m , a diffusion matrix L ∈ R m × s , and the spectral density matrix of the white noise process Q c ∈ R s × s . ◮ The scalar-valued GP can be recovered by f ( t ) = h T f ( t ) . State space methods for temporal GPs Arno Solin 7/44

  8. State space (path) representation [2/3] ◮ The initial state is given by a stationary state f ( 0 ) ∼ N ( 0 , P ∞ ) which fulfils F P ∞ + P ∞ F T + L Q c L T = 0 ◮ The covariance function at the stationary state can be recovered by h T P ∞ exp(( t ′ − t ) F ) T h , t ′ ≥ t � κ ( t , t ′ ) = h T exp(( t ′ − t ) F ) P ∞ h , t ′ < t where exp( · ) denotes the matrix exponential function. ◮ The spectral density function at the stationary state can be recovered by S ( ω ) = h T ( F + i ω I ) − 1 L Q c L T ( F − i ω I ) − T h State space methods for temporal GPs Arno Solin 8/44

  9. State space (path) representation [3/3] ◮ Similarly as the kernel has to be evaluated into a covariance matrix for computations, the SDE can be solved for discrete time points { t i } n i = 1 . ◮ The resulting model is a discrete state space model: f i = A i − 1 f i − 1 + q i − 1 , q i ∼ N ( 0 , Q i ) , where f i = f ( t i ) . ◮ The discrete-time model matrices are given by: A i = exp( F ∆ t i ) , � ∆ t i exp( F (∆ t i − τ )) L Q c L T exp( F (∆ t i − τ )) T d τ, Q i = 0 where ∆ t i = t i + 1 − t i ◮ If the model is stationary, Q i is given by Q i = P ∞ − A i P ∞ A T i State space methods for temporal GPs Arno Solin 9/44

  10. Three views into GPs Covariance function Spectral density function 1 2 0 . 8 1 . 5 0 . 6 κ ( τ ) S ( ω ) 1 0 . 4 0 . 5 0 . 2 0 0 − 4 − 2 0 2 4 − 4 − 2 0 2 4 τ = t − t ′ ω Sample functions 2 Output, f ( t ) 0 − 2 0 1 2 3 4 5 6 7 8 9 10 Input, t State space methods for temporal GPs Arno Solin 10/44

  11. Example: Exponential covariance function ◮ Exponential covariance function (Ornstein-Uhlenbeck process): κ ( t , t ′ ) = exp( − λ | t − t ′ | ) ◮ Spectral density function: 2 S ( ω ) = λ + ω 2 /λ ◮ Path representation: Stochastic differential equation (SDE) d f ( t ) = − λ f ( t ) + w ( t ) , d t or using the notation from before: F = − λ , L = 1, Q c = 2, h = 1, and P ∞ = 1. State space methods for temporal GPs Arno Solin 11/44

  12. Examples of applicable GP priors State space methods for temporal GPs Arno Solin 12/44

  13. Applicable GP priors ◮ The covariance function needs to be Markovian (or approximated as such). ◮ Covers many common stationary and non-stationary models. ◮ Sums of kernels: κ ( t , t ′ ) = κ 1 ( t , t ′ ) + κ 2 ( t , t ′ ) • Stacking of the state spaces • State dimension: m = m 1 + m 2 ◮ Product of kernels: κ ( t , t ′ ) = κ 1 ( t , t ′ ) κ 2 ( t , t ′ ) • Kronecker sum of the models • State dimension: m = m 1 m 2 State space methods for temporal GPs Arno Solin 13/44

  14. Example: GP regression, O ( n 3 ) State space methods for temporal GPs Arno Solin 14/44

  15. Example: GP regression, O ( n 3 ) ◮ Consider the GP regression problem with input–output training pairs { ( t i , y i ) } n i = 1 : f ( t ) ∼ GP ( 0 , κ ( t , t ′ )) , ε i ∼ N ( 0 , σ 2 y i = f ( t i ) + ε i , n ) ◮ The posterior mean and variance for an unseen test input t ∗ is given by (see previous lectures): n I ) − 1 y , E [ f ∗ ] = k ∗ ( K + σ 2 n I ) − 1 k T V [ f ∗ ] = K ∗∗ − k ∗ ( K + σ 2 ∗ ◮ Note the inversion of the n × n matrix. State space methods for temporal GPs Arno Solin 15/44

  16. Example: GP regression, O ( n 3 ) State space methods for temporal GPs Arno Solin 16/44

  17. Example: GP regression, O ( n ) ◮ The sequential solution (goes under the name ‘Kalman filter’) considers one data point at a time, hence the linear time-scaling. ◮ Start from m 0 = 0 and P 0 = P ∞ and for each data point iterate the following steps. ◮ Kalman prediction: m i | i − 1 = A i − 1 m i − 1 | i − 1 , P i | i − 1 = A i − 1 P i − 1 | i − 1 A T i − 1 + Q i − 1 . ◮ Kalman update: v i = y i − h T m i | i − 1 , S i = h T P i | i − 1 h + σ 2 n , K i = P i | i − 1 h S − 1 , i m i | i = m i | i − 1 + K i v i , P i | i = P i | i − 1 − K i S i K T i . State space methods for temporal GPs Arno Solin 17/44

  18. Example: GP regression, O ( n ) ◮ To condition all time-marginals on all data, run a backward sweep (Rauch–Tung–Striebel smoother): m i + 1 | i = A i m i | i , P i + 1 | i = A i P i | i A T i + Q i , G i = P i | i A T i P − 1 i + 1 | i , m i | n = m i | i + G i ( m i + 1 | n − m i + 1 | i ) , P i | n = P i | i + G i ( P i + 1 | n − P i + 1 | i ) G T i , ◮ The marginal mean and variance can be recovered by: E [ f i ] = h T m i | n , V [ f i ] = h T P i | n h ◮ The log marginal likelihood can be evaluated as a by-product of the Kalman update: n log p ( y ) = − 1 � log | 2 π S i | + v T i S − 1 v i i 2 i = 1 State space methods for temporal GPs Arno Solin 18/44

  19. Example: GP regression, O ( n ) State space methods for temporal GPs Arno Solin 19/44

  20. Basic regression example ◮ Number of births in the US (from BDA3 by Gelman et al. ) ◮ Daily data between 1969–1988 ( n = 7305) ◮ GP regression with a prior covariance function: κ ( t , t ′ ) = κ ν = 5 / 2 ( t , t ′ ) + κ ν = 3 / 2 ( t , t ′ ) Mat. Mat. + κ year Per. ( t , t ′ ) κ ν = 3 / 2 Per. ( t , t ′ ) κ ν = 3 / 2 ( t , t ′ ) + κ week ( t , t ′ ) Mat. Mat. ◮ Learn hyperparameters by optimizing the marginal likelihood State space methods for temporal GPs Arno Solin 20/44

  21. Basic regression example ◮ Number of births in the US (from BDA3 by Gelman et al. ) ◮ Daily data between 1969–1988 ( n = 7305) ◮ GP regression with a prior covariance function: κ ( t , t ′ ) = κ ν = 5 / 2 ( t , t ′ ) + κ ν = 3 / 2 ( t , t ′ ) Mat. Mat. + κ year Per. ( t , t ′ ) κ ν = 3 / 2 Per. ( t , t ′ ) κ ν = 3 / 2 ( t , t ′ ) + κ week ( t , t ′ ) Mat. Mat. ◮ Learn hyperparameters by optimizing the marginal likelihood Explaining changes in number of births in the US State space methods for temporal GPs Arno Solin 20/44

  22. Connection to banded precision matrices State space methods for temporal GPs Arno Solin 21/44

  23. Precision matrices Covariance (Gram) matrix: Precision matrix: K = κ ( X , X ) K − 1 Q = k ( X , X ) 1 K = k ( X , X ) 3 1.00 0 0 0.75 2 1 1 0.50 1 2 2 0.25 3 0 3 0.00 0.25 4 4 1 0.50 5 5 2 0.75 6 6 3 1.00 0 1 2 3 4 5 6 0 1 2 3 4 5 6 For Markovian models the precision is sparse! (block tri-diagonal) see Durrande et al. (2019) State space methods for temporal GPs Arno Solin 22/44

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend