State space methods for temporal GPs Arno Solin Assistant Professor - PowerPoint PPT Presentation

State space methods for temporal GPs Arno Solin Assistant Professor in Machine Learning Department of Computer Science Aalto University G AUSSIAN P ROCESS S UMMER S CHOOL September 11, 2019 � @arnosolin � arno.solin.fi

Outline Motivation: Recap Temporal models Spatio- Three views temporal GPs into GPs State space Further models extensions General likelihoods State space methods for temporal GPs Arno Solin 2/44

Motivation: Temporal models � One-dimensional problems (the data has a natural ordering) � Spatio-temporal models (something developing over time) � Long / unbounded data (sensor data streams, daily observations, etc.) State space methods for temporal GPs Arno Solin 3/44

Three views into GPs Kernel (moment) GP GP Spectral State space (Fourier) (path) State space methods for temporal GPs Arno Solin 4/44

Kernel (moment) representation f ( t ) ∼ GP ( µ ( t ) , κ ( t , t ′ )) GP prior � y | f ∼ p ( y i | f ( t i )) likelihood i ◮ Let’s focus on the GP prior only. ◮ A temporal Gaussian process (GP) is a random function f ( t ) , such that joint distribution of f ( t 1 ) , . . . , f ( t n ) is always Gaussian. ◮ Mean and covariance functions have the form: µ ( t ) = E [ f ( t )] , κ ( t , t ′ ) = E [( f ( t ) − µ ( t ))( f ( t ′ ) − µ ( t ′ )) T ] . ◮ Convenient for model specification, but expanding the kernel to a covariance matrix can be problematic (the notorious O ( n 3 ) scaling). State space methods for temporal GPs Arno Solin 5/44

Spectral (Fourier) representation ◮ The Fourier transform of a function f ( t ) : R → R is � F [ f ]( i ω ) = f ( t ) exp( − i ω t ) d t R ◮ For a stationary GP, the covariance function can be written in terms of the difference between two inputs: κ ( t , t ′ ) � κ ( t − t ′ ) ◮ Wiener–Khinchin: If f ( t ) is a stationary Gaussian process with covariance function κ ( t ) , then its spectral density is S ( ω ) = F [ κ ] . ◮ Spectral representation of a GP in terms of spectral density function S ( ω ) = E [˜ f ( i ω )˜ f T ( − i ω )] State space methods for temporal GPs Arno Solin 6/44

State space (path) representation [1/3] ◮ Path or state space representation as solution to a linear time-invariant (LTI) stochastic differential equation (SDE): d f = F f d t + L d β , where f = ( f , d f / d t , . . . ) and β ( t ) is a vector of Wiener processes. ◮ Equivalently, but more informally d f ( t ) = F f ( t ) + L w ( t ) , d t where w ( t ) is white noise. ◮ The model now consists of a drift matrix F ∈ R m × m , a diffusion matrix L ∈ R m × s , and the spectral density matrix of the white noise process Q c ∈ R s × s . ◮ The scalar-valued GP can be recovered by f ( t ) = h T f ( t ) . State space methods for temporal GPs Arno Solin 7/44

State space (path) representation [2/3] ◮ The initial state is given by a stationary state f ( 0 ) ∼ N ( 0 , P ∞ ) which fulfils F P ∞ + P ∞ F T + L Q c L T = 0 ◮ The covariance function at the stationary state can be recovered by h T P ∞ exp(( t ′ − t ) F ) T h , t ′ ≥ t � κ ( t , t ′ ) = h T exp(( t ′ − t ) F ) P ∞ h , t ′ < t where exp( · ) denotes the matrix exponential function. ◮ The spectral density function at the stationary state can be recovered by S ( ω ) = h T ( F + i ω I ) − 1 L Q c L T ( F − i ω I ) − T h State space methods for temporal GPs Arno Solin 8/44

State space (path) representation [3/3] ◮ Similarly as the kernel has to be evaluated into a covariance matrix for computations, the SDE can be solved for discrete time points { t i } n i = 1 . ◮ The resulting model is a discrete state space model: f i = A i − 1 f i − 1 + q i − 1 , q i ∼ N ( 0 , Q i ) , where f i = f ( t i ) . ◮ The discrete-time model matrices are given by: A i = exp( F ∆ t i ) , � ∆ t i exp( F (∆ t i − τ )) L Q c L T exp( F (∆ t i − τ )) T d τ, Q i = 0 where ∆ t i = t i + 1 − t i ◮ If the model is stationary, Q i is given by Q i = P ∞ − A i P ∞ A T i State space methods for temporal GPs Arno Solin 9/44

Three views into GPs Covariance function Spectral density function 1 2 0 . 8 1 . 5 0 . 6 κ ( τ ) S ( ω ) 1 0 . 4 0 . 5 0 . 2 0 0 − 4 − 2 0 2 4 − 4 − 2 0 2 4 τ = t − t ′ ω Sample functions 2 Output, f ( t ) 0 − 2 0 1 2 3 4 5 6 7 8 9 10 Input, t State space methods for temporal GPs Arno Solin 10/44

Example: Exponential covariance function ◮ Exponential covariance function (Ornstein-Uhlenbeck process): κ ( t , t ′ ) = exp( − λ | t − t ′ | ) ◮ Spectral density function: 2 S ( ω ) = λ + ω 2 /λ ◮ Path representation: Stochastic differential equation (SDE) d f ( t ) = − λ f ( t ) + w ( t ) , d t or using the notation from before: F = − λ , L = 1, Q c = 2, h = 1, and P ∞ = 1. State space methods for temporal GPs Arno Solin 11/44

Examples of applicable GP priors State space methods for temporal GPs Arno Solin 12/44

Applicable GP priors ◮ The covariance function needs to be Markovian (or approximated as such). ◮ Covers many common stationary and non-stationary models. ◮ Sums of kernels: κ ( t , t ′ ) = κ 1 ( t , t ′ ) + κ 2 ( t , t ′ ) • Stacking of the state spaces • State dimension: m = m 1 + m 2 ◮ Product of kernels: κ ( t , t ′ ) = κ 1 ( t , t ′ ) κ 2 ( t , t ′ ) • Kronecker sum of the models • State dimension: m = m 1 m 2 State space methods for temporal GPs Arno Solin 13/44

Example: GP regression, O ( n 3 ) State space methods for temporal GPs Arno Solin 14/44

Example: GP regression, O ( n 3 ) ◮ Consider the GP regression problem with input–output training pairs { ( t i , y i ) } n i = 1 : f ( t ) ∼ GP ( 0 , κ ( t , t ′ )) , ε i ∼ N ( 0 , σ 2 y i = f ( t i ) + ε i , n ) ◮ The posterior mean and variance for an unseen test input t ∗ is given by (see previous lectures): n I ) − 1 y , E [ f ∗ ] = k ∗ ( K + σ 2 n I ) − 1 k T V [ f ∗ ] = K ∗∗ − k ∗ ( K + σ 2 ∗ ◮ Note the inversion of the n × n matrix. State space methods for temporal GPs Arno Solin 15/44

Example: GP regression, O ( n 3 ) State space methods for temporal GPs Arno Solin 16/44

Example: GP regression, O ( n ) ◮ The sequential solution (goes under the name ‘Kalman filter’) considers one data point at a time, hence the linear time-scaling. ◮ Start from m 0 = 0 and P 0 = P ∞ and for each data point iterate the following steps. ◮ Kalman prediction: m i | i − 1 = A i − 1 m i − 1 | i − 1 , P i | i − 1 = A i − 1 P i − 1 | i − 1 A T i − 1 + Q i − 1 . ◮ Kalman update: v i = y i − h T m i | i − 1 , S i = h T P i | i − 1 h + σ 2 n , K i = P i | i − 1 h S − 1 , i m i | i = m i | i − 1 + K i v i , P i | i = P i | i − 1 − K i S i K T i . State space methods for temporal GPs Arno Solin 17/44

Example: GP regression, O ( n ) ◮ To condition all time-marginals on all data, run a backward sweep (Rauch–Tung–Striebel smoother): m i + 1 | i = A i m i | i , P i + 1 | i = A i P i | i A T i + Q i , G i = P i | i A T i P − 1 i + 1 | i , m i | n = m i | i + G i ( m i + 1 | n − m i + 1 | i ) , P i | n = P i | i + G i ( P i + 1 | n − P i + 1 | i ) G T i , ◮ The marginal mean and variance can be recovered by: E [ f i ] = h T m i | n , V [ f i ] = h T P i | n h ◮ The log marginal likelihood can be evaluated as a by-product of the Kalman update: n log p ( y ) = − 1 � log | 2 π S i | + v T i S − 1 v i i 2 i = 1 State space methods for temporal GPs Arno Solin 18/44

Example: GP regression, O ( n ) State space methods for temporal GPs Arno Solin 19/44

Basic regression example ◮ Number of births in the US (from BDA3 by Gelman et al. ) ◮ Daily data between 1969–1988 ( n = 7305) ◮ GP regression with a prior covariance function: κ ( t , t ′ ) = κ ν = 5 / 2 ( t , t ′ ) + κ ν = 3 / 2 ( t , t ′ ) Mat. Mat. + κ year Per. ( t , t ′ ) κ ν = 3 / 2 Per. ( t , t ′ ) κ ν = 3 / 2 ( t , t ′ ) + κ week ( t , t ′ ) Mat. Mat. ◮ Learn hyperparameters by optimizing the marginal likelihood State space methods for temporal GPs Arno Solin 20/44

Basic regression example ◮ Number of births in the US (from BDA3 by Gelman et al. ) ◮ Daily data between 1969–1988 ( n = 7305) ◮ GP regression with a prior covariance function: κ ( t , t ′ ) = κ ν = 5 / 2 ( t , t ′ ) + κ ν = 3 / 2 ( t , t ′ ) Mat. Mat. + κ year Per. ( t , t ′ ) κ ν = 3 / 2 Per. ( t , t ′ ) κ ν = 3 / 2 ( t , t ′ ) + κ week ( t , t ′ ) Mat. Mat. ◮ Learn hyperparameters by optimizing the marginal likelihood Explaining changes in number of births in the US State space methods for temporal GPs Arno Solin 20/44

Connection to banded precision matrices State space methods for temporal GPs Arno Solin 21/44

Precision matrices Covariance (Gram) matrix: Precision matrix: K = κ ( X , X ) K − 1 Q = k ( X , X ) 1 K = k ( X , X ) 3 1.00 0 0 0.75 2 1 1 0.50 1 2 2 0.25 3 0 3 0.00 0.25 4 4 1 0.50 5 5 2 0.75 6 6 3 1.00 0 1 2 3 4 5 6 0 1 2 3 4 5 6 For Markovian models the precision is sparse! (block tri-diagonal) see Durrande et al. (2019) State space methods for temporal GPs Arno Solin 22/44

State space methods for temporal GPs Arno Solin Assistant Professor - PowerPoint PPT Presentation

State space methods for temporal GPs Arno Solin Assistant Professor in Machine Learning Department of Computer Science Aalto University G AUSSIAN P ROCESS S UMMER S CHOOL September 11, 2019 @arnosolin arno.solin.fi Outline Motivation:

Localization with GPS Localization with GPS From GPS Theory and Practice Fifth Edition

Faster GPS via the Sparse Fourier Transform Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

Spatio-Temporal Statistics with R Chapter Two: Exploring Spatio-Temporal Data Spatio-Temporal

Targeted GPS spoofing Bart Hermans & Luc Gommans University of Amsterdam - RP2 How does GPS

6. Kinematic GPS and Applications Tectonic Geodesy GEOS 655 Kinematic GPS Development of

Die Mathematik des GPS Drei Segmente des GPS Koordinatensysteme Notwendige Lineare Algebra

Imp mprovin ing S Sensitivity o on K Kea CubeS eSat GPS GPS Rec ecei eiver ers Eamonn

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1 GPs and Computational

Applications of GPS Provided Time and Frequency and Future Edward Powers United States Naval

Capturing data and use of GPS Capturing data GIS GPS Paper maps Coordinates Satellite images

GPS as a dark matter detector Andrei Derevianko University of Nevada, Reno, USA GPS.DM (?)

Temporal, Spatial, and Spatio-temporal Granularities Gabriele Pozzani Department of Computer

Temporal Code Temporal Code Temporal Code (Acoustic Front-end) Human Recognition Machine

Temporal Privacy in Wireless Sensor Networks Temporal Privacy in Wireless Sensor Networks

Temporal Planning Planning with Temporal and Concurrent Actions 1 Literature Malik

Temporal Distortion Temporal Distortion Perspective) Perspective) t t Blue view Blue view y

Summary of Lecture III Introducing Temporal Logics. Intuitions beyond Linear Temporal Logic.

Robustness of Temporal Logic Specifications for Signals Georgios Fainekos dissertation series -

Spaten : a Spatio-Temporal and Textual Big Data Generator Thaleia Dimitra Doudali* Ioannis

Programming in Linear Temporal Logic Correspondence Categorical Semantics for Restricted LTL

Temporal Gaussian Mixture Layer for Videos AJ Piergiovanni and Michel S. Ryoo Indiana University

Activity Identification from GPS Trajectories Using Spatial Temporal POIs Attractiveness Lian

Temporal Planning with Temporal Metric Trajectory Constraints Andrea Micheli Enrico Scala

Tianwei Lin Baidu VIS What is Temporal Action Detection (TAD)? Image: Classification Video:

Sambuz

Useful Links

Newsletter

Mail Us

State space methods for temporal GPs Arno Solin Assistant Professor - PowerPoint PPT Presentation

State space methods for temporal GPs Arno Solin Assistant Professor in Machine Learning Department of Computer Science Aalto University G AUSSIAN P ROCESS S UMMER S CHOOL September 11, 2019 @arnosolin arno.solin.fi Outline Motivation:

Localization with GPS Localization with GPS From GPS Theory and Practice Fifth Edition

Faster GPS via the Sparse Fourier Transform Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

Spatio-Temporal Statistics with R Chapter Two: Exploring Spatio-Temporal Data Spatio-Temporal

Targeted GPS spoofing Bart Hermans &amp; Luc Gommans University of Amsterdam - RP2 How does GPS

6. Kinematic GPS and Applications Tectonic Geodesy GEOS 655 Kinematic GPS Development of

Die Mathematik des GPS Drei Segmente des GPS Koordinatensysteme Notwendige Lineare Algebra

Imp mprovin ing S Sensitivity o on K Kea CubeS eSat GPS GPS Rec ecei eiver ers Eamonn

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1 GPs and Computational

Applications of GPS Provided Time and Frequency and Future Edward Powers United States Naval

Capturing data and use of GPS Capturing data GIS GPS Paper maps Coordinates Satellite images

GPS as a dark matter detector Andrei Derevianko University of Nevada, Reno, USA GPS.DM (?)

Temporal, Spatial, and Spatio-temporal Granularities Gabriele Pozzani Department of Computer

Temporal Code Temporal Code Temporal Code (Acoustic Front-end) Human Recognition Machine

Temporal Privacy in Wireless Sensor Networks Temporal Privacy in Wireless Sensor Networks

Temporal Planning Planning with Temporal and Concurrent Actions 1 Literature Malik

Temporal Distortion Temporal Distortion Perspective) Perspective) t t Blue view Blue view y

Summary of Lecture III Introducing Temporal Logics. Intuitions beyond Linear Temporal Logic.

Robustness of Temporal Logic Specifications for Signals Georgios Fainekos dissertation series -

Spaten : a Spatio-Temporal and Textual Big Data Generator Thaleia Dimitra Doudali* Ioannis

Programming in Linear Temporal Logic Correspondence Categorical Semantics for Restricted LTL

Temporal Gaussian Mixture Layer for Videos AJ Piergiovanni and Michel S. Ryoo Indiana University

Activity Identification from GPS Trajectories Using Spatial Temporal POIs Attractiveness Lian

Temporal Planning with Temporal Metric Trajectory Constraints Andrea Micheli Enrico Scala

Tianwei Lin Baidu VIS What is Temporal Action Detection (TAD)? Image: Classification Video:

Sambuz

Useful Links

Newsletter

Mail Us

Targeted GPS spoofing Bart Hermans & Luc Gommans University of Amsterdam - RP2 How does GPS