8 Approximate inference in switching linear dynamical systems using - PDF document

8 Approximate inference in switching linear dynamical systems using Gaussian mixtures David Barber 8.1 Introduction The linear dynamical system (LDS) (see Section 1.3.2) is a standard time series model in which a latent linear process generates the observations. Complex time series which are not well described globally by a single LDS may be divided into segments, each modelled by a potentially di ff erent LDS. Such models can handle situations in which the underlying model ‘jumps’ from one parameter setting to another. For example a single LDS might well represent the normal flows in a chemical plant. However, if there is a break in a pipeline, the dynamics of the system changes from one set of linear flow equations to another. This scenario could be modelled by two sets of linear systems, each with di ff erent parameters, with a discrete latent variable at each time s t ∈ { normal , pipe broken } indicating which of the LDSs is most appropriate at the current time. This is called a switching LDS (SLDS) and used in many disciplines, from econometrics to machine learning [2, 9, 15, 13, 12, 6, 5, 19, 21, 16]. 8.2 The switching linear dynamical system At each time t , a switch variable s t ∈ { 1 , . . . , S } describes which of a set of LDSs is to be used. The observation (or ‘visible’) variable v t ∈ R V is linearly related to the hidden state h t ∈ R H by v ( s t ) , Σ v ( s t ) � . v t = B ( s t ) h t + η v ( s t ) , η v ( s t ) ∼ N � η v ( s t ) ¯ (8.1) Here s t describes which of the set of emission matrices B (1) , . . . , B ( S ) is active at time t . The observation noise η v ( s t ) is drawn from one of a set of Gaussians with di ff erent means v ( s t ) and covariances Σ v ( s t ). The transition of the continuous hidden state h t is linear, ¯ � η h ( s t ) ¯ � h t = A ( s t ) h t − 1 + η h ( s t ) , η h ( s t ) ∼ N h ( s t ) , Σ h ( s t ) , (8.2) and the switch variable s t selects a single transition matrix from the available set A (1) , . . . , A ( S ). The Gaussian transition noise η h ( s t ) also depends on the switch variables. The dynamics of s t itself is Markovian, with transition p ( s t | s t − 1 ). For the more general‘augmented’ (aSLDS) model the switch s t is dependent on both the previous s t − 1 Downloaded from https://www.cambridge.org/core. Seoul National University - Statistics Department, on 01 Aug 2018 at 08:05:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511984679.009

Inference in switching linear systems using mixtures 167 s 1 s 2 s 3 s 4 Figure 8.1 The independence structure of the aSLDS. Square nodes s t denote discrete switch variables; h t are continuous latent / hidden variables, and v t continuous observed / vis- h 1 h 2 h 3 h 4 ible variables. The discrete state s t determines which linear dynamical system from a finite set of linear dynamical systems is operational at time t . In the SLDS links from h to s are not normally considered. v 1 v 2 v 3 v 4 and h t − 1 . The probabilistic model defines a joint distribution, see Fig. 8.1, T � p ( v 1: T , h 1: T , s 1: T ) = p ( v t | h t , s t ) p ( h t | h t − 1 , s t ) p ( s t | h t − 1 , s t − 1 ) , t = 1 v ( s t ) + B ( s t ) h t , Σ v ( s t )) , p ( v t | h t , s t ) = N ( v t ¯ � � h t ¯ h ( s t ) + A ( s t ) h t − 1 , Σ h ( s t ) p ( h t | h t − 1 , s t ) = N . At time t = 1, p ( s 1 | h 0 , s 0 ) denotes the prior p ( s 1 ), and p ( h 1 | h 0 , s 1 ) denotes p ( h 1 | s 1 ). The SLDS can be thought of as a marriage between a hidden Markov model and an LDS. The SLDS is also called a jump Markov model / process, switching Kalman filter, switching linear Gaussian state space model, conditional linear Gaussian model. 8.2.1 Exact inference is computationally intractable Performing exact filtered and smoothed inference in the SLDS is intractable, scaling exponentially with time, see for example [16]. As an informal explanation, consider filtered posterior inference, for which, by analogy with Section 1.4.1 the forward pass is � � p ( s t + 1 , h t + 1 | v 1: t + 1 ) = p ( s t + 1 , h t + 1 | s t , h t , v t + 1 ) p ( s t , h t | v 1: t ) . (8.3) h t s t At time step 1, p ( s 1 , h 1 | v 1 ) = p ( h 1 | s 1 , v 1 ) p ( s 1 | v 1 ) is an indexed Gaussian. At time step 2, due to the summation over the states s 1 , p ( s 2 , h 2 | v 1:2 ) is an indexed set of S Gaussians. In general, at time t , p ( s t , h t | v 1: t ) is an indexed set of S t − 1 Gaussians. Even for small t , the number of components required to exactly represent the filtered distribution is computationally intractable. Analogously, smoothing is also intractable. The origin of the intractability of the SLDS therefore di ff ers from ‘structural intractability’ since, in terms of the cluster variables x 1: T with x t ≡ ( s t , h t ) and visible variables v 1: T , the graph of the distribution is singly connected. From a purely graph-theoretic view- point, one would therefore envisage no di ffi culty in carrying out inference. Indeed, as we saw above, the derivation of the filtering algorithm is straightforward since the graph of p ( x 1: T , v 1: T ) is singly connected. However, the numerical representation of the messages requires an exponentially increasing number of terms. In order to deal with this intractability, several approximation schemes have been intro- duced, [8, 9, 15, 13, 12]. Here we focus on techniques which approximate the switch conditional posteriors using a limited mixture of Gaussians. Since the exact posterior distributions are mixtures of Gaussians, but with an exponentially large number of components, the aim is to drop low-weight components such that the resulting limited number of Gaussians still accurately represents the posterior. Downloaded from https://www.cambridge.org/core. Seoul National University - Statistics Department, on 01 Aug 2018 at 08:05:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511984679.009

168 David Barber 8.3 Gaussian sum filtering Equation (8.3) describes the exact filtering recursion. Whilst the number of mixture components increases exponentially with time, intuitively one would expect that there is an e ff ective time scale over which the previous visible information is relevant. In general, the influence of ancient observations will be much less relevant than that of recent observations. This suggests that a limited number of components in the Gaussian mixture should su ffi ce to accurately represent the filtered posterior [1]. Our aim is to form a recursion for p ( s t , h t | v 1: t ) using a Gaussian mixture approximation of p ( h t | s t , v 1: t ). Given an approximation of the filtered distribution p ( s t , h t | v 1: t ) ≈ q ( s t , h t | v 1: t ), the exact recursion (8.3) is approximated by � � q ( s t + 1 , h t + 1 | v 1: t + 1 ) = p ( s t + 1 , h t + 1 | s t , h t , v t + 1 ) q ( s t , h t | v 1: t ) . (8.4) h t s t This approximation to the filtered posterior at the next time step will contain S times more components than at the previous time step. Therefore to prevent an exponential explosion in mixture components we need to collapse this mixture in a suitable way. We will deal with this once the new mixture representation for the filtered posterior has been computed. To derive the updates it is useful to break the filtered approximation from Eq. (8.4) into continuous and discrete parts q ( h t , s t | v 1: t ) = q ( h t | s t , v 1: t ) q ( s t | v 1: t ) , (8.5) and derive separate filtered update formulae, as described below. An important remark is that many techniques approximate p ( h t | s t , v 1: t ) using a single Gaussian. Naturally, this gives rise to a mixture of Gaussians for p ( h t | v 1: t ). However, in making a single Gaussian approximation to p ( h t | s t , v 1: t ) the representation of the posterior may be poor. Our aim here is to maintain an accurate approximation to p ( h t | s t , v 1: t ) by using a mixture of Gaussians. 8.3.1 Continuous filtering The exact representation of p ( h t | s t , v 1: t ) is a mixture with S t − 1 components. To retain computational feasibility we approximate this with a limited I -component mixture I � q ( h t | s t , v 1: t ) = q ( h t | i t , s t , v 1: t ) q ( i t | s t , v 1: t ) , i t = 1 where q ( h t | i t , s t , v 1: t ) is a Gaussian parameterised with mean f ( i t , s t ) and covariance F ( i t , s t ). Strictly speaking, we should use the notation f t ( i t , s t ) since, for each time t , we have a set of means indexed by i t , s t , but we drop these dependencies in the notation used here. To find a recursion for the approximating distribution, we first assume that we know the filtered approximation q ( h t , s t | v 1: t ) and then propagate this forwards using the exact dynamics. To do so consider first the exact relation � q ( h t + 1 | s t + 1 , v 1: t + 1 ) = q ( h t + 1 , s t , i t | s t + 1 , v 1: t + 1 ) s t , i t � = q ( h t + 1 | s t , i t , s t + 1 , v 1: t + 1 ) q ( s t , i t | s t + 1 , v 1: t + 1 ) . (8.6) s t , i t Downloaded from https://www.cambridge.org/core. Seoul National University - Statistics Department, on 01 Aug 2018 at 08:05:20, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511984679.009

8 Approximate inference in switching linear dynamical systems using - PDF document

8 Approximate inference in switching linear dynamical systems using Gaussian mixtures David Barber 8.1 Introduction The linear dynamical system (LDS) (see Section 1.3.2) is a standard time series model in which a latent linear process

Routing in packet-switching networks Circuit switching vs. Packet switching Most of WANs based on

Integration of Routing and Switching Label Switching & IP switching The goal is to avoid

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008

Switching in Pharmacy Practice What is Switching Switching is where a prescription that

Switching Combinatorial Objects Patric R. J. Osterg ard Department of Communications and

Switching Algebra Basic postulate: existence of two-valued switching variable that takes two

Circuit and Packet Switching Packet Switching Comparison ITS323: Introduction to Data

Switching Packet Switching Comparison ITS323: Introduction to Data Communications CSS331:

TELEPHONE SWITCHING ECE 2526 Monday, February 10, 2020 1 DIRECT AND COMMON CONTROL SWITCHING

Stream Switching Control draft-gentric-mmusic-stream-switching-00.txt Philippe Gentric

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Mathematical approximation Jo Hardin Professor, Pomona College DataCamp Inference for Linear

Homotopy theories of dynamical systems Rick Jardine University of Western Ontario July 15, 2013

Existence of the free boundary in a diffusive ow in porous media Gabriela Marinoschi

Design and analysis of follow-up studies with genetic component Juha Karvanen Department of

Multilevel methods for fast Bayesian optimal experimental design Ra ul Tempone Alexander von

Constrained optimal discrimination designs for Fourier regression models S. Biedermann, School of

Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of

Week 7: Regression Issues Standardized and Studentized residuals, outliers and leverage,

Introduction to Regression and Correlation James H. Steiger Department of Psychology and Human

Machine Learning for Computational Linguistics Regression ar ltekin University of

8 Approximate inference in switching linear dynamical systems using - PDF document

8 Approximate inference in switching linear dynamical systems using Gaussian mixtures David Barber 8.1 Introduction The linear dynamical system (LDS) (see Section 1.3.2) is a standard time series model in which a latent linear process

Routing in packet-switching networks Circuit switching vs. Packet switching Most of WANs based on

Integration of Routing and Switching Label Switching &amp; IP switching The goal is to avoid

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008

Switching in Pharmacy Practice What is Switching Switching is where a prescription that

Switching Combinatorial Objects Patric R. J. Osterg ard Department of Communications and

Switching Algebra Basic postulate: existence of two-valued switching variable that takes two

Circuit and Packet Switching Packet Switching Comparison ITS323: Introduction to Data

Switching Packet Switching Comparison ITS323: Introduction to Data Communications CSS331:

TELEPHONE SWITCHING ECE 2526 Monday, February 10, 2020 1 DIRECT AND COMMON CONTROL SWITCHING

Stream Switching Control draft-gentric-mmusic-stream-switching-00.txt Philippe Gentric

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Mathematical approximation Jo Hardin Professor, Pomona College DataCamp Inference for Linear

Homotopy theories of dynamical systems Rick Jardine University of Western Ontario July 15, 2013

Existence of the free boundary in a diffusive ow in porous media Gabriela Marinoschi

Design and analysis of follow-up studies with genetic component Juha Karvanen Department of

Multilevel methods for fast Bayesian optimal experimental design Ra ul Tempone Alexander von

Constrained optimal discrimination designs for Fourier regression models S. Biedermann, School of

Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of

Week 7: Regression Issues Standardized and Studentized residuals, outliers and leverage,

Introduction to Regression and Correlation James H. Steiger Department of Psychology and Human

Machine Learning for Computational Linguistics Regression ar ltekin University of

Integration of Routing and Switching Label Switching & IP switching The goal is to avoid