Analyzing multiple time series using a dynamic latent variables - PowerPoint PPT Presentation

Analyzing multiple time series using a dynamic latent variables principal component analysis model S. Dossou-Gb´ et´ e February 9, 2011 S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 1 / 44

Outline 1 Introduction 2 Probabilistic Principal Components Analysis and potential dynamic extension 3 Statistical method 4 Case study: Performances of a wastewater treatment plant S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 2 / 44

1 Introduction S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 3 / 44

Introduction Probabilistic Principal Component Analysis (PPCA)[3, 2] as well as Principal Component Analysis (PCA)[1, 2] are two statistical methods designed for analyzing multivariate data. In this setting multivariate data are considered as response variables assuming latent variables (unobserved effects) could explain the variations among individual observations. Anderson T.W. (1984): Estimating Linear Statistical Relationships. Annals of Statistics, 12, pp.1-45. Bishop C.M. (2006): Pattern Recognition and Machine Learning. Springer Tipping M.E. & Bishop C.M. (1999): Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society, Series B, 21 (3), pp.611–622 S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 4 / 44

Introduction ... These methods have proved their ability to cope with a large number of variables without running into scarce degrees of freedom problems often faced in a regression-based analysis. Similar considerations apply to multivariate time series if they are thought as response variables assuming the variations over time of the individual observations could be explained by hidden and time-varying stochastic mechanisms. S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 5 / 44

Introduction ... These latent time-varying components could describe trends in observed time series as well as the relationships between them. This motivates the extension of Probabilistic Principal Component Analysis so as to take into account explicitly the time component that is inherent to the aims of the analysis of multivariate time series. S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 6 / 44

2 Probabilistic Principal Components Analysis and potential dynamic extension S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 7 / 44

Dynamic extension of Probabilistic Principal Components Analysis Let’s consider a multivariate time series { x t , t = 1 : T } where x t = ( x tj ) j =1: p is a vector of p ≥ 2 numerical measurements. It is assumed there is a k -dimensional latent gaussian process Z t , t = 0 : T such that for each t = 1 : T the following assumptions are fullfilled: Z 0 ∼N (0, I k ) Z t − Z t − 1 ∼N (0, I k ) where ❼ η t = Z t − Z t − 1 are iid gaussian random vectors ❼ F Z s is the σ -field generated by { Z u , u = 0 : s } and F Z s gathers informations from the beginning of the process up to s . S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 8 / 44

Dynamic extension of Probabilistic Principal Components Analysis ... For each t = 1 : T measurement x t is the realisation of a p -dimensional random variable X t such that � � X t | F Z = µ + AZ t E t � � � � X t | F Z 0, σ 2 I p X t − E ∼ N t � � X t | F Z where ε t = X t − E are i.i.d. gaussian random vectors; t The model’s specification is completed by the additional assumptions that the gaussian processes η t and ε t are mutually independent. S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 9 / 44

Dynamic extension of Probabilistic Principal Components Analysis ... This is a latent variables model that can be regarded as a specific case of the state space model. The random process Z t is used to model unknown trend components and can be thought as a state variable whose dynamic behavior is described by a random walk. S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 10 / 44

Dynamic extension of Probabilistic Principal Components Analysis ... ❼ X t is then a gaussian process ❼ the marginal components of X t are independent conditionally to F Z t . S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 11 / 44

Model’s identifiabilty The model’s parameters are A , µ , σ 2 also called hyperparameters in the factor analysis terminology. As it stands however, there is a lack of identifiability since for any rotation matrix Ω, the matrix of factor loadings and the trend component could be redefined as A ∗ = A Ω and Z ∗ t = t Ω Z t respectively. S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 12 / 44

Model’s identifiabilty ... for probabilistic PCA, identifiability will be achieved in some extend if the loading matrix A satisfies the constraint � � t AA = diag λ 2 j , j = 1 : k . This constraint means that the columns of the matrix W are set to constitute an orthogonal system of vectors belonging to R p . S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 13 / 44

Model’s identifiabilty ... Then the model may be reparametrized by considering ❼ the scalar parameters µ and σ 2 and the numerical sequence λ 2 j , j = 1 : k ❼ and a sequence of orthonormal vectors u j ∈ R p , j = 1 : k such 1 � � 2 with U = [ u j ] j =1: k and Λ = diag λ 2 that A = U Λ j , j = 1 : k . S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 14 / 44

3 Statistical method EM algorithm Filtering, forecasting and smoothing the latent process Algorithms S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 15 / 44

The log-likelihood of model’s parameters � denotes the unknown parameters of the � µ , σ 2 , A Let Θ = statistical model The density of the joint distribution of the sequence { ( X t ) t =1: T , ( Z t ) t =0: T } of random variables can be written as T � g T (( x t ) t =1: T , ( z t ) t =0: T ) = h T ( z t , t = 0 : T ) g t ( x t | z t − j , j = 0 : t ) t =1 with T � h T ( z t , t = 0 : T ) = h 0 ( z 0 ) h t ( z t | z t − j , j = 1 : t ) t =1 S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 17 / 44

The log-likelihood of model’s parameters ... By taking into account the models specifications as set in the previous subsection, it comes this density is proportional to the expression � 1 � T � T − 1 − 1 � � � 2 � z 0 � 2 2 � z t − z t − 1 � 2 � exp exp σ 2 t =1 T − 1 � � 2 σ 2 � x t − µ − Az t � 2 � × exp t =1 S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 18 / 44

The log-likelihood of model’s parameters ... and hence the complete likelihood (which assumes that both x t and z t are observed) is T cte − 1 � − 1 � z t − z t − 1 � 2 − � � � z 2 � L X , Z (Θ) = � � 0 2 2 t =1 T 1 � σ 2 � � x t − µ − Az t � 2 � − T log 2 σ 2 t =1 where Θ denotes the set of the model’s parameters µ , σ 2 , λ and U . S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 19 / 44

EM algorithm Since z t are unobserved we do not have the complete data and we ressort to EM algorithm[1] which provides with an iterative procedure for computing the maximum likelihood estimates of the model’s parameters based on the incomplete data x t , t = 1 : T . Moreover the EM algorithm allows parts of the observation vector x t to be missing at a number of observation times [2, ? ]. EM algorithm aims to calculate a maximum likelihood estimate of unknown parameter Θ throught a iterative scheme that alternate between: ❼ maximizing the loglikelihood function with respect to latent states (M-step) ❼ with respect to unknown parameters (E-step) directly corresponds to solving the smoothing problem S. Dossou-Gb´ et´ e () Dynamic latent variables PCA February 9, 2011 20 / 44

Analyzing multiple time series using a dynamic latent variables - PowerPoint PPT Presentation

Analyzing multiple time series using a dynamic latent variables principal component analysis model S. Dossou-Gb et e February 9, 2011 S. Dossou-Gb et e () Dynamic latent variables PCA February 9, 2011 1 / 44 Outline 1

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

standard series Overview DP series DX series H series M series bitte hier

Discovering Latent Covariance Structures for Multiple Time Series Anh Tong and Jaesik Choi Ulsan

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

E- -Series: Series: Water Mist Extinguishers Water Mist Extinguishers E E- -Series: Series:

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

GYRO: Analyzing new physics in record time M. Fahey and J. Candy ORNL, Oak Ridge, TN General

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Background-error correlation modelling in variational assimilation using a diffusion equation,

Week 2: Arrange Tables Tamara Munzner Department of Computer Science University of British

Models in Formal Semantics and Pragmatics Magdalena & Stefan Kaufmann (University of

Set Theory CMPS/MATH 2170: Discrete Mathematics Outline Sets and Set Operations (2.1-2.2)

Plan Composite Likelihood Methods What are composite likelihoods? David Firth Where are

arXiv:1404.5733v2 [math.PR] 27 Apr 2014 Abstract This article is concerned with the analysis of

Beyond Optimality: The computational nature of phonological maps and constraints Jeffrey Heinz

Syncretism in Optimality Theory An Overview Gereon M uller Institut f ur Linguistik

Analyzing multiple time series using a dynamic latent variables - PowerPoint PPT Presentation

Analyzing multiple time series using a dynamic latent variables principal component analysis model S. Dossou-Gb et e February 9, 2011 S. Dossou-Gb et e () Dynamic latent variables PCA February 9, 2011 1 / 44 Outline 1

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

standard series Overview DP series DX series H series M series bitte hier

Discovering Latent Covariance Structures for Multiple Time Series Anh Tong and Jaesik Choi Ulsan

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

E- -Series: Series: Water Mist Extinguishers Water Mist Extinguishers E E- -Series: Series:

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

GYRO: Analyzing new physics in record time M. Fahey and J. Candy ORNL, Oak Ridge, TN General

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Background-error correlation modelling in variational assimilation using a diffusion equation,

Week 2: Arrange Tables Tamara Munzner Department of Computer Science University of British

Models in Formal Semantics and Pragmatics Magdalena &amp; Stefan Kaufmann (University of

Set Theory CMPS/MATH 2170: Discrete Mathematics Outline Sets and Set Operations (2.1-2.2)

Plan Composite Likelihood Methods What are composite likelihoods? David Firth Where are

arXiv:1404.5733v2 [math.PR] 27 Apr 2014 Abstract This article is concerned with the analysis of

Beyond Optimality: The computational nature of phonological maps and constraints Jeffrey Heinz

Syncretism in Optimality Theory An Overview Gereon M uller Institut f ur Linguistik

Models in Formal Semantics and Pragmatics Magdalena & Stefan Kaufmann (University of