outline
play

Outline Stochastic processes Time Series Modelling and Kalman - PowerPoint PPT Presentation

Outline Stochastic processes Time Series Modelling and Kalman Filters AR, MA and ARMA models The Fourier view Chris Williams Parameter estimation for ARMA models Linear-Gaussian HMMs (Kalman filtering) School of Informatics,


  1. Outline ◮ Stochastic processes Time Series Modelling and Kalman Filters ◮ AR, MA and ARMA models ◮ The Fourier view Chris Williams ◮ Parameter estimation for ARMA models ◮ Linear-Gaussian HMMs (Kalman filtering) School of Informatics, University of Edinburgh ◮ Reading: Handout on Time Series Modelling: AR, MA, November 2010 ARMA and All That ◮ Reading: Bishop 13.3 (but not 13.3.2, 13.3.3) 1 / 24 2 / 24 Example time series Stochastic Processes ◮ A stochastic process is a family of random variables X ( t ) , t ∈ T indexed by a parameter t in an index set T ◮ We will consider discrete-time stochastic processes where T = Z (the integers) ◮ A time series is said to be strictly stationary if the joint ◮ FTSE 100 distribution of X ( t 1 ) , . . . , X ( t n ) is the same as the joint ◮ Meteorology: temperature, pressure ... distribution of X ( t 1 + τ ) , . . . , X ( t n + τ ) for all t 1 , . . . , t n , τ ◮ Seismology ◮ A time series is said to be weakly stationary if its mean is ◮ Electrocardiogram (ECG) constant and its autocovariance function depends only on the lag, i.e. ◮ . . . E [ X ( t )] = µ ∀ t Cov [ X ( t ) X ( t + τ )] = γ ( τ ) ◮ A Gaussian process is a family of random variables, any finite number of which have a joint Gaussian distribution ◮ The ARMA models we will study are stationary Gaussian processes 3 / 24 4 / 24

  2. α = 0 . 5 Autoregressive (AR) Models ◮ Example AR(1) x t = α x t − 1 + w t where w t ∼ N ( 0 , σ 2 ) ◮ By repeated substitution we get x t = w t + α w t − 1 + α 2 w t − 2 + . . . ◮ Hence E [ X ( t )] = 0, and if | α | < 1 the process is stationary α = − 0 . 5 with σ 2 Var [ X ( t )] = ( 1 + α 2 + α 4 + . . . ) σ 2 = 1 − α 2 ◮ Similarly Cov [ X ( t ) X ( t − k )] = α k Var [ X ( t − k )] = α k σ 2 1 − α 2 5 / 24 6 / 24 Yule-Walker Equations ◮ The general case is an AR( p ) process p p � x t = α i x t − i + w t � x t = α i x t − i + w t i = 1 i = 1 p ◮ Notice how x t is obtainied by a (linear) regression from � x t x t − k = α i x t − i x t − k + w t x t − k x t − 1 , . . . , x t − p , hence an autoregressive process i = 1 ◮ Introduce the backward shift operator B , so that Bx t = x t − 1 , B 2 x t = x t − 2 etc ◮ Taking expectations (and exploiting stationarity) we obtain ◮ Then AR( p ) model can be written as p φ ( B ) x t = w t � γ k = k = 1 , , 2 , . . . α i γ k − i i = 1 where φ ( B ) = ( 1 − α 1 B . . . − α p B p ) ◮ Use p simultaneous equations to obtain the γ ’s from the ◮ The condition for stationarity is that all the roots of φ ( B ) lie α ’s. For inference, can solve a linear system to obtain the outside the unit circle α ’s given estimates of the γ ’s ◮ Example: AR(1) process, γ k = α k γ 0 7 / 24 8 / 24

  3. Vector AR processes Graphical model illustrating an AR(2) process . . . . . . p AR2: α 1 = 0 . 2 , α 2 = 0 . 1 � x t = A i x t − i + G w t i = 1 where the A i s and G are square matrices ◮ We can in general consider modelling multivariate (as opposed to univariate) time series ◮ An AR(2) process can be written as a vector AR(1) process: AR2: α 1 = 1 . 0 , α 2 = − 0 . 5 � α 1 � � x t − 1 � 1 � � � � � � x t 0 w t α 2 = + x t − 1 1 0 x t − 2 0 0 w t − 1 ◮ In general an AR( p ) process can be written as a vector AR(1) process with a p -dimensional state vector (cf ODEs) 9 / 24 10 / 24 Moving Average (MA) processes ◮ We have E [ X ( t )] = 0, and q � x t = β j w t − j ( linear filter ) Var [ X ( t )] = ( 1 + β 2 1 + . . . + β 2 q ) σ 2 j = 0 q q = θ ( B ) w t � � Cov [ X ( t ) X ( t − k )] = E [ β j w t − j , β i w t − k − i ] j = 0 i = 0 with scaling so that β 0 = 1 and θ ( B ) = 1 + � q j = 1 β j B j � σ 2 � q − k j = 0 β j + k β j for k = 0 , 1 , . . . , q = 0 for k > q Example: MA(1) process w’s ◮ Note that covariance “cuts off” for k > q x’s 11 / 24 12 / 24

  4. ARMA( p , q ) processes The Fourier View p q ◮ ARMA models are linear time-invariant systems. Hence � � x t = α i x t − i + β j w t − j sinusoids are their eigenfunctions (Fourier analysis) i = 1 j = 0 ◮ This means it is natural to consider the power spectrum of φ ( B ) x t = θ ( B ) w t the ARMA process. The power spectrum S ( k ) can be determined from the { α } , { β } coefficients ◮ Writing an AR( p ) process as a MA( ∞ ) process ◮ Roughly speaking S ( k ) is the amount of power allocated on average to the eigenfunction e 2 π ikt φ ( B ) x t = w t ◮ This is a useful way to understand some properties of x t = ( 1 − α 1 B . . . α p B p ) − 1 w t ARMA processes, but we will not pursue it further here = ( 1 + β 1 B + β 2 B 2 . . . ) w t ◮ If you want to know more, see e.g. Chatfield (1989) chapter 7 or Diggle (1990) chapter 4 ◮ Similarly a MA( q ) process can be written as a AR( ∞ ) process ◮ Utility of ARMA( p , q ) is potential parsimony 13 / 24 14 / 24 Parameter Estimation Model Order Selection, References ◮ Let the vector of observations x = ( x ( t 1 ) , . . . , x ( t n )) T ◮ Estimate and subtract constant offset ˆ µ if this is non zero ◮ For a MA( q ) process there should be a cut-off in the ◮ ARMA models driven by Gaussian noise are Gaussian autocorrelation function for lags greater than q processes. Thus the likelihood L ( x ; α , β ) is a multivariate Gaussian, and we can optimize this wrt the parameters ◮ For general ARMA models this is model order selection (e.g. by gradient ascent) problem, discussed in an upcoming lecture ◮ AR( p ) models, Some useful books: p � ◮ The Analysis of Time Series: An Introduction. C. Chatfield, x t = α i x t − i + w t Chapman and Hall, 4th edition, 1989 i = 1 ◮ Time Series: A Biostatistical Introduction. P . J. Diggle, can be viewed as the linear regression of x t on the p previous time steps, α and σ 2 can be estimated using Clarendon Press, 1990 linear regression ◮ This viewpoint also enables the fitting of nonlinear AR models 15 / 24 16 / 24

  5. Linear-Gaussian HMMs z z z z N 1 2 3 . . A A C C C C x x x x N 1 2 3 ◮ HMM with continuous state-space and observations ◮ Filtering problem known as Kalman filtering ◮ Dynamical model z n + 1 = A z n + w n + 1 where w n + 1 ∼ N ( 0 , Γ) is Gaussian noise, i.e. p ( z n + 1 | z n ) ∼ N ( A z n , Γ) 17 / 24 18 / 24 Inference Problem – filtering ◮ As whole model is Gaussian, only need to compute means and ◮ Observation model variances p ( z n | x 1 , . . . , x n ) ∼ N ( µ n , V n ) x n = C z n + v n µ n = E [ z n | x 1 , . . . , x n ] where v n ∼ N ( 0 , Σ) is Gaussian noise, i.e. V n = E [( z n − µ n )( z n − µ n ) T | x 1 , . . . , x n ] p ( x n | z n ) ∼ N ( C z n , Σ) ◮ Recursive update split into two parts ◮ Initialization ◮ Time update p ( z 1 ) ∼ N ( µ 0 , V 0 ) p ( z n | x 1 , . . . , x n ) → p ( z n + 1 | x 1 , . . . , x n ) ◮ Measurement update p ( z n + 1 | x 1 , . . . , x n ) → p ( z n + 1 | x 1 , . . . , x n , x n + 1 ) 19 / 24 20 / 24

  6. Simple example ◮ Time update z n + 1 = A z n + w n + 1 z n + 1 = z n + w n + 1 thus w n ∼ N ( 0 , 1 ) E [ z n + 1 | x 1 , . . . x n ] = A µ n x n = z n + v n def = P n = AV n A T + Γ v n ∼ N ( 0 , 1 ) cov ( z n + 1 | x 1 , . . . x n ) p ( z 1 ) ∼ N ( 0 , σ 2 ) ◮ Measurement update (like posterior in Factor Analysis) In the limit σ 2 → ∞ we find µ n + 1 = A µ n + K n + 1 ( x n + 1 − CA µ n ) µ 3 = 5 x 3 + 2 x 2 + x 1 V n + 1 = ( I − K n + 1 C ) P n 8 where ◮ Notice how later data has more weight K n + 1 = P n C T ( CP n C T + Σ) − 1 ◮ Compare z n + 1 = z n (so that w n has zero variance); then ◮ K n + 1 is known as the Kalman gain matrix µ 3 = x 3 + x 2 + x 1 3 21 / 24 22 / 24 Applications Extensions Dealing with non-linearity Much as a coffee filter serves to keep undesirable ◮ The Extended Kalman Filter (EKF) grounds out of your morning mug, the Kalman filter is If x n = f ( z n ) + v n where f is a non-linear function, can designed to strip unwanted noise out of a stream of linearize f , e.g. around E [ z n | x 1 , . . . x n − 1 ] . Works for weak data. Barry Cipra, SIAM News 26(5) 1993 non-linearities ◮ Navigational and guidance systems ◮ For very non-linear problems use sampling methods (known as particle filters). Example, work of Blake and ◮ Radar tracking Isard on tracking, see ◮ Sonar ranging http://www.robots.ox.ac.uk/ ∼ vdg/dynamics.html ◮ Satellite orbit determination It is possible to train KFs using a forward-backward algorithm 23 / 24 24 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend