Introduction to Time Series Data and Analysis Simon Taylor - - PowerPoint PPT Presentation

introduction to time series data and analysis
SMART_READER_LITE
LIVE PREVIEW

Introduction to Time Series Data and Analysis Simon Taylor - - PowerPoint PPT Presentation

Introduction to Time Series Data and Analysis Simon Taylor Department of Mathematics and Statistics 20 April 2016 Contents What is Time Series Data? Analysis Tools Trace Plot Auto-Correlation Function Spectrum Time Series Models Moving


slide-1
SLIDE 1

Introduction to Time Series Data and Analysis

Simon Taylor

Department of Mathematics and Statistics

20 April 2016

slide-2
SLIDE 2

Contents

What is Time Series Data? Analysis Tools Trace Plot Auto-Correlation Function Spectrum Time Series Models Moving Average Auto-Regressive Further Topics

slide-3
SLIDE 3

What is Time Series Data?

A time series is a set of observations made sequentially through time. Examples:

  • Changes in execution time, RAM or bandwidth usage.
  • Times a software has run in consecutive periods of time.
  • Financial, geophysical, marketing, demographic, etc.

The objectives in time series analysis are: Description How does the data vary over time? Explanation What causes the observed variation? Prediction What are the future values of the series? Control Aim to improve control over the process.

slide-4
SLIDE 4

Common Questions

Q: How important is preserving data order? A: Very! Changing data order breaks the dependence between measurements. Q: How frequent do I need to take measurements? A: It depends:

  • Too sparse, risk missing the

dependence structure.

  • Too frequent, swamped with noise.

−5 5

1Hz

−5 5

10Hz

10 20 30 40 50 −5 5

100Hz Time

Figure: Sampling Frequency

slide-5
SLIDE 5

Why is time series important in benchmarking?

Q: Can I use simple summary statistics? A: You can, but they only describe overall properties. Q: Can’t I just interpolate between data points? A: Signals are often subject to uncontrollable random noise. Error from interpolation may be large if noise is large.

−3 −2 −1 1 2 3

X

−3 −2 −1 1 2 3

Y

200 400 600 800 1000 −3 −2 −1 1 2 3

Z

Time

Figure: Three times series with ¯ x = 0 and s2 = 1.

slide-6
SLIDE 6

Analysis Tools – Trace Plot

A trace plot is a graph of the measurements against time. Easy to visually identify key features:

  • Trends – Long-term trend in the mean level.
  • Seasonality – Regular peaks & falls in the measurements.
  • Outliers – Unusual measurements that are inconsistent with

the rest of the data.

  • Discontinuities – Abrupt change to the underlying process.
slide-7
SLIDE 7

Analysis Tools – Auto-correlation function

Correlation measures the linear dependence between two data sets. Auto-correlation measures the correlation between all data pairs at lag k apart. rk = T−k

t=1 (xt − ¯

x)(xt+k − ¯ x) (T − 1)s2 , where ¯ x and s2 is the sample mean and variance.

Figure: Lag 5 ACF calculation.

slide-8
SLIDE 8

Analysis Tools – Spectrum

The spectrum describes how the power in a time series varies across frequencies. I(ω) = 1 πT

  • T
  • t=1

xtei2πtω

  • 2

, for ω ∈ (0, 1/2]. Identifies prominent seasonal and cyclic variation.

10 20 30 40 50

Time 1/4 1/5 1/6 1/7 1/8 X

0.0 0.1 0.2 0.3 0.4 0.5 10 20 30 40 50

Frequency Spectrum

Figure: Fourier decomposition and spectrum of time series Xt.

slide-9
SLIDE 9

Time series models

Let X1:T = {X1, . . . , XT} denote a sequence of T measurements. A time series is stationary if the distribution of any pair of subset separated by lag k, X1:t and X1+k,t+k, are the same. A time series is weakly stationary if the first two moments are constant over time: E[Xt] = µ and Cov(Xt, Xt+k) = γ(k).

Gaussian White Noise Process, GWNP

The time series {Zt} follows a Gaussian white noise process if: Zt ∼ N(0, σ2), t = 1, . . . , T

slide-10
SLIDE 10

Gaussian White Noise Process

Figure: Gaussian white noise process.

slide-11
SLIDE 11

MA(q) process

Moving Average Process of Order q, MA(q)

The process {Xt} is a moving average process of order q if: Xt = β0Zt + β1Zt−1 + · · · + βqZt−q where {Zt} is a GWNP and β0, . . . , βq are constants (β0 = 1). Expectation: E[Xt] = 0. Auto-covariance: Cov(Xt, Xt+k) =

  • σ2 q−|k|

i=0

βiβi+|k|, |k| = 0, . . . , q; 0,

  • therwise.
slide-12
SLIDE 12

MA(q) process

Figure: Left: MA(1), β1 = 0.9. Right: MA(2), (β1, β2) = (−0.4, 0.9).

slide-13
SLIDE 13

AR(p) process

Autoregressive Process of Order p, AR(p)

The process {Yt} is an autoregressive process of order p if: Yt = α1Yt−1 + · · · + αpYt−p + Zt where {Zt} is a GWNP and α1, . . . , αp are constants. Expectation: E[Xt] = 0. Auto-covariance for AR(1): Cov(Xt, Xt+k) = σ2 α|k|

1

1 − α2

1

, provided |α1| < 1.

slide-14
SLIDE 14

AR(p) process

Figure: Left: AR(1), α1 = 0.9. Right: AR(2), (α1, α2) = (0.8, −0.64).

slide-15
SLIDE 15

Non-stationary process

200 400 600 800 1000 −10 10 20

Random Walk

Time

200 400 600 800 1000 −20 20

Non−stationary AR(2)

Time

200 400 600 800 1000 −8 −6 −4 −2 2

AR(1) w/ Mean Change

Time

200 400 600 800 1000 −3 −2 −1 1 2 3

Concatenated Haar MA

Time

Figure: Examples of non-stationary processes.

slide-16
SLIDE 16

On-going Research in Time Series

local Whittle

nonstationary

residual autocorrelations

least squares estimation

fractional integration heteroscedastic

unobserved components

model selection

spectral analysis ARIMA

state−space model

empirical distribution function

linear time series

Fisher information matrix

Stochastic volatility

AR

VARMA conditional heteroscedasticity

simulation

kernel−density estimation deterministic trend CUSUM test TAR nonlinear AR Brownian motion

consistency

count data likelihood ratio test fractional cointegration Markov chain Additive outliers

random coefficients

EM algorithm

stationary process

bilinear model parameter estimation neural network

QMLE

Bayesian inference

time series

integer−valued time series

estimation high−frequency data spatio−temporal

periodogram

long memory

Gaussian process power

wavelet

RJMCMC hypothesis testing

semiparametric estimation

goodness−of−fit

threshold model factor model least squares score test point process

GARCH

ARMA

Portmanteau test

Dickey−Fuller test changepoint robustness

spectral density INAR

cointegration

Monte Carlo experiment

nonstationary time series efficiency partial autocorrelation nonparametric estimation

MLE

CLT

structural break

ARFIMA

smoothing

Seasonal unit roots

VAR

missing data infinite variance periodic time series block bootstrap spectral density matrix

unit roots forecasting

maximum likelihood

ARCH

periodically correlated process prediction confidence interval

stationarity

structural change

  • utlier

AIC Whittle likelihood seasonality subsampling

Asymptotic distribution

Lagrange multiplier

Asymptotic normality

stationary test temporal aggregation non−Gaussian

Autocorrelation

bootstrap

multivariate time series

locally stationary processes

heavy tail

Kalman filtering

MCMC

nonlinear

unit root test

MA

identification

ergodicity

Figure: Keyword cloud from the Journal of Time Series Analysis, 2002–2015. Red: Models, Navy: Properties, Grey: Inference & Methods

slide-17
SLIDE 17

Further Reading

  • Box, G. E. P

., Jenkins, G. M. and Reinsel, G. C. (2008) Time series analysis: Forecasting and control. 4th ed., John Wiley & Sons.

  • Chatfield, C. (2004) The Analysis of Time Series: An
  • Introduction. 6th ed., CRC press.
  • Signal processing toolbox, MATLAB

(http://uk.mathworks.com/products/signal/)

  • Statsmodels, python

(http://statsmodels.sourceforge.net/)

slide-18
SLIDE 18

Simon Taylor

Department of Mathematics and Statistics

20 April 2016