Introduction to Time Series Data and Analysis Simon Taylor - PowerPoint PPT Presentation

Introduction to Time Series Data and Analysis Simon Taylor Department of Mathematics and Statistics 20 April 2016

Contents What is Time Series Data? Analysis Tools Trace Plot Auto-Correlation Function Spectrum Time Series Models Moving Average Auto-Regressive Further Topics

What is Time Series Data? A time series is a set of observations made sequentially through time. Examples: • Changes in execution time, RAM or bandwidth usage. • Times a software has run in consecutive periods of time. • Financial, geophysical, marketing, demographic, etc. The objectives in time series analysis are: Description How does the data vary over time? Explanation What causes the observed variation? Prediction What are the future values of the series? Control Aim to improve control over the process.

Common Questions Q: How important is preserving 5 data order? 1Hz 0 −5 A: Very! Changing data order breaks the dependence 5 between measurements. 10Hz 0 −5 Q: How frequent do I need to take measurements? 5 A: It depends: 100Hz 0 −5 • Too sparse, risk missing the dependence structure. 0 10 20 30 40 50 Time • Too frequent, swamped with noise. Figure: Sampling Frequency

Why is time series important in benchmarking? 3 2 Q: Can I use simple summary 1 X 0 statistics? −1 −2 −3 A: You can, but they only 3 2 describe overall properties. 1 Y 0 −1 −2 Q: Can’t I just interpolate −3 3 between data points? 2 1 Z 0 A: Signals are often subject to −1 −2 uncontrollable random noise. −3 0 200 400 600 800 1000 Time Error from interpolation may be large if noise is large. Figure: Three times series with x = 0 and s 2 = 1. ¯

Analysis Tools – Trace Plot A trace plot is a graph of the measurements against time. Easy to visually identify key features: • Trends – Long-term trend in the mean level. • Seasonality – Regular peaks & falls in the measurements. • Outliers – Unusual measurements that are inconsistent with the rest of the data. • Discontinuities – Abrupt change to the underlying process.

Analysis Tools – Auto-correlation function Correlation measures the linear dependence between two data sets. Auto-correlation measures the correlation between all data pairs at lag k apart. � T − k t = 1 ( x t − ¯ x )( x t + k − ¯ x ) r k = , ( T − 1 ) s 2 x and s 2 is the sample where ¯ mean and variance. Figure: Lag 5 ACF calculation.

Analysis Tools – Spectrum X The spectrum describes how 1/8 the power in a time series 1/7 varies across frequencies. 1/6 1/5 2 � T � I ( ω ) = 1 1/4 � � � x t e i 2 π t ω , � � π T 0 10 20 30 40 50 � � Time � � t = 1 50 40 Spectrum 30 for ω ∈ ( 0 , 1 / 2 ] . 20 10 0 Identifies prominent seasonal 0.0 0.1 0.2 0.3 0.4 0.5 Frequency and cyclic variation. Figure: Fourier decomposition and spectrum of time series X t .

Time series models Let X 1 : T = { X 1 , . . . , X T } denote a sequence of T measurements. A time series is stationary if the distribution of any pair of subset separated by lag k , X 1 : t and X 1 + k , t + k , are the same. A time series is weakly stationary if the first two moments are constant over time: E [ X t ] = µ Cov ( X t , X t + k ) = γ ( k ) . and Gaussian White Noise Process, GWNP The time series { Z t } follows a Gaussian white noise process if: Z t ∼ N ( 0 , σ 2 ) , t = 1 , . . . , T

Gaussian White Noise Process Figure: Gaussian white noise process.

MA( q ) process Moving Average Process of Order q , MA( q ) The process { X t } is a moving average process of order q if: X t = β 0 Z t + β 1 Z t − 1 + · · · + β q Z t − q where { Z t } is a GWNP and β 0 , . . . , β q are constants ( β 0 = 1). Expectation: E [ X t ] = 0. Auto-covariance: � σ 2 � q −| k | β i β i + | k | , | k | = 0 , . . . , q ; Cov ( X t , X t + k ) = i = 0 0 , otherwise .

MA( q ) process Figure: Left: MA(1), β 1 = 0 . 9. Right: MA(2), ( β 1 , β 2 ) = ( − 0 . 4 , 0 . 9 ) .

AR( p ) process Autoregressive Process of Order p , AR( p ) The process { Y t } is an autoregressive process of order p if: Y t = α 1 Y t − 1 + · · · + α p Y t − p + Z t where { Z t } is a GWNP and α 1 , . . . , α p are constants. Expectation: E [ X t ] = 0. Auto-covariance for AR(1): α | k | Cov ( X t , X t + k ) = σ 2 1 , provided | α 1 | < 1 . 1 − α 2 1

AR( p ) process Figure: Left: AR(1), α 1 = 0 . 9. Right: AR(2), ( α 1 , α 2 ) = ( 0 . 8 , − 0 . 64 ) .

Non-stationary process Random Walk Non−stationary AR(2) 20 20 10 0 0 −10 −20 0 200 400 600 800 1000 0 200 400 600 800 1000 Time Time AR(1) w/ Mean Change Concatenated Haar MA 3 2 2 0 1 −2 0 −4 −1 −6 −2 −8 −3 0 200 400 600 800 1000 0 200 400 600 800 1000 Time Time Figure: Examples of non-stationary processes.

On-going Research in Time Series nonlinear locally stationary processes bootstrap confidence interval maximum likelihood structural change forecasting Monte Carlo experiment spectral density Whittle likelihood ARFIMA heavy tail nonstationary time series ergodicity block bootstrap time series Kalman filtering Gaussian process ARMA stationarity outlier Portmanteau test random coefficients likelihood ratio test periodic time series changepoint seasonality estimation Markov chain VAR Stochastic volatility cointegration bilinear model CLT score test subsampling conditional heteroscedasticity robustness efficiency empirical distribution function consistency non−Gaussian kernel−density estimation INAR structural break heteroscedastic fractional integration MLE temporal aggregation wavelet QMLE AR parameter estimation unobserved components nonstationary MCMC infinite variance TAR model selection EM algorithm factor model unit roots ARCH least squares Additive outliers local Whittle ARIMA AIC residual autocorrelations fractional cointegration GARCH Bayesian inference VARMA spectral analysis Asymptotic distribution neural network least squares estimation count data smoothing goodness−of−fit high−frequency data state−space model Fisher information matrix Lagrange multiplier periodogram Brownian motion linear time series spectral density matrix integer−valued time series prediction simulation power MA nonlinear AR partial autocorrelation CUSUM test semiparametric estimation hypothesis testing deterministic trend RJMCMC stationary process nonparametric estimation threshold model spatio−temporal Dickey−Fuller test stationary test long memory point process Autocorrelation missing data Seasonal unit roots identification periodically correlated process multivariate time series Asymptotic normality unit root test Figure: Keyword cloud from the Journal of Time Series Analysis, 2002–2015. Red: Models, Navy: Properties, Grey: Inference & Methods

Further Reading • Box, G. E. P ., Jenkins, G. M. and Reinsel, G. C. (2008) Time series analysis: Forecasting and control . 4th ed., John Wiley & Sons. • Chatfield, C. (2004) The Analysis of Time Series: An Introduction . 6th ed., CRC press. • Signal processing toolbox, MATLAB � ( http://uk.mathworks.com/products/signal/ ) • Statsmodels, python ( http://statsmodels.sourceforge.net/ )

Simon Taylor Department of Mathematics and Statistics 20 April 2016

Introduction to Time Series Data and Analysis Simon Taylor - PowerPoint PPT Presentation

Introduction to Time Series Data and Analysis Simon Taylor Department of Mathematics and Statistics 20 April 2016 Contents What is Time Series Data? Analysis Tools Trace Plot Auto-Correlation Function Spectrum Time Series Models Moving

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Why do you care? Time-series data is all over the place. Time-Series Data Kaitlin Duck

standard series Overview DP series DX series H series M series bitte hier

Analysis Manuel Len Hoyos Overview What is Time Series Data? Index Prices: Crude Oil

Introduction to Time Series Basic Concepts Time series concepts well cover Elements of

Introduction to Time Series Heino Bohn Nielsen 1 of 15 Outline (1) What is a time series? (2)

Working w ith more than one time series VISU AL IZIN G TIME SE R IE S DATA IN P YTH ON Thomas

E- -Series: Series: Water Mist Extinguishers Water Mist Extinguishers E E- -Series: Series:

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

Compare Time Series Growth Rates Manipulating Time Series Data in Python Comparing Stock

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: Yizhou Sun yzsun@ccs.neu.edu

Time Series Representations for Better Data Mining What can we do with time series data?

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Introduction to the course James Lamb Instructor DataCamp Time Series with data.table in R A

The Caterpillar-SSA approach to time series analysis and its automatization Th.Alexandrov,

W HAT IS TIME SERIES D ATA ? W HAT IS TIME SERIES D ATA ? A value over time W HAT IS TIME

Introd u ction to the Co u rse TIME SE R IE S AN ALYSIS IN P YTH ON Rob Reider Adj u nct

Modeling time series with hidden Markov models Advanced Machine learning 2017 Nadia Figueroa,

Time Series Analysis Henrik Madsen hm@imm.dtu.dk Informatics and Mathematical Modelling

Correlating Events with Time Series for Incident Diagnosis Ricardo Reimao Idea: Identifying

LiNGAM combined with Instantaneous effects can be incorporated explicitly into account through

Time-Series Data in MongoDB on a Budget Peter Schwaller Senior Director Server Engineering,