NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES - - PowerPoint PPT Presentation

nonparametric time series analysis using gaussian
SMART_READER_LITE
LIVE PREVIEW

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES - - PowerPoint PPT Presentation

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES Sotirios Damouras Advisor: Mark Schervish Committee: Rong Chen (external) Anthony Brockwell Rob Kass Cosma Shalizi Larry Wasserman Overview Present new nonparametric


slide-1
SLIDE 1

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES

Sotirios Damouras

Advisor: Mark Schervish Committee: Rong Chen (external) Anthony Brockwell Rob Kass Cosma Shalizi Larry Wasserman

slide-2
SLIDE 2

Overview

  • Present new nonparametric estimation procedure based on Gaussian

Process regression for modeling nonlinear time series (conditional mean)

  • Outline

– Review of Literature ∗ Dynamics ∗ Estimation (parametric and nonparametric) – Proposed Method ∗ Description ∗ Example - Comments ∗ Approximate Inference ∗ Theoretical results ∗ Model selection – Applications ∗ Univariate series (natural sciences) ∗ Bivariate series (financial econometrics)

1

slide-3
SLIDE 3

Motivation

  • Linear time series (ARMA) models have robust theoretical properties and

estimation procedures, but lack modeling flexibility

  • Many real-life time series exhibit nonlinear behavior (limit cycles, amplitude

dependent frequency etc.) which cannot be captured by linear models

  • Canadian lynx time series is a famous advocate of nonlinearity

Year log10(Lynx)

1820 1840 1860 1880 1900 1920 2.0 2.5 3.0 3.5

(a) Canadian lynx log-series

2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5

Xt−1 Xt

(b) Directed lag-1 scatter plot

2

slide-4
SLIDE 4

Nonlinear Dynamics

  • Nonlinear Autoregressive Model (NLAR)

Xt = f(Xt−1, . . . , Xt−p) + ǫt Curse of dimensionality

  • Nonlinear Additive Autoregressive Model (NLAAR)

Xt = f1(Xt−1) + . . . + fp(Xt−p) + ǫt Problems with back-fitting, non-identifiability

  • Functional Coefficient Autoregressive Model (FAR)

Xt = f1(U (1)

t )Xt−1 + . . . + fp(U (p) t )Xt−p + ǫt

Preferred for time series

3

slide-5
SLIDE 5

Parametric Estimation

  • Threshold autoregressive (TAR) model [Tong, 1990]

– Define regimes by common threshold variable Ut – Coefficient functions fi(Ut) are piecewise constant Xt =

k

  • i=1
  • α(i)

0 + α(i) 1 Xt−1 + . . . + α(i) p Xt−p + ǫ(i) t

  • I(Ut ∈ Ai)

– Estimate linear model within each regime

  • Other alternatives less general/popular, e.g.

– Exponential autoregressive (EXPAR) model [Haggan and Ozaki, 1990] – Smooth transition autoregressive (STAR) model [Ter ¨ asvirta, 1994]

4

slide-6
SLIDE 6

Nonparametric Estimation

  • Kernel methods: All functions share the same argument Ut.

Run local regression around U, based on kernel Kh – Arranged Local Regression (ALR) [Chen and Tsay, 1993] min

{αi}

  • t

(Xt − α1Xt−1 + . . . + αpXt−p)2Kh(Ut − U) ⇒ ˆ fi(U) = ˆ αi – Local Linear Regression (LLR) [Cai, Fan and Yao, 2000, 1993] min

{αi,βi}

  • t
  • Xt −

p

  • i=1
  • αi + βi(Ut − U)
  • Xt−i

2 Kh(Ut − U) ⇒ ˆ fi(U) = ˆ αi

  • Splines [Huang and Shen, 2004] Different arguments U (i)

t , number of spline

bases mi controls smoothness min

{αi,j}

  • t
  • Xt −

p

  • i=1

mi

  • j=1

αi,jBi,j

  • U (i)

t

  • Xt−i

2 ⇒ ˆ fi(U) =

mi

  • j=1

ˆ αi,jBi,j(U)

5

slide-7
SLIDE 7

Gaussian Process Regression

  • Random function f follows Gaussian Process (GP) with mean function µ(·)

and covariance function C(·, ·), denoted by f ∼ GP(µ, C), if ∀n ∈ N [f(X1), . . . , f(Xn)]⊤ ∼ Nn(µ, C) µ = [µ(X1), . . . , µ(Xn)]⊤ C =

  • {C(Xi, Xj)}n

i,j=1

  • GP regression is a Bayesian nonparametric technique

– GP prior on regression function f ∼ GP(µ, C) – Covariance function C(·, ·) controls smoothing – Data from Yi = f(Xi) + ǫi, where ǫi ∼ N(0, σ2) (conjugacy) – Posterior distribution of f also follows GP

6

slide-8
SLIDE 8

Proposed Method I

Adopt FAR dynamics, allow different arguments Xt = f1(U (1)

t )Xt−1 + . . . + fp(U (p) t )Xt−p + ǫt,

ǫt ∼ N(0, σ2) Prior Specification

  • A-priori independent fi ∼ GP(µi, Ci)

→ different smoothness for each function

  • Constant prior mean µi(x) ≡ µi

→ prior bias toward linear model

  • Gaussian covariance kernel Ci(x, x′) = τ 2

i exp

  • − x−x′

h2

i

  • → relatively smooth functions (infinitely differentiable)

7

slide-9
SLIDE 9

Proposed Method II

  • Estimation

– Use “conditional likelihood”, treating first p observations as fixed – A-posteriori each function follows a GP – Likelihood-based estimation (on-line inference, relation to HMM)

  • Prediction

– Predictions follow naturally from functions’ posterior – One-step-ahead predictive distributions available explicitly – Multi-step-ahead predictive distributions analytically intractable. Approximate by Monte-Carlo simulation

8

slide-10
SLIDE 10

Proposed Method III

Empirical Bayes for choosing prior parameters θ of mean and covariance functions

  • Distribute prior uncertainty evenly among functions
  • Select θ that maximizes marginal log-likelihood ℓ(X(p+1):T|θ)

– Closed form expression for ℓ – Gradient ∇θℓ is available with little extra effort – Convenient for selecting many parameters

  • Likelihood ℓ automatically penalizes function variability

– Allow constant fi as bandwidth hi → ∞

  • Initial values from AR model

– Avoid bad local maxima – Favor constant functions

9

slide-11
SLIDE 11

Canadian Lynx Data I

Estimated functional coefficients for model Xt = f1(Xt−2)Xt−1 + f2(Xt−2)Xt−2 + ǫt

2.0 2.5 3.0 3.5 1.2 1.4 1.6 1.8 GP TAR LLR SPLINES I II I III I II I I I I IIII I IIIIIIIII I II I I I I I I I II I I II I I I I II I I II II II II I II IIII I I I II I II II II I I II I I I I I I III I I I II I I I III I IIII

(a) f1

2.0 2.5 3.0 3.5 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 GP TAR LLR SPLINES I II I III I II I I I I IIII I IIIIIIIII I II I I I I I I I II I I II I I I I II I I II II II II I II IIII I I I II I II II II I I II I I I I I I III I I I II I I I III I IIII

(b) f2

10

slide-12
SLIDE 12

Canadian Lynx Data II

Fitted values

20 40 60 80 100 2.0 2.5 3.0 3.5 GP TAR LLR SPLINES

  • Different models give similar one-step-ahead predictions

11

slide-13
SLIDE 13

Canadian Lynx Data III

Iterated dynamics

2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5

GP

+

2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5

TAR

+

2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5

LLR

+

2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5

Splines

+

  • Don’t need much flexibility for capturing nonlinear behavior

12

slide-14
SLIDE 14

Comments

  • TAR

– Common thresholds for all coefficient functions – Likelihood is discontinuous w.r.t. thresholds ∗ Complications as number of regimes increases ∗ Resort to graphical/ad-hoc methods

  • Kernel methods

– Common argument for all coefficient functions – Single bandwidth, similar smoothness across estimates – No regularization (boundary behavior, extrapolation)

  • Splines

– Extrapolate linearly ∗ Unbounded estimates ∗ Unstable models – Problem persists even with regularization

13

slide-15
SLIDE 15

Approximate Inference I

  • GP estimation is computationally expensive

– Need to invert T × T covariance matrices – Scale as O(T 3), infeasible for large T

  • Reduced rank approximation

– Posterior mean: fi(U) = T

t=1 βi,tCi(U, U (i) t )

– Approximation: fi(U) ≈ mi

j=1 βi,jCi(U, B(i) j ), mi ≪ T

– Basis points (B(i)

1 , . . . , B(i) mi), similar to Spline knots

– Estimation scales as O

  • (p

i=1 mi)2 T

  • 14
slide-16
SLIDE 16

Approximate Inference II

Example on Canadian lynx data

2.0 2.5 3.0 3.5 1.34 1.36 1.38 1.40 1.42 Exact m=10 m=5 m=3 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I

(a) f1

2.0 2.5 3.0 3.5 −0.50 −0.45 −0.40 −0.35 −0.30 −0.25 Exact m=10 m=5 m=3 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I

(b) f2

  • Approximation works well for smooth functions

15

slide-17
SLIDE 17

Approximate Inference III

  • Implementation

– Small mi (around 10) is sufficient – Modify kernel to ensure numerical stability

  • Fixed number of stochastic parameters {βi,j}

– Low memory cost – Bigger models, e.g. multivariate

  • Extend method to State-Space models

– Treat {βi,j} as unobserved variables – Linearize model, apply Kalman filter

16

slide-18
SLIDE 18

Theoretical Results

  • Posterior means are solutions to penalized least squares problem in

reproducing kernel Hilbert spaces Hi (defined by Ci) min

{fi∈Hi}

  • 1

σ2

  • t

(Xt − f1(U (1)

t )Xt−1 − . . . − fp(U (p) t

)Xt−p)2 +

  • i

hi2

Hi

  • Consistency: ˆ

fi − fi2

Hi(C) → 0 over compact C

– Assume true functions fi ∈ Hi – Identifiability and ergodicity conditions

  • Extend result to nonlinear time series regression

Yt = f1(U (1)

t )X(1) t

+ . . . + fp(U (p)

t )X(p) t

+ ǫt

  • Approximate inference: estimates converge to appropriate projections of

true functions fi

17

slide-19
SLIDE 19

Model Fitting

  • Model Specification

– Which terms to include? – Constant coefficients (partially linear)?

  • Model Selection Minimize standard information criteria (AIC, BIC)

– Forward Algorithm ∗ Incrementally add new terms fi(U (i))Xt−i ∗ Check whether coefficients are constant – Parsimonious models (nonlinearity only where needed)

  • Diagnostic Procedures based on

– Recursive residuals → fit – Simulation → stability and dynamics

18

slide-20
SLIDE 20

Sunspot Data I

  • lf’s sunspot numbers

1700 1750 1800 1850 1900 1950 2000 5 10 15 20 25

Year 2( Sunspot+1 −1)

  • TAR:

Xt =              1.92 + .84Xt−1 + .07Xt−2 − .32Xt−3 + .15Xt−4 −.2Xt−5 + .01Xt−7 + .19Xt−7 − .27Xt−8 +.21Xt−9 + .01Xt−10 + .09Xt−11 + ǫ(1)

t ,

if Xt−8 ≤ 11.93 4.27 + 1.44Xt−1 − .84Xt−2 + .06Xt−3 + ǫ(2)

t ,

if Xt−8 > 11.93

19

slide-21
SLIDE 21

Sunspot Data II

  • GP: Xt = f1(Xt−3)Xt−1 + f2(Xt−1)Xt−5 + f3(Xt−2)Xt−7 + ǫt

.

5 10 15 20 25 0.8 0.9 1.0 1.1 1.2

f1

I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −0.4 −0.2 0.0

f2

I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 0.0 0.1 0.2 0.3 0.4 0.5 0.6

f3

I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I

  • LLR: Xt = f1(Xt−3)Xt−1 + f2(Xt−3)Xt−2 + f3(Xt−3)Xt−3+

. +f4(Xt−3)Xt−6 + f5(Xt−3)Xt−8 + ǫt

5 10 15 20 25 0.0 0.5 1.0 1.5

f1

I IIIII I III IIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −1.0 −0.5 0.0 0.5 1.0 1.5

f2

I IIIII I III IIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −0.6 −0.2 0.2 0.6

f3

I IIIII I III IIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −0.4 −0.2 0.0 0.2 0.4

f4

I IIIII I III IIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −0.2 0.0 0.2 0.4

f5

I IIIII I III IIIIIIII I IIII I IIII II IIIIIIII I I

  • Splines: Xt = f1(Xt−2)Xt−1 + f2(Xt−2)Xt−2 + f3(Xt−2)Xt−3+

. +f4(Xt−2)Xt−9 + f5(Xt−2)Xt−11 + ǫt

5 10 15 20 25 0.5 1.0 1.5 2.0 2.5 3.0

f1

I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −3 −2 −1

f2

I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −1.5 −1.0 −0.5 0.0 0.5

f3

I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −0.2 0.0 0.2 0.4 0.6

f4

I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −1.0 −0.5 0.0 0.5

f5

I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I

20

slide-22
SLIDE 22

Sunspot Data III

5 10 15 20 25 5 10 15 20 25

GP

+

5 10 15 20 25 5 10 15 20 25

TAR

+

5 10 15 20 25 5 10 15 20 25

LLR

+

5 10 15 20 25 5 10 15 20 25

Splines

+

(a) Iterated dynamics

5 10 15 20 25 1.5 2.0 2.5 3.0 3.5 4.0 4.5 lead time Mean Absolute Prediction Error AR GP TAR LLR Splines

(b) Mean absolute error of iterative multi- step-ahead predictions

  • GP gives simpler model with at least as good behavior

21

slide-23
SLIDE 23

Index-Futures Data I

  • High-freq. (1-min.) S&P 500 Index and Futures data, May and Nov. 1993

2000 4000 6000 6.08 6.09 6.10 6.11 6.12

log−Index (May)

2000 4000 6000 6.08 6.09 6.10 6.11 6.12

log−Futures (May)

  • Theoretical Futures price: Ft,τ = St exp{(rt,τ − qt,τ)(t − τ)}

⇒ Mispricing Error: Zt = ln Ft,τ − ln St − (rt,τ − qt,τ)(t − τ)

2000 4000 6000 −0.3 −0.1 0.0 0.1 0.2 0.3

Error (May)

22

slide-24
SLIDE 24

Index-Futures Data II

  • Cointegration

– Series {Y t = (Y1,t, Y2,t)} are integrated (∆Y t stationary) – Have stationary linear combination Zt = a1Y1,t + a2Y2,t

  • Vector Error-Correction representation

∆Y t = a0 +

  • i≥1

Ai∆Y t−i + bZt−1 + ǫt

  • Data exhibit nonlinear behavior, error-correcting intensity is

– Higher when Zt > 0, (arbitrage opportunity) – Lower when Zt ≈ 0, (transaction costs)

23

slide-25
SLIDE 25

Index-Futures Data III

  • GP: Nonlinearity only from two terms, futures returns are unpredictable

∆ ln St = f1Zt−1 + f2(Zt−1)∆ ln Ft−1,τ + f3(Zt−1)∆ ln Ft−2,τ + f4∆ ln Ft−3,τ + f5∆ ln Ft−4,τ + f6∆ ln Ft−5,τ + f7∆ ln Ft−6,τ + f8∆ ln St−9 + ǫ2t ∆ ln Ft,τ = ǫ1t

−0.3 −0.1 0.0 0.1 0.2 0.3 −0.10 −0.05 0.00 0.05 0.10

Fitted Values

Zt−1 ∆lnSt

  • Threshold Vector Error-Correction models

– [Martens, Kofman and Vorst, 1998]: 5 regimes, ∼ 8 terms – [Forbes, Kalf and Kofman, 1999]: 3 regimes, 16 terms (Bayesian) – [Tsay, 1998]: 3 regimes, 16 terms

24

slide-26
SLIDE 26

Index-Futures Data IV

Mean square error of iterative multi-step-ahead predictions for Nov.’s data

2 4 6 8 10 1.7 1.8 1.9 2.0 2.1 lead time Mean Squared Prediction Error GP MKV FKK Tsay

(a) Index log-returns

2 4 6 8 10 7.35 7.40 7.45 7.50 7.55 7.60 7.65 7.70 lead time Mean Squared Prediction Error GP MKV FKK Tsay

(b) Futures log-returns

  • Threshold models overfit Futures

25

slide-27
SLIDE 27

Summary

  • Presented nonparametric procedure based on GPs for modeling

nonlinear time series that is practical, flexible and parsimonious, and compares favorably to competing methods.

  • Future Work

– Alternative Prior Specification (covariance and mean functions) – Alternative Estimation (relax conjugacy) ∗ MCMC ∗ Particle Filtering – Theory ∗ Convergence rates, adaptivity ∗ Model selection consistency – More Applications

26

slide-28
SLIDE 28

.

Thank You, Questions ?