NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES - - PowerPoint PPT Presentation
NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES - - PowerPoint PPT Presentation
NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES Sotirios Damouras Advisor: Mark Schervish Committee: Rong Chen (external) Anthony Brockwell Rob Kass Cosma Shalizi Larry Wasserman Overview Present new nonparametric
Overview
- Present new nonparametric estimation procedure based on Gaussian
Process regression for modeling nonlinear time series (conditional mean)
- Outline
– Review of Literature ∗ Dynamics ∗ Estimation (parametric and nonparametric) – Proposed Method ∗ Description ∗ Example - Comments ∗ Approximate Inference ∗ Theoretical results ∗ Model selection – Applications ∗ Univariate series (natural sciences) ∗ Bivariate series (financial econometrics)
1
Motivation
- Linear time series (ARMA) models have robust theoretical properties and
estimation procedures, but lack modeling flexibility
- Many real-life time series exhibit nonlinear behavior (limit cycles, amplitude
dependent frequency etc.) which cannot be captured by linear models
- Canadian lynx time series is a famous advocate of nonlinearity
Year log10(Lynx)
1820 1840 1860 1880 1900 1920 2.0 2.5 3.0 3.5
(a) Canadian lynx log-series
2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5
Xt−1 Xt
(b) Directed lag-1 scatter plot
2
Nonlinear Dynamics
- Nonlinear Autoregressive Model (NLAR)
Xt = f(Xt−1, . . . , Xt−p) + ǫt Curse of dimensionality
- Nonlinear Additive Autoregressive Model (NLAAR)
Xt = f1(Xt−1) + . . . + fp(Xt−p) + ǫt Problems with back-fitting, non-identifiability
- Functional Coefficient Autoregressive Model (FAR)
Xt = f1(U (1)
t )Xt−1 + . . . + fp(U (p) t )Xt−p + ǫt
Preferred for time series
3
Parametric Estimation
- Threshold autoregressive (TAR) model [Tong, 1990]
– Define regimes by common threshold variable Ut – Coefficient functions fi(Ut) are piecewise constant Xt =
k
- i=1
- α(i)
0 + α(i) 1 Xt−1 + . . . + α(i) p Xt−p + ǫ(i) t
- I(Ut ∈ Ai)
– Estimate linear model within each regime
- Other alternatives less general/popular, e.g.
– Exponential autoregressive (EXPAR) model [Haggan and Ozaki, 1990] – Smooth transition autoregressive (STAR) model [Ter ¨ asvirta, 1994]
4
Nonparametric Estimation
- Kernel methods: All functions share the same argument Ut.
Run local regression around U, based on kernel Kh – Arranged Local Regression (ALR) [Chen and Tsay, 1993] min
{αi}
- t
(Xt − α1Xt−1 + . . . + αpXt−p)2Kh(Ut − U) ⇒ ˆ fi(U) = ˆ αi – Local Linear Regression (LLR) [Cai, Fan and Yao, 2000, 1993] min
{αi,βi}
- t
- Xt −
p
- i=1
- αi + βi(Ut − U)
- Xt−i
2 Kh(Ut − U) ⇒ ˆ fi(U) = ˆ αi
- Splines [Huang and Shen, 2004] Different arguments U (i)
t , number of spline
bases mi controls smoothness min
{αi,j}
- t
- Xt −
p
- i=1
mi
- j=1
αi,jBi,j
- U (i)
t
- Xt−i
2 ⇒ ˆ fi(U) =
mi
- j=1
ˆ αi,jBi,j(U)
5
Gaussian Process Regression
- Random function f follows Gaussian Process (GP) with mean function µ(·)
and covariance function C(·, ·), denoted by f ∼ GP(µ, C), if ∀n ∈ N [f(X1), . . . , f(Xn)]⊤ ∼ Nn(µ, C) µ = [µ(X1), . . . , µ(Xn)]⊤ C =
- {C(Xi, Xj)}n
i,j=1
- GP regression is a Bayesian nonparametric technique
– GP prior on regression function f ∼ GP(µ, C) – Covariance function C(·, ·) controls smoothing – Data from Yi = f(Xi) + ǫi, where ǫi ∼ N(0, σ2) (conjugacy) – Posterior distribution of f also follows GP
6
Proposed Method I
Adopt FAR dynamics, allow different arguments Xt = f1(U (1)
t )Xt−1 + . . . + fp(U (p) t )Xt−p + ǫt,
ǫt ∼ N(0, σ2) Prior Specification
- A-priori independent fi ∼ GP(µi, Ci)
→ different smoothness for each function
- Constant prior mean µi(x) ≡ µi
→ prior bias toward linear model
- Gaussian covariance kernel Ci(x, x′) = τ 2
i exp
- − x−x′
h2
i
- → relatively smooth functions (infinitely differentiable)
7
Proposed Method II
- Estimation
– Use “conditional likelihood”, treating first p observations as fixed – A-posteriori each function follows a GP – Likelihood-based estimation (on-line inference, relation to HMM)
- Prediction
– Predictions follow naturally from functions’ posterior – One-step-ahead predictive distributions available explicitly – Multi-step-ahead predictive distributions analytically intractable. Approximate by Monte-Carlo simulation
8
Proposed Method III
Empirical Bayes for choosing prior parameters θ of mean and covariance functions
- Distribute prior uncertainty evenly among functions
- Select θ that maximizes marginal log-likelihood ℓ(X(p+1):T|θ)
– Closed form expression for ℓ – Gradient ∇θℓ is available with little extra effort – Convenient for selecting many parameters
- Likelihood ℓ automatically penalizes function variability
– Allow constant fi as bandwidth hi → ∞
- Initial values from AR model
– Avoid bad local maxima – Favor constant functions
9
Canadian Lynx Data I
Estimated functional coefficients for model Xt = f1(Xt−2)Xt−1 + f2(Xt−2)Xt−2 + ǫt
2.0 2.5 3.0 3.5 1.2 1.4 1.6 1.8 GP TAR LLR SPLINES I II I III I II I I I I IIII I IIIIIIIII I II I I I I I I I II I I II I I I I II I I II II II II I II IIII I I I II I II II II I I II I I I I I I III I I I II I I I III I IIII
(a) f1
2.0 2.5 3.0 3.5 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 GP TAR LLR SPLINES I II I III I II I I I I IIII I IIIIIIIII I II I I I I I I I II I I II I I I I II I I II II II II I II IIII I I I II I II II II I I II I I I I I I III I I I II I I I III I IIII
(b) f2
10
Canadian Lynx Data II
Fitted values
20 40 60 80 100 2.0 2.5 3.0 3.5 GP TAR LLR SPLINES
- Different models give similar one-step-ahead predictions
11
Canadian Lynx Data III
Iterated dynamics
2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5
GP
+
2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5
TAR
+
2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5
LLR
+
2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5
Splines
+
- Don’t need much flexibility for capturing nonlinear behavior
12
Comments
- TAR
– Common thresholds for all coefficient functions – Likelihood is discontinuous w.r.t. thresholds ∗ Complications as number of regimes increases ∗ Resort to graphical/ad-hoc methods
- Kernel methods
– Common argument for all coefficient functions – Single bandwidth, similar smoothness across estimates – No regularization (boundary behavior, extrapolation)
- Splines
– Extrapolate linearly ∗ Unbounded estimates ∗ Unstable models – Problem persists even with regularization
13
Approximate Inference I
- GP estimation is computationally expensive
– Need to invert T × T covariance matrices – Scale as O(T 3), infeasible for large T
- Reduced rank approximation
– Posterior mean: fi(U) = T
t=1 βi,tCi(U, U (i) t )
– Approximation: fi(U) ≈ mi
j=1 βi,jCi(U, B(i) j ), mi ≪ T
– Basis points (B(i)
1 , . . . , B(i) mi), similar to Spline knots
– Estimation scales as O
- (p
i=1 mi)2 T
- 14
Approximate Inference II
Example on Canadian lynx data
2.0 2.5 3.0 3.5 1.34 1.36 1.38 1.40 1.42 Exact m=10 m=5 m=3 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
(a) f1
2.0 2.5 3.0 3.5 −0.50 −0.45 −0.40 −0.35 −0.30 −0.25 Exact m=10 m=5 m=3 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
(b) f2
- Approximation works well for smooth functions
15
Approximate Inference III
- Implementation
– Small mi (around 10) is sufficient – Modify kernel to ensure numerical stability
- Fixed number of stochastic parameters {βi,j}
– Low memory cost – Bigger models, e.g. multivariate
- Extend method to State-Space models
– Treat {βi,j} as unobserved variables – Linearize model, apply Kalman filter
16
Theoretical Results
- Posterior means are solutions to penalized least squares problem in
reproducing kernel Hilbert spaces Hi (defined by Ci) min
{fi∈Hi}
- 1
σ2
- t
(Xt − f1(U (1)
t )Xt−1 − . . . − fp(U (p) t
)Xt−p)2 +
- i
hi2
Hi
- Consistency: ˆ
fi − fi2
Hi(C) → 0 over compact C
– Assume true functions fi ∈ Hi – Identifiability and ergodicity conditions
- Extend result to nonlinear time series regression
Yt = f1(U (1)
t )X(1) t
+ . . . + fp(U (p)
t )X(p) t
+ ǫt
- Approximate inference: estimates converge to appropriate projections of
true functions fi
17
Model Fitting
- Model Specification
– Which terms to include? – Constant coefficients (partially linear)?
- Model Selection Minimize standard information criteria (AIC, BIC)
– Forward Algorithm ∗ Incrementally add new terms fi(U (i))Xt−i ∗ Check whether coefficients are constant – Parsimonious models (nonlinearity only where needed)
- Diagnostic Procedures based on
– Recursive residuals → fit – Simulation → stability and dynamics
18
Sunspot Data I
- W¨
- lf’s sunspot numbers
1700 1750 1800 1850 1900 1950 2000 5 10 15 20 25
Year 2( Sunspot+1 −1)
- TAR:
Xt = 1.92 + .84Xt−1 + .07Xt−2 − .32Xt−3 + .15Xt−4 −.2Xt−5 + .01Xt−7 + .19Xt−7 − .27Xt−8 +.21Xt−9 + .01Xt−10 + .09Xt−11 + ǫ(1)
t ,
if Xt−8 ≤ 11.93 4.27 + 1.44Xt−1 − .84Xt−2 + .06Xt−3 + ǫ(2)
t ,
if Xt−8 > 11.93
19
Sunspot Data II
- GP: Xt = f1(Xt−3)Xt−1 + f2(Xt−1)Xt−5 + f3(Xt−2)Xt−7 + ǫt
.
5 10 15 20 25 0.8 0.9 1.0 1.1 1.2
f1
I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −0.4 −0.2 0.0
f2
I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 0.0 0.1 0.2 0.3 0.4 0.5 0.6
f3
I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I
- LLR: Xt = f1(Xt−3)Xt−1 + f2(Xt−3)Xt−2 + f3(Xt−3)Xt−3+
. +f4(Xt−3)Xt−6 + f5(Xt−3)Xt−8 + ǫt
5 10 15 20 25 0.0 0.5 1.0 1.5
f1
I IIIII I III IIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −1.0 −0.5 0.0 0.5 1.0 1.5
f2
I IIIII I III IIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −0.6 −0.2 0.2 0.6
f3
I IIIII I III IIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −0.4 −0.2 0.0 0.2 0.4
f4
I IIIII I III IIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −0.2 0.0 0.2 0.4
f5
I IIIII I III IIIIIIII I IIII I IIII II IIIIIIII I I
- Splines: Xt = f1(Xt−2)Xt−1 + f2(Xt−2)Xt−2 + f3(Xt−2)Xt−3+
. +f4(Xt−2)Xt−9 + f5(Xt−2)Xt−11 + ǫt
5 10 15 20 25 0.5 1.0 1.5 2.0 2.5 3.0
f1
I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −3 −2 −1
f2
I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −1.5 −1.0 −0.5 0.0 0.5
f3
I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −0.2 0.0 0.2 0.4 0.6
f4
I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I 5 10 15 20 25 −1.0 −0.5 0.0 0.5
f5
I IIIII I IIIIIIIIIII I IIII I IIII II IIIIIIII I I
20
Sunspot Data III
5 10 15 20 25 5 10 15 20 25
GP
+
5 10 15 20 25 5 10 15 20 25
TAR
+
5 10 15 20 25 5 10 15 20 25
LLR
+
5 10 15 20 25 5 10 15 20 25
Splines
+
(a) Iterated dynamics
5 10 15 20 25 1.5 2.0 2.5 3.0 3.5 4.0 4.5 lead time Mean Absolute Prediction Error AR GP TAR LLR Splines
(b) Mean absolute error of iterative multi- step-ahead predictions
- GP gives simpler model with at least as good behavior
21
Index-Futures Data I
- High-freq. (1-min.) S&P 500 Index and Futures data, May and Nov. 1993
2000 4000 6000 6.08 6.09 6.10 6.11 6.12
log−Index (May)
2000 4000 6000 6.08 6.09 6.10 6.11 6.12
log−Futures (May)
- Theoretical Futures price: Ft,τ = St exp{(rt,τ − qt,τ)(t − τ)}
⇒ Mispricing Error: Zt = ln Ft,τ − ln St − (rt,τ − qt,τ)(t − τ)
2000 4000 6000 −0.3 −0.1 0.0 0.1 0.2 0.3
Error (May)
22
Index-Futures Data II
- Cointegration
– Series {Y t = (Y1,t, Y2,t)} are integrated (∆Y t stationary) – Have stationary linear combination Zt = a1Y1,t + a2Y2,t
- Vector Error-Correction representation
∆Y t = a0 +
- i≥1
Ai∆Y t−i + bZt−1 + ǫt
- Data exhibit nonlinear behavior, error-correcting intensity is
– Higher when Zt > 0, (arbitrage opportunity) – Lower when Zt ≈ 0, (transaction costs)
23
Index-Futures Data III
- GP: Nonlinearity only from two terms, futures returns are unpredictable
∆ ln St = f1Zt−1 + f2(Zt−1)∆ ln Ft−1,τ + f3(Zt−1)∆ ln Ft−2,τ + f4∆ ln Ft−3,τ + f5∆ ln Ft−4,τ + f6∆ ln Ft−5,τ + f7∆ ln Ft−6,τ + f8∆ ln St−9 + ǫ2t ∆ ln Ft,τ = ǫ1t
−0.3 −0.1 0.0 0.1 0.2 0.3 −0.10 −0.05 0.00 0.05 0.10
Fitted Values
Zt−1 ∆lnSt
- Threshold Vector Error-Correction models
– [Martens, Kofman and Vorst, 1998]: 5 regimes, ∼ 8 terms – [Forbes, Kalf and Kofman, 1999]: 3 regimes, 16 terms (Bayesian) – [Tsay, 1998]: 3 regimes, 16 terms
24
Index-Futures Data IV
Mean square error of iterative multi-step-ahead predictions for Nov.’s data
2 4 6 8 10 1.7 1.8 1.9 2.0 2.1 lead time Mean Squared Prediction Error GP MKV FKK Tsay
(a) Index log-returns
2 4 6 8 10 7.35 7.40 7.45 7.50 7.55 7.60 7.65 7.70 lead time Mean Squared Prediction Error GP MKV FKK Tsay
(b) Futures log-returns
- Threshold models overfit Futures
25
Summary
- Presented nonparametric procedure based on GPs for modeling
nonlinear time series that is practical, flexible and parsimonious, and compares favorably to competing methods.
- Future Work
– Alternative Prior Specification (covariance and mean functions) – Alternative Estimation (relax conjugacy) ∗ MCMC ∗ Particle Filtering – Theory ∗ Convergence rates, adaptivity ∗ Model selection consistency – More Applications
26
.