NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES Sotirios Damouras Advisor: Mark Schervish Committee: Rong Chen (external) Anthony Brockwell Rob Kass Cosma Shalizi Larry Wasserman

Overview • Present new nonparametric estimation procedure based on Gaussian Process regression for modeling nonlinear time series (conditional mean) • Outline – Review of Literature ∗ Dynamics ∗ Estimation (parametric and nonparametric) – Proposed Method ∗ Description ∗ Example - Comments ∗ Approximate Inference ∗ Theoretical results ∗ Model selection – Applications ∗ Univariate series (natural sciences) ∗ Bivariate series (financial econometrics) 1

Motivation • Linear time series (ARMA) models have robust theoretical properties and estimation procedures, but lack modeling flexibility • Many real-life time series exhibit nonlinear behavior (limit cycles, amplitude dependent frequency etc.) which cannot be captured by linear models • Canadian lynx time series is a famous advocate of nonlinearity 3.5 3.5 3.0 3.0 log 10 ( Lynx ) X t 2.5 2.5 2.0 2.0 1820 1840 1860 1880 1900 1920 2.0 2.5 3.0 3.5 X t − 1 Year (a) Canadian lynx log-series (b) Directed lag-1 scatter plot 2

Nonlinear Dynamics • Nonlinear Autoregressive Model (NLAR) X t = f ( X t − 1 , . . . , X t − p ) + ǫ t Curse of dimensionality • Nonlinear Additive Autoregressive Model (NLAAR) X t = f 1 ( X t − 1 ) + . . . + f p ( X t − p ) + ǫ t Problems with back-fitting, non-identifiability • Functional Coefficient Autoregressive Model (FAR) f 1 ( U (1) t ) X t − 1 + . . . + f p ( U ( p ) X t = t ) X t − p + ǫ t Preferred for time series 3

Parametric Estimation • Threshold autoregressive (TAR) model [ Tong, 1990 ] – Define regimes by common threshold variable U t – Coefficient functions f i ( U t ) are piecewise constant k � � α ( i ) 0 + α ( i ) p X t − p + ǫ ( i ) � 1 X t − 1 + . . . + α ( i ) X t = I ( U t ∈ A i ) t i =1 – Estimate linear model within each regime • Other alternatives less general/popular, e.g. – Exponential autoregressive (EXPAR) model [ Haggan and Ozaki, 1990 ] – Smooth transition autoregressive (STAR) model [ Ter ¨ asvirta, 1994 ] 4

Nonparametric Estimation • Kernel methods: All functions share the same argument U t . Run local regression around U , based on kernel K h – Arranged Local Regression (ALR) [ Chen and Tsay, 1993 ] ( X t − α 1 X t − 1 + . . . + α p X t − p ) 2 K h ( U t − U ) ⇒ ˆ � min f i ( U ) = ˆ α i { α i } t – Local Linear Regression (LLR) [ Cai, Fan and Yao, 2000, 1993 ] � 2 � p � � K h ( U t − U ) ⇒ ˆ � � X t − α i + β i ( U t − U ) min X t − i f i ( U ) = ˆ α i { α i ,β i } t i =1 • Splines [ Huang and Shen, 2004 ] Different arguments U ( i ) t , number of spline bases m i controls smoothness � m i � 2 � p � m i � � � U ( i ) ⇒ ˆ � � � X t − min α i,j B i,j X t − i f i ( U ) = α i,j B i,j ( U ) ˆ t { α i,j } t i =1 j =1 j =1 5

Gaussian Process Regression • Random function f follows Gaussian Process (GP) with mean function µ ( · ) and covariance function C ( · , · ) , denoted by f ∼ GP ( µ, C ) , if ∀ n ∈ N [ f ( X 1 ) , . . . , f ( X n )] ⊤ ∼ N n ( µ , C ) µ = [ µ ( X 1 ) , . . . , µ ( X n )] ⊤ � { C ( X i , X j ) } n � C = i,j =1 • GP regression is a Bayesian nonparametric technique – GP prior on regression function f ∼ GP ( µ, C ) – Covariance function C ( · , · ) controls smoothing – Data from Y i = f ( X i ) + ǫ i , where ǫ i ∼ N (0 , σ 2 ) (conjugacy) – Posterior distribution of f also follows GP 6

Proposed Method I Adopt FAR dynamics, allow different arguments f 1 ( U (1) t ) X t − 1 + . . . + f p ( U ( p ) ǫ t ∼ N (0 , σ 2 ) = t ) X t − p + ǫ t , X t Prior Specification • A-priori independent f i ∼ GP ( µ i , C i ) → different smoothness for each function • Constant prior mean µ i ( x ) ≡ µ i → prior bias toward linear model � � − � x − x ′ � • Gaussian covariance kernel C i ( x, x ′ ) = τ 2 i exp h 2 i → relatively smooth functions (infinitely differentiable) 7

Proposed Method II • Estimation – Use “conditional likelihood”, treating first p observations as fixed – A-posteriori each function follows a GP – Likelihood-based estimation (on-line inference, relation to HMM) • Prediction – Predictions follow naturally from functions’ posterior – One-step-ahead predictive distributions available explicitly – Multi-step-ahead predictive distributions analytically intractable. Approximate by Monte-Carlo simulation 8

Proposed Method III Empirical Bayes for choosing prior parameters θ of mean and covariance functions • Distribute prior uncertainty evenly among functions • Select θ that maximizes marginal log-likelihood ℓ ( X ( p +1): T | θ ) – Closed form expression for ℓ – Gradient ∇ θ ℓ is available with little extra effort – Convenient for selecting many parameters • Likelihood ℓ automatically penalizes function variability – Allow constant f i as bandwidth h i → ∞ • Initial values from AR model – Avoid bad local maxima – Favor constant functions 9

Canadian Lynx Data I Estimated functional coefficients for model X t = f 1 ( X t − 2 ) X t − 1 + f 2 ( X t − 2 ) X t − 2 + ǫ t 0.0 GP GP TAR TAR LLR LLR 1.8 SPLINES SPLINES −0.2 −0.4 1.6 −0.6 1.4 −0.8 1.2 −1.0 I II I III I II I I I I IIII I IIIIIIIII II I I I I I I I I II I I II I I I I II I I II II II II I II IIII I I I II I II II II I I II I I I I I I III I I I II I I I III I IIII I II I III I II I I I I IIII I IIIIIIIII I II I I I I I I I II I I II I I I II I I I II II II II I II IIII I I I II I II II II I I II I I I I I III I I I I II I I I III I IIII 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 (a) f 1 (b) f 2 10

Canadian Lynx Data II Fitted values 3.5 3.0 2.5 2.0 GP TAR LLR SPLINES 0 20 40 60 80 100 • Different models give similar one-step-ahead predictions 11

Canadian Lynx Data III Iterated dynamics GP TAR 3.5 3.5 3.0 3.0 2.5 2.5 + + 2.0 2.0 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 LLR Splines 3.5 3.5 3.0 3.0 2.5 2.5 + + 2.0 2.0 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 • Don’t need much flexibility for capturing nonlinear behavior 12

Comments • TAR – Common thresholds for all coefficient functions – Likelihood is discontinuous w.r.t. thresholds ∗ Complications as number of regimes increases ∗ Resort to graphical/ad-hoc methods • Kernel methods – Common argument for all coefficient functions – Single bandwidth, similar smoothness across estimates – No regularization (boundary behavior, extrapolation) • Splines – Extrapolate linearly ∗ Unbounded estimates ∗ Unstable models – Problem persists even with regularization 13

Approximate Inference I • GP estimation is computationally expensive – Need to invert T × T covariance matrices – Scale as O ( T 3 ) , infeasible for large T • Reduced rank approximation t =1 β i,t C i ( U, U ( i ) – Posterior mean: f i ( U ) = � T t ) j =1 β i,j C i ( U, B ( i ) – Approximation: f i ( U ) ≈ � m i j ) , m i ≪ T – Basis points ( B ( i ) 1 , . . . , B ( i ) m i ) , similar to Spline knots i =1 m i ) 2 T � � ( � p – Estimation scales as O 14

Approximate Inference II Example on Canadian lynx data −0.25 Exact 1.42 m=10 m=5 −0.30 m=3 1.40 −0.35 1.38 −0.40 1.36 Exact −0.45 m=10 m=5 m=3 −0.50 1.34 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 (a) f 1 (b) f 2 • Approximation works well for smooth functions 15

Approximate Inference III • Implementation – Small m i (around 10) is sufficient – Modify kernel to ensure numerical stability • Fixed number of stochastic parameters { β i,j } – Low memory cost – Bigger models, e.g. multivariate • Extend method to State-Space models – Treat { β i,j } as unobserved variables – Linearize model, apply Kalman filter 16

Theoretical Results • Posterior means are solutions to penalized least squares problem in reproducing kernel Hilbert spaces H i (defined by C i ) � � 1 ) X t − p ) 2 + � ( X t − f 1 ( U (1) t ) X t − 1 − . . . − f p ( U ( p ) � � h i � 2 min t H i σ 2 { f i ∈H i } t i • Consistency: � ˆ f i − f i � 2 H i ( C ) → 0 over compact C – Assume true functions f i ∈ H i – Identifiability and ergodicity conditions • Extend result to nonlinear time series regression Y t = f 1 ( U (1) t ) X (1) + . . . + f p ( U ( p ) t ) X ( p ) + ǫ t t t • Approximate inference: estimates converge to appropriate projections of true functions f i 17

Recommend

More recommend