nonparametric time series analysis using gaussian
play

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES - PowerPoint PPT Presentation

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES Sotirios Damouras Advisor: Mark Schervish Committee: Rong Chen (external) Anthony Brockwell Rob Kass Cosma Shalizi Larry Wasserman Overview Present new nonparametric


  1. NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES Sotirios Damouras Advisor: Mark Schervish Committee: Rong Chen (external) Anthony Brockwell Rob Kass Cosma Shalizi Larry Wasserman

  2. Overview • Present new nonparametric estimation procedure based on Gaussian Process regression for modeling nonlinear time series (conditional mean) • Outline – Review of Literature ∗ Dynamics ∗ Estimation (parametric and nonparametric) – Proposed Method ∗ Description ∗ Example - Comments ∗ Approximate Inference ∗ Theoretical results ∗ Model selection – Applications ∗ Univariate series (natural sciences) ∗ Bivariate series (financial econometrics) 1

  3. Motivation • Linear time series (ARMA) models have robust theoretical properties and estimation procedures, but lack modeling flexibility • Many real-life time series exhibit nonlinear behavior (limit cycles, amplitude dependent frequency etc.) which cannot be captured by linear models • Canadian lynx time series is a famous advocate of nonlinearity 3.5 3.5 3.0 3.0 log 10 ( Lynx ) X t 2.5 2.5 2.0 2.0 1820 1840 1860 1880 1900 1920 2.0 2.5 3.0 3.5 X t − 1 Year (a) Canadian lynx log-series (b) Directed lag-1 scatter plot 2

  4. Nonlinear Dynamics • Nonlinear Autoregressive Model (NLAR) X t = f ( X t − 1 , . . . , X t − p ) + ǫ t Curse of dimensionality • Nonlinear Additive Autoregressive Model (NLAAR) X t = f 1 ( X t − 1 ) + . . . + f p ( X t − p ) + ǫ t Problems with back-fitting, non-identifiability • Functional Coefficient Autoregressive Model (FAR) f 1 ( U (1) t ) X t − 1 + . . . + f p ( U ( p ) X t = t ) X t − p + ǫ t Preferred for time series 3

  5. Parametric Estimation • Threshold autoregressive (TAR) model [ Tong, 1990 ] – Define regimes by common threshold variable U t – Coefficient functions f i ( U t ) are piecewise constant k � � α ( i ) 0 + α ( i ) p X t − p + ǫ ( i ) � 1 X t − 1 + . . . + α ( i ) X t = I ( U t ∈ A i ) t i =1 – Estimate linear model within each regime • Other alternatives less general/popular, e.g. – Exponential autoregressive (EXPAR) model [ Haggan and Ozaki, 1990 ] – Smooth transition autoregressive (STAR) model [ Ter ¨ asvirta, 1994 ] 4

  6. Nonparametric Estimation • Kernel methods: All functions share the same argument U t . Run local regression around U , based on kernel K h – Arranged Local Regression (ALR) [ Chen and Tsay, 1993 ] ( X t − α 1 X t − 1 + . . . + α p X t − p ) 2 K h ( U t − U ) ⇒ ˆ � min f i ( U ) = ˆ α i { α i } t – Local Linear Regression (LLR) [ Cai, Fan and Yao, 2000, 1993 ] � 2 � p � � K h ( U t − U ) ⇒ ˆ � � X t − α i + β i ( U t − U ) min X t − i f i ( U ) = ˆ α i { α i ,β i } t i =1 • Splines [ Huang and Shen, 2004 ] Different arguments U ( i ) t , number of spline bases m i controls smoothness � m i � 2 � p � m i � � � U ( i ) ⇒ ˆ � � � X t − min α i,j B i,j X t − i f i ( U ) = α i,j B i,j ( U ) ˆ t { α i,j } t i =1 j =1 j =1 5

  7. Gaussian Process Regression • Random function f follows Gaussian Process (GP) with mean function µ ( · ) and covariance function C ( · , · ) , denoted by f ∼ GP ( µ, C ) , if ∀ n ∈ N [ f ( X 1 ) , . . . , f ( X n )] ⊤ ∼ N n ( µ , C ) µ = [ µ ( X 1 ) , . . . , µ ( X n )] ⊤ � { C ( X i , X j ) } n � C = i,j =1 • GP regression is a Bayesian nonparametric technique – GP prior on regression function f ∼ GP ( µ, C ) – Covariance function C ( · , · ) controls smoothing – Data from Y i = f ( X i ) + ǫ i , where ǫ i ∼ N (0 , σ 2 ) (conjugacy) – Posterior distribution of f also follows GP 6

  8. Proposed Method I Adopt FAR dynamics, allow different arguments f 1 ( U (1) t ) X t − 1 + . . . + f p ( U ( p ) ǫ t ∼ N (0 , σ 2 ) = t ) X t − p + ǫ t , X t Prior Specification • A-priori independent f i ∼ GP ( µ i , C i ) → different smoothness for each function • Constant prior mean µ i ( x ) ≡ µ i → prior bias toward linear model � � − � x − x ′ � • Gaussian covariance kernel C i ( x, x ′ ) = τ 2 i exp h 2 i → relatively smooth functions (infinitely differentiable) 7

  9. Proposed Method II • Estimation – Use “conditional likelihood”, treating first p observations as fixed – A-posteriori each function follows a GP – Likelihood-based estimation (on-line inference, relation to HMM) • Prediction – Predictions follow naturally from functions’ posterior – One-step-ahead predictive distributions available explicitly – Multi-step-ahead predictive distributions analytically intractable. Approximate by Monte-Carlo simulation 8

  10. Proposed Method III Empirical Bayes for choosing prior parameters θ of mean and covariance functions • Distribute prior uncertainty evenly among functions • Select θ that maximizes marginal log-likelihood ℓ ( X ( p +1): T | θ ) – Closed form expression for ℓ – Gradient ∇ θ ℓ is available with little extra effort – Convenient for selecting many parameters • Likelihood ℓ automatically penalizes function variability – Allow constant f i as bandwidth h i → ∞ • Initial values from AR model – Avoid bad local maxima – Favor constant functions 9

  11. Canadian Lynx Data I Estimated functional coefficients for model X t = f 1 ( X t − 2 ) X t − 1 + f 2 ( X t − 2 ) X t − 2 + ǫ t 0.0 GP GP TAR TAR LLR LLR 1.8 SPLINES SPLINES −0.2 −0.4 1.6 −0.6 1.4 −0.8 1.2 −1.0 I II I III I II I I I I IIII I IIIIIIIII II I I I I I I I I II I I II I I I I II I I II II II II I II IIII I I I II I II II II I I II I I I I I I III I I I II I I I III I IIII I II I III I II I I I I IIII I IIIIIIIII I II I I I I I I I II I I II I I I II I I I II II II II I II IIII I I I II I II II II I I II I I I I I III I I I I II I I I III I IIII 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 (a) f 1 (b) f 2 10

  12. Canadian Lynx Data II Fitted values 3.5 3.0 2.5 2.0 GP TAR LLR SPLINES 0 20 40 60 80 100 • Different models give similar one-step-ahead predictions 11

  13. Canadian Lynx Data III Iterated dynamics GP TAR 3.5 3.5 3.0 3.0 2.5 2.5 + + 2.0 2.0 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 LLR Splines 3.5 3.5 3.0 3.0 2.5 2.5 + + 2.0 2.0 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 • Don’t need much flexibility for capturing nonlinear behavior 12

  14. Comments • TAR – Common thresholds for all coefficient functions – Likelihood is discontinuous w.r.t. thresholds ∗ Complications as number of regimes increases ∗ Resort to graphical/ad-hoc methods • Kernel methods – Common argument for all coefficient functions – Single bandwidth, similar smoothness across estimates – No regularization (boundary behavior, extrapolation) • Splines – Extrapolate linearly ∗ Unbounded estimates ∗ Unstable models – Problem persists even with regularization 13

  15. Approximate Inference I • GP estimation is computationally expensive – Need to invert T × T covariance matrices – Scale as O ( T 3 ) , infeasible for large T • Reduced rank approximation t =1 β i,t C i ( U, U ( i ) – Posterior mean: f i ( U ) = � T t ) j =1 β i,j C i ( U, B ( i ) – Approximation: f i ( U ) ≈ � m i j ) , m i ≪ T – Basis points ( B ( i ) 1 , . . . , B ( i ) m i ) , similar to Spline knots i =1 m i ) 2 T � � ( � p – Estimation scales as O 14

  16. Approximate Inference II Example on Canadian lynx data −0.25 Exact 1.42 m=10 m=5 −0.30 m=3 1.40 −0.35 1.38 −0.40 1.36 Exact −0.45 m=10 m=5 m=3 −0.50 1.34 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 (a) f 1 (b) f 2 • Approximation works well for smooth functions 15

  17. Approximate Inference III • Implementation – Small m i (around 10) is sufficient – Modify kernel to ensure numerical stability • Fixed number of stochastic parameters { β i,j } – Low memory cost – Bigger models, e.g. multivariate • Extend method to State-Space models – Treat { β i,j } as unobserved variables – Linearize model, apply Kalman filter 16

  18. Theoretical Results • Posterior means are solutions to penalized least squares problem in reproducing kernel Hilbert spaces H i (defined by C i ) � � 1 ) X t − p ) 2 + � ( X t − f 1 ( U (1) t ) X t − 1 − . . . − f p ( U ( p ) � � h i � 2 min t H i σ 2 { f i ∈H i } t i • Consistency: � ˆ f i − f i � 2 H i ( C ) → 0 over compact C – Assume true functions f i ∈ H i – Identifiability and ergodicity conditions • Extend result to nonlinear time series regression Y t = f 1 ( U (1) t ) X (1) + . . . + f p ( U ( p ) t ) X ( p ) + ǫ t t t • Approximate inference: estimates converge to appropriate projections of true functions f i 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend