NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES - PowerPoint PPT Presentation

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES Sotirios Damouras Advisor: Mark Schervish Committee: Rong Chen (external) Anthony Brockwell Rob Kass Cosma Shalizi Larry Wasserman

Overview • Present new nonparametric estimation procedure based on Gaussian Process regression for modeling nonlinear time series (conditional mean) • Outline – Review of Literature ∗ Dynamics ∗ Estimation (parametric and nonparametric) – Proposed Method ∗ Description ∗ Example - Comments ∗ Approximate Inference ∗ Theoretical results ∗ Model selection – Applications ∗ Univariate series (natural sciences) ∗ Bivariate series (financial econometrics) 1

Motivation • Linear time series (ARMA) models have robust theoretical properties and estimation procedures, but lack modeling flexibility • Many real-life time series exhibit nonlinear behavior (limit cycles, amplitude dependent frequency etc.) which cannot be captured by linear models • Canadian lynx time series is a famous advocate of nonlinearity 3.5 3.5 3.0 3.0 log 10 ( Lynx ) X t 2.5 2.5 2.0 2.0 1820 1840 1860 1880 1900 1920 2.0 2.5 3.0 3.5 X t − 1 Year (a) Canadian lynx log-series (b) Directed lag-1 scatter plot 2

Nonlinear Dynamics • Nonlinear Autoregressive Model (NLAR) X t = f ( X t − 1 , . . . , X t − p ) + ǫ t Curse of dimensionality • Nonlinear Additive Autoregressive Model (NLAAR) X t = f 1 ( X t − 1 ) + . . . + f p ( X t − p ) + ǫ t Problems with back-fitting, non-identifiability • Functional Coefficient Autoregressive Model (FAR) f 1 ( U (1) t ) X t − 1 + . . . + f p ( U ( p ) X t = t ) X t − p + ǫ t Preferred for time series 3

Parametric Estimation • Threshold autoregressive (TAR) model [ Tong, 1990 ] – Define regimes by common threshold variable U t – Coefficient functions f i ( U t ) are piecewise constant k � � α ( i ) 0 + α ( i ) p X t − p + ǫ ( i ) � 1 X t − 1 + . . . + α ( i ) X t = I ( U t ∈ A i ) t i =1 – Estimate linear model within each regime • Other alternatives less general/popular, e.g. – Exponential autoregressive (EXPAR) model [ Haggan and Ozaki, 1990 ] – Smooth transition autoregressive (STAR) model [ Ter ¨ asvirta, 1994 ] 4

Nonparametric Estimation • Kernel methods: All functions share the same argument U t . Run local regression around U , based on kernel K h – Arranged Local Regression (ALR) [ Chen and Tsay, 1993 ] ( X t − α 1 X t − 1 + . . . + α p X t − p ) 2 K h ( U t − U ) ⇒ ˆ � min f i ( U ) = ˆ α i { α i } t – Local Linear Regression (LLR) [ Cai, Fan and Yao, 2000, 1993 ] � 2 � p � � K h ( U t − U ) ⇒ ˆ � � X t − α i + β i ( U t − U ) min X t − i f i ( U ) = ˆ α i { α i ,β i } t i =1 • Splines [ Huang and Shen, 2004 ] Different arguments U ( i ) t , number of spline bases m i controls smoothness � m i � 2 � p � m i � � � U ( i ) ⇒ ˆ � � � X t − min α i,j B i,j X t − i f i ( U ) = α i,j B i,j ( U ) ˆ t { α i,j } t i =1 j =1 j =1 5

Gaussian Process Regression • Random function f follows Gaussian Process (GP) with mean function µ ( · ) and covariance function C ( · , · ) , denoted by f ∼ GP ( µ, C ) , if ∀ n ∈ N [ f ( X 1 ) , . . . , f ( X n )] ⊤ ∼ N n ( µ , C ) µ = [ µ ( X 1 ) , . . . , µ ( X n )] ⊤ � { C ( X i , X j ) } n � C = i,j =1 • GP regression is a Bayesian nonparametric technique – GP prior on regression function f ∼ GP ( µ, C ) – Covariance function C ( · , · ) controls smoothing – Data from Y i = f ( X i ) + ǫ i , where ǫ i ∼ N (0 , σ 2 ) (conjugacy) – Posterior distribution of f also follows GP 6

Proposed Method I Adopt FAR dynamics, allow different arguments f 1 ( U (1) t ) X t − 1 + . . . + f p ( U ( p ) ǫ t ∼ N (0 , σ 2 ) = t ) X t − p + ǫ t , X t Prior Specification • A-priori independent f i ∼ GP ( µ i , C i ) → different smoothness for each function • Constant prior mean µ i ( x ) ≡ µ i → prior bias toward linear model � � − � x − x ′ � • Gaussian covariance kernel C i ( x, x ′ ) = τ 2 i exp h 2 i → relatively smooth functions (infinitely differentiable) 7

Proposed Method II • Estimation – Use “conditional likelihood”, treating first p observations as fixed – A-posteriori each function follows a GP – Likelihood-based estimation (on-line inference, relation to HMM) • Prediction – Predictions follow naturally from functions’ posterior – One-step-ahead predictive distributions available explicitly – Multi-step-ahead predictive distributions analytically intractable. Approximate by Monte-Carlo simulation 8

Proposed Method III Empirical Bayes for choosing prior parameters θ of mean and covariance functions • Distribute prior uncertainty evenly among functions • Select θ that maximizes marginal log-likelihood ℓ ( X ( p +1): T | θ ) – Closed form expression for ℓ – Gradient ∇ θ ℓ is available with little extra effort – Convenient for selecting many parameters • Likelihood ℓ automatically penalizes function variability – Allow constant f i as bandwidth h i → ∞ • Initial values from AR model – Avoid bad local maxima – Favor constant functions 9

Canadian Lynx Data I Estimated functional coefficients for model X t = f 1 ( X t − 2 ) X t − 1 + f 2 ( X t − 2 ) X t − 2 + ǫ t 0.0 GP GP TAR TAR LLR LLR 1.8 SPLINES SPLINES −0.2 −0.4 1.6 −0.6 1.4 −0.8 1.2 −1.0 I II I III I II I I I I IIII I IIIIIIIII II I I I I I I I I II I I II I I I I II I I II II II II I II IIII I I I II I II II II I I II I I I I I I III I I I II I I I III I IIII I II I III I II I I I I IIII I IIIIIIIII I II I I I I I I I II I I II I I I II I I I II II II II I II IIII I I I II I II II II I I II I I I I I III I I I I II I I I III I IIII 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 (a) f 1 (b) f 2 10

Canadian Lynx Data II Fitted values 3.5 3.0 2.5 2.0 GP TAR LLR SPLINES 0 20 40 60 80 100 • Different models give similar one-step-ahead predictions 11

Canadian Lynx Data III Iterated dynamics GP TAR 3.5 3.5 3.0 3.0 2.5 2.5 + + 2.0 2.0 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 LLR Splines 3.5 3.5 3.0 3.0 2.5 2.5 + + 2.0 2.0 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 • Don’t need much flexibility for capturing nonlinear behavior 12

Comments • TAR – Common thresholds for all coefficient functions – Likelihood is discontinuous w.r.t. thresholds ∗ Complications as number of regimes increases ∗ Resort to graphical/ad-hoc methods • Kernel methods – Common argument for all coefficient functions – Single bandwidth, similar smoothness across estimates – No regularization (boundary behavior, extrapolation) • Splines – Extrapolate linearly ∗ Unbounded estimates ∗ Unstable models – Problem persists even with regularization 13

Approximate Inference I • GP estimation is computationally expensive – Need to invert T × T covariance matrices – Scale as O ( T 3 ) , infeasible for large T • Reduced rank approximation t =1 β i,t C i ( U, U ( i ) – Posterior mean: f i ( U ) = � T t ) j =1 β i,j C i ( U, B ( i ) – Approximation: f i ( U ) ≈ � m i j ) , m i ≪ T – Basis points ( B ( i ) 1 , . . . , B ( i ) m i ) , similar to Spline knots i =1 m i ) 2 T � � ( � p – Estimation scales as O 14

Approximate Inference II Example on Canadian lynx data −0.25 Exact 1.42 m=10 m=5 −0.30 m=3 1.40 −0.35 1.38 −0.40 1.36 Exact −0.45 m=10 m=5 m=3 −0.50 1.34 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 2.0 2.5 3.0 3.5 2.0 2.5 3.0 3.5 (a) f 1 (b) f 2 • Approximation works well for smooth functions 15

Approximate Inference III • Implementation – Small m i (around 10) is sufficient – Modify kernel to ensure numerical stability • Fixed number of stochastic parameters { β i,j } – Low memory cost – Bigger models, e.g. multivariate • Extend method to State-Space models – Treat { β i,j } as unobserved variables – Linearize model, apply Kalman filter 16

Theoretical Results • Posterior means are solutions to penalized least squares problem in reproducing kernel Hilbert spaces H i (defined by C i ) � � 1 ) X t − p ) 2 + � ( X t − f 1 ( U (1) t ) X t − 1 − . . . − f p ( U ( p ) � � h i � 2 min t H i σ 2 { f i ∈H i } t i • Consistency: � ˆ f i − f i � 2 H i ( C ) → 0 over compact C – Assume true functions f i ∈ H i – Identifiability and ergodicity conditions • Extend result to nonlinear time series regression Y t = f 1 ( U (1) t ) X (1) + . . . + f p ( U ( p ) t ) X ( p ) + ǫ t t t • Approximate inference: estimates converge to appropriate projections of true functions f i 17

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES - PowerPoint PPT Presentation

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES Sotirios Damouras Advisor: Mark Schervish Committee: Rong Chen (external) Anthony Brockwell Rob Kass Cosma Shalizi Larry Wasserman Overview Present new nonparametric

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept.

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

A prior near-ignorance Gaussian Process model for nonparametric regression Francesca Mangili

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

standard series Overview DP series DX series H series M series bitte hier

Causality with Non-Gaussian Time Series Arthur Charpentier (Universit de Rennes 1 & UQM)

Green Banks and Financing Energy Efficiency and Renewables in Industry and Buildings Sixth

Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent

Clear and Present Challenges to the Chinese Economy Dr. Keyu Jin March 9 th , 2016 Source: NBS,

nineteen concrete construction: http:// nisee.berkeley.edu/godden materials & beams

The other Apache Technologies your Big Data solution needs! Nick Burch The Apache Software

Polynomial actions of unitary operators and idempotent ultrafilters Mariusz Lemaczyk (based on

Online Sinkhorn: Optimal Transport distances from sample streams Arthur Mensch Joint work with

Optimization, Monitoring, and Control for Smart Grid Consumers New Brunswick, NJ, 27 October 2010

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES - PowerPoint PPT Presentation

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES Sotirios Damouras Advisor: Mark Schervish Committee: Rong Chen (external) Anthony Brockwell Rob Kass Cosma Shalizi Larry Wasserman Overview Present new nonparametric

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept.

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

A prior near-ignorance Gaussian Process model for nonparametric regression Francesca Mangili

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

standard series Overview DP series DX series H series M series bitte hier

Causality with Non-Gaussian Time Series Arthur Charpentier (Universit de Rennes 1 &amp; UQM)

Green Banks and Financing Energy Efficiency and Renewables in Industry and Buildings Sixth

Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent

Clear and Present Challenges to the Chinese Economy Dr. Keyu Jin March 9 th , 2016 Source: NBS,

nineteen concrete construction: http:// nisee.berkeley.edu/godden materials &amp; beams

The other Apache Technologies your Big Data solution needs! Nick Burch The Apache Software

Polynomial actions of unitary operators and idempotent ultrafilters Mariusz Lemaczyk (based on

Online Sinkhorn: Optimal Transport distances from sample streams Arthur Mensch Joint work with

Optimization, Monitoring, and Control for Smart Grid Consumers New Brunswick, NJ, 27 October 2010

Causality with Non-Gaussian Time Series Arthur Charpentier (Universit de Rennes 1 & UQM)

nineteen concrete construction: http:// nisee.berkeley.edu/godden materials & beams