Penalty terms for estimation of ARMA models: A Bayesian inspiration - PowerPoint PPT Presentation

Penalty terms for estimation of ARMA models: A Bayesian inspiration ITISE Granada 2018 Helgi T´ omasson University of Iceland helgito@hi.is September 18-21, 2018

Plan of talk • A brief review of parameterization of ARMA time-series models • The role of the a prior distribution in the Bayesian estimation • Review of partial fractions and residue calculus • Implementation of smoothness priors in ARMA models • Exact calculation of the distance between spectral shapes • Implementation in R • Conclusion and discussion

On the ARMA model • A noise observation of a linear differential equation: y ′′ + ay ′ + by = 0 , non-stochastic y ′′ + ay ′ + by = a stochastic concept . • A classical discrete time version: Y t = φ 1 Y t − 1 + ε t + θ 1 ε t − 1 , ε t white noise • Or a continuous time version: Y ′ ( t ) + α Y ( t ) = σ ( dW ( t ) + β d (2) W ( t )) , dW ( t ) white-noise .

• A representation of a continous-time ARMA(p,q), CARMA(p,q) process in terms of the differential operator D is: Y ( p ) ( t ) + α 1 Y ( p − 1) ( t ) + · · · + α p Y ( t ) = σ d ( W ( t ) + β 1 W 1 ( t ) + · · · + β q W ( q )) ( t )) , or α ( D ) Y ( t ) = σ β ( D ) dW ( t ) , α ( z ) = z p + α 1 z p − 1 + · · · + α p , β ( z ) = 1 + β 1 z + · · · + β q z q . • The spectral density of Y ( t ) is a rational function: f ( ω ) = σ 2 β ( i ω ) β ( − i ω ) α ( i ω ) α ( − i ω ) . 2 π A key feature in this paper. Similar formulas apply for the ususal discrete-time ARMA models. Then the polynomials are in exp ( − i ω ).

• For stationarity we need the realpart of the roots of the polynomal α ( z ) to be negative. Similar to the discrete-time case where roots of a certain polynomial need to be on the correct side of the unit circle. • In continuous-time we also need p > q .

The role of the prior in Bayesian estimation • Bayesian inference about a parameter θ is based on the posterior-distribution which is proportional to the likelihood-function times the prior distribution. π ( θ | data ) ∝ likelihood(data | θ ) π ( θ ) � �� model prior • Then a Bayesian estimator can be calculated, e.g. by calculating the expected value, or the mode of posterior etc.

An example, the normal mean • A possible approach for Bayesian inference on µ in N ( µ, σ 2 ), σ known is: X | µ, σ ∼ N ( µ, σ 2 ) , µ | σ ∼ N ( µ 0 , σ 2 0 ) , σ 0 = τσ. Given data, x 1 , . . . , x n , and reparameterizing, v = 1 /σ , v 0 = k 0 v , the log-posterior is ( σ known): log ( p ( µ, v | x 1 , . . . , x n , µ 0 , k 0 )) = n constant + n + 1 � ( x i − µ ) 2 / 2 2 log( v ) − k 0 v ( µ − µ 0 ) 2 / 2 2 log( v ) − v . i =1 � �� B A

• The number k 0 expresses the certainty in the prior. If k 0 is set to zero and the log-posterior (as a function of µ ) is maximized the result is the ML estimator and a nonzero k 0 biases the ML-estimate towards the prior ( µ 0 ). • Maximization of the log-posterior can be interpreted as a penalized maximum-likelihood. • I.e. a deviance from µ 0 is penalized. • AIC, BIC, R 2 -adjusted are examples of penalizing terms. • The added term penalizes for a more complicated (less reasonable) model.

The role of parameters in ARMA models • The parameters in the polynomials α ( z ) and β ( z ) are auto-correlation parameters, and the parameter σ is a scale parameter. • If normality is assumed and the polynomials α ( z ) and β ( z ) were known inference about σ is similar to inference about σ in a normal model. E.g. a posterior like: gamma ( a + n / 2 , b + y ′ M ( α , β ) − 1 y / 2) . • It is very difficult to have a good intuition about the auto-correlation function. • The interpretation of the spectral density is easier and therefore perhaps more natural to express a prior on the parameters in the polynomials α ( z ) and β ( z ) based on properties of the spectral function. • One might e.g. state a prior on the smoothness of the spectral function or its closeness to a particular spectral function.

More than the number of parameters • The AIC, BIC and the R 2 -adjusted all penalize by using a function of the number of estimated parameters. The number of parameters is not always the natural way of grading complexity. In regression it seems reasonable that the model: y = a + bx + e , is simpler than: y = sin(cos( ax )) a exp( − bx ) / x b + e . • The ARMA(1,0) model: dY + Y = dW , , is actually the same as: Y (4) + 4 Y (3) + 6 Y (2) + 4 Y (1) + Y = d ( W + 3 W (1) + 3 W (2) + W (3) ) . That is the ARMA(1,0) is a special case of (many) ARMA(4,3) models. Estimation of six additional parameters might result in a spectral function with an unreasonable shape. However, it might be of interest to estimate a model which is more complicated than an AR(1). One might, however, want restrict the freedom of the additional parameters.

• In time-series analysis, just as in non-parametric regression a smoothness restriction may be enforced on the fitted values. That is the sharp spikes and turns are penalized. In economics a well known procedure of this type is the Hodrick-Prescott filter. In stationary time-series analysis a natural form of a priori information might consist of a specification of the spectral function or some features of the spectral function. In analogy with the Hodrick-Prescott filter one can introduce a term that penalizes for sharp spikes and turns, e.g., a term proportional to: � ∞ ( f ′′ ( ω )) 2 d ω. −∞ • One might also want that the estimated spectrum is close to a particular spectral function.

How to measure distance between functions? • Here the I only discuss the Kullback-Leibler distance measure. � ∞ log( f ( ω ) KLD ( f , f ∗ ) = f ∗ ( ω )) f ( ω ) d ω. −∞ • Here I use f ∗ because I do not work directly with the distance between two spectral curves, but only with the proportionality between two spectral curves: f ( ω ) = β ( i ω ) β ( − i ω ) α ( i ω ) α ( − i ω ) , f ∗ ( ω ) = σ ∗ β 0 ( i ω ) β 0 ( − i ω ) α 0 ( i ω ) α 0 ( − i ω ) , where σ ∗ is chosen such that, � ∞ � ∞ f ( ω ) d ω = f ∗ ( ω ) d ω. −∞ −∞

Some computational aspects • The fact that the spectral function of an ARMA model is rational allows for an exact calculation of integrals like: � ∞ ( f ′′ ( ω )) 2 d ω. −∞ • By use of partial fractions the function: � q j =1 (1 + µ 2 j ω 2 ) f ( ω ) = σ 2 α ( i ω ) α ( − i ω ) = σ 2 β ( i ω ) β ( − i ω ) j ) , � p j =1 ( ω 2 + λ 2 2 π 2 π can be written as: f ( ω ) = σ 2 a 1 a p b 1 b p 2 π ( ω − i λ 1 + · · · w − i λ p + ω + i λ 1 + · · · ω + i λ p ) , where λ j are the roots of the AR polynomial, α ( z ). Another way is: f ( ω ) = σ 2 c 1 c p 2 π ( + · · · + ) . ω 2 + λ 2 ω 2 + λ 2 p 1

Residue calculus • The residue calculus of complex analysis offers a useful tool for calculating integrals of rational functions. The residue theorem states that � � h ( x ) dx = 2 π i Res ( h ( z )) , over a certain path, where the sum is evaluated over the residues of the function h (Kreyszig, 1999). • f ′′ ( ω ) = σ 2 2 a 1 2 a p 2 b 1 2 b p 2 π ( ( ω − i λ 1 ) 3 + · · · ( w − i λ p ) 3 + ( ω + i λ 1 ) 3 + · · · ( ω + i λ p ) 3 ) , f ′′ ( ω ) 2 will contain p terms of the type a j / ( ω − i λ j ) 6 and p terms b k / ( ω + i λ k ) 6 and p ( p − 1) terms, k � = j , of the type a k a j / (( ω − i λ k ) 3 )( ω − i λ j ) 3 ) and similarly p ( p − 1) terms, k � = j , b k b j / (( ω + i λ j )( ω + i λ j )). The residues in the upper half-plane of these terms sums to zero. The integral will be the sum of the 2 p 2 terms of the type a k b j / (( ω − i λ k ) 3 ( ω + i λ j ) 3 ) .

The residues of these terms are of the form: 3 · 4 a k b j / ( − ( i λ k + i λ j ) 5 ) , and the integral therefore, � ∞ p p a k b j � � ( f ′′ ( ω )) 2 d ω = 2 π i · 2 3 · 4 − ( i λ k + i λ j ) 5 . −∞ k =1 j =1 Similarly one can use residue calculus to create of measure of steep hills in the spectrum, � ∞ ( f ′ ( ω )) 2 d ω, −∞ or weighing ( f ′ ( ω )) 2 or ( f ′′ ( ω )) 2 with a rational function. • I have checked this numerically. Dual roots will give somewhat more complicated formulas.

More partial fractions The partial fraction trick can also be applied to calculate the Kullback-Leibner (KL) metric, as a measure of the distance between two functions, f and f 0 (e.g. a prior). � � D ( f 1 ; f 0 ) = f 1 ( ω ) log( f 1 ( ω )) d ω − f 1 ( ω ) log( f 0 ( ω )) d ω. Using the second partial fraction formulation of the spectral density the termsthat need to be integrated will be of the form: c 1 , k c 1 , j 1 , k ) log( w 2 + λ 2 1 , k ) log(1 + µ 2 1 , j w 2 ) . − 1 , j ) , and ( ω 2 + λ 2 ( ω 2 + λ 2

Penalty terms for estimation of ARMA models: A Bayesian inspiration - PowerPoint PPT Presentation

Penalty terms for estimation of ARMA models: A Bayesian inspiration ITISE Granada 2018 Helgi T omasson University of Iceland helgito@hi.is September 18-21, 2018 Plan of talk A brief review of parameterization of ARMA time-series models

WRL Submission on Property Rights AMM 16-18 October 2019 P. Rogers What is ARMA? ARMA will

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Outline Stochastic processes Time Series Modelling and Kalman Filters AR, MA and ARMA

Penalty under s. 270A and Immunity from penalty under s. 270AA Jagdish T Punjabi May 04, 2019

Complexity of a quadratic penalty accelerated inexact proximal point method W. Kong 1 J.G. Melo 2

Lecture 8 AR, MA, and ARMA Models 9/27/2018 1 AR models AR(p) models We can generalize from

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Depth Based Procedures For Estimation ARMA and GARCH Models Daniel Kosiorowski Cracow University

Bayesian hierarchical models Bruno Nicenboim / Shravan Vasishth 2020-03-14 1 Bayesian

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Jisc-ARMA ORCID Pilot Project Progress Meeting 17/09/2014 Catherine Grout & Verena Weigert,

RETENTION SCHEDULE DEVELOPMENT AND ENFORCEMENT: ONE RECORDS MANAGERS SEVEN YEAR ODYSSEY ARMA

INVEST IN YOUR PROFESSION ARMA Region Leadership Conference 2019 w w w.armaedfoundation.org The

ARMA Professional Qualifications Marie Garnett Head of Professional Development Professional

Time-varying Combinations of Bayesian Dynamic Models and Equity Momentum Strategies Herman K. van

Bayesian Estimation of Autoregressive Moving-Average Processes as Exogenous Shock Processes in

Embedded Bayesian Perception & Risk Assessment for ADAS & Autonomous Cars Christian

cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, Ph.D. | oyilmaz@nvidia.com |

Hierarchical Navigation Algorithms In Support of Mars Exploration Robert H. Bishop The

Mathematical Strategies for Filtering Turbulent Systems: Sparse Observations, Model Errors, and

Traffic Classification Rotsos Charalampos , Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani

A Bayesian Model of Pronoun Production and Interpretation Andrew Kehler UCSD Linguistics

Sambuz

Useful Links

Newsletter

Mail Us

Penalty terms for estimation of ARMA models: A Bayesian inspiration - PowerPoint PPT Presentation

Penalty terms for estimation of ARMA models: A Bayesian inspiration ITISE Granada 2018 Helgi T omasson University of Iceland helgito@hi.is September 18-21, 2018 Plan of talk A brief review of parameterization of ARMA time-series models

WRL Submission on Property Rights AMM 16-18 October 2019 P. Rogers What is ARMA? ARMA will

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Outline Stochastic processes Time Series Modelling and Kalman Filters AR, MA and ARMA

Penalty under s. 270A and Immunity from penalty under s. 270AA Jagdish T Punjabi May 04, 2019

Complexity of a quadratic penalty accelerated inexact proximal point method W. Kong 1 J.G. Melo 2

Lecture 8 AR, MA, and ARMA Models 9/27/2018 1 AR models AR(p) models We can generalize from

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Depth Based Procedures For Estimation ARMA and GARCH Models Daniel Kosiorowski Cracow University

Bayesian hierarchical models Bruno Nicenboim / Shravan Vasishth 2020-03-14 1 Bayesian

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Jisc-ARMA ORCID Pilot Project Progress Meeting 17/09/2014 Catherine Grout &amp; Verena Weigert,

RETENTION SCHEDULE DEVELOPMENT AND ENFORCEMENT: ONE RECORDS MANAGERS SEVEN YEAR ODYSSEY ARMA

INVEST IN YOUR PROFESSION ARMA Region Leadership Conference 2019 w w w.armaedfoundation.org The

ARMA Professional Qualifications Marie Garnett Head of Professional Development Professional

Time-varying Combinations of Bayesian Dynamic Models and Equity Momentum Strategies Herman K. van

Bayesian Estimation of Autoregressive Moving-Average Processes as Exogenous Shock Processes in

Embedded Bayesian Perception &amp; Risk Assessment for ADAS &amp; Autonomous Cars Christian

cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, Ph.D. | oyilmaz@nvidia.com |

Hierarchical Navigation Algorithms In Support of Mars Exploration Robert H. Bishop The

Mathematical Strategies for Filtering Turbulent Systems: Sparse Observations, Model Errors, and

Traffic Classification Rotsos Charalampos , Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani

A Bayesian Model of Pronoun Production and Interpretation Andrew Kehler UCSD Linguistics

Sambuz

Useful Links

Newsletter

Mail Us

Jisc-ARMA ORCID Pilot Project Progress Meeting 17/09/2014 Catherine Grout & Verena Weigert,

Embedded Bayesian Perception & Risk Assessment for ADAS & Autonomous Cars Christian