penalty terms for estimation of arma models a bayesian
play

Penalty terms for estimation of ARMA models: A Bayesian inspiration - PowerPoint PPT Presentation

Penalty terms for estimation of ARMA models: A Bayesian inspiration ITISE Granada 2018 Helgi T omasson University of Iceland helgito@hi.is September 18-21, 2018 Plan of talk A brief review of parameterization of ARMA time-series models


  1. Penalty terms for estimation of ARMA models: A Bayesian inspiration ITISE Granada 2018 Helgi T´ omasson University of Iceland helgito@hi.is September 18-21, 2018

  2. Plan of talk • A brief review of parameterization of ARMA time-series models • The role of the a prior distribution in the Bayesian estimation • Review of partial fractions and residue calculus • Implementation of smoothness priors in ARMA models • Exact calculation of the distance between spectral shapes • Implementation in R • Conclusion and discussion

  3. On the ARMA model • A noise observation of a linear differential equation: y ′′ + ay ′ + by = 0 , non-stochastic y ′′ + ay ′ + by = a stochastic concept . • A classical discrete time version: Y t = φ 1 Y t − 1 + ε t + θ 1 ε t − 1 , ε t white noise • Or a continuous time version: Y ′ ( t ) + α Y ( t ) = σ ( dW ( t ) + β d (2) W ( t )) , dW ( t ) white-noise .

  4. • A representation of a continous-time ARMA(p,q), CARMA(p,q) process in terms of the differential operator D is: Y ( p ) ( t ) + α 1 Y ( p − 1) ( t ) + · · · + α p Y ( t ) = σ d ( W ( t ) + β 1 W 1 ( t ) + · · · + β q W ( q )) ( t )) , or α ( D ) Y ( t ) = σ β ( D ) dW ( t ) , α ( z ) = z p + α 1 z p − 1 + · · · + α p , β ( z ) = 1 + β 1 z + · · · + β q z q . • The spectral density of Y ( t ) is a rational function: f ( ω ) = σ 2 β ( i ω ) β ( − i ω ) α ( i ω ) α ( − i ω ) . 2 π A key feature in this paper. Similar formulas apply for the ususal discrete-time ARMA models. Then the polynomials are in exp ( − i ω ).

  5. • For stationarity we need the realpart of the roots of the polynomal α ( z ) to be negative. Similar to the discrete-time case where roots of a certain polynomial need to be on the correct side of the unit circle. • In continuous-time we also need p > q .

  6. The role of the prior in Bayesian estimation • Bayesian inference about a parameter θ is based on the posterior-distribution which is proportional to the likelihood-function times the prior distribution. π ( θ | data ) ∝ likelihood(data | θ ) π ( θ ) � �� � ���� model prior • Then a Bayesian estimator can be calculated, e.g. by calculating the expected value, or the mode of posterior etc.

  7. An example, the normal mean • A possible approach for Bayesian inference on µ in N ( µ, σ 2 ), σ known is: X | µ, σ ∼ N ( µ, σ 2 ) , µ | σ ∼ N ( µ 0 , σ 2 0 ) , σ 0 = τσ. Given data, x 1 , . . . , x n , and reparameterizing, v = 1 /σ , v 0 = k 0 v , the log-posterior is ( σ known): log ( p ( µ, v | x 1 , . . . , x n , µ 0 , k 0 )) = n constant + n + 1 � ( x i − µ ) 2 / 2 2 log( v ) − k 0 v ( µ − µ 0 ) 2 / 2 2 log( v ) − v . i =1 � �� � � �� � B A

  8. • The number k 0 expresses the certainty in the prior. If k 0 is set to zero and the log-posterior (as a function of µ ) is maximized the result is the ML estimator and a nonzero k 0 biases the ML-estimate towards the prior ( µ 0 ). • Maximization of the log-posterior can be interpreted as a penalized maximum-likelihood. • I.e. a deviance from µ 0 is penalized. • AIC, BIC, R 2 -adjusted are examples of penalizing terms. • The added term penalizes for a more complicated (less reasonable) model.

  9. The role of parameters in ARMA models • The parameters in the polynomials α ( z ) and β ( z ) are auto-correlation parameters, and the parameter σ is a scale parameter. • If normality is assumed and the polynomials α ( z ) and β ( z ) were known inference about σ is similar to inference about σ in a normal model. E.g. a posterior like: gamma ( a + n / 2 , b + y ′ M ( α , β ) − 1 y / 2) . • It is very difficult to have a good intuition about the auto-correlation function. • The interpretation of the spectral density is easier and therefore perhaps more natural to express a prior on the parameters in the polynomials α ( z ) and β ( z ) based on properties of the spectral function. • One might e.g. state a prior on the smoothness of the spectral function or its closeness to a particular spectral function.

  10. More than the number of parameters • The AIC, BIC and the R 2 -adjusted all penalize by using a function of the number of estimated parameters. The number of parameters is not always the natural way of grading complexity. In regression it seems reasonable that the model: y = a + bx + e , is simpler than: y = sin(cos( ax )) a exp( − bx ) / x b + e . • The ARMA(1,0) model: dY + Y = dW , , is actually the same as: Y (4) + 4 Y (3) + 6 Y (2) + 4 Y (1) + Y = d ( W + 3 W (1) + 3 W (2) + W (3) ) . That is the ARMA(1,0) is a special case of (many) ARMA(4,3) models. Estimation of six additional parameters might result in a spectral function with an unreasonable shape. However, it might be of interest to estimate a model which is more complicated than an AR(1). One might, however, want restrict the freedom of the additional parameters.

  11. • In time-series analysis, just as in non-parametric regression a smoothness restriction may be enforced on the fitted values. That is the sharp spikes and turns are penalized. In economics a well known procedure of this type is the Hodrick-Prescott filter. In stationary time-series analysis a natural form of a priori information might consist of a specification of the spectral function or some features of the spectral function. In analogy with the Hodrick-Prescott filter one can introduce a term that penalizes for sharp spikes and turns, e.g., a term proportional to: � ∞ ( f ′′ ( ω )) 2 d ω. −∞ • One might also want that the estimated spectrum is close to a particular spectral function.

  12. How to measure distance between functions? • Here the I only discuss the Kullback-Leibler distance measure. � ∞ log( f ( ω ) KLD ( f , f ∗ ) = f ∗ ( ω )) f ( ω ) d ω. −∞ • Here I use f ∗ because I do not work directly with the distance between two spectral curves, but only with the proportionality between two spectral curves: f ( ω ) = β ( i ω ) β ( − i ω ) α ( i ω ) α ( − i ω ) , f ∗ ( ω ) = σ ∗ β 0 ( i ω ) β 0 ( − i ω ) α 0 ( i ω ) α 0 ( − i ω ) , where σ ∗ is chosen such that, � ∞ � ∞ f ( ω ) d ω = f ∗ ( ω ) d ω. −∞ −∞

  13. Some computational aspects • The fact that the spectral function of an ARMA model is rational allows for an exact calculation of integrals like: � ∞ ( f ′′ ( ω )) 2 d ω. −∞ • By use of partial fractions the function: � q j =1 (1 + µ 2 j ω 2 ) f ( ω ) = σ 2 α ( i ω ) α ( − i ω ) = σ 2 β ( i ω ) β ( − i ω ) j ) , � p j =1 ( ω 2 + λ 2 2 π 2 π can be written as: f ( ω ) = σ 2 a 1 a p b 1 b p 2 π ( ω − i λ 1 + · · · w − i λ p + ω + i λ 1 + · · · ω + i λ p ) , where λ j are the roots of the AR polynomial, α ( z ). Another way is: f ( ω ) = σ 2 c 1 c p 2 π ( + · · · + ) . ω 2 + λ 2 ω 2 + λ 2 p 1

  14. Residue calculus • The residue calculus of complex analysis offers a useful tool for calculating integrals of rational functions. The residue theorem states that � � h ( x ) dx = 2 π i Res ( h ( z )) , over a certain path, where the sum is evaluated over the residues of the function h (Kreyszig, 1999). • f ′′ ( ω ) = σ 2 2 a 1 2 a p 2 b 1 2 b p 2 π ( ( ω − i λ 1 ) 3 + · · · ( w − i λ p ) 3 + ( ω + i λ 1 ) 3 + · · · ( ω + i λ p ) 3 ) , f ′′ ( ω ) 2 will contain p terms of the type a j / ( ω − i λ j ) 6 and p terms b k / ( ω + i λ k ) 6 and p ( p − 1) terms, k � = j , of the type a k a j / (( ω − i λ k ) 3 )( ω − i λ j ) 3 ) and similarly p ( p − 1) terms, k � = j , b k b j / (( ω + i λ j )( ω + i λ j )). The residues in the upper half-plane of these terms sums to zero. The integral will be the sum of the 2 p 2 terms of the type a k b j / (( ω − i λ k ) 3 ( ω + i λ j ) 3 ) .

  15. The residues of these terms are of the form: 3 · 4 a k b j / ( − ( i λ k + i λ j ) 5 ) , and the integral therefore, � ∞ p p a k b j � � ( f ′′ ( ω )) 2 d ω = 2 π i · 2 3 · 4 − ( i λ k + i λ j ) 5 . −∞ k =1 j =1 Similarly one can use residue calculus to create of measure of steep hills in the spectrum, � ∞ ( f ′ ( ω )) 2 d ω, −∞ or weighing ( f ′ ( ω )) 2 or ( f ′′ ( ω )) 2 with a rational function. • I have checked this numerically. Dual roots will give somewhat more complicated formulas.

  16. More partial fractions The partial fraction trick can also be applied to calculate the Kullback-Leibner (KL) metric, as a measure of the distance between two functions, f and f 0 (e.g. a prior). � � D ( f 1 ; f 0 ) = f 1 ( ω ) log( f 1 ( ω )) d ω − f 1 ( ω ) log( f 0 ( ω )) d ω. Using the second partial fraction formulation of the spectral density the termsthat need to be integrated will be of the form: c 1 , k c 1 , j 1 , k ) log( w 2 + λ 2 1 , k ) log(1 + µ 2 1 , j w 2 ) . − 1 , j ) , and ( ω 2 + λ 2 ( ω 2 + λ 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend