SLIDE 1 FMS161/MASM18 Financial Statistics Lecture 2, Linear Time Series
Erik Lindström
SLIDE 2 Systems with discrete time
Linear systems
◮ Can be represented on a polynomial or state
space models form while being
◮ Stability ◮ Lyapunov stable, ∥x(t) − xe∥ < ϵ ◮ Asympt. stable limt→∞ ∥x(t) − xe∥ = 0
causal time-invariant stationary
Discrete time models are written as a difference equation Impulse Response - h s t or h Transfer Function - H z Frequency Function - H ei2 f Typical process: The SARIMAX-process
SLIDE 3 Systems with discrete time
Linear systems
◮ Can be represented on a polynomial or state
space models form while being
◮ Stability ◮ Lyapunov stable, ∥x(t) − xe∥ < ϵ ◮ Asympt. stable limt→∞ ∥x(t) − xe∥ = 0 ◮ causal ◮ time-invariant ◮ stationary
Discrete time models are written as a difference equation Impulse Response - h s t or h Transfer Function - H z Frequency Function - H ei2 f Typical process: The SARIMAX-process
SLIDE 4 Systems with discrete time
Linear systems
◮ Can be represented on a polynomial or state
space models form while being
◮ Stability ◮ Lyapunov stable, ∥x(t) − xe∥ < ϵ ◮ Asympt. stable limt→∞ ∥x(t) − xe∥ = 0 ◮ causal ◮ time-invariant ◮ stationary
◮ Discrete time models are written as a difference
equation
◮ Impulse Response - h(s, t) or h(τ) ◮ Transfer Function - H(z) ◮ Frequency Function - H(ei2πf)
Typical process: The SARIMAX-process
SLIDE 5 Systems with discrete time
Linear systems
◮ Can be represented on a polynomial or state
space models form while being
◮ Stability ◮ Lyapunov stable, ∥x(t) − xe∥ < ϵ ◮ Asympt. stable limt→∞ ∥x(t) − xe∥ = 0 ◮ causal ◮ time-invariant ◮ stationary
◮ Discrete time models are written as a difference
equation
◮ Impulse Response - h(s, t) or h(τ) ◮ Transfer Function - H(z) ◮ Frequency Function - H(ei2πf)
Typical process: The SARIMAX-process
SLIDE 6
Impulse response
◮ A causal linear stable system (Gaussian or
non-Gaussian) has a well defined impulse response h(·).
◮ The impulse response is the output of a system if
we let the input be 1 at time zero and then zero for the rest of the time.
◮ The output for a general input u is given as
y(t) =
∞
∑
i=0
h(i)u(t − i) = (h ∗ u)(t) It is the convolution of the input u and the impulse response h.
SLIDE 7
Difference equations
Difference equation representation for ARX/ARMA structure: yt + a1yt−1 + · · · + apyt−p = ut + b1ut−1 + · · · + bqut−q using the delay operator, zyt = yt−1 leads to the Transfer Function (which can be defined also for a system not following this linear difference equation) yt = 1 + b1z1 + · · · + bqzq 1 + a1z1 + · · · + apzp ut (1) = B(z) A(z)ut = H(z)ut with (the latter equation with a Z-transform interpretation of the operations) H(z) =
∞
∑ h(τ)z−τ; Y(Z) = H(Z)U(Z)
SLIDE 8
Frequency representation
The frequency function is defined from the transfer function as H ( ei2πf) = H(f) giving a amplitude and phase shift of an input trigonometric signal, as e.g. u(k) = cos(2πfk) y(k) = |H(f)| cos (2πfk + arg (H(f))) |f| ≤ 0.5
SLIDE 9 Spectrum
◮ If we filter standard white noise, i.e. a sequence
- f i.i.d. zero mean random variables with
variance one, through a linear system with frequency function H then we get a signal with spectrum R(f) = |H(f)|2. The spectrum at frequency f is the average energy in the output with frequency f.
◮ The spectrum is also the Fourier transform of
covariance function γ(k) = E [XnXn−k] with R(f) =
∞
∑
k=−∞
γ(k)e−i2πfk [γ(·) is sym] =
∞
∑
k=−∞
γ(k) cos(2πfk). (2) Note: The covariance is not symmetric for multivariate processes (think Granger causality)
SLIDE 10
Inverse filtering in discrete time
AIM: Reconstruct the input u from the output y signal.
◮ Assume that we have a filter g with (h) being
linear, stable, and time-invariant. It then follows that w(k) = (g ∗ y)(k) = (g ∗ h ∗ u)(k) (3)
◮ We say that g is an inverse if w(k) = u(k) for all
k, or equivalently that there exists ∃ causal and stable g such that (g ∗ h)(k) = δ(k) G(z)H(z) = 1 NOTE: The causality means that we do reconstruct the signal from old values, h(k) = 0 ∀ k < 0.
SLIDE 11 ARMA(p,q)-filter
◮ The process is defined as
yt +a1yt−1 +· · ·+apyt−p = xt +c1xt−1 +· · ·+cqxt−q
◮ The corresponding transfer function is given by
H(z) = 1 + c1z1 + · · · + cqzq 1 + a1z1 + · · · + apzp = C(z) A(z)
◮ Properties:
◮ Frequency Function as H(ei2πf), f ∈ (−0.5, 0.5] ◮ Stability: Poles to A(z−1) = 0, st
|πi| < 1, i = 1, . . . , p
◮ Invertability: Zeroes to C(z−1) = 0, st
|ηi| < 1, i = 1, . . . , q
SLIDE 12
ARMAX process
We can combine moving average (MA) and exogenous variables in the ARMAX process yt + a1yt−1 + · · · + apyt−p = xt + b1xt−1 + · · · + bqxt−q + et + c1et−1 + · · · + cret−r (4) Or in transfer function form yt = B(z) A(z)xt + C(z) A(z)et
SLIDE 13
Auto correlation and cross-correlation
◮ The auto covariance is defined as
γ(k) = E [YtYt+k] (5) and corresponding autocorrelation function ρ(k) = γ(k) γ(0) (6)
◮ The cross covariance is defined as
γXY(k) = E [XtYt+k] (7) and corresponding autocorrelation function ρXY(k) = γXY(k) γXY(0) (8)
SLIDE 14
Auto covariance for ARMA
◮ Consider the ARMA(p,q) process
Yt + a1Yt−1 + . . . apYt−p = et + . . . cqet−q (9)
◮ The auto covariance then satisfies
γ(k)+a1γ(k−1)+. . . apγ(k−p) = ckγeY(0)+. . . cqγeY(q−k) (10) This is known as the Yule-Walker equation.
◮ Proof: Multiply with Yt−k, and use that
E [et−lYt−k] = 0 for k > l
SLIDE 15
Cointegration
◮ It is rather common that financial time series
{X(t)} are non-stationary, often integrated
◮ This means that ∇X(t) is typically stationary. We
then say that is an integrated process, X(t) ∼ I(1).
◮ Assume that the processes X(t) ∼ I(1) and
Y(t) ∼ I(1) but X(t) − βY(t) ∼ I(0). We then say that X(t) and Y(t) are cointegrated.
◮ NOTE: that the asymptotic theory for β is
non-standard.
SLIDE 16 Log-real money and Bond rates 1974-1985
1975 1980 1985 11.5 11.6 11.7 11.8 11.9 12 12.1 Money Levels 1975 1980 1985 −0.05 0.05 0.1 Differences 1975 1980 1985 0.1 0.12 0.14 0.16 0.18 0.2 0.22 Bond rate Levels 1975 1980 1985 −0.04 −0.03 −0.02 −0.01 0.01 Differences
Figure: Log-real money and interest rates
SLIDE 17
Estimation
Two dominant approaches
◮ Optimization based estimation (LS, WLS, ML,
PEM, GMM)
◮ Matching properties (MM, GMM, EF, ML, IV)
We focus on the optimization based estimators today.
SLIDE 18
General properties
◮ Denote the true parameter θ0 ◮ Introduce an estimator ˆ
θ = T(X1, . . . , XN)
◮ Observation: The estimator is a function of data.
Implications...
◮ Bias: b = (E[T(X)] − θ0). ◮ Consistency: T(X) p
→ θ0.
◮ Efficiency: Var(T(X)) ≥ IN(θ0)−1.
Where IN(θ0)ij = −E [ ∂2 ∂θi∂θj ℓ(X1, . . . , XN, θ) ]
|θ=θ0
, = Cov [ ∂ ∂θi ℓ(X1, . . . , XN, θ) ∂ ∂θj ℓ(X1, . . . , XN, θ)T ]
|θ=θ0
where ℓ is the log-likelihood function.
SLIDE 19
Estimators
The maximum likelihood estimator is defined as ˆ θMLE = arg max ℓ(θ) = arg max
N
∑
n=1
log pθ(xn|x1, . . . xn−1) (11) The asymptotics for the MLE is given by √ N ( ˆ θMLE − θ0 ) d → N ( 0, I−1
F
) . (12) Hint: MLmax will help you during the labs and project.
SLIDE 20
Estimators
The general so-called M estimator (ex GMM) is defined as ˆ θ = arg min Q(θ) = arg min log Q(θ) (13) The asymptotics for that estimator is given by √ N ( ˆ θ − θ0 ) d → N ( 0, J−1IJ−1) . (14) with J = E[∇θ∇θ log Q] (15) I = E[(∇θ log Q) (∇θ log Q)T] (16)
SLIDE 21
ML methods for Gaussian processes
◮ Say that we have sample
Y = {yt}, t = 1, 2, , . . . , n from a Gaussian process. We then have that Y ∈ N(µ(θ), Σ(θ)), where θ is a vector of parameters.
◮ The log-likelihood for Y can be written as
ℓ(Y, θ) = −1 2 log(det(2πΣ(θ))) (17) − 1 2 (Y − µ(θ))T Σ(θ)−1 (Y − µ(θ)) . (18) If we can calculate the likelihood, then it follows that we can use a standard optimization routine to maximize ℓ(Y, θ) and thereby estimate θ.
SLIDE 22
Example: AR(2)
Y = x3 . . . xN , X = x2 x1 . . . . . . xN−1 xN−2 Then ˆ θ = (XTX)−1(XTY). (XTX) = ( ∑ x2
i−1
∑ xi−1xi−2 ∑ xi−1xi−2 ∑ x2
i−2
) and (XTY) = ( ∑ xixi−1 ∑ xixi−2 ) Solve! Explanation on blackboard
SLIDE 23 An ARMA example
ARMA(1,1) model: xt + 0.7xt−1 = et − 0.5et−1, {et}t=0,1,2,... i.i.d. ∈ N(0, 1).
200 400 600 800 1000 −10 10 Realisation −20 −15 −10 −5 5 10 15 20 −5 5 Covariance −0.5 0.5 20 40 Spectrum
SLIDE 24
Explanation approximative method on blackboard
◮ ML ◮ 2LS
SLIDE 25 Comparison, LS2 and MLE
ARMA(1,1) model: xt + 0.7xt−1 = et − 0.5et−1, {et}t=0,1,2,... i.i.d. ∈ N(0, 1).
0.65 0.7 0.75 0.8 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 a1 LS2 Normal Probability Plot −0.6 −0.5 −0.4 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 c1 Probability Normal Probability Plot 0.65 0.7 0.75 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 a1 MLE Normal Probability Plot −0.6 −0.55 −0.5 −0.45 −0.4 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 c1 Probability Normal Probability Plot
1000 estimations using 1000 observations each.
SLIDE 26 Extra material
Feel free to dig deeper into any of:
◮ Lindgren, G., Rootzén, H., & Sandsten, M. (2013).
Stationary stochastic processes for scientists and
◮ Jakobsson, A. (2015) An Introduction to time
series modeling. Studentlitteratur AB,
◮ Madsen, H. (2007). Time series analysis. CRC
Press.
◮ (PhD level) Lindgren, G. (2012). Stationary
Stochastic Processes: Theory and Applications. CRC Press.
SLIDE 27