Simulated maximum likelihood for time series models with nonlinear - - PowerPoint PPT Presentation

simulated maximum likelihood for time series models
SMART_READER_LITE
LIVE PREVIEW

Simulated maximum likelihood for time series models with nonlinear - - PowerPoint PPT Presentation

Simulated maximum likelihood for time series models with nonlinear non-Gaussian observation equations: new results and an application Siem Jan Koopman Vrije Universiteit Amsterdam Tinbergen Institute Simulated maximum likelihood for time


slide-1
SLIDE 1

Simulated maximum likelihood for time series models

with nonlinear non-Gaussian observation equations: new results and an application

Siem Jan Koopman

Vrije Universiteit Amsterdam Tinbergen Institute

Simulated maximum likelihood for time series models – p. 1

slide-2
SLIDE 2

Simulated maximum likelihood for time series models

Presentation for NSF/NBER Time Series Conference, Heidelberg, Germany, 22-24 Sept 2005 Joint work with Borus Jungbacker. http://staff.feweb.vu.nl/koopman

Simulated maximum likelihood for time series models – p. 2

slide-3
SLIDE 3

Motivation

  • Stochastic volatility models
  • Basic SV model;
  • SV plus Xs and microstructure noise;
  • Multivariate extensions of SV.
  • Credit risk and default modelling
  • Basic model;
  • Modelling defaults using a binomial panel time series model;
  • Modelling rating transitions using a dynamic event-history

(stochastic durations) model with multiple states.

  • Trend-cycle decomposition models with asymmetric cycles.
  • Binary models for the fertility of age cohorts of women

(longitudinal models).

Simulated maximum likelihood for time series models – p. 3

slide-4
SLIDE 4

High-frequency data (tick-by-tick)

Prices and returns of IBM, November 1, 2002, against seconds.

78.5 79.0 79.5 80.0 80.5

Price IBM Stock 11/1/2002 10:00 11:00 12:00 13:00 14:00 15:00 16:00

−0.25 0.00 0.25 0.50

Log Returns IBM Stock 11/1/2002 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Simulated maximum likelihood for time series models – p. 4

slide-5
SLIDE 5

An intra-day returns model

Jungbacker and Koopman (2005) consider a model for returns with stochastic volatility, intra-day seasonality and possible micro-structure noise as represented by the nonlinear state space model Rt = exp ξ + g(t) 2

  • exp(1

2ht)εt + σUWt, ht = φht−1 + σηηt, t = 1, . . . , n. where g(t) is a smooth function to capture the intra-day seasonality. We note that

  • the log-volatility ht follows an AR(1) process, can be generalised;
  • the noise (incorporating micro-structure noise) Wt follows an MA

process but can be any linear stationary process;

  • We pursue a model-based approach using maximum likelihood

estimation based on importance sampling (IS) techniques;

  • IS needs to be modified as we will present next.

Simulated maximum likelihood for time series models – p. 5

slide-6
SLIDE 6

Estimation methodology: importance sampling (IS)

A motivation of the work behind this presentation is given. For most of the various applications, much data is at hand. Therefore efficient estimation methods are needed; Bayesian methods such as MCMC and particle filtering tend to be quite slow in practice. On the other hand, IS does not always “work”. However, we have diagnostics to validate the effectiveness of the methodology. But when it works, it works and it works fast. This research aims to widen the applicability of IS: to overcome various difficulties and to make it work even better. New derivations of Kalman filter, smoother and simulations have come as by-products !

Simulated maximum likelihood for time series models – p. 6

slide-7
SLIDE 7

Model for the state vector: linear Gaussian

The linear Gaussian state process is defined as follows. αt+1 = dt + Ttαt + ηt, ηt ∼ NID(0, Qt), t = 1, . . . , n, where system vector dt and system matrices Tt and Qt are fixed and known for t = 1, . . . , n. The initial state vector is normally distributed with mean a and variance matrix P, that is α1 ∼ N(a, P). The disturbances ηt (t = 1, . . . , n) are serially independent and are independent of the initial state vector. A vector stochastic process with these properties is defined as a linear Gaussian state process.

Simulated maximum likelihood for time series models – p. 7

slide-8
SLIDE 8

Model for the state vector: linear Gaussian

The linear Gaussian state process can be expressed by the multivariate normal density α ∼ N(d, Ω), where α = (α′

1, . . . , α′ n)′ and

d = T

  • a′, d′

1, . . . , d′ n−1

′ , Ω = Tdiag(P1, Q1, . . . , Qn−1)T ′, and T =            I · · · T1 I . . . T2T1 T2 I ... . . . Tn−2 . . . T1 Tn−2 . . . T2 Tn−2 . . . T3 I Tn−1 . . . T1 Tn−1 . . . T2 Tn−1 . . . T3 · · · Tn−1 I            . Further, we have log p(α) = − qn

2 log 2π − 1 2|Ω| − 1 2(α − d)′Ω−1(α − d).

Simulated maximum likelihood for time series models – p. 8

slide-9
SLIDE 9

Gaussian observation model

The linear Gaussian observation model for vector Yt is given by Yt = ct + αt + εt, εt ∼ NID(0, Ht), t = 1, . . . , n, where vector ct and matrix Ht are fixed and known for t = 1, . . . , n. For data vector Y = (Y ′

1, . . . , Y ′ n)′, we have

Y |α ∼ N(c + α, H), with c = (c′

1, . . . , c′ n)′ and block diagonal matrix H = diag(H1, . . . , Hn).

Since E(Y ) = c + d, V ar(Y ) = Σ = Ω + H and Cov(α, Y ) = Ω, it follows from a standard lemma of the multivariate normal density that the conditional mean and variance are given by E(α|Y ) = d + ΩΣ−1 (Y − c − d) , V ar(α|Y ) = Ω − ΩΣ−1Ω.

Simulated maximum likelihood for time series models – p. 9

slide-10
SLIDE 10

Mean and mode estimation for Gaussian model

Kalman filter and smoother evaluate the mean E(αt|Y ) and V ar(αt|Y ) in a recursive and computationally efficient way, see Durbin and Koopman (2001). Since all densities are Gaussian, the mode of p(α|Y ), denoted by α, is equivalent to the mean of p(α|Y ). Applying a standard inversion lemma, it follows that

  • α = d + ΩΣ−1 (Y − c − d) , with Σ = Ω + H,

  • α =
  • Ω−1 + H−1−1

H−1{Y − c} + Ω−1d

  • ,

It should be emphasized that the Kalman filter and smoother effectively computes α.

Simulated maximum likelihood for time series models – p. 10

slide-11
SLIDE 11

The mode for nonlinear non-Gaussian observations

Consider nonlinear non-Gaussian density p(Y |α) with α as before and with the independent channel assumption p(Y |α) =

n

  • t=1

p(Yt|αt). The mode is obtained by maximising p(α|Y ) numerically since an analytical solution is not available. The standard Newton-Raphson method is adopted: for a given guess g of mode α, a new guess is g+ = g −

  • ¨

p(α|Y )|α=g −1 ˙ p(α|Y )|α=g , where the step-length is one and ˙ p(·|·) = ∂ log p(·|·) ∂α , ¨ p(·|·) = ∂2 log p(·|·) ∂α∂α′ .

Simulated maximum likelihood for time series models – p. 11

slide-12
SLIDE 12

The mode for nonlinear non-Gaussian observations

Since log p(α|Y ) = log p(Y |α) + log p(α) − log p(Y ), we have ˙ p(α|Y ) = ˙ p(Y |α) − Ω−1(α − d), ¨ p(α|Y ) = ¨ p(Y |α) − Ω−1. Independent channel assumption implies ¨ p(Y |α) block diagonal. Then, g+ = g −

  • ¨

p(Y |α)|α=g − Ω−1−1 ˙ p(Y |α)|α=g − Ω−1{g − d}

  • =
  • Ω−1 − ¨

p(Y |α)|α=g −1 ˙ p(Y |α)|α=g − ¨ p(Y |α)|α=g g + Ω−1d

  • =
  • Ω−1 + A−1−1

A−1x + Ω−1d

  • ,

where A = −

  • ¨

p(Y |α)|α=g −1 , x = g + A ˙ p(Y |α)|α=g . Since structures of Gausian mode estimator and nonlinear non-Gaussian mode “steps” are similar, compute g+ by KFS with Y = x, c = 0, H = A.

Simulated maximum likelihood for time series models – p. 12

slide-13
SLIDE 13

The mode for nonlinear non-Gaussian observations

To summarize,

  • For Gaussian model, Y = c + α + ε with ε ∼ N(0, H) and H block

diagonal, the KFS computes

  • α =
  • Ω−1 + H−1−1

H−1{Y − c} + Ω−1d

  • ,
  • For a non-Gaussian model, the mode is obtained numerically via

Newton-Raphson where at each step we compute, for a given g, g+ =

  • Ω−1 + A−1−1

A−1x + Ω−1d

  • ,

where A = −

  • ¨

p(Y |α)|α=g −1 , x = g + A ˙ p(Y |α)|α=g .

  • By considering Gaussian model with

Y = x, c = 0, H = A. it follows that the KFS computes g+.

Simulated maximum likelihood for time series models – p. 13

slide-14
SLIDE 14

What if ¨

p(Y |α) is positive ???

The arguments used are only valid when the block elements in A are positive semi-definite (H = A is the variance matrix for the Gaussian observation model). Therefore, all block elements of ¨ p(Y |α) need to be negative definite. In case elements of ¨ p(Y |α) are positive, Theorem 1 of paper claims that KFS can still be used although no appeal can be made to the linear Gaussian model. Nevertheless, the numerical algorithm for computing the next guess

  • f the mode is equivalent to the KFS equations.

So the KFS can work with “variance matrix” H being negative

  • definite. This sounds strange initially but viewing the KFS as an
  • perator for specially structured matrices does help.

Simulated maximum likelihood for time series models – p. 14

slide-15
SLIDE 15

Justification for KFS when ¨

p(Y |α) is psd

Proposition 1: Consider α ∼ NID(d, Ω) as before, Σ = A + Ω for any block diagonal matrix A with appropriate dimensions. If the LU decomposition Σ = LU, exists, with lower block unity triangular L and upper block triangular U, the forwards recursion vt = xt − at, Ft = At + Pt, Kt = TtPtF −1

t

, Lt = Tt − Kt, at+1 = dt + Ttat + Ktvt, Pt+1 = TtPtL′

t + Q∗ t ,

solves the set of linear equations Lv = x for v with x given. This result holds even when At is negative definite for any t = 1, . . . , n. Although Ft is not necessarily pd, this algorithm is the same as the Kalman filter.

Simulated maximum likelihood for time series models – p. 15

slide-16
SLIDE 16

Justification for KFS when ¨

p(Y |α) is psd

Proposition 2: Given the definitions, condition and result in Proposition 1, the computation u = Σ−1x is carried out by the backwards recursion ut = F −1

t

vt − K′

trt,

rt−1 = ut + T ′

trt, ,

for t = n, n − 1, . . . , 1 with rn = 0. The algorithm is carried out after the forwards recursion of Proposition 1 and the storage of vt, Ft and Kt for t = 1, . . . , n.

Simulated maximum likelihood for time series models – p. 16

slide-17
SLIDE 17

Justification for KFS when ¨

p(Y |α) is psd

Proposition 3: Given the definitions, conditions and results in Propositions 1 and 2, the computation a∗ = d + ΩΣ−1(x − d) is carried out by the forwards recursion a∗

t+1 = dt + Tta∗ t + Qtrt,

t = 1, . . . , n, with a∗

1 = a1 + P1r0. The algorithm is carried out after the earlier

recursions and storage of rt for t = 0, . . . , n − 1. Again, it can be shown that d + ΩΣ−1(x − d) =

  • Ω−1 + A−1−1

A−1x + Ω−1d

  • .

The right-hand side is the computation of the next mode. We have therefore shown that the standard KFS can be used for finding the mode. The derivations of the Propositions 1, 2 and 3 are in the paper.

Simulated maximum likelihood for time series models – p. 17

slide-18
SLIDE 18

Line search for finding mode

  • In current procedures for finding the mode using

Newton-Raphson, a line search is not implemented.

  • Introducing scalar 0 < λ ≤ 1 in Newton-Raphson step

g+

λ = g − λ

  • ¨

p(α|Y )|α=g −1 ˙ p(α|Y )|α=g , where the line search procedure consists of finding a λ such that p(α|Y )|α=g+

λ > p(α|Y )|α=g.

  • Equivalently, we have

g+

λ = g + λ(g+ − g),

where g+ is computed as before. Note that g+ = g+

λ with λ = 1.

  • After calculation of g+ − g, line search procedure for λ can start.
  • Note that score ˙

p(θ|y) can also be evaluated by KFS, see paper.

Simulated maximum likelihood for time series models – p. 18

slide-19
SLIDE 19

Simulation for linear Gaussian observation models

  • Simulation smoothing refers to drawing samples from pG(α|Y )

where pG(α, Y ) is implied by linear Gaussian state space model.

  • From earlier results, it follows that pG(α|Y ) = N(ˆ

α, V ) with ˆ α = d + ΩΣ−1(Y − d) =

  • Ω−1 + H−1−1 (Ω−1d + H−1Y ),

V = Ω − ΩΣ−1Ω =

  • Ω−1 + H−1−1 ,

where Σ = Ω + H.

  • The standard procedure for simulating from a multivariate normal

distribution is to carry out a Cholesky on V = LL′ and to compute ˜ α = ˆ α + Lu, u ∼ N(0, I), such that ˜ α ∼ pG(α|Y ) = N(ˆ α, V ).

  • Efficient algorithms for sampling from N(ˆ

α, V ) are developed by de Jong and Shephard (1995) and Durbin and Koopman (2002).

Simulated maximum likelihood for time series models – p. 19

slide-20
SLIDE 20

Sampling for nonlinear non-Gaussian models

  • Direct sampling from p(α|Y ) is not possible since in most cases

no analytical expressions are available for p(α|Y ).

  • Since the mode is computed via Newton-Raphson for which a

linear Gaussian model is considered at each step, we can also adopt this model at the mode of p(α|Y ).

  • Then, by adopting the simulation smoothing methods of previous

slide, we sample effectively around the mode of p(α|Y ) with a curvature implied by −¨ p−1(Y |α) for α = α.

  • It is anticipated that these samples from pG(α|Y ) (yes, pG refers

to the “mode” model) mimic samples from p(α|Y ) for obvious reasons.

  • These ideas and concepts are used for computing the likelihood

function p(y) via Monte Carlo integration.

Simulated maximum likelihood for time series models – p. 20

slide-21
SLIDE 21

Sampling for nonlinear non-Gaussian models

  • Sampling around the mode of p(α|Y ) with a curvature implied by

−¨ p−1(Y |α) for α = α is done by considering pG(α, y) and applying the simulation smoothing method of de Jong and Shephard (1995) or of Durbin and Koopman (2002).

  • However, what to do if ¨

p(Y |α) is positive ???

  • Durbin and Koopman (2002) can not be used since it requires

the unconditional sampling from pG(Y |α). Although this is simple, it is not possible with negative variances !

  • Proof of de Jong and Shephard (1995) is also based on
  • bservation densities that do not exist in this case.
  • A new algorithm (with derivation) is developed based on

matrix algebra, slightly tedious.

  • It can be shown that the new algorithm is equivalent to the de

Jong and Shephard (1995) algorithm. This method can therefore work with negative variances !

Simulated maximum likelihood for time series models – p. 21

slide-22
SLIDE 22

Theorem 2: simulation smoothing

Consider the linear Gaussian signal vector α ∼ N(µ, Ω). Further, consider x and A as defined earlier and evaluated at θ = θ. Both A and Σ = Ω + A can be nd while V = Ω − ΩΣ−1Ω = A − AΣ−1A is pd. Sampling from N

  • α , A − AΣ−1A
  • ,

is carried out by Kalman filter and simulation smoothing equations Ct = A−1

t

− F −1

t

− K′

tNtKt,

Rt = C−1

t

(A−1

t

− K′

tNtTt),

wt ∼ N(0, Ct), ut = At(wt + F −1

t

vt − K′

trt),

rt−1 = A−1

t ut − R′ twt + T ′ trt,

Nt−1 = R′

tCtRt − A−1 t

+ T ′

tNtTt,

for t = n, n − 1, . . . , 1 and with the initialisations rn = 0 and Nn = 0. It can be shown that these equations are equivalent to the ones of de Jong and Shephard (1995). However, the equations here are computationally more efficient.

Simulated maximum likelihood for time series models – p. 22

slide-23
SLIDE 23

Computing likelihood function via importance sampling

  • IS is based on the simulation of αt given the observations Y .
  • Simulations can be used to evaluate the likelihood function and to

estimate αt (signal extraction).

  • When the simulations are conditional on y, the samples are

informative with respect to y.

  • A naive Monte Carlo estimator of the likelihood function

p(y) =

  • p(y|α)p(α)dα is based on the unconditional density
  • p(y) =

M

  • i=1

p(y|αi),

where

αi ∼ p(α). This simple estimator is poor since many simulations will make no contribution to p(y|α).

  • Therefore a very large number of simulations M is needed to
  • btain only an inaccurate Monte Carlo estimate

p(y).

  • We therefore use IS !

Simulated maximum likelihood for time series models – p. 23

slide-24
SLIDE 24

Computing likelihood function via importance sampling

  • Given draws αi ∼ pG(α|y) for i = 1, . . . , M, an efficient Monte

Carlo estimator of the likelihood is given by

  • p(y)

= M −1 M

i=1 p(y|αi)p(αi) / pG(αi|y)

= pG(y)M −1 M

i=1 p(y|αi) / pG(y|αi),

where pG(α) = p(α) and pG(α|Y ) is the approximating model based on the mode and the curvature −¨ p−1(Y |α).

  • In a similar way, we obtain estimates of α :

¯ α =

  • αp(α|y)dα = p(y)−1
  • αp(α, y)dα = p(y)−1
  • αp(y|α)p(α)dα.

The Monte Carlo estimator of ¯ α is given by

  • ¯

α = M

i=1 αi p(y|αi) / pG(y|αi)

M

i=1 p(y|αi) / pG(y|αi)

.

Simulated maximum likelihood for time series models – p. 24

slide-25
SLIDE 25

An application: intra-day SV model

Jungbacker and Koopman (2005) consider a model for returns with stochastic volatility, intra-day seasonality and possible micro-structure noise as represented by the nonlinear state space model Rt = exp ξ + g(t) 2

  • exp(1

2ht)εt + σUWt,

(1)

ht = φht−1 + σηηt, t = 1, . . . , n.

(2)

where g(t) is a smooth function to capture the intra-day seasonality. We note that

  • the log-volatility hn follows an AR(1) process, can be generalised;
  • the noise (incorporating micro-structure noise) Wt follows an MA

process but can be any linear stationary process;

  • We pursue a model-based approach using maximum likelihood

estimation based on importance sampling (IS) techniques;

  • IS needs to be modified as discussed here.

Simulated maximum likelihood for time series models – p. 25

slide-26
SLIDE 26

IS for SV model with noise

  • For the case with micro-structure IID noise, the approximating

model is based on H−1

t

= −¨ pt = 1 2

  • bt − b2

t

  • +
  • bt − 1

2 bt at R2

t ,

(3)

Yt = Rt − ht − 1 2Htbt R2

t

at − 1

  • ,

(4)

where at = exp(ht) + σ2

U and bt = exp(ht) / at. We note that

at > 0 and 0 < bt ≤ 1.

  • Ht > 0 can only be guaranteed when σ2

U < exp(ht) implying

bt > 1

2: the micronoise variation must be smaller than volatility

  • variation. This can not be guaranteed.
  • Derivations are in paper.

Simulated maximum likelihood for time series models – p. 26

slide-27
SLIDE 27

Estimating a model with SV for one day

More interestingly from theoretical and empirical perspectives are the results for the returns model with SV and intra-daily seasonality. The estimates for the parameters are as follows. For model without micro-structure noise:

  • φ = 0.961,
  • σ2

η = 0.0619,

log σ = −7.977

  • γ2 = −1.654,
  • γ3 = −1.135.

For the model with micro-structure noise:

  • σ2

U = 0.00003985,

  • σU = 0.00631,
  • φ = 0.955,
  • σ2

η = 0.0821,

log σ = −8.033,

  • γ2 = −1.629,
  • γ3 = −1.065.

Although apparently the micro-structure noise seems low, it has a big impact on the estimate ση.

Simulated maximum likelihood for time series models – p. 27

slide-28
SLIDE 28

Estimated volatility for one day

(i) log-volatility, (ii) intra-daily effect, (iii) integrated volatility.

1800 5400 9000 12600 16200 19800 23400 −2.5 0.0 2.5 1800 5400 9000 12600 16200 19800 23400 −2.5 0.0 2.5 1800 5400 9000 12600 16200 19800 23400 −9.5 −9.0 −8.5 −8.0 1800 5400 9000 12600 16200 19800 23400 −9.5 −9.0 −8.5 −8.0 1800 5400 9000 12600 16200 19800 23400 0.05 0.10 0.15 0.20 1800 5400 9000 12600 16200 19800 23400 0.05 0.10 0.15 0.20

Simulated maximum likelihood for time series models – p. 28

slide-29
SLIDE 29

Estimated volatility for one day

Estimates of log σ′

t for interval of 30 minutes.

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

10:00 10:10 10:20 10:30

Simulated maximum likelihood for time series models – p. 29

slide-30
SLIDE 30

Estimated volatility for one day

Estimates of log σ′

t for interval of 5 minutes.

−2 −1 1 2

13:00 13:01 13:02 13:03 13:04 13:05

−2 −1 1 2

13:05 13:06 13:07 13:08 13:09 13:10

−2 −1 1 2

13:10 13:11 13:12 13:13 13:14 13:15

−2 −1 1 2

13:15 13:16 13:17 13:18 13:19 13:20

Simulated maximum likelihood for time series models – p. 30

slide-31
SLIDE 31

Estimates of daily volatility for 61 days

(i) RV, (ii) LL model; (iii) with seasonal; (iv) with seasonal & SV.

20 40 60 2 4 6 8 10

(i)

20 40 60 2 4 6 8 10

(ii)

20 40 60 2 4 6 8 10

(iii)

20 40 60 2 4 6 8 10

(iv)

Simulated maximum likelihood for time series models – p. 31

slide-32
SLIDE 32

Estimates of daily volatility for 61 days

ACF: (i) RV, (ii) LL model; (iii) with seasonal; (iv) with seas & SV.

5 10 −0.5 0.0 0.5 1.0

(i)

5 10 0.25 0.50 0.75 1.00

(ii)

5 10 0.25 0.50 0.75 1.00

(iii)

5 10 0.25 0.50 0.75 1.00

(iv)

Simulated maximum likelihood for time series models – p. 32