Simulated maximum likelihood for time series models with nonlinear - PowerPoint PPT Presentation

Simulated maximum likelihood for time series models with nonlinear non-Gaussian observation equations: new results and an application Siem Jan Koopman Vrije Universiteit Amsterdam Tinbergen Institute Simulated maximum likelihood for time series models – p. 1

Simulated maximum likelihood for time series models Presentation for NSF/NBER Time Series Conference, Heidelberg, Germany, 22-24 Sept 2005 Joint work with Borus Jungbacker. http://staff.feweb.vu.nl/koopman Simulated maximum likelihood for time series models – p. 2

Motivation • Stochastic volatility models ◦ Basic SV model; ◦ SV plus Xs and microstructure noise; ◦ Multivariate extensions of SV. • Credit risk and default modelling ◦ Basic model; ◦ Modelling defaults using a binomial panel time series model; ◦ Modelling rating transitions using a dynamic event-history (stochastic durations) model with multiple states. • Trend-cycle decomposition models with asymmetric cycles. • Binary models for the fertility of age cohorts of women (longitudinal models). Simulated maximum likelihood for time series models – p. 3

High-frequency data (tick-by-tick) Prices and returns of IBM, November 1, 2002, against seconds. Price IBM Stock 11/1/2002 80.5 80.0 79.5 79.0 78.5 10:00 11:00 12:00 13:00 14:00 15:00 16:00 Log Returns IBM Stock 11/1/2002 0.50 0.25 0.00 −0.25 10:00 11:00 12:00 13:00 14:00 15:00 16:00 Simulated maximum likelihood for time series models – p. 4

An intra-day returns model Jungbacker and Koopman (2005) consider a model for returns with stochastic volatility, intra-day seasonality and possible micro-structure noise as represented by the nonlinear state space model � ξ + g ( t ) � exp(1 R t = exp 2 h t ) ε t + σ U W t , 2 h t = φh t − 1 + σ η η t , t = 1 , . . . , n. where g ( t ) is a smooth function to capture the intra-day seasonality. We note that • the log-volatility h t follows an AR(1) process, can be generalised; • the noise (incorporating micro-structure noise) W t follows an MA process but can be any linear stationary process; • We pursue a model-based approach using maximum likelihood estimation based on importance sampling (IS) techniques; • IS needs to be modified as we will present next. Simulated maximum likelihood for time series models – p. 5

Estimation methodology: importance sampling (IS) A motivation of the work behind this presentation is given. For most of the various applications, much data is at hand. Therefore efficient estimation methods are needed; Bayesian methods such as MCMC and particle filtering tend to be quite slow in practice. On the other hand, IS does not always “work” . However, we have diagnostics to validate the effectiveness of the methodology. But when it works, it works and it works fast . This research aims to widen the applicability of IS: to overcome various difficulties and to make it work even better. New derivations of Kalman filter, smoother and simulations have come as by-products ! Simulated maximum likelihood for time series models – p. 6

Model for the state vector: linear Gaussian The linear Gaussian state process is defined as follows. α t +1 = d t + T t α t + η t , η t ∼ NID (0 , Q t ) , t = 1 , . . . , n, where system vector d t and system matrices T t and Q t are fixed and known for t = 1 , . . . , n . The initial state vector is normally distributed with mean a and variance matrix P , that is α 1 ∼ N ( a, P ) . The disturbances η t ( t = 1 , . . . , n ) are serially independent and are independent of the initial state vector. A vector stochastic process with these properties is defined as a linear Gaussian state process . Simulated maximum likelihood for time series models – p. 7

Model for the state vector: linear Gaussian The linear Gaussian state process can be expressed by the n ) ′ and multivariate normal density α ∼ N ( d, Ω) , where α = ( α ′ 1 , . . . , α ′ � � ′ , a ′ , d ′ 1 , . . . , d ′ Ω = T diag ( P 1 , Q 1 , . . . , Q n − 1 ) T ′ , d = T n − 1 and   I 0 0 · · · 0 0   T 1 I 0 . . . 0 0       T 2 T 1 T 2 I 0 0   T = .   . ... .   .     T n − 2 . . . T 1 T n − 2 . . . T 2 T n − 2 . . . T 3 I 0   T n − 1 . . . T 1 T n − 1 . . . T 2 T n − 1 . . . T 3 · · · T n − 1 I Further, we have log p ( α ) = − qn 2 log 2 π − 1 2 | Ω | − 1 2 ( α − d ) ′ Ω − 1 ( α − d ) . Simulated maximum likelihood for time series models – p. 8

Gaussian observation model The linear Gaussian observation model for vector Y t is given by Y t = c t + α t + ε t , ε t ∼ NID (0 , H t ) , t = 1 , . . . , n, where vector c t and matrix H t are fixed and known for t = 1 , . . . , n . For data vector Y = ( Y ′ 1 , . . . , Y ′ n ) ′ , we have Y | α ∼ N ( c + α, H ) , n ) ′ and block diagonal matrix H = diag ( H 1 , . . . , H n ) . with c = ( c ′ 1 , . . . , c ′ Since E ( Y ) = c + d , V ar ( Y ) = Σ = Ω + H and Cov ( α, Y ) = Ω , it follows from a standard lemma of the multivariate normal density that the conditional mean and variance are given by E ( α | Y ) = d + ΩΣ − 1 ( Y − c − d ) , V ar ( α | Y ) = Ω − ΩΣ − 1 Ω . Simulated maximum likelihood for time series models – p. 9

Mean and mode estimation for Gaussian model Kalman filter and smoother evaluate the mean E ( α t | Y ) and V ar ( α t | Y ) in a recursive and computationally efficient way, see Durbin and Koopman (2001). Since all densities are Gaussian, the mode of p ( α | Y ) , denoted by � α , is equivalent to the mean of p ( α | Y ) . Applying a standard inversion lemma, it follows that α = d + ΩΣ − 1 ( Y − c − d ) , with Σ = Ω + H, � ⇔ � Ω − 1 + H − 1 � − 1 � � H − 1 { Y − c } + Ω − 1 d α = , � It should be emphasized that the Kalman filter and smoother effectively computes � α . Simulated maximum likelihood for time series models – p. 10

The mode for nonlinear non-Gaussian observations Consider nonlinear non-Gaussian density p ( Y | α ) with α as before and with the independent channel assumption n � p ( Y | α ) = p ( Y t | α t ) . t =1 The mode is obtained by maximising p ( α | Y ) numerically since an analytical solution is not available. The standard Newton-Raphson method is adopted: for a given guess g of mode � α , a new guess is � � − 1 g + = g − p ( α | Y ) | α = g ¨ p ( α | Y ) | α = g , ˙ where the step-length is one and p ( ·|· ) = ∂ 2 log p ( ·|· ) p ( ·|· ) = ∂ log p ( ·|· ) ˙ , ¨ . ∂α ∂α∂α ′ Simulated maximum likelihood for time series models – p. 11

The mode for nonlinear non-Gaussian observations Since log p ( α | Y ) = log p ( Y | α ) + log p ( α ) − log p ( Y ) , we have p ( Y | α ) − Ω − 1 ( α − d ) , p ( Y | α ) − Ω − 1 . p ( α | Y ) = ˙ ˙ p ( α | Y ) = ¨ ¨ Independent channel assumption implies ¨ p ( Y | α ) block diagonal. Then, � p ( Y | α ) | α = g − Ω − 1 � − 1 � � g + p ( Y | α ) | α = g − Ω − 1 { g − d } = g − ¨ ˙ � � − 1 � � Ω − 1 − ¨ p ( Y | α ) | α = g g + Ω − 1 d = p ( Y | α ) | α = g p ( Y | α ) | α = g − ¨ ˙ � Ω − 1 + A − 1 � − 1 � � A − 1 x + Ω − 1 d = , � � − 1 where A = − p ( Y | α ) | α = g ¨ , x = g + A ˙ p ( Y | α ) | α = g . Since structures of Gausian mode estimator and nonlinear non-Gaussian mode “steps” are similar, compute g + by KFS with Y = x, c = 0 , H = A. Simulated maximum likelihood for time series models – p. 12

The mode for nonlinear non-Gaussian observations To summarize, • For Gaussian model, Y = c + α + ε with ε ∼ N (0 , H ) and H block diagonal, the KFS computes � Ω − 1 + H − 1 � − 1 � � H − 1 { Y − c } + Ω − 1 d α = , � • For a non-Gaussian model, the mode is obtained numerically via Newton-Raphson where at each step we compute, for a given g , � Ω − 1 + A − 1 � − 1 � � g + = A − 1 x + Ω − 1 d , � � − 1 where A = − p ( Y | α ) | α = g ¨ , x = g + A ˙ p ( Y | α ) | α = g . • By considering Gaussian model with Y = x, c = 0 , H = A. it follows that the KFS computes g + . Simulated maximum likelihood for time series models – p. 13

What if ¨ p ( Y | α ) is positive ??? The arguments used are only valid when the block elements in A are positive semi-definite ( H = A is the variance matrix for the Gaussian observation model). Therefore, all block elements of ¨ p ( Y | α ) need to be negative definite. In case elements of ¨ p ( Y | α ) are positive, Theorem 1 of paper claims that KFS can still be used although no appeal can be made to the linear Gaussian model. Nevertheless, the numerical algorithm for computing the next guess of the mode is equivalent to the KFS equations. So the KFS can work with “variance matrix” H being negative definite. This sounds strange initially but viewing the KFS as an operator for specially structured matrices does help. Simulated maximum likelihood for time series models – p. 14

Simulated maximum likelihood for time series models with nonlinear - PowerPoint PPT Presentation

Simulated maximum likelihood for time series models with nonlinear non-Gaussian observation equations: new results and an application Siem Jan Koopman Vrije Universiteit Amsterdam Tinbergen Institute Simulated maximum likelihood for time

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Simulated Annealing Simulated annealing is a probabilistic search algorithm. The

Simulated Annealing G5BAIM: Artificial Intelligence Methods Graham Kendall 15 Feb 09 1

Outline Convergence DM812 METAHEURISTICS Lecture 2 1. Simulated Annealing Simulated Annealing

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Outline n Maximum likelihood (ML) n Priors, and maximum a posteriori (MAP) n

To risk or not to risk? Dr Iraklis Lazakis Dpt of NAOME University of Strathclyde LNG Bunkering

A Bayesian Model of Pronoun Production and Interpretation Andrew Kehler UCSD Linguistics

Traffic Classification Rotsos Charalampos , Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani

Mathematical Strategies for Filtering Turbulent Systems: Sparse Observations, Model Errors, and

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

Organizing the Mathematical Literature On the Road to MSC 2020 Edward Dunne Fabian Mller

Exchange Rate is Disconnected after All Yu-Chin Chen 1 , Ippei Fujiwara 2 and Yasuo Hirose 3 1

Investor Presentation November 2019 Forward-Looking Statements and Other Disclaimers These