SLIDE 1 Multivariate and Partially observed models
Erik Lindström
SLIDE 2
Briefly on multivariate models
Consider the Vector-AR(1) (VAR) process. Xn+1 = AXn + εn+1, ε ∼ MVN(0, Σ) (1) Can we estimate the parameters? Yes, with a bit of matrix algebra tricks. Writing down the log-likelihood... [checking the Matrix Cookbook] Leads to A
N 1 n 1
Xn
1XT n N 1 n 1
XnXT
n 1
(2)
N 1 n 1 n T n
(3)
SLIDE 3
Briefly on multivariate models
Consider the Vector-AR(1) (VAR) process. Xn+1 = AXn + εn+1, ε ∼ MVN(0, Σ) (1) Can we estimate the parameters? Yes, with a bit of matrix algebra tricks. Writing down the log-likelihood... [checking the Matrix Cookbook] Leads to A
N 1 n 1
Xn
1XT n N 1 n 1
XnXT
n 1
(2)
N 1 n 1 n T n
(3)
SLIDE 4
Briefly on multivariate models
Consider the Vector-AR(1) (VAR) process. Xn+1 = AXn + εn+1, ε ∼ MVN(0, Σ) (1) Can we estimate the parameters? Yes, with a bit of matrix algebra tricks. Writing down the log-likelihood... [checking the Matrix Cookbook] Leads to ˆ A = (N−1 ∑
n=1
Xn+1XT
n
) (N−1 ∑
n=1
XnXT
n
)−1 (2) ˆ Σ =
N−1
∑
n=1
ˆ εnˆ εT
n
(3)
SLIDE 5 Motivation, partially observed models
◮ Used when regressors are unobservable
- Eg. when the regressor dimension is larger than
the observable state dimension (think stoch. vol).
- r interest rate models
- r credit models (hidden jump intensity process)
Missing observations can be treated in this framework.
SLIDE 6 Motivation, partially observed models
◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than
the observable state dimension (think stoch. vol).
- r interest rate models
- r credit models (hidden jump intensity process)
Missing observations can be treated in this framework.
SLIDE 7 Motivation, partially observed models
◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than
the observable state dimension (think stoch. vol).
◮ or interest rate models
- r credit models (hidden jump intensity process)
Missing observations can be treated in this framework.
SLIDE 8
Motivation, partially observed models
◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than
the observable state dimension (think stoch. vol).
◮ or interest rate models ◮ or credit models (hidden jump intensity process)
Missing observations can be treated in this framework.
SLIDE 9
Motivation, partially observed models
◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than
the observable state dimension (think stoch. vol).
◮ or interest rate models ◮ or credit models (hidden jump intensity process) ◮ Missing observations can be treated in this
framework.
SLIDE 10
Examples
◮ Stochastic volatility
yt = σtηt (4) log σ2
t = a0 + a1 log σ2 t−1 + et
(5) Short rate models drt rt dt rtdWt (6) P t T A t T e
B t T rt
(7)
SLIDE 11
Examples
◮ Stochastic volatility
yt = σtηt (4) log σ2
t = a0 + a1 log σ2 t−1 + et
(5)
◮ Short rate models
drt = α(β − rt)dt + √ γ + δrtdWt (6) P(t, T) = A(t, T)e−B(t,T)rt (7)
SLIDE 12
Stoch vol.
Let us start with the stoch vol. model. yt = σtηt (8) log σ2
t
= a0 + a1 log σ2
t−1 + et
(9)
2 t is not directly observable
but can be estimated. Likelihood p y1 yT p
1 y1 T yT d 1 T
Dependence structure?
SLIDE 13
Stoch vol.
Let us start with the stoch vol. model. yt = σtηt (8) log σ2
t
= a0 + a1 log σ2
t−1 + et
(9)
◮ σ2 t is not directly observable
but can be estimated. Likelihood p y1 yT p
1 y1 T yT d 1 T
Dependence structure?
SLIDE 14
Stoch vol.
Let us start with the stoch vol. model. yt = σtηt (8) log σ2
t
= a0 + a1 log σ2
t−1 + et
(9)
◮ σ2 t is not directly observable ◮ but can be estimated. ◮ Likelihood
p(y1, . . . , yT) = ∫ p(σ1, y1, . . . , σT, yT)dσ1:T? Dependence structure?
SLIDE 15
Stoch vol.
Let us start with the stoch vol. model. yt = σtηt (8) log σ2
t
= a0 + a1 log σ2
t−1 + et
(9)
◮ σ2 t is not directly observable ◮ but can be estimated. ◮ Likelihood
p(y1, . . . , yT) = ∫ p(σ1, y1, . . . , σT, yT)dσ1:T?
◮ Dependence structure?
SLIDE 16
General state space models
All models we use can be written in general state space form yt = h(xt, ηt) (10) xt = f(xt−1, et) (11)
◮ x is a hidden (unobservable) Markov process (cf.
HMM)
◮ y is observed. ◮ yt|xt is independent of ys, s = 1..t − 1, t + 1..T. ◮ These rather simple structures can generate
complex models!
SLIDE 17
Structure
All models we use can be written in state space form yt = h(xt, ηt) (12) xt = f(xt−1, et) (13)
◮ These equations imply transition probabilities,
i.e. we can derive p(xt|xt−1) and p(yt|xt) from the model setup.
◮ We also need p(x0), i.e. initial conditions.
SLIDE 18
Likelihood
The likelihood can be written as p(y1, . . . , yT) = p(y1)
T
∏
t=2
p(yt|y1:t−1), where y1:t−1 is shorthand notation for {y1, . . . , yt−1}. We can write p yt y1 t
1
p yt xt p xt y1 t
1 dxt
and p xt y1 t
1
p xt xt
1 p xt 1 y1 t 1 dxt 1
SLIDE 19
Likelihood
The likelihood can be written as p(y1, . . . , yT) = p(y1)
T
∏
t=2
p(yt|y1:t−1), where y1:t−1 is shorthand notation for {y1, . . . , yt−1}. We can write p(yt|y1:t−1) = ∫ p(yt|xt)p(xt|y1:t−1)dxt and p(xt|y1:t−1) = ∫ p(xt|xt−1)p(xt−1|y1:t−1)dxt−1
SLIDE 20 Filter density
◮ The density for the hidden state xt, using the
information y1:t is called the filter density, p(xt|y1:t). We can derive the filter density from p xt y1 t p yt xt p xt y1 t
1
p yt y1 t
1
p xt y1 t p yt xt p xt y1 t
1
p yt xt p xt y1 t
1 dxt
SLIDE 21 Filter density
◮ The density for the hidden state xt, using the
information y1:t is called the filter density, p(xt|y1:t).
◮ We can derive the filter density from
p(xt|y1:t) = p(yt|xt)p(xt|y1:t−1) p(yt|y1:t−1) .
p(xt|y1:t) = p(yt|xt)p(xt|y1:t−1) ∫ p(yt|xt)p(xt|y1:t−1)dxt .
SLIDE 22
Predictive density
◮ The density for the hidden state xt, using the
information y1:t−1 is called the predictive density, p(xt|y1:t−1). We can derive the predictive density from p xt y1 t
1
p xt xt
1 p xt 1 y1 t 1 dxt 1
SLIDE 23
Predictive density
◮ The density for the hidden state xt, using the
information y1:t−1 is called the predictive density, p(xt|y1:t−1).
◮ We can derive the predictive density from
p(xt|y1:t−1) = ∫ p(xt|xt−1)p(xt−1|y1:t−1)dxt−1.
SLIDE 24 Recursion
- 1. We have the filter density p(x0) at time 0.
- 2. At time t, generate the predictive density
p(xt+1|y1:t).
- 3. At time t + 1, calculate p(yt|y1:t−1) and update
the filter density p(xt+1|y1:t+1). Repeat from step 2.
SLIDE 25
Working recursions
Essentially there are only 2 recursions known in closed form
◮ HMM (finite state space) ◮ Kalman filter (linear, Gaussian models)
SLIDE 26
Kalman filter
Why does it give closed form recursions? Short answer: The Gaussian density is an exponential, second order polynomial. Model: Yt = CXt + ηt, ηt ∈ N(0, Γ) Xt = AXt−1 + et, et ∈ N(0, Σ) [1] Assume initial distribution p x0 x0 m0 P0
SLIDE 27
Kalman filter
Why does it give closed form recursions? Short answer: The Gaussian density is an exponential, second order polynomial. Model: Yt = CXt + ηt, ηt ∈ N(0, Γ) Xt = AXt−1 + et, et ∈ N(0, Σ) [1] Assume initial distribution p(x0|F0) = φ(x0; m0, P0)
SLIDE 28 Kalman filter
[2] Calculate the predictive density p(x1|F0) = ∫ p(x1|x0)p(x0|F0)dx0 Here p(x1|x0) = φ(x1; Ax0, Σ), thus giving ∝ ∫ e− 1
2 (x1−Ax0)TΣ−1(x1−Ax0)e− 1 2 (x0−m0)TP−1 0 (x0−m0)dx0.
Some calculations give x1 Am0 AP0AT
SLIDE 29 Kalman filter
[2] Calculate the predictive density p(x1|F0) = ∫ p(x1|x0)p(x0|F0)dx0 Here p(x1|x0) = φ(x1; Ax0, Σ), thus giving ∝ ∫ e− 1
2 (x1−Ax0)TΣ−1(x1−Ax0)e− 1 2 (x0−m0)TP−1 0 (x0−m0)dx0.
Some calculations give = φ(x1; Am0, AP0AT + Σ).
SLIDE 30
Kalman filter
[3] The filter density is more complicated. We have p(xt|y1:t) = p(yt|xt)p(xt|y1:t−1) ∫ p(yt|xt)p(xt|y1:t−1)dxt . Thus p(x1|y1) = p(y1|x1)p(x1|F0) p(y1|F0) ∝ p(y1|x1)p(x1|F0). Note that the likelihood is a normalization, and independent of x1.
SLIDE 31
Kalman filter
[3] The filter density is more complicated. We have p(xt|y1:t) = p(yt|xt)p(xt|y1:t−1) ∫ p(yt|xt)p(xt|y1:t−1)dxt . Thus p(x1|y1) = p(y1|x1)p(x1|F0) p(y1|F0) ∝ p(y1|x1)p(x1|F0). Note that the likelihood is a normalization, and independent of x1.
SLIDE 32
Kalman filter
Tedious calculations give p(x1|y1) = φ(x1; m1|0 + K1(y1 − Cm1|0), P1|0 − K1CP1|0), where var(η) = Γ m1|0 = Am0 P1|0 = AP0AT + Σ Ω = CP1|0CT + Γ K1 = P1|0CTΩ−1 Still Gaussian!
SLIDE 33
Non-linear models
Approximate non-linear models with a local linear model.
◮ EKF ◮ Sigma point filters / Unscented Kalman Filter
(UKF)
◮ Ensemble filters (EnKF)
Or use Monte Carlo methods (will cover this later)
◮ Particle filters ◮ Hybrid particle filters
SLIDE 34
Extended Kalman filter (EKF)
Use Taylor expansions to approximate the non-linear model with a linear model. yt+1 = h(xt, ηt+1) (14) xt+1 = f(xt, et+1) (15)
◮ Prediction
mt+1|t = f(mt|t, 0), Pt+1|t = FtPt|tFT
t + Σ,
Ft = f′
x(xt, 0).
SLIDE 35
Extended Kalman filter (EKF)
While the filtering is given by:
◮ Filtering
mt+1|t+1 = mt+1|t + Kt(yt+1 − h(mt+1|t, 0)) Pt+1|t+1 = Pt+1|t − KtHtPt+1|t, where Ωt = HtPt+1|tHT
t + Γ
Kt = Pt+1|tHT
t Ω−1 t
Ht = h′
x(xt, 0)
Even better is Iterated EKFs, by reinterpreting the problem as an optimization of the log-posterior.
SLIDE 36
Other filters
Alternatives include:
◮ Iterated Kalman filters improve the quality of
the linearizations. Sigma point filters (UKF) use cleverly sampled points to derive the first and second moment. Ensemble filters (EnKF) use Monte Carlo methods to approximate the first and second central moments. These are in general more accurate than the EKF.
SLIDE 37
Other filters
Alternatives include:
◮ Iterated Kalman filters improve the quality of
the linearizations.
◮ Sigma point filters (UKF) use cleverly sampled
points to derive the first and second moment. Ensemble filters (EnKF) use Monte Carlo methods to approximate the first and second central moments. These are in general more accurate than the EKF.
SLIDE 38
Other filters
Alternatives include:
◮ Iterated Kalman filters improve the quality of
the linearizations.
◮ Sigma point filters (UKF) use cleverly sampled
points to derive the first and second moment.
◮ Ensemble filters (EnKF) use Monte Carlo
methods to approximate the first and second central moments. These are in general more accurate than the EKF.
SLIDE 39
Other filters
Alternatives include:
◮ Iterated Kalman filters improve the quality of
the linearizations.
◮ Sigma point filters (UKF) use cleverly sampled
points to derive the first and second moment.
◮ Ensemble filters (EnKF) use Monte Carlo
methods to approximate the first and second central moments.
◮ These are in general more accurate than the EKF.
SLIDE 40
Case: Calibration of options
The most common method for calibrating options to market data today is some non-linear weighted least squares estimator θ = arg min ∑
i
λi ( cMarket
t
(Si, Ki, ri, τi) − cModel
t
(Si, Ki, ri, τi; θ) )2 (16) where cMarket(Si, Ki, ri, τi) are the market price that depends on the underlying asset Si, strike level Ki, interest rate ri and time to maturity τi and λi are weights. There are two main (implicitly related) problems with this approach: The parameter estimates are noisy, Old data is typically discarded, as only the most recent data is used.
SLIDE 41
Case: Calibration of options
The most common method for calibrating options to market data today is some non-linear weighted least squares estimator θ = arg min ∑
i
λi ( cMarket
t
(Si, Ki, ri, τi) − cModel
t
(Si, Ki, ri, τi; θ) )2 (16) where cMarket(Si, Ki, ri, τi) are the market price that depends on the underlying asset Si, strike level Ki, interest rate ri and time to maturity τi and λi are weights. There are two main (implicitly related) problems with this approach:
◮ The parameter estimates are noisy, ◮ Old data is typically discarded, as only the most
recent data is used.
SLIDE 42 Simulated data from the Bates (1996) model
50 100 150 200 0.5 1 1.5 2 2.5 3 3.5 κ 50 100 150 200 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 ξ 50 100 150 200 0.25 0.3 0.35 0.4 0.45 0.5 σV 50 100 150 200 −1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 ρ 50 100 150 200 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Vt 50 100 150 200 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 λ 50 100 150 200 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 µ 50 100 150 200 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 δ
Figure: WLS estimates (red circles) and the true parameters (solid blue line). The WLS works most of the time...
SLIDE 43
Calibration through filtering
(Lindström et al, 2008) rewrites the calibration problems as a filtering problem, augmenting the latent states with the parameter vector cMarket
t
(Sn, Ki, ri, τi) = cModel
t
(Sn, Ki, ri, τi; θn) + ηn, (17) θn = θn−1 + en. (18) This decomposes the change of the option prices into changes in the underlying state variables (i.e. the index level), changes in the parameters (which is captured by the random walk dynamics) and pure noise due to the ask-bid spread.
SLIDE 44 Same graph with data from the Bates (1996) model
50 100 150 200 0.5 1 1.5 2 2.5 3 3.5 κ 50 100 150 200 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 ξ 50 100 150 200 0.25 0.3 0.35 0.4 0.45 0.5 σV 50 100 150 200 −1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 ρ 50 100 150 200 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Vt 50 100 150 200 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 λ 50 100 150 200 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 µ 50 100 150 200 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 δ
Figure: WLS estimates (red circles), the true parameters (solid blue line) and filter estimates (black dots).
SLIDE 45
Filters
We have successfully used
◮ The EKF (not optimal though...) ◮ The Iterated EKF (Lindström et. al, 2008) ◮ The UKF (Wiktorsson & Lindström, 2014) ◮ The EnKF (Lindström & Guo, 2013) ◮ Tuned UKF (Lindström & Åkerlindh, 2016)
However, (simple) Monte Carlo filters does not work very well for this problem.
SLIDE 46
Filters
We have successfully used
◮ The EKF (not optimal though...) ◮ The Iterated EKF (Lindström et. al, 2008) ◮ The UKF (Wiktorsson & Lindström, 2014) ◮ The EnKF (Lindström & Guo, 2013) ◮ Tuned UKF (Lindström & Åkerlindh, 2016)
However, (simple) Monte Carlo filters does not work very well for this problem.
SLIDE 47
Quadratic Hedging for free!
◮ The quadratic hedging problem was given by
(ˆ α, ˆ β) = arg min E [ (π(S(T) − αS(T) − βB(T))2 |F(0) ] . (19) The solution to this problem is given by the means, variances and covariances between these assets. Most/All of these are computed in the filter (Lindström & Guo, 2013)!
SLIDE 48
Quadratic Hedging for free!
◮ The quadratic hedging problem was given by
(ˆ α, ˆ β) = arg min E [ (π(S(T) − αS(T) − βB(T))2 |F(0) ] . (19)
◮ The solution to this problem is given by the
means, variances and covariances between these assets. Most/All of these are computed in the filter (Lindström & Guo, 2013)!
SLIDE 49
Quadratic Hedging for free!
◮ The quadratic hedging problem was given by
(ˆ α, ˆ β) = arg min E [ (π(S(T) − αS(T) − βB(T))2 |F(0) ] . (19)
◮ The solution to this problem is given by the
means, variances and covariances between these assets.
◮ Most/All of these are computed in the filter
(Lindström & Guo, 2013)!
SLIDE 50
Proof
Check the derivations! Example (EKF). h mt
1 t 0 - Predicted mean
(20)
t- Covariance of the assets
(21) This works if the model is modified by also including the underlying asset S into the model. cMarket
t
Sn Ki ri
i
cModel
t
Sn Ki ri
i n c n
(22) SMarket
t
Sn
S n
(23)
n n 1
en (24)
SLIDE 51
Proof
Check the derivations! Example (EKF). h(mt+1|t, 0)- Predicted mean (20) Ωt- Covariance of the assets (21) This works if the model is modified by also including the underlying asset S into the model. cMarket
t
Sn Ki ri
i
cModel
t
Sn Ki ri
i n c n
(22) SMarket
t
Sn
S n
(23)
n n 1
en (24)
SLIDE 52
Proof
Check the derivations! Example (EKF). h(mt+1|t, 0)- Predicted mean (20) Ωt- Covariance of the assets (21) This works if the model is modified by also including the underlying asset S into the model. cMarket
t
(Sn, Ki, ri, τi) = cModel
t
(Sn, Ki, ri, τi; θn) + η(c)
n ,
(22) SMarket
t
= Sn + η(S)
n
(23) θn = θn−1 + en. (24)
SLIDE 53
Parameter estimation
Essentially two options (there are more...)
◮ Direct maximization of the log-likelihood
function The EM-algorithm The EM-algorithm consists of two steps Compute the expectation of the intermediate quantity Q
m
E log p X Y Y
m
(25) Maximize Q
m according to m 1
arg max Q
m
(26)
SLIDE 54
Parameter estimation
Essentially two options (there are more...)
◮ Direct maximization of the log-likelihood
function
◮ The EM-algorithm
The EM-algorithm consists of two steps Compute the expectation of the intermediate quantity Q
m
E log p X Y Y
m
(25) Maximize Q
m according to m 1
arg max Q
m
(26)
SLIDE 55
Parameter estimation
Essentially two options (there are more...)
◮ Direct maximization of the log-likelihood
function
◮ The EM-algorithm
The EM-algorithm consists of two steps
◮ Compute the expectation of the intermediate
quantity Q(θ|θm) = E [log pθ(X, Y)|Y, θm] (25) Maximize Q
m according to m 1
arg max Q
m
(26)
SLIDE 56
Parameter estimation
Essentially two options (there are more...)
◮ Direct maximization of the log-likelihood
function
◮ The EM-algorithm
The EM-algorithm consists of two steps
◮ Compute the expectation of the intermediate
quantity Q(θ|θm) = E [log pθ(X, Y)|Y, θm] (25)
◮ Maximize Q(θ, θm) according to
θm+1 = arg max Q(θ, θm) (26)
SLIDE 57
EM cont.
Note that the intermediate quantity can be written as Q(θ|θm) = E [log pθ(X, Y)|Y, θm] (27) = E [ N ∑
n=1
log pθ(Xn|Xn−1) + log pθ(Yn|Xn)|Y, θm ] (28) Solution is then almost given by the VAR(1) estimates
SLIDE 58
Some references
◮ Särkkä, S. (2013). Bayesian filtering and smoothing
(Vol. 3). Cambridge University Press. https://users.aalto.fi/~ssarkka/
◮ Lindström, E., Ströjby, J., Brodén, M., Wiktorsson, M.
& Holst, J. (2008) Sequential Calibration of Options, Computational Statistics and Data Analysis. 52, 2877-2891, (2008).
◮ Lindström, E. & Guo, J. (2013) Simultaneous
Calibration and Hedging. in Quantitative and Qualitative Analysis in Social Sciences Issue 1, Volume 7
◮ Lindström, E. & Åkerlindh, C., (2016) Optimal
Adaptive Sequential Calibration of Option Models. Forthcoming in Springer Handbook of Recent Advances in Commodity and Financial Modelling - Quantitative Methods in Banking, Finance. Insurance, Energy and Commodity markets
SLIDE 59