Multivariate and Partially observed models Erik Lindstrm n T - - PowerPoint PPT Presentation

multivariate and partially observed models
SMART_READER_LITE
LIVE PREVIEW

Multivariate and Partially observed models Erik Lindstrm n T - - PowerPoint PPT Presentation

Multivariate and Partially observed models Erik Lindstrm n T Briefly on multivariate models N n 1 n 1 N (2) 1 n X n X T 1 n 1 n Consider the Vector-AR(1) (VAR) process. 1 X T X n 1 n 1 N A Matrix Cookbook] Leads to Writing


slide-1
SLIDE 1

Multivariate and Partially observed models

Erik Lindström

slide-2
SLIDE 2

Briefly on multivariate models

Consider the Vector-AR(1) (VAR) process. Xn+1 = AXn + εn+1, ε ∼ MVN(0, Σ) (1) Can we estimate the parameters? Yes, with a bit of matrix algebra tricks. Writing down the log-likelihood... [checking the Matrix Cookbook] Leads to A

N 1 n 1

Xn

1XT n N 1 n 1

XnXT

n 1

(2)

N 1 n 1 n T n

(3)

slide-3
SLIDE 3

Briefly on multivariate models

Consider the Vector-AR(1) (VAR) process. Xn+1 = AXn + εn+1, ε ∼ MVN(0, Σ) (1) Can we estimate the parameters? Yes, with a bit of matrix algebra tricks. Writing down the log-likelihood... [checking the Matrix Cookbook] Leads to A

N 1 n 1

Xn

1XT n N 1 n 1

XnXT

n 1

(2)

N 1 n 1 n T n

(3)

slide-4
SLIDE 4

Briefly on multivariate models

Consider the Vector-AR(1) (VAR) process. Xn+1 = AXn + εn+1, ε ∼ MVN(0, Σ) (1) Can we estimate the parameters? Yes, with a bit of matrix algebra tricks. Writing down the log-likelihood... [checking the Matrix Cookbook] Leads to ˆ A = (N−1 ∑

n=1

Xn+1XT

n

) (N−1 ∑

n=1

XnXT

n

)−1 (2) ˆ Σ =

N−1

n=1

ˆ εnˆ εT

n

(3)

slide-5
SLIDE 5

Motivation, partially observed models

◮ Used when regressors are unobservable

  • Eg. when the regressor dimension is larger than

the observable state dimension (think stoch. vol).

  • r interest rate models
  • r credit models (hidden jump intensity process)

Missing observations can be treated in this framework.

slide-6
SLIDE 6

Motivation, partially observed models

◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than

the observable state dimension (think stoch. vol).

  • r interest rate models
  • r credit models (hidden jump intensity process)

Missing observations can be treated in this framework.

slide-7
SLIDE 7

Motivation, partially observed models

◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than

the observable state dimension (think stoch. vol).

◮ or interest rate models

  • r credit models (hidden jump intensity process)

Missing observations can be treated in this framework.

slide-8
SLIDE 8

Motivation, partially observed models

◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than

the observable state dimension (think stoch. vol).

◮ or interest rate models ◮ or credit models (hidden jump intensity process)

Missing observations can be treated in this framework.

slide-9
SLIDE 9

Motivation, partially observed models

◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than

the observable state dimension (think stoch. vol).

◮ or interest rate models ◮ or credit models (hidden jump intensity process) ◮ Missing observations can be treated in this

framework.

slide-10
SLIDE 10

Examples

◮ Stochastic volatility

yt = σtηt (4) log σ2

t = a0 + a1 log σ2 t−1 + et

(5) Short rate models drt rt dt rtdWt (6) P t T A t T e

B t T rt

(7)

slide-11
SLIDE 11

Examples

◮ Stochastic volatility

yt = σtηt (4) log σ2

t = a0 + a1 log σ2 t−1 + et

(5)

◮ Short rate models

drt = α(β − rt)dt + √ γ + δrtdWt (6) P(t, T) = A(t, T)e−B(t,T)rt (7)

slide-12
SLIDE 12

Stoch vol.

Let us start with the stoch vol. model. yt = σtηt (8) log σ2

t

= a0 + a1 log σ2

t−1 + et

(9)

2 t is not directly observable

but can be estimated. Likelihood p y1 yT p

1 y1 T yT d 1 T

Dependence structure?

slide-13
SLIDE 13

Stoch vol.

Let us start with the stoch vol. model. yt = σtηt (8) log σ2

t

= a0 + a1 log σ2

t−1 + et

(9)

◮ σ2 t is not directly observable

but can be estimated. Likelihood p y1 yT p

1 y1 T yT d 1 T

Dependence structure?

slide-14
SLIDE 14

Stoch vol.

Let us start with the stoch vol. model. yt = σtηt (8) log σ2

t

= a0 + a1 log σ2

t−1 + et

(9)

◮ σ2 t is not directly observable ◮ but can be estimated. ◮ Likelihood

p(y1, . . . , yT) = ∫ p(σ1, y1, . . . , σT, yT)dσ1:T? Dependence structure?

slide-15
SLIDE 15

Stoch vol.

Let us start with the stoch vol. model. yt = σtηt (8) log σ2

t

= a0 + a1 log σ2

t−1 + et

(9)

◮ σ2 t is not directly observable ◮ but can be estimated. ◮ Likelihood

p(y1, . . . , yT) = ∫ p(σ1, y1, . . . , σT, yT)dσ1:T?

◮ Dependence structure?

slide-16
SLIDE 16

General state space models

All models we use can be written in general state space form yt = h(xt, ηt) (10) xt = f(xt−1, et) (11)

◮ x is a hidden (unobservable) Markov process (cf.

HMM)

◮ y is observed. ◮ yt|xt is independent of ys, s = 1..t − 1, t + 1..T. ◮ These rather simple structures can generate

complex models!

slide-17
SLIDE 17

Structure

All models we use can be written in state space form yt = h(xt, ηt) (12) xt = f(xt−1, et) (13)

◮ These equations imply transition probabilities,

i.e. we can derive p(xt|xt−1) and p(yt|xt) from the model setup.

◮ We also need p(x0), i.e. initial conditions.

slide-18
SLIDE 18

Likelihood

The likelihood can be written as p(y1, . . . , yT) = p(y1)

T

t=2

p(yt|y1:t−1), where y1:t−1 is shorthand notation for {y1, . . . , yt−1}. We can write p yt y1 t

1

p yt xt p xt y1 t

1 dxt

and p xt y1 t

1

p xt xt

1 p xt 1 y1 t 1 dxt 1

slide-19
SLIDE 19

Likelihood

The likelihood can be written as p(y1, . . . , yT) = p(y1)

T

t=2

p(yt|y1:t−1), where y1:t−1 is shorthand notation for {y1, . . . , yt−1}. We can write p(yt|y1:t−1) = ∫ p(yt|xt)p(xt|y1:t−1)dxt and p(xt|y1:t−1) = ∫ p(xt|xt−1)p(xt−1|y1:t−1)dxt−1

slide-20
SLIDE 20

Filter density

◮ The density for the hidden state xt, using the

information y1:t is called the filter density, p(xt|y1:t). We can derive the filter density from p xt y1 t p yt xt p xt y1 t

1

p yt y1 t

1

  • r equivalently

p xt y1 t p yt xt p xt y1 t

1

p yt xt p xt y1 t

1 dxt

slide-21
SLIDE 21

Filter density

◮ The density for the hidden state xt, using the

information y1:t is called the filter density, p(xt|y1:t).

◮ We can derive the filter density from

p(xt|y1:t) = p(yt|xt)p(xt|y1:t−1) p(yt|y1:t−1) .

  • r equivalently

p(xt|y1:t) = p(yt|xt)p(xt|y1:t−1) ∫ p(yt|xt)p(xt|y1:t−1)dxt .

slide-22
SLIDE 22

Predictive density

◮ The density for the hidden state xt, using the

information y1:t−1 is called the predictive density, p(xt|y1:t−1). We can derive the predictive density from p xt y1 t

1

p xt xt

1 p xt 1 y1 t 1 dxt 1

slide-23
SLIDE 23

Predictive density

◮ The density for the hidden state xt, using the

information y1:t−1 is called the predictive density, p(xt|y1:t−1).

◮ We can derive the predictive density from

p(xt|y1:t−1) = ∫ p(xt|xt−1)p(xt−1|y1:t−1)dxt−1.

slide-24
SLIDE 24

Recursion

  • 1. We have the filter density p(x0) at time 0.
  • 2. At time t, generate the predictive density

p(xt+1|y1:t).

  • 3. At time t + 1, calculate p(yt|y1:t−1) and update

the filter density p(xt+1|y1:t+1). Repeat from step 2.

slide-25
SLIDE 25

Working recursions

Essentially there are only 2 recursions known in closed form

◮ HMM (finite state space) ◮ Kalman filter (linear, Gaussian models)

slide-26
SLIDE 26

Kalman filter

Why does it give closed form recursions? Short answer: The Gaussian density is an exponential, second order polynomial. Model: Yt = CXt + ηt, ηt ∈ N(0, Γ) Xt = AXt−1 + et, et ∈ N(0, Σ) [1] Assume initial distribution p x0 x0 m0 P0

slide-27
SLIDE 27

Kalman filter

Why does it give closed form recursions? Short answer: The Gaussian density is an exponential, second order polynomial. Model: Yt = CXt + ηt, ηt ∈ N(0, Γ) Xt = AXt−1 + et, et ∈ N(0, Σ) [1] Assume initial distribution p(x0|F0) = φ(x0; m0, P0)

slide-28
SLIDE 28

Kalman filter

[2] Calculate the predictive density p(x1|F0) = ∫ p(x1|x0)p(x0|F0)dx0 Here p(x1|x0) = φ(x1; Ax0, Σ), thus giving ∝ ∫ e− 1

2 (x1−Ax0)TΣ−1(x1−Ax0)e− 1 2 (x0−m0)TP−1 0 (x0−m0)dx0.

Some calculations give x1 Am0 AP0AT

slide-29
SLIDE 29

Kalman filter

[2] Calculate the predictive density p(x1|F0) = ∫ p(x1|x0)p(x0|F0)dx0 Here p(x1|x0) = φ(x1; Ax0, Σ), thus giving ∝ ∫ e− 1

2 (x1−Ax0)TΣ−1(x1−Ax0)e− 1 2 (x0−m0)TP−1 0 (x0−m0)dx0.

Some calculations give = φ(x1; Am0, AP0AT + Σ).

slide-30
SLIDE 30

Kalman filter

[3] The filter density is more complicated. We have p(xt|y1:t) = p(yt|xt)p(xt|y1:t−1) ∫ p(yt|xt)p(xt|y1:t−1)dxt . Thus p(x1|y1) = p(y1|x1)p(x1|F0) p(y1|F0) ∝ p(y1|x1)p(x1|F0). Note that the likelihood is a normalization, and independent of x1.

slide-31
SLIDE 31

Kalman filter

[3] The filter density is more complicated. We have p(xt|y1:t) = p(yt|xt)p(xt|y1:t−1) ∫ p(yt|xt)p(xt|y1:t−1)dxt . Thus p(x1|y1) = p(y1|x1)p(x1|F0) p(y1|F0) ∝ p(y1|x1)p(x1|F0). Note that the likelihood is a normalization, and independent of x1.

slide-32
SLIDE 32

Kalman filter

Tedious calculations give p(x1|y1) = φ(x1; m1|0 + K1(y1 − Cm1|0), P1|0 − K1CP1|0), where var(η) = Γ m1|0 = Am0 P1|0 = AP0AT + Σ Ω = CP1|0CT + Γ K1 = P1|0CTΩ−1 Still Gaussian!

slide-33
SLIDE 33

Non-linear models

Approximate non-linear models with a local linear model.

◮ EKF ◮ Sigma point filters / Unscented Kalman Filter

(UKF)

◮ Ensemble filters (EnKF)

Or use Monte Carlo methods (will cover this later)

◮ Particle filters ◮ Hybrid particle filters

slide-34
SLIDE 34

Extended Kalman filter (EKF)

Use Taylor expansions to approximate the non-linear model with a linear model. yt+1 = h(xt, ηt+1) (14) xt+1 = f(xt, et+1) (15)

◮ Prediction

mt+1|t = f(mt|t, 0), Pt+1|t = FtPt|tFT

t + Σ,

Ft = f′

x(xt, 0).

slide-35
SLIDE 35

Extended Kalman filter (EKF)

While the filtering is given by:

◮ Filtering

mt+1|t+1 = mt+1|t + Kt(yt+1 − h(mt+1|t, 0)) Pt+1|t+1 = Pt+1|t − KtHtPt+1|t, where Ωt = HtPt+1|tHT

t + Γ

Kt = Pt+1|tHT

t Ω−1 t

Ht = h′

x(xt, 0)

Even better is Iterated EKFs, by reinterpreting the problem as an optimization of the log-posterior.

slide-36
SLIDE 36

Other filters

Alternatives include:

◮ Iterated Kalman filters improve the quality of

the linearizations. Sigma point filters (UKF) use cleverly sampled points to derive the first and second moment. Ensemble filters (EnKF) use Monte Carlo methods to approximate the first and second central moments. These are in general more accurate than the EKF.

slide-37
SLIDE 37

Other filters

Alternatives include:

◮ Iterated Kalman filters improve the quality of

the linearizations.

◮ Sigma point filters (UKF) use cleverly sampled

points to derive the first and second moment. Ensemble filters (EnKF) use Monte Carlo methods to approximate the first and second central moments. These are in general more accurate than the EKF.

slide-38
SLIDE 38

Other filters

Alternatives include:

◮ Iterated Kalman filters improve the quality of

the linearizations.

◮ Sigma point filters (UKF) use cleverly sampled

points to derive the first and second moment.

◮ Ensemble filters (EnKF) use Monte Carlo

methods to approximate the first and second central moments. These are in general more accurate than the EKF.

slide-39
SLIDE 39

Other filters

Alternatives include:

◮ Iterated Kalman filters improve the quality of

the linearizations.

◮ Sigma point filters (UKF) use cleverly sampled

points to derive the first and second moment.

◮ Ensemble filters (EnKF) use Monte Carlo

methods to approximate the first and second central moments.

◮ These are in general more accurate than the EKF.

slide-40
SLIDE 40

Case: Calibration of options

The most common method for calibrating options to market data today is some non-linear weighted least squares estimator θ = arg min ∑

i

λi ( cMarket

t

(Si, Ki, ri, τi) − cModel

t

(Si, Ki, ri, τi; θ) )2 (16) where cMarket(Si, Ki, ri, τi) are the market price that depends on the underlying asset Si, strike level Ki, interest rate ri and time to maturity τi and λi are weights. There are two main (implicitly related) problems with this approach: The parameter estimates are noisy, Old data is typically discarded, as only the most recent data is used.

slide-41
SLIDE 41

Case: Calibration of options

The most common method for calibrating options to market data today is some non-linear weighted least squares estimator θ = arg min ∑

i

λi ( cMarket

t

(Si, Ki, ri, τi) − cModel

t

(Si, Ki, ri, τi; θ) )2 (16) where cMarket(Si, Ki, ri, τi) are the market price that depends on the underlying asset Si, strike level Ki, interest rate ri and time to maturity τi and λi are weights. There are two main (implicitly related) problems with this approach:

◮ The parameter estimates are noisy, ◮ Old data is typically discarded, as only the most

recent data is used.

slide-42
SLIDE 42

Simulated data from the Bates (1996) model

50 100 150 200 0.5 1 1.5 2 2.5 3 3.5 κ 50 100 150 200 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 ξ 50 100 150 200 0.25 0.3 0.35 0.4 0.45 0.5 σV 50 100 150 200 −1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 ρ 50 100 150 200 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Vt 50 100 150 200 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 λ 50 100 150 200 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 µ 50 100 150 200 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 δ

Figure: WLS estimates (red circles) and the true parameters (solid blue line). The WLS works most of the time...

slide-43
SLIDE 43

Calibration through filtering

(Lindström et al, 2008) rewrites the calibration problems as a filtering problem, augmenting the latent states with the parameter vector cMarket

t

(Sn, Ki, ri, τi) = cModel

t

(Sn, Ki, ri, τi; θn) + ηn, (17) θn = θn−1 + en. (18) This decomposes the change of the option prices into changes in the underlying state variables (i.e. the index level), changes in the parameters (which is captured by the random walk dynamics) and pure noise due to the ask-bid spread.

slide-44
SLIDE 44

Same graph with data from the Bates (1996) model

50 100 150 200 0.5 1 1.5 2 2.5 3 3.5 κ 50 100 150 200 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 ξ 50 100 150 200 0.25 0.3 0.35 0.4 0.45 0.5 σV 50 100 150 200 −1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 ρ 50 100 150 200 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Vt 50 100 150 200 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 λ 50 100 150 200 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 µ 50 100 150 200 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 δ

Figure: WLS estimates (red circles), the true parameters (solid blue line) and filter estimates (black dots).

slide-45
SLIDE 45

Filters

We have successfully used

◮ The EKF (not optimal though...) ◮ The Iterated EKF (Lindström et. al, 2008) ◮ The UKF (Wiktorsson & Lindström, 2014) ◮ The EnKF (Lindström & Guo, 2013) ◮ Tuned UKF (Lindström & Åkerlindh, 2016)

However, (simple) Monte Carlo filters does not work very well for this problem.

slide-46
SLIDE 46

Filters

We have successfully used

◮ The EKF (not optimal though...) ◮ The Iterated EKF (Lindström et. al, 2008) ◮ The UKF (Wiktorsson & Lindström, 2014) ◮ The EnKF (Lindström & Guo, 2013) ◮ Tuned UKF (Lindström & Åkerlindh, 2016)

However, (simple) Monte Carlo filters does not work very well for this problem.

slide-47
SLIDE 47

Quadratic Hedging for free!

◮ The quadratic hedging problem was given by

(ˆ α, ˆ β) = arg min E [ (π(S(T) − αS(T) − βB(T))2 |F(0) ] . (19) The solution to this problem is given by the means, variances and covariances between these assets. Most/All of these are computed in the filter (Lindström & Guo, 2013)!

slide-48
SLIDE 48

Quadratic Hedging for free!

◮ The quadratic hedging problem was given by

(ˆ α, ˆ β) = arg min E [ (π(S(T) − αS(T) − βB(T))2 |F(0) ] . (19)

◮ The solution to this problem is given by the

means, variances and covariances between these assets. Most/All of these are computed in the filter (Lindström & Guo, 2013)!

slide-49
SLIDE 49

Quadratic Hedging for free!

◮ The quadratic hedging problem was given by

(ˆ α, ˆ β) = arg min E [ (π(S(T) − αS(T) − βB(T))2 |F(0) ] . (19)

◮ The solution to this problem is given by the

means, variances and covariances between these assets.

◮ Most/All of these are computed in the filter

(Lindström & Guo, 2013)!

slide-50
SLIDE 50

Proof

Check the derivations! Example (EKF). h mt

1 t 0 - Predicted mean

(20)

t- Covariance of the assets

(21) This works if the model is modified by also including the underlying asset S into the model. cMarket

t

Sn Ki ri

i

cModel

t

Sn Ki ri

i n c n

(22) SMarket

t

Sn

S n

(23)

n n 1

en (24)

slide-51
SLIDE 51

Proof

Check the derivations! Example (EKF). h(mt+1|t, 0)- Predicted mean (20) Ωt- Covariance of the assets (21) This works if the model is modified by also including the underlying asset S into the model. cMarket

t

Sn Ki ri

i

cModel

t

Sn Ki ri

i n c n

(22) SMarket

t

Sn

S n

(23)

n n 1

en (24)

slide-52
SLIDE 52

Proof

Check the derivations! Example (EKF). h(mt+1|t, 0)- Predicted mean (20) Ωt- Covariance of the assets (21) This works if the model is modified by also including the underlying asset S into the model. cMarket

t

(Sn, Ki, ri, τi) = cModel

t

(Sn, Ki, ri, τi; θn) + η(c)

n ,

(22) SMarket

t

= Sn + η(S)

n

(23) θn = θn−1 + en. (24)

slide-53
SLIDE 53

Parameter estimation

Essentially two options (there are more...)

◮ Direct maximization of the log-likelihood

function The EM-algorithm The EM-algorithm consists of two steps Compute the expectation of the intermediate quantity Q

m

E log p X Y Y

m

(25) Maximize Q

m according to m 1

arg max Q

m

(26)

slide-54
SLIDE 54

Parameter estimation

Essentially two options (there are more...)

◮ Direct maximization of the log-likelihood

function

◮ The EM-algorithm

The EM-algorithm consists of two steps Compute the expectation of the intermediate quantity Q

m

E log p X Y Y

m

(25) Maximize Q

m according to m 1

arg max Q

m

(26)

slide-55
SLIDE 55

Parameter estimation

Essentially two options (there are more...)

◮ Direct maximization of the log-likelihood

function

◮ The EM-algorithm

The EM-algorithm consists of two steps

◮ Compute the expectation of the intermediate

quantity Q(θ|θm) = E [log pθ(X, Y)|Y, θm] (25) Maximize Q

m according to m 1

arg max Q

m

(26)

slide-56
SLIDE 56

Parameter estimation

Essentially two options (there are more...)

◮ Direct maximization of the log-likelihood

function

◮ The EM-algorithm

The EM-algorithm consists of two steps

◮ Compute the expectation of the intermediate

quantity Q(θ|θm) = E [log pθ(X, Y)|Y, θm] (25)

◮ Maximize Q(θ, θm) according to

θm+1 = arg max Q(θ, θm) (26)

slide-57
SLIDE 57

EM cont.

Note that the intermediate quantity can be written as Q(θ|θm) = E [log pθ(X, Y)|Y, θm] (27) = E [ N ∑

n=1

log pθ(Xn|Xn−1) + log pθ(Yn|Xn)|Y, θm ] (28) Solution is then almost given by the VAR(1) estimates

slide-58
SLIDE 58

Some references

◮ Särkkä, S. (2013). Bayesian filtering and smoothing

(Vol. 3). Cambridge University Press. https://users.aalto.fi/~ssarkka/

◮ Lindström, E., Ströjby, J., Brodén, M., Wiktorsson, M.

& Holst, J. (2008) Sequential Calibration of Options, Computational Statistics and Data Analysis. 52, 2877-2891, (2008).

◮ Lindström, E. & Guo, J. (2013) Simultaneous

Calibration and Hedging. in Quantitative and Qualitative Analysis in Social Sciences Issue 1, Volume 7

◮ Lindström, E. & Åkerlindh, C., (2016) Optimal

Adaptive Sequential Calibration of Option Models. Forthcoming in Springer Handbook of Recent Advances in Commodity and Financial Modelling - Quantitative Methods in Banking, Finance. Insurance, Energy and Commodity markets

slide-59
SLIDE 59