Lecture on Parameter Estimation for Stochastic Differential - - PowerPoint PPT Presentation

lecture on parameter estimation for stochastic
SMART_READER_LITE
LIVE PREVIEW

Lecture on Parameter Estimation for Stochastic Differential - - PowerPoint PPT Presentation

Lecture on Parameter Estimation for Stochastic Differential Equations Erik Lindstrm Recap (1) hedging 9.2.2.1 ( Advanced hedging (Greeks 9.2.2 and quadratic Risk management Model validation Why? 0 0 Stochastic Integral Equations ))


slide-1
SLIDE 1

Lecture on Parameter Estimation for Stochastic Differential Equations

Erik Lindström

slide-2
SLIDE 2

Recap

◮ We are interested in the parameters θ in the

Stochastic Integral Equations X(t) = X(0)+ ∫ t µθ(s, X(s))ds+ ∫ t σθ(s, X(s))dW(s) (1) Why?

Model validation Risk management Advanced hedging (Greeks 9.2.2 and quadratic hedging 9.2.2.1 ( ))

slide-3
SLIDE 3

Recap

◮ We are interested in the parameters θ in the

Stochastic Integral Equations X(t) = X(0)+ ∫ t µθ(s, X(s))ds+ ∫ t σθ(s, X(s))dW(s) (1) Why?

◮ Model validation

Risk management Advanced hedging (Greeks 9.2.2 and quadratic hedging 9.2.2.1 ( ))

slide-4
SLIDE 4

Recap

◮ We are interested in the parameters θ in the

Stochastic Integral Equations X(t) = X(0)+ ∫ t µθ(s, X(s))ds+ ∫ t σθ(s, X(s))dW(s) (1) Why?

◮ Model validation ◮ Risk management

Advanced hedging (Greeks 9.2.2 and quadratic hedging 9.2.2.1 ( ))

slide-5
SLIDE 5

Recap

◮ We are interested in the parameters θ in the

Stochastic Integral Equations X(t) = X(0)+ ∫ t µθ(s, X(s))ds+ ∫ t σθ(s, X(s))dW(s) (1) Why?

◮ Model validation ◮ Risk management ◮ Advanced hedging (Greeks 9.2.2 and quadratic

hedging 9.2.2.1 (P/Q))

slide-6
SLIDE 6

Some asymptotics

Consider the arithmetic Brownian motion dX(t) = µdt + σdW(t) (2) The drift is estimated by computing the mean, and compensating for the sampling tn

1

tn 1 N

N 1 n

X tn

1

X tn (3) Expanding this expression reveals that the MLE is given by X tN X t0 tN t0 W tN W t0 tN t0 (4) The MLE for the diffusion ( ) parameter is given by

2

1 N 1

N 1 n

X tn

1

X tn

2 d 2 2 N

1 N 1 (5)

slide-7
SLIDE 7

Some asymptotics

Consider the arithmetic Brownian motion dX(t) = µdt + σdW(t) (2) The drift is estimated by computing the mean, and compensating for the sampling δ = tn+1 − tn ˆ µ = 1 δN

N−1

n=0

X(tn+1) − X(tn). (3) Expanding this expression reveals that the MLE is given by X tN X t0 tN t0 W tN W t0 tN t0 (4) The MLE for the diffusion ( ) parameter is given by

2

1 N 1

N 1 n

X tn

1

X tn

2 d 2 2 N

1 N 1 (5)

slide-8
SLIDE 8

Some asymptotics

Consider the arithmetic Brownian motion dX(t) = µdt + σdW(t) (2) The drift is estimated by computing the mean, and compensating for the sampling δ = tn+1 − tn ˆ µ = 1 δN

N−1

n=0

X(tn+1) − X(tn). (3) Expanding this expression reveals that the MLE is given by ˆ µ = X(tN) − X(t0) tN − t0 = µ + σW(tN) − W(t0) tN − t0 . (4) The MLE for the diffusion ( ) parameter is given by

2

1 N 1

N 1 n

X tn

1

X tn

2 d 2 2 N

1 N 1 (5)

slide-9
SLIDE 9

Some asymptotics

Consider the arithmetic Brownian motion dX(t) = µdt + σdW(t) (2) The drift is estimated by computing the mean, and compensating for the sampling δ = tn+1 − tn ˆ µ = 1 δN

N−1

n=0

X(tn+1) − X(tn). (3) Expanding this expression reveals that the MLE is given by ˆ µ = X(tN) − X(t0) tN − t0 = µ + σW(tN) − W(t0) tN − t0 . (4) The MLE for the diffusion (σ) parameter is given by ˆ σ2 = 1 δ(N − 1)

N−1

n=0

(X(tn+1) − X(tn) − ˆ µδ)2

d

→ σ2 χ2(N − 1) N − 1 (5)

slide-10
SLIDE 10

A simple method

Many data sets are sampled at high frequency, making the bias due to discretization of the SDEs some of the schemes in Chapter 12 acceptable. The simplest discretization, the Explicit Euler method, would for the stochastic differential equation dX t t X t dt t X t dW t (6) correspond the Discretized Maximum Likelihood (DML) estimator given by

DML

arg max

N 1 n 1

log X tn

1

X tn tn X tn tn X tn (7) where x m P is the density for a multivariate Normal distribution with argument x, mean m and covariance P and t X t t X t t X t

T

(8)

slide-11
SLIDE 11

A simple method

Many data sets are sampled at high frequency, making the bias due to discretization of the SDEs some of the schemes in Chapter 12 acceptable. The simplest discretization, the Explicit Euler method, would for the stochastic differential equation dX(t) = µ(t, X(t))dt + σ(t, X(t))dW(t) (6) correspond the Discretized Maximum Likelihood (DML) estimator given by ˆ θDML = arg max

θ∈Θ N−1

n=1

log φ (X(tn+1), X(tn) + µ(tn, X(tn))∆, Σ(tn, X(tn (7) where φ(x, m, P) is the density for a multivariate Normal distribution with argument x, mean m and covariance P and Σ(t, X(t)) = σ(t, X(t))σ(t, X(t))T. (8)

slide-12
SLIDE 12

Consistency

◮ The DMLE is generally NOT consistent.

Approximate ML estimators (13.5) are, provided enough computational resources are allocated

Simulation based estimators Fokker-Planck based estimators Series expansions.

GMM-type estimators (13.6) are consistent if the moments are correctly specified (which is a non-trivial problem!)

slide-13
SLIDE 13

Consistency

◮ The DMLE is generally NOT consistent. ◮ Approximate ML estimators (13.5) are, provided

enough computational resources are allocated

◮ Simulation based estimators ◮ Fokker-Planck based estimators ◮ Series expansions.

GMM-type estimators (13.6) are consistent if the moments are correctly specified (which is a non-trivial problem!)

slide-14
SLIDE 14

Consistency

◮ The DMLE is generally NOT consistent. ◮ Approximate ML estimators (13.5) are, provided

enough computational resources are allocated

◮ Simulation based estimators ◮ Fokker-Planck based estimators ◮ Series expansions.

◮ GMM-type estimators (13.6) are consistent if the

moments are correctly specified (which is a non-trivial problem!)

slide-15
SLIDE 15

Simultion based estimators

◮ Discretely observed SDEs are Markov processes

Then it follows that p xt xs E p xt x s t s (9) This is the Pedersen algorithm. Improved by Durham-Gallant (2002), Lindström (2012) and Whitaker et al (2016) etc. Works very well for Multivariate models! and is easily (...) extended to Levy driven SDEs.

slide-16
SLIDE 16

Simultion based estimators

◮ Discretely observed SDEs are Markov processes ◮ Then it follows that

pθ(xt|xs) = Eθ [pθ(xt|xτ)|F(s)] , t > τ > s (9) This is the Pedersen algorithm. Improved by Durham-Gallant (2002), Lindström (2012) and Whitaker et al (2016) etc. Works very well for Multivariate models! and is easily (...) extended to Levy driven SDEs.

slide-17
SLIDE 17

Simultion based estimators

◮ Discretely observed SDEs are Markov processes ◮ Then it follows that

pθ(xt|xs) = Eθ [pθ(xt|xτ)|F(s)] , t > τ > s (9) This is the Pedersen algorithm.

◮ Improved by Durham-Gallant (2002), Lindström

(2012) and Whitaker et al (2016) etc. Works very well for Multivariate models! and is easily (...) extended to Levy driven SDEs.

slide-18
SLIDE 18

Simultion based estimators

◮ Discretely observed SDEs are Markov processes ◮ Then it follows that

pθ(xt|xs) = Eθ [pθ(xt|xτ)|F(s)] , t > τ > s (9) This is the Pedersen algorithm.

◮ Improved by Durham-Gallant (2002), Lindström

(2012) and Whitaker et al (2016) etc.

◮ Works very well for Multivariate models! ◮ and is easily (...) extended to Levy driven SDEs.

slide-19
SLIDE 19

Some key points

◮ Naive implementation only provides a point wise

estimate - use CRNs or importance sampling

◮ Variance reduction helps (antithetic variates,

control variates)

◮ The near optimal importance sampler is a Bridge

process, as it reduces variance AND improves the asymptotics.

◮ There is a version that is completely bias free,

albeit somewhat restrictive in terms of the class

  • f feasible models.
slide-20
SLIDE 20

Fokker-Planck

Consider the expectation E [h(X(t))|F(0)] = ∫ h(x(t))p(x(t)|x(0))dx(t) (10) and then ∂ ∂tE [h(X(t))|F(0)] (11) Two possible ways to compute this, direct and using the Ito formula. Equating these yields p t x t x 0 p x t x 0 (12) where p x t x t p x t 1 2

2

x2 t

2

p x t (13)

slide-21
SLIDE 21

Fokker-Planck

Consider the expectation E [h(X(t))|F(0)] = ∫ h(x(t))p(x(t)|x(0))dx(t) (10) and then ∂ ∂tE [h(X(t))|F(0)] (11) Two possible ways to compute this, direct and using the It¯

  • formula.

Equating these yields p t x t x 0 p x t x 0 (12) where p x t x t p x t 1 2

2

x2 t

2

p x t (13)

slide-22
SLIDE 22

Fokker-Planck

Consider the expectation E [h(X(t))|F(0)] = ∫ h(x(t))p(x(t)|x(0))dx(t) (10) and then ∂ ∂tE [h(X(t))|F(0)] (11) Two possible ways to compute this, direct and using the It¯

  • formula. Equating these yields

∂p ∂t (x(t)|x(0)) = A⋆p(x(t)|x(0)) (12) where A⋆p(x(t)) = − ∂ ∂x(t) (µ(·)p(x(t)))+1 2 ∂2 ∂x2(t) ( σ2(·)p(x(t)) ) . (13)

slide-23
SLIDE 23

Example of the Fokker-Planck equation

From (Lindström, 2007)

0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.05 0.055 0.06 0.065 0.07 0.075 0.08 100 200 300 400 Time Grid p(s,xs;t,xt)

Figure: Fokker-Planck equation computed for the CKLS process

slide-24
SLIDE 24

Comments on the PDE approach

Generally better than the Monte Carlo method in low dimensional problems.

10

2

10

3

10

−5

10

−4

10

−3

10

−2

10

−1

MAE time Durham−Gallant Poulsen Order2−Pade(1,1) Order4−Pade(2,2)

Figure: Comparing Monte Carlo, 2nd order and 4th order numerical approximations of the Fokker-Planck equation

slide-25
SLIDE 25

Discussion

◮ Fokker-Planck is the preferred method is the

state space is non-trivial (see Pedersen et. al, 2011) Successfully used in 1-d and 2-d problems but the ``curse of dimensionality'' will eventually make the method infeasable

slide-26
SLIDE 26

Discussion

◮ Fokker-Planck is the preferred method is the

state space is non-trivial (see Pedersen et. al, 2011)

◮ Successfully used in 1-d and 2-d problems

but the ``curse of dimensionality'' will eventually make the method infeasable

slide-27
SLIDE 27

Discussion

◮ Fokker-Planck is the preferred method is the

state space is non-trivial (see Pedersen et. al, 2011)

◮ Successfully used in 1-d and 2-d problems ◮ but the ``curse of dimensionality'' will eventually

make the method infeasable

slide-28
SLIDE 28

Series expansion

◮ The solution to the Fokker-Planck equation

when dX(t) = µdt + σdW(t) (14) is p(x(t)|x(0)) = N(x(t); x(0) + µt, σ2t). Hermite polynomials are the orthogonal polynomial basis when using a Gaussian as weight function. This is used in the 'series expansion approach', see e.g. (Ait-Sahalia, 2002, 2008)

slide-29
SLIDE 29

Series expansion

◮ The solution to the Fokker-Planck equation

when dX(t) = µdt + σdW(t) (14) is p(x(t)|x(0)) = N(x(t); x(0) + µt, σ2t).

◮ Hermite polynomials are the orthogonal

polynomial basis when using a Gaussian as weight function. This is used in the 'series expansion approach', see e.g. (Ait-Sahalia, 2002, 2008)

slide-30
SLIDE 30

Series expansion

◮ The solution to the Fokker-Planck equation

when dX(t) = µdt + σdW(t) (14) is p(x(t)|x(0)) = N(x(t); x(0) + µt, σ2t).

◮ Hermite polynomials are the orthogonal

polynomial basis when using a Gaussian as weight function.

◮ This is used in the 'series expansion approach',

see e.g. (Ait-Sahalia, 2002, 2008)

slide-31
SLIDE 31

Key ideas

Transform from X → Y → Z where Z is approximately standard Gaussian. We assume that dX(t) = µ(X(t))dt + σ(X(t))dW(t) (15) First step. Y t du u (16) It then follows that dY t

Y Y t dt

dW t (17) Second step: Transform Z tk Y tk Y tk

1

tk tk

1

(18)

slide-32
SLIDE 32

Key ideas

Transform from X → Y → Z where Z is approximately standard Gaussian. We assume that dX(t) = µ(X(t))dt + σ(X(t))dW(t) (15) First step. Y(t) = ∫ du σ(u) (16) It then follows that dY t

Y Y t dt

dW t (17) Second step: Transform Z tk Y tk Y tk

1

tk tk

1

(18)

slide-33
SLIDE 33

Key ideas

Transform from X → Y → Z where Z is approximately standard Gaussian. We assume that dX(t) = µ(X(t))dt + σ(X(t))dW(t) (15) First step. Y(t) = ∫ du σ(u) (16) It then follows that dY(t) = µY(Y(t))dt + dW(t) (17) Second step: Transform Z tk Y tk Y tk

1

tk tk

1

(18)

slide-34
SLIDE 34

Key ideas

Transform from X → Y → Z where Z is approximately standard Gaussian. We assume that dX(t) = µ(X(t))dt + σ(X(t))dW(t) (15) First step. Y(t) = ∫ du σ(u) (16) It then follows that dY(t) = µY(Y(t))dt + dW(t) (17) Second step: Transform Z(tk) = Y(tk) − Y(tk−1) tk − tk−1 . (18)

slide-35
SLIDE 35

Expansion

A Hermite expansion for the density pZ at order J is given by pJ

Z(z|y(0), tk−tk−1) = φ(z) J

j=0

ηj(tk−tk−1, y0)Hj(z) (19) where Hj(z) = ez2/2 dj dzj e−z2/2. (20) The coefficients are computed by projecting the density onto the basis functions Hj z (recall Hilbert space theory)

j t y0

1 j Hj z pJ

Z z y 0

tk tk

1 dz

(21)

slide-36
SLIDE 36

Expansion

A Hermite expansion for the density pZ at order J is given by pJ

Z(z|y(0), tk−tk−1) = φ(z) J

j=0

ηj(tk−tk−1, y0)Hj(z) (19) where Hj(z) = ez2/2 dj dzj e−z2/2. (20) The coefficients are computed by projecting the density onto the basis functions Hj(z) (recall Hilbert space theory) ηj(t, y0) = 1 j! ∫ Hj(z)pJ

Z(z|y(0), tk − tk−1)dz

(21)

slide-37
SLIDE 37

Practical concerns

◮ The series expansion can be extremely accurate. ◮ The standard approach is to compute ηj by Taylor

expansion up to order (tk − tk−1)K = (∆t)K

◮ Some restrictions 'so-called reducible diffusion'

when using the method for multivariate diffusions.

slide-38
SLIDE 38

Other alternatives - GMM/EF

What about non-likelihood methods?

◮ The model is governed by some p-dimensional

parameter. Suppose some set of features are important, hl x l 1 q p Compute f x t h1 x t E h1 Xt hq xt E hq Xt (22) and form JN 1 N

N n 1

f x n

T

W 1 N

N n 1

f x n (23)

slide-39
SLIDE 39

Other alternatives - GMM/EF

What about non-likelihood methods?

◮ The model is governed by some p-dimensional

parameter.

◮ Suppose some set of features are important,

hl(x), l = 1, . . . , q ≥ p Compute f x t h1 x t E h1 Xt hq xt E hq Xt (22) and form JN 1 N

N n 1

f x n

T

W 1 N

N n 1

f x n (23)

slide-40
SLIDE 40

Other alternatives - GMM/EF

What about non-likelihood methods?

◮ The model is governed by some p-dimensional

parameter.

◮ Suppose some set of features are important,

hl(x), l = 1, . . . , q ≥ p

◮ Compute

f(x(t); θ) =   h1(x(t)) − Eθ [h1(Xt)] . . . hq(xt) − Eθ [hq(Xt)]   (22) and form

JN(θ) = ( 1 N

N

n=1

f(x(n); θ) )T W ( 1 N

N

n=1

f(x(n); θ) ) (23)

slide-41
SLIDE 41

GMM

The Generalized Methods of Moments (GMM) estimators is then given by ˆ θ = arg min JN(θ) (24) It can be shown that N

N

N 0 (25) where

T N 1 N N 1

(26) and

N and N are estimates of

E f x

T

Var f x (27)

slide-42
SLIDE 42

GMM

The Generalized Methods of Moments (GMM) estimators is then given by ˆ θ = arg min JN(θ) (24) It can be shown that √ N ( ˆ θN − θ0 ) → N(0, Σ) (25) where Σ = ( ΓT

N Ω−1 N

ΓN )−1 . (26) and ΓN and ΩN are estimates of Γ = E [(∂f(x, θ) ∂θT )] , Ω = Var [f(x, θ)] . (27)

slide-43
SLIDE 43

Quasi Likelihood

What if we treat the distribution as Gaussian but with correctly specified mean and covariance?

◮ This is called Quasi Likelihood. ◮ Asymptotics can be derived from the GMM

asymptotics

◮ (Höök & Lindström, 2016) showed that this can

be extremely efficient from a computational point of view, essentially O(1) for N

  • bservations.
slide-44
SLIDE 44

References

◮ Pedersen, A. R. (1995). A new approach to

maximum likelihood estimation for stochastic differential equations based on discrete

  • bservations. Scandinavian journal of statistics,

55-71.

◮ Aït฀Sahalia, Y. (2002). Maximum Likelihood

Estimation of Discretely Sampled Diffusions: A Closed฀form Approximation Approach. Econometrica, 70(1), 223-262.

◮ Lindström, E. (2007). Estimating parameters in

diffusion processes using an approximate maximum likelihood approach. Annals of Operations Research, 151(1), 269-288.

slide-45
SLIDE 45

References

◮ Durham, G. B. & Gallant, A. R. (2002). Numerical

techniques for maximum likelihood estimation

  • f continuous-time diffusion processes. Journal
  • f Business & Economic Statistics, 20(3), 297-338.

◮ Lindström, E. (2012). A regularized bridge

sampler for sparsely sampled diffusions. Statistics and Computing, 22(2), 615-623.

◮ Whitaker, G. A., Golightly, A., Boys, R. J., &

Sherlock, C. (2016). Improved bridge constructs for stochastic differential equations. Statistics and Computing, 1-16.

◮ Höök, L. J., & Lindström, E. (2016). Efficient

computation of the quasi likelihood function for discretely observed diffusion processes. Computational Statistics & Data Analysis.

slide-46
SLIDE 46