SLIDE 1 Lecture on Parameter Estimation for Stochastic Differential Equations
Erik Lindström
SLIDE 2
Recap
◮ We are interested in the parameters θ in the
Stochastic Integral Equations X(t) = X(0)+ ∫ t µθ(s, X(s))ds+ ∫ t σθ(s, X(s))dW(s) (1) Why?
Model validation Risk management Advanced hedging (Greeks 9.2.2 and quadratic hedging 9.2.2.1 ( ))
SLIDE 3 Recap
◮ We are interested in the parameters θ in the
Stochastic Integral Equations X(t) = X(0)+ ∫ t µθ(s, X(s))ds+ ∫ t σθ(s, X(s))dW(s) (1) Why?
◮ Model validation
Risk management Advanced hedging (Greeks 9.2.2 and quadratic hedging 9.2.2.1 ( ))
SLIDE 4 Recap
◮ We are interested in the parameters θ in the
Stochastic Integral Equations X(t) = X(0)+ ∫ t µθ(s, X(s))ds+ ∫ t σθ(s, X(s))dW(s) (1) Why?
◮ Model validation ◮ Risk management
Advanced hedging (Greeks 9.2.2 and quadratic hedging 9.2.2.1 ( ))
SLIDE 5 Recap
◮ We are interested in the parameters θ in the
Stochastic Integral Equations X(t) = X(0)+ ∫ t µθ(s, X(s))ds+ ∫ t σθ(s, X(s))dW(s) (1) Why?
◮ Model validation ◮ Risk management ◮ Advanced hedging (Greeks 9.2.2 and quadratic
hedging 9.2.2.1 (P/Q))
SLIDE 6
Some asymptotics
Consider the arithmetic Brownian motion dX(t) = µdt + σdW(t) (2) The drift is estimated by computing the mean, and compensating for the sampling tn
1
tn 1 N
N 1 n
X tn
1
X tn (3) Expanding this expression reveals that the MLE is given by X tN X t0 tN t0 W tN W t0 tN t0 (4) The MLE for the diffusion ( ) parameter is given by
2
1 N 1
N 1 n
X tn
1
X tn
2 d 2 2 N
1 N 1 (5)
SLIDE 7
Some asymptotics
Consider the arithmetic Brownian motion dX(t) = µdt + σdW(t) (2) The drift is estimated by computing the mean, and compensating for the sampling δ = tn+1 − tn ˆ µ = 1 δN
N−1
∑
n=0
X(tn+1) − X(tn). (3) Expanding this expression reveals that the MLE is given by X tN X t0 tN t0 W tN W t0 tN t0 (4) The MLE for the diffusion ( ) parameter is given by
2
1 N 1
N 1 n
X tn
1
X tn
2 d 2 2 N
1 N 1 (5)
SLIDE 8
Some asymptotics
Consider the arithmetic Brownian motion dX(t) = µdt + σdW(t) (2) The drift is estimated by computing the mean, and compensating for the sampling δ = tn+1 − tn ˆ µ = 1 δN
N−1
∑
n=0
X(tn+1) − X(tn). (3) Expanding this expression reveals that the MLE is given by ˆ µ = X(tN) − X(t0) tN − t0 = µ + σW(tN) − W(t0) tN − t0 . (4) The MLE for the diffusion ( ) parameter is given by
2
1 N 1
N 1 n
X tn
1
X tn
2 d 2 2 N
1 N 1 (5)
SLIDE 9
Some asymptotics
Consider the arithmetic Brownian motion dX(t) = µdt + σdW(t) (2) The drift is estimated by computing the mean, and compensating for the sampling δ = tn+1 − tn ˆ µ = 1 δN
N−1
∑
n=0
X(tn+1) − X(tn). (3) Expanding this expression reveals that the MLE is given by ˆ µ = X(tN) − X(t0) tN − t0 = µ + σW(tN) − W(t0) tN − t0 . (4) The MLE for the diffusion (σ) parameter is given by ˆ σ2 = 1 δ(N − 1)
N−1
∑
n=0
(X(tn+1) − X(tn) − ˆ µδ)2
d
→ σ2 χ2(N − 1) N − 1 (5)
SLIDE 10
A simple method
Many data sets are sampled at high frequency, making the bias due to discretization of the SDEs some of the schemes in Chapter 12 acceptable. The simplest discretization, the Explicit Euler method, would for the stochastic differential equation dX t t X t dt t X t dW t (6) correspond the Discretized Maximum Likelihood (DML) estimator given by
DML
arg max
N 1 n 1
log X tn
1
X tn tn X tn tn X tn (7) where x m P is the density for a multivariate Normal distribution with argument x, mean m and covariance P and t X t t X t t X t
T
(8)
SLIDE 11
A simple method
Many data sets are sampled at high frequency, making the bias due to discretization of the SDEs some of the schemes in Chapter 12 acceptable. The simplest discretization, the Explicit Euler method, would for the stochastic differential equation dX(t) = µ(t, X(t))dt + σ(t, X(t))dW(t) (6) correspond the Discretized Maximum Likelihood (DML) estimator given by ˆ θDML = arg max
θ∈Θ N−1
∑
n=1
log φ (X(tn+1), X(tn) + µ(tn, X(tn))∆, Σ(tn, X(tn (7) where φ(x, m, P) is the density for a multivariate Normal distribution with argument x, mean m and covariance P and Σ(t, X(t)) = σ(t, X(t))σ(t, X(t))T. (8)
SLIDE 12
Consistency
◮ The DMLE is generally NOT consistent.
Approximate ML estimators (13.5) are, provided enough computational resources are allocated
Simulation based estimators Fokker-Planck based estimators Series expansions.
GMM-type estimators (13.6) are consistent if the moments are correctly specified (which is a non-trivial problem!)
SLIDE 13 Consistency
◮ The DMLE is generally NOT consistent. ◮ Approximate ML estimators (13.5) are, provided
enough computational resources are allocated
◮ Simulation based estimators ◮ Fokker-Planck based estimators ◮ Series expansions.
GMM-type estimators (13.6) are consistent if the moments are correctly specified (which is a non-trivial problem!)
SLIDE 14 Consistency
◮ The DMLE is generally NOT consistent. ◮ Approximate ML estimators (13.5) are, provided
enough computational resources are allocated
◮ Simulation based estimators ◮ Fokker-Planck based estimators ◮ Series expansions.
◮ GMM-type estimators (13.6) are consistent if the
moments are correctly specified (which is a non-trivial problem!)
SLIDE 15
Simultion based estimators
◮ Discretely observed SDEs are Markov processes
Then it follows that p xt xs E p xt x s t s (9) This is the Pedersen algorithm. Improved by Durham-Gallant (2002), Lindström (2012) and Whitaker et al (2016) etc. Works very well for Multivariate models! and is easily (...) extended to Levy driven SDEs.
SLIDE 16
Simultion based estimators
◮ Discretely observed SDEs are Markov processes ◮ Then it follows that
pθ(xt|xs) = Eθ [pθ(xt|xτ)|F(s)] , t > τ > s (9) This is the Pedersen algorithm. Improved by Durham-Gallant (2002), Lindström (2012) and Whitaker et al (2016) etc. Works very well for Multivariate models! and is easily (...) extended to Levy driven SDEs.
SLIDE 17
Simultion based estimators
◮ Discretely observed SDEs are Markov processes ◮ Then it follows that
pθ(xt|xs) = Eθ [pθ(xt|xτ)|F(s)] , t > τ > s (9) This is the Pedersen algorithm.
◮ Improved by Durham-Gallant (2002), Lindström
(2012) and Whitaker et al (2016) etc. Works very well for Multivariate models! and is easily (...) extended to Levy driven SDEs.
SLIDE 18
Simultion based estimators
◮ Discretely observed SDEs are Markov processes ◮ Then it follows that
pθ(xt|xs) = Eθ [pθ(xt|xτ)|F(s)] , t > τ > s (9) This is the Pedersen algorithm.
◮ Improved by Durham-Gallant (2002), Lindström
(2012) and Whitaker et al (2016) etc.
◮ Works very well for Multivariate models! ◮ and is easily (...) extended to Levy driven SDEs.
SLIDE 19 Some key points
◮ Naive implementation only provides a point wise
estimate - use CRNs or importance sampling
◮ Variance reduction helps (antithetic variates,
control variates)
◮ The near optimal importance sampler is a Bridge
process, as it reduces variance AND improves the asymptotics.
◮ There is a version that is completely bias free,
albeit somewhat restrictive in terms of the class
SLIDE 20
Fokker-Planck
Consider the expectation E [h(X(t))|F(0)] = ∫ h(x(t))p(x(t)|x(0))dx(t) (10) and then ∂ ∂tE [h(X(t))|F(0)] (11) Two possible ways to compute this, direct and using the Ito formula. Equating these yields p t x t x 0 p x t x 0 (12) where p x t x t p x t 1 2
2
x2 t
2
p x t (13)
SLIDE 21 Fokker-Planck
Consider the expectation E [h(X(t))|F(0)] = ∫ h(x(t))p(x(t)|x(0))dx(t) (10) and then ∂ ∂tE [h(X(t))|F(0)] (11) Two possible ways to compute this, direct and using the It¯
Equating these yields p t x t x 0 p x t x 0 (12) where p x t x t p x t 1 2
2
x2 t
2
p x t (13)
SLIDE 22 Fokker-Planck
Consider the expectation E [h(X(t))|F(0)] = ∫ h(x(t))p(x(t)|x(0))dx(t) (10) and then ∂ ∂tE [h(X(t))|F(0)] (11) Two possible ways to compute this, direct and using the It¯
- formula. Equating these yields
∂p ∂t (x(t)|x(0)) = A⋆p(x(t)|x(0)) (12) where A⋆p(x(t)) = − ∂ ∂x(t) (µ(·)p(x(t)))+1 2 ∂2 ∂x2(t) ( σ2(·)p(x(t)) ) . (13)
SLIDE 23 Example of the Fokker-Planck equation
From (Lindström, 2007)
0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.05 0.055 0.06 0.065 0.07 0.075 0.08 100 200 300 400 Time Grid p(s,xs;t,xt)
Figure: Fokker-Planck equation computed for the CKLS process
SLIDE 24 Comments on the PDE approach
Generally better than the Monte Carlo method in low dimensional problems.
10
2
10
3
10
−5
10
−4
10
−3
10
−2
10
−1
MAE time Durham−Gallant Poulsen Order2−Pade(1,1) Order4−Pade(2,2)
Figure: Comparing Monte Carlo, 2nd order and 4th order numerical approximations of the Fokker-Planck equation
SLIDE 25
Discussion
◮ Fokker-Planck is the preferred method is the
state space is non-trivial (see Pedersen et. al, 2011) Successfully used in 1-d and 2-d problems but the ``curse of dimensionality'' will eventually make the method infeasable
SLIDE 26
Discussion
◮ Fokker-Planck is the preferred method is the
state space is non-trivial (see Pedersen et. al, 2011)
◮ Successfully used in 1-d and 2-d problems
but the ``curse of dimensionality'' will eventually make the method infeasable
SLIDE 27
Discussion
◮ Fokker-Planck is the preferred method is the
state space is non-trivial (see Pedersen et. al, 2011)
◮ Successfully used in 1-d and 2-d problems ◮ but the ``curse of dimensionality'' will eventually
make the method infeasable
SLIDE 28
Series expansion
◮ The solution to the Fokker-Planck equation
when dX(t) = µdt + σdW(t) (14) is p(x(t)|x(0)) = N(x(t); x(0) + µt, σ2t). Hermite polynomials are the orthogonal polynomial basis when using a Gaussian as weight function. This is used in the 'series expansion approach', see e.g. (Ait-Sahalia, 2002, 2008)
SLIDE 29
Series expansion
◮ The solution to the Fokker-Planck equation
when dX(t) = µdt + σdW(t) (14) is p(x(t)|x(0)) = N(x(t); x(0) + µt, σ2t).
◮ Hermite polynomials are the orthogonal
polynomial basis when using a Gaussian as weight function. This is used in the 'series expansion approach', see e.g. (Ait-Sahalia, 2002, 2008)
SLIDE 30
Series expansion
◮ The solution to the Fokker-Planck equation
when dX(t) = µdt + σdW(t) (14) is p(x(t)|x(0)) = N(x(t); x(0) + µt, σ2t).
◮ Hermite polynomials are the orthogonal
polynomial basis when using a Gaussian as weight function.
◮ This is used in the 'series expansion approach',
see e.g. (Ait-Sahalia, 2002, 2008)
SLIDE 31
Key ideas
Transform from X → Y → Z where Z is approximately standard Gaussian. We assume that dX(t) = µ(X(t))dt + σ(X(t))dW(t) (15) First step. Y t du u (16) It then follows that dY t
Y Y t dt
dW t (17) Second step: Transform Z tk Y tk Y tk
1
tk tk
1
(18)
SLIDE 32
Key ideas
Transform from X → Y → Z where Z is approximately standard Gaussian. We assume that dX(t) = µ(X(t))dt + σ(X(t))dW(t) (15) First step. Y(t) = ∫ du σ(u) (16) It then follows that dY t
Y Y t dt
dW t (17) Second step: Transform Z tk Y tk Y tk
1
tk tk
1
(18)
SLIDE 33
Key ideas
Transform from X → Y → Z where Z is approximately standard Gaussian. We assume that dX(t) = µ(X(t))dt + σ(X(t))dW(t) (15) First step. Y(t) = ∫ du σ(u) (16) It then follows that dY(t) = µY(Y(t))dt + dW(t) (17) Second step: Transform Z tk Y tk Y tk
1
tk tk
1
(18)
SLIDE 34
Key ideas
Transform from X → Y → Z where Z is approximately standard Gaussian. We assume that dX(t) = µ(X(t))dt + σ(X(t))dW(t) (15) First step. Y(t) = ∫ du σ(u) (16) It then follows that dY(t) = µY(Y(t))dt + dW(t) (17) Second step: Transform Z(tk) = Y(tk) − Y(tk−1) tk − tk−1 . (18)
SLIDE 35
Expansion
A Hermite expansion for the density pZ at order J is given by pJ
Z(z|y(0), tk−tk−1) = φ(z) J
∑
j=0
ηj(tk−tk−1, y0)Hj(z) (19) where Hj(z) = ez2/2 dj dzj e−z2/2. (20) The coefficients are computed by projecting the density onto the basis functions Hj z (recall Hilbert space theory)
j t y0
1 j Hj z pJ
Z z y 0
tk tk
1 dz
(21)
SLIDE 36
Expansion
A Hermite expansion for the density pZ at order J is given by pJ
Z(z|y(0), tk−tk−1) = φ(z) J
∑
j=0
ηj(tk−tk−1, y0)Hj(z) (19) where Hj(z) = ez2/2 dj dzj e−z2/2. (20) The coefficients are computed by projecting the density onto the basis functions Hj(z) (recall Hilbert space theory) ηj(t, y0) = 1 j! ∫ Hj(z)pJ
Z(z|y(0), tk − tk−1)dz
(21)
SLIDE 37
Practical concerns
◮ The series expansion can be extremely accurate. ◮ The standard approach is to compute ηj by Taylor
expansion up to order (tk − tk−1)K = (∆t)K
◮ Some restrictions 'so-called reducible diffusion'
when using the method for multivariate diffusions.
SLIDE 38
Other alternatives - GMM/EF
What about non-likelihood methods?
◮ The model is governed by some p-dimensional
parameter. Suppose some set of features are important, hl x l 1 q p Compute f x t h1 x t E h1 Xt hq xt E hq Xt (22) and form JN 1 N
N n 1
f x n
T
W 1 N
N n 1
f x n (23)
SLIDE 39
Other alternatives - GMM/EF
What about non-likelihood methods?
◮ The model is governed by some p-dimensional
parameter.
◮ Suppose some set of features are important,
hl(x), l = 1, . . . , q ≥ p Compute f x t h1 x t E h1 Xt hq xt E hq Xt (22) and form JN 1 N
N n 1
f x n
T
W 1 N
N n 1
f x n (23)
SLIDE 40
Other alternatives - GMM/EF
What about non-likelihood methods?
◮ The model is governed by some p-dimensional
parameter.
◮ Suppose some set of features are important,
hl(x), l = 1, . . . , q ≥ p
◮ Compute
f(x(t); θ) = h1(x(t)) − Eθ [h1(Xt)] . . . hq(xt) − Eθ [hq(Xt)] (22) and form
◮
JN(θ) = ( 1 N
N
∑
n=1
f(x(n); θ) )T W ( 1 N
N
∑
n=1
f(x(n); θ) ) (23)
SLIDE 41
GMM
The Generalized Methods of Moments (GMM) estimators is then given by ˆ θ = arg min JN(θ) (24) It can be shown that N
N
N 0 (25) where
T N 1 N N 1
(26) and
N and N are estimates of
E f x
T
Var f x (27)
SLIDE 42
GMM
The Generalized Methods of Moments (GMM) estimators is then given by ˆ θ = arg min JN(θ) (24) It can be shown that √ N ( ˆ θN − θ0 ) → N(0, Σ) (25) where Σ = ( ΓT
N Ω−1 N
ΓN )−1 . (26) and ΓN and ΩN are estimates of Γ = E [(∂f(x, θ) ∂θT )] , Ω = Var [f(x, θ)] . (27)
SLIDE 43 Quasi Likelihood
What if we treat the distribution as Gaussian but with correctly specified mean and covariance?
◮ This is called Quasi Likelihood. ◮ Asymptotics can be derived from the GMM
asymptotics
◮ (Höök & Lindström, 2016) showed that this can
be extremely efficient from a computational point of view, essentially O(1) for N
SLIDE 44 References
◮ Pedersen, A. R. (1995). A new approach to
maximum likelihood estimation for stochastic differential equations based on discrete
- bservations. Scandinavian journal of statistics,
55-71.
◮ AïtSahalia, Y. (2002). Maximum Likelihood
Estimation of Discretely Sampled Diffusions: A Closedform Approximation Approach. Econometrica, 70(1), 223-262.
◮ Lindström, E. (2007). Estimating parameters in
diffusion processes using an approximate maximum likelihood approach. Annals of Operations Research, 151(1), 269-288.
SLIDE 45 References
◮ Durham, G. B. & Gallant, A. R. (2002). Numerical
techniques for maximum likelihood estimation
- f continuous-time diffusion processes. Journal
- f Business & Economic Statistics, 20(3), 297-338.
◮ Lindström, E. (2012). A regularized bridge
sampler for sparsely sampled diffusions. Statistics and Computing, 22(2), 615-623.
◮ Whitaker, G. A., Golightly, A., Boys, R. J., &
Sherlock, C. (2016). Improved bridge constructs for stochastic differential equations. Statistics and Computing, 1-16.
◮ Höök, L. J., & Lindström, E. (2016). Efficient
computation of the quasi likelihood function for discretely observed diffusion processes. Computational Statistics & Data Analysis.
SLIDE 46