Measuring price and volatility from high-frequency stock prices Siem - - PowerPoint PPT Presentation
Measuring price and volatility from high-frequency stock prices Siem - - PowerPoint PPT Presentation
Measuring price and volatility from high-frequency stock prices Siem Jan Koopman , Free University Amsterdam, Netherlands, with Borus Jungbacker (Amsterdam) and Eugenie Hol (ING) Presentation at Mathematisches Forschungsinstitut Oberwolfach ,
Outline of presentation Motivation is to forecast stock index volatility.
- Empirical results of a forecast study
- Forecasts based on high frequency data are superior
- Methods for measuring volatility using high frequency data
- Model for measuring price and volatility for tick-by-tick data
Data Standard & Poor’s 100 (S&P 100) stock index transaction prices during the period from 6 January 1997 to 14 November 2003. Daily return Rn is first difference between closing prices, Rn = 100(ln Pn − ln Pn−1), n = 1, . . . , N, where Pn is the closing asset price at trading day n. Intraday return (5-minute) is taken between successive log prices, Rn,d = 100(ln Pn,d − ln Pn,d−1), n = 1, . . . , N, d = 1, . . . , D, where Pn,d is the asset price at trading day n and 5-minute period d. Overnight return is Rn,0 = 100(ln Pn,0 − ln Pn−1,D).
Data Realised volatility is computed as ˜ σ2
n = R2 n,0 + D
- d=1
R2
n,d,
n = 1, . . . , N, but overnight return is special so it is better to take account of this: ˜ σ2
n = ˆ
σ2
- c + ˆ
σ2
co
ˆ σ2
- c
D
- d=1
R2
n,d,
where ˆ σ2
- c
=
10,000 N
N
n=1(log Pn,D − log Pn,0)2,
ˆ σ2
co
=
10,000 N
N
n=1(log Pn,0 − log Pn−1,D)2.
Implied volatility s2
n is obtained from Chicago Board Options
Exchange Market Volatility Index (VIX), a highly liquid options market. The VIX index is calculated from midpoint bid-ask option prices using a binomial method that takes into account the level and timing of dividend payments. Black-Scholes model assumption of constant volatility intro- duces bias into the implied volatility measure but magnitude
- f the bias is small for near-the-money and close-to-maturity
- ptions.
Data is based on S&P 100 stock index for the period between 6 January 1997 and 15 November 2003 (1725 observations) Summary Statistics of return and volatility time series daily return realised vol. implied vol. Rn R2
n
˜ σ2
n
log ˜ σ2
n
s2
n
log s2
n
Mean 0.020 1.889 0.920 −0.612 26.46 3.253 Stand.Dev. 1.374 4.058 1.359 0.981 5.998 0.208 Skewness −0.122 7.918 5.109 0.245 1.266 0.744 Exc.Kurt. 5.621 110.8 39.80 0.524 1.482 0.135 Minimum −8.994 0.004 −5.484 16.84 2.834 Maximum 5.702 80.89 15.38 2.733 50.48 3.922
Data: Rn, R2
n, ˜
σ2
n, log ˜
σ2
n, s2 n, log s2 n (row-wise)
1997 1999 2001 2003 10 −10 −5 5 0.2 0.4 20 40 1 1997 1999 2001 2003 50 100 25 50 75 0.25 0.50 20 40 1 1997 1999 2001 2003 10 20 5 10 15 1 20 40 0.5 1.0 1997 1999 2001 2003 −5 5 −5 0.25 0.50 20 40 0.5 1.0 1997 1999 2001 2003 20 40 60 20 40 0.05 0.10 20 40 0.5 1.0 1997 1999 2001 2003 20 40 60 20 40 0.05 0.10 20 40 0.5 1.0
V olatility modelling Consider spot price P(t) with return defined as R(t) = log P(t) − log P(0), t > 0. which follows the continuous time process dR(t) = µ(t)dt + σ(t)dW(t), t > 0, where µ(t) is drift process, σ(t) is spot volatility and W(t) is standard Brownian motion. Mean and variance of spot volatility are given by E
- σ2(t)
- = ξ,
var
- σ2(t)
- = ω2.
The actual volatility for the n-th day interval of length h is then defined as σ2
n = σ∗(hn) − σ∗ ((n − 1)h) , where σ∗(t) =
t
0 σ2(s)ds.
OU type models for SV It is established that rv ˜ σ2
n is accurate estimator of av σ2 n.
Barndorff-Nielsen and Shephard (2002) have studied the statis- tical properties of this estimator and its error σ2
n − ˜
σ2
- n. Also they
conclude that a model for spot volatility σ2(t) can significantly improve estimation of actual volatility. A candidate model for σ2(t) is based on the superposition of OU processes τj(t), that is σ2(t) =
J
- j=1
τj(t), dτj(t) = −λjτj(t)dt + dzj(λjt), (1) where zj(t) is independent L´ evy process (with non-negative in- crements, known as a subordinator) and λj is unknown.
Bandorff-Nielsen and Shephard (2001, 2002): The SDE defining τj(t) implies its acf to be corr
- τj(t), τj(t + s)
- = e−λj|s|.
Assume E(τj(t)) = wjξ and var(τj(t)) = wjω2, acf for σ2(t) is corr
- σ2(t), σ2(t + s)
- =
J
- j=1
wje−λj|s|. It follows that acf of j-th component of av, τ j
n ≡
nh
(n−1)h τj(t)dt,
is corr(τj
n, τj n+m) =
(1 − e−λjh)2 2(e−λjh − 1 + λjh) e−λjh(m−1), m = 1, 2 . . . , where h is the length of the day interval.
These convenient ”BNS” results imply that τ j
n have ARMA(1,1)
representations: τj
n+1 = wjξ + φj(τj n − wjξ) + θjηj n−1 + ηj n,
ηj
n ∼ WN(0, σ2 ηj),
where WN(0, σ2) refers to a white noise process with zero mean and variance σ2. It follows that the autoregressive parameter φj equals e−λjh while Barndorff-Nielsen and Shephard (2003) show that θj = 1 −
- 1 − 4ϑ2
j
2ϑj , with ϑj = corr(τj
n, τj n+1) − φj
(1 + φ2
j ) − 2φjcorr(τj n, τj n+1)
. Finally, the key to modelling realised volatility in this way is set
- f results in Barndorff-Nielsen and Shephard (2001), see next
slide.
Define error un = σ2
n − ˜
σ2
n, BNS establish it to be Gaussian with
mean zero and variance σ2
u = 2D
(ξh/D)2 +
J
- j=1
2wjω2 λ2
j
(e−λjh/D − 1 + λjh/D)
,
where D is the number of intra-daily intervals used to calculate ˜ σ2
n.
So rv model becomes (assuming av model is valid) ˜ σ2
n = J
- j=1
τj
n + un,
τj
n ∼ ARMA(1, 1),
un ∼ NIID(0, σ2
u),
which is an unobserved ARMA components model. Model can be formulated in state space to be estimated and to compute forecasts. Note that model is not Gaussian due to ηj
n.
Long memory ARFIMA model for rv Empirical work on realised volatility points out that ˜ σ2
n exhibits
long memory features. This is more so when logs are taken. Suggestion is to model rv by ARFIMA model, see Andersen, Bollerslev, Diebold and Labys (2001, 2003). ARFIMA(1, d, 1) model with mean µ is given by (1 − φL)(1 − L)d(˜ σ2
n − µ) = (1 + θL)εn,
where d, φ and θ are unknown and εn is assumed Gaussian WN. Estimation is carried out by maximum likelihood and related procedures also provide forecasts, see Sowell (1992). For com- putational details, see Doornik and Ooms (2003).
Volatility models for daily returns The SV model is based on the ct process for returns. By dis- cretisation the return process at daily intervals and by assuming an AR for log-volatility, we obtain Rn = µ + σnεn, εn ∼ NID(0, 1), σ2
n
= σ∗2 exp(hn), hn+1 = φhn + σηηn, ηn ∼ NID(0, 1), h1 ∼ NID(0, σ2
η/{1 − φ2}),
for n = 1, . . . , N and where µ is taken to be fixed and zero. The likelihood function can be constructed using simulation methods such as the ones developed by Shephard and Pitt (1997) and Durbin and Koopman (1997). For application to SV models, see also Sandmann and Koopman (1998). Note that similar methods can also be used to estimate SV models with leverage, see Koopman and Shephard (2003).
GARCH model can also be considered for daily returns and in its most simplest form is given by Rn = σnεn εn ∼ NID(0, 1), n = 1, . . . , N, σ2
n
= ω + αR2
n−1 + βσ2 n−1,
with parameter restrictions ω > 0, α ≥ 0, β ≥ 0 and α + β < 1. The techniques of estimation and forecasting for this model are well established.
SV and GARCH models can be extended by including volatility measures in the volatility equation: SV with explanatory variable: hn = γ log s2
n−1 + η∗ n,
η∗
n = φη∗ n−1 + σηηn,
GARCH with explanatory variable: σ2
n = ω + αR2 n−1 + βσ2 n−1 + γs2 n−1,
Here s2
n can be realised volatility (rv) or implied volatility (iv).
Forecasting results: evaluated against ˜ σ2
m, for the Standard &
Poor’s 100 with evaluation period from 17 October 2001 to 14 November 2003. Model Forecast loss functions MSE MAE HMSE HMAE R2 UC 1 1.248 0.613 2.495 1.240 0.522 UC 2 0.996 0.505 1.546 0.792 0.593 ARFIMA 0.991 0.508 1.610 0.813 0.598 ARFIMA (log) 1.149 0.472 1.030 0.555 0.597 SV 2.433 1.240 5.080 2.948 0.386 SVX rv 2.256 1.037 3.368 2.063 0.437 SVX iv 3.132 1.082 3.422 2.048 0.343 GARCH 2.837 1.348 5.339 3.174 0.405 GARCHX rv 3.134 1.228 4.603 2.738 0.421 GARCHX iv 2.872 1.297 5.079 2.720 0.419
Volatility forecasts: GARCH, SV, RV-UC, RV-ARFIMA
5 10 15 2002 2003 5 10 15 2002 2003 5 10 15 5 10 15
Volatility forecasts: GARCH, SV, RV-UC, RV-ARFIMA
2.5 5.0 7.5
Sep 2002 Nov 2002
2.5 5.0 7.5
Sep 2002 Nov 2002
2 4 6 2 4 6
More elaborate discussions and results are presented in paper, Koopman, Jungbacker and Hol (2004). Overall, it shows that realised volatility models are superior in volatility forecasting. Aim next is to improve estimation of actual volatility by not ”pre-filtering” the returns data: Measuring price and volatility from high-frequency stock prices
Measuring realised volatility High-frequency price data (tick by tick) is subject to irregularities in recording and market micro-structure. Current practice of computing realised volatility is to construct five minute returns and compute rv from these. However, such data can be messy and a regular series of daily series of 5 minute returns is not always available. Also bid-ask spreads in data can be huge (ways to capture these require a lot of extra data and modelling). Some approaches of obtaining a regular set of 5-minute return to linearly interpolate between ask-bid bounces as in Andersen, Bollerslev, Diebold and Ebens (2001). More flexible spline in- terpolations are used by Hansen and Lunde (2003) and Fourier methods are used by Malliavin and Mancino (2002) and Barucci and Reno (2002). First we adopt a model-based version of these interpolations.
Spline model in state space representation Consider the smoothing problem where the log price p(t) is a continuous function of t > 0. To smooth p(t) by function µ(t), we observe tick prices (bid and asks) p(ti) for i = 1, . . . , n where 0 < t1 < . . . < tn < T (ti is a tick). We can choose µ(t) to be a twice-differentiable function on (0, T) which minimises
n
- i=1
[p(ti) − µ(ti)]2 + λ
T
- ∂2µ(t)
∂t2
2
dt, This problem can be represented as a state space model p(t) = µ(t) + ε(t), t = t1 . . . , tn, ε(ti) ∼ N[0, σ2(ti)], with state equation d
- µ(t)
ν(t)
- =
- 1
µ(t) ν(t)
- dt + σζ
- dW(t)
- .
In discrete time, we obtain the model pi = µi + εi with
- µi+1
νi+1
- =
- 1
δi 1 µi νi
- +
- ξi
ζi
- ,
i = 1, . . . , n, where the disturbances are Gaussian and correlated with each
- ther.
The distance δi is for the distance in seconds between tick prices (can be zero !). The variance of ζi, as a ratio of the variance of εi, equals q = 1/λ. This model can be used to smooth out the micro-structure in tick prices and to obtain a regular set of 5 minutes quotes from which rv can be computed, for example. In standard smoothing q (or λ) is fixed. Here, we estimate q for each day by standard maximum likelihood methods using the Kalman filter (see www.ssfpack.com). It turns out that the q estimates are very close to rv, up to a constant.
Realised volatility: log ˜ σ2
n and estimated q’s (logged)
450 900 1350 −4 −2 2 −5.0 −2.5 0.0 2.5 0.1 0.2 0.3 0.4 20 40 0.25 0.50 0.75 1.00 450 900 1350 −10.0 −7.5 −5.0 −2.5 0.0 −10 −5 0.1 0.2 20 40 0.25 0.50 0.75 1.00
Realised volatility: log ˜ σ2
n versus q’s (logged)
−4.0 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 −10 −8 −6 −4 −2 realised volatility (in logs) estimated smoothing par q (in logs)
Such results are encouraging. As we take a closer look at some daily data patterns, it becomes clear that tick prices at the opening and closure of the trading day have higher variation than during the main trading hours. Therefore we extend the spline model with different q’s within the day: we allow smoothing parameter to be a spline itself ! Let’s look at an example of one day of tick prices.
Tick prices in one day with smoothed spline and errors.
250 500 750 1000 1250 −1.5 −1.0 −0.5 0.0
tick price price process (spline)
250 500 750 1000 1250 −0.005 0.000 0.005 0.010
Measurement error
250 500 750 1000 1250 0.00025 0.00050 0.00075 0.00100
smoothing parameter spline
250 500 750 1000 1250 −0.10 −0.05 0.00 0.05
Price innovation
The spline model may not be satisfactory since theoretical model would have p(t) = µ(t) + ε(t), t = t1 . . . , tn, ε(ti) ∼ N[0, σ2(ti)], with state equation dµ(t) = σ(t)dW(t). In discrete time, we obtain the model pi = µi + εi, µi+1 = µi + qiσηηi, where ηi is WN with var(ηi) = δiσ2
η.
At the opening and closure of the trading day, qi is higher: volatility seasonality.
Tick prices in one day with theoretical price and errors.
250 500 750 1000 1250 −1.5 −1.0 −0.5 0.0
tick price price process (spline)
250 500 750 1000 1250 −2e−11 −1e−11 1e−11 2e−11
Measurement error
250 500 750 1000 1250 0.0005 0.0010 0.0015
smoothing parameter spline
250 500 750 1000 1250 −0.10 −0.05 0.00 0.05
Price innovation
This model produces small measurement noise but it is like- liy that there is serial correlation in the error due to market micro-structure. This can be captured by including an AR(1) component in the price equation. By the standardisation of the one-step ahead prediction errors
- f the decomposition model, we effectively deseasonalise the
intra-day returns.
Tick prices with predicted spline and returns.
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 −1
tick price predicted price
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 5
Standardised prediction errors (RETURNS)
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 0.0002 0.0004 0.0006
tv q
We can now concentrate on the stochastic volatility in dµ(t)∗ = σ(t)dW(t). where µ(t)∗ refers to the process of µ(t) corrected for seasonal heteroskedasticity. Remaining dynamic volatility can be captured by the stochastic volatility model.
We can model the constructed intraday returns by the discretised SV model, Ri = µ + σiεi, εi ∼ NID(0, 1), σ2
i
= σ∗2 exp(hi), hi+1 = φhi + σηηi, ηn ∼ NID(0, 1), h1 ∼ NID(0, σ2
η/{1 − φ2}),
However, we deal with tick returns rather than day returns. The SV model can be generalised by formulating hi as a sum of stationary ARMA components.
Absolute returns and estimated actual volatility.
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 2 4
absolute tick returns and estimated actual volatility
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 −2 2
log volatility
The SV model: Ri = µ + σiεi, εi ∼ NID(0, 1), σ2
i
= σ∗2 exp(hi), hi+1 = φhi + σηηi, ηn ∼ NID(0, 1), h1 ∼ NID(0, σ2
η/{1 − φ2}),
The estimates were given by ˆ φ = 0.84, ˆ σ2
η = 0.141,
ˆ σ∗2 = 0.881. Computations done in www.ssfpack.com.
Just one more example: tick prices and returns.
150 300 450 600 750 900 1050 1200 1350 −0.50 −0.25 0.00 0.25
tick price predicted price
150 300 450 600 750 900 1050 1200 1350 −2.5 0.0 2.5
Standardised prediction errors (RETURNS)
150 300 450 600 750 900 1050 1200 1350 0.0005 0.0010
tv q
SV estimates : ˆ φ = 0.90, ˆ σ2
η = 0.062,
ˆ σ∗2 = 0.969.
150 300 450 600 750 900 1050 1200 1350 1 2 3 150 300 450 600 750 900 1050 1200 1350 −2 −1 1
These empirical results have motivated us to formalise this by setting up a model for tick prices that requires no pre-filtering
- f the tick data.
In discrete form, the model will possibly be pi = µi + σiεi, µi+1 = µi + σiqiζi, log σi = hi = sum of ARMA components, Note that σi is common to both εi and ζi which effectively means that σi is standard deviation of innovations (is returns).
Model: pi = µi + σiεi, µi+1 = µi + σiqiζi, log σi = hi = sum of ARMA components,
- σi is the (deseasonalised) standard deviation of returns as
shown in previous examples;
- qi is the seasonal volatility within the day, can be restricted
to be the same for all days;
- µi follows here a random walk, if this does not produce suf-
ficient smoothness to eliminate microstructure, we turn to higher-order smoothness models;
- the daily actual volatility is
i σ2 i q2 i .
It is not straighforward to estimate this model since both mean and variance are stochastic. However, model fits in framework
- f Koopman and Bos (2004, JBES) and can be estimated by