Machine Learning for Multi-step Ahead Forecasting of Volatility - - PowerPoint PPT Presentation

machine learning for multi step ahead forecasting of
SMART_READER_LITE
LIVE PREVIEW

Machine Learning for Multi-step Ahead Forecasting of Volatility - - PowerPoint PPT Presentation

Machine Learning for Multi-step Ahead Forecasting of Volatility Proxies Jacopo De Stefani, Ir. - jdestefa@ulb.ac.be Prof. Gianluca Bontempi - gbonte@ulb.ac.be Olivier Caelen, PhD - olivier.caelen@worldline.com Dalila Hattab, PhD -


slide-1
SLIDE 1

Machine Learning for Multi-step Ahead Forecasting of Volatility Proxies

Jacopo De Stefani, Ir. - jdestefa@ulb.ac.be

  • Prof. Gianluca Bontempi - gbonte@ulb.ac.be

Olivier Caelen, PhD - olivier.caelen@worldline.com Dalila Hattab, PhD - dalila.hattab@equensworldline.com

MIDAS 2017 - ECML-PKDD Hotel Aleksandar Palace, Skopje, FYROM Monday 18th September, 2017

slide-2
SLIDE 2

Problem overview

25 30 35 40 45

First series CAC40 [2012−01−02/2013−11−04]

Last 47.255 Volume (100,000s): 345,721 10 20 30 40 50 Moving Average Convergence Divergence (12,26,9): MACD: 1.335 Signal: 1.258 −3 −2 −1 1 2 3 Jan 02 2012 Mar 01 2012 May 02 2012 Jul 02 2012 Sep 03 2012 Nov 01 2012 Jan 02 2013 Mar 01 2013 May 02 2013 Jul 01 2013 Sep 02 2013 Nov 01 2013

2/32

slide-3
SLIDE 3

What is volatility?

Definition

Volatility is a statistical measure of the dispersion of returns for a given security or market index.

20 40 60 80 100 −1 −0.5 0.5 1 High volatility Low volatility t [days] rt

3/32

slide-4
SLIDE 4

A closer look on data - Volatility proxies

0.2 0.4 0.6 0.8 1 1.2 1.4 9.8 10 10.2 P o P h P l P c P o

1

P h

1

P l

1

P c

1

Pre-opening 1 − f f 1 − f Calendar Day 0 Calendar Day 1 t [days] Pt

Volatility proxy P o

t

P h

t

P l

t

P c

t

σP

t

4/32

slide-5
SLIDE 5

Models for volatility

Volatility models

Past volatility

Average- based

HA MA ES EWMA STES

Simple Regression

SR-AR SR-TAR SR-ARMA

Random Walk

ARCH

Symmetric

ARCH (q) GARCH (p,q)

Asymmetric

EGARCH (p,q) GJR- GARCH (p,q) QGARCH (p,q) ST- GARCH (p,q) RS- GARCH (p,q)

Extended

Component- GARCH (p,q) RGARCH (p,q)

Machine Learning

Univariate

NN k-NN SVR

Multivariate

5/32

slide-6
SLIDE 6

Models for volatility

Volatility models

Past volatility Average- based

HA MA ES EWMA STES

Simple Regression

SR-AR SR-TAR SR-ARMA

Random Walk ARCH Symmetric

ARCH (q) GARCH (p,q)

Asymmetric

EGARCH (p,q) GJR- GARCH (p,q) QGARCH (p,q) ST- GARCH (p,q) RS- GARCH (p,q)

Extended

Component- GARCH (p,q) RGARCH (p,q)

Machine Learning Univariate

NN k-NN SVR

Multivariate

5/32

slide-7
SLIDE 7

Models for volatility

Volatility models

Past volatility Average- based

HA MA ES EWMA STES

Simple Regression

SR-AR SR-TAR SR-ARMA

Random Walk ARCH Symmetric

ARCH (q) GARCH (p,q)

Asymmetric

EGARCH (p,q) GJR- GARCH (p,q) QGARCH (p,q) ST- GARCH (p,q) RS- GARCH (p,q)

Extended

Component- GARCH (p,q) RGARCH (p,q)

Machine Learning Univariate

NN k-NN SVR

Multivariate

Established Research 5/32

slide-8
SLIDE 8

Models for volatility

Volatility models

Past volatility Average- based

HA MA ES EWMA STES

Simple Regression

SR-AR SR-TAR SR-ARMA

Random Walk ARCH Symmetric

ARCH (q) GARCH (p,q)

Asymmetric

EGARCH (p,q) GJR- GARCH (p,q) QGARCH (p,q) ST- GARCH (p,q) RS- GARCH (p,q)

Extended

Component- GARCH (p,q) RGARCH (p,q)

Machine Learning Univariate

NN k-NN SVR

Multivariate

Established Research Future Research 5/32

slide-9
SLIDE 9

Multistep ahead TS forecasting - Taieb [2014]

Definition

Given a univariate time series {y1, · · · , yT } comprising T

  • bservations, forecast the next H observations {yT+1, · · · , yT+H}

where H is the forecast horizon. Hypotheses:

◮ Autoregressive model yt = m(yt−1, · · · , yt−d) + εt with lag

  • rder (embedding) d

◮ ε is a stochastic iid model with µε = 0 and σ2 ε = σ2

6/32

slide-10
SLIDE 10

Multistep ahead forecasting for volatility

State-of-the-art NAR m(σP) · · · σP

t−1]

[σP

t−d

· · · [ˆ σP

t

ˆ σP

t+H]

1 Input 1 Output

7/32

slide-11
SLIDE 11

Multistep ahead forecasting for volatility

State-of-the-art NAR m(σP) · · · σP

t−1]

[σP

t−d

· · · [ˆ σP

t

ˆ σP

t+H]

1 Input 1 Output Proposed model NARX m(σP, σX) · · · · · · σX

t−1]

σP

t−1]

[σX

t−d

[σP

t−d

· · · [ˆ σP

t

ˆ σP

t+H]

2 inputs 1 output

7/32

slide-12
SLIDE 12

Multistep ahead forecasting for volatility

State-of-the-art NAR m(σP) · · · σP

t−1]

[σP

t−d

· · · [ˆ σP

t

ˆ σP

t+H]

1 Input 1 Output Proposed model NARX m(σP, σX) · · · · · · σX

t−1]

σP

t−1]

[σX

t−d

[σP

t−d

· · · [ˆ σP

t

ˆ σP

t+H]

2 inputs 1 output Future work m(σP, · · · , σXM) · · · · · · · · · σXM

t−1 ]

· · · ] σP

t−1]

[σXM

t−d

[· · · [σP

t−d

· · · · · · · · · [ˆ σP

t

[· · · [ˆ σXM

t

ˆ σP

t+H]

· · · ] ˆ σXM

t+H]

M + 1 inputs M + 1 outputs

7/32

slide-13
SLIDE 13

Multistep ahead forecasting for volatility

Direct method

◮ A single model fh for each horizon h. ◮ Forecast at h step is made using hth model. ◮ Dataset examples (d = 3, h = 3):

Direct NAR

x y σP

3

σP

2

σP

1

σP

5

σP

4

σP

3

σP

2

σP

6

... ... ... ... σP

T −5

σP

T −6

σP

T −7

σP

T −2

Direct NARX

x y σP

3

σP

2

σP

1

σX

3

σX

2

σX

1

σP

5

σP

4

σP

3

σP

2

σX

4

σX

3

σX

2

σP

6

... ... ... ... ... ... ... σP

T −5

σP

T −6

σP

T −7

σX

T −5

σX

T −6

σX

T −7

σP

T −2

8/32

slide-14
SLIDE 14

Experimental setup

m(σP, σX) · · · · · · σX

t−1]

σP

t−1]

[σX

t−d

[σP

t−d

· · · [ˆ σP

t

ˆ σP

t+H]

2 TS Input 1 TS Output

Data: Volatility proxies σX, σP from CAC40:

◮ Price based

◮ σi family - Garman and Klass

[1980]

◮ Return based

◮ GARCH (1,1) model - Hansen

and Lunde [2005]

◮ Sample standard deviation

Models:

◮ Feedforward Neural Networks

(NAR,NARX)

◮ k-Nearest Neighbours (NAR,NARX) ◮ Support Vector Regression (NAR,NARX) ◮ Naive (w/o σX) ◮ GARCH(1,1) (w/o σX) ◮ Average (w/o σX)

9/32

slide-15
SLIDE 15

Correlation meta-analysis (cf. Field [2001])

? ? ? ? ? ? ? ? ? ? ? ? ?

−1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Volume σ1 σ6 σ4 σ5 σ2 σ3 rt σ0 σSD

250

σSD

100

σSD

50

σG Volume σ1 σ6 σ4 σ5 σ2 σ3 rt σ0 σSD

250

σSD

100

σSD

50

σG

◮ 40 time series

(CAC40)

◮ Time range:

05-01-2009 to 22-10-2014

◮ 1489 OHLC

samples per TS

◮ Hierarchical

clustering using Ward Jr [1963]

◮ All

correlations are statistically significant

10/32

slide-16
SLIDE 16

NARX forecaster - Results ANN

11/32

slide-17
SLIDE 17

NARX forecaster - Results ANN

12/32

slide-18
SLIDE 18

NARX forecaster - Results KNN

13/32

slide-19
SLIDE 19

NARX forecaster - Results KNN

14/32

slide-20
SLIDE 20

NARX forecaster - Results SVR

15/32

slide-21
SLIDE 21

NARX forecaster - Results SVR

16/32

slide-22
SLIDE 22

Conclusions

◮ Correlation clustering among proxies belonging to the same

family, i.e. σi

t and σSD,n t

.

◮ All ML methods outperform the reference GARCH method,

both in the single input and the multiple input configuration.

◮ Only the addition of an external regressor, and for h > 8 bring

a statistically significant improvement (paired t-test, pv=0.05).

◮ No model appear to clearly outperform all the others on every

horizons, but generally SVR performs better than ANN and k-NN.

17/32

slide-23
SLIDE 23

Thank you for your attention! Any questions/comments? jacopo.de.stefani@ulb.ac.be Find the paper at:

18/32

slide-24
SLIDE 24

Bibliography I References

Tim Bollerslev. Generalized autoregressive conditional

  • heteroskedasticity. Journal of econometrics, 31(3):307–327,

1986. Andy P Field. Meta-analysis of correlation coefficients: a monte carlo comparison of fixed-and random-effects methods. Psychological methods, 6(2):161, 2001. Mark B Garman and Michael J Klass. On the estimation of security price volatilities from historical data. Journal of business, pages 67–78, 1980.

19/32

slide-25
SLIDE 25

Bibliography II

Peter R Hansen and Asger Lunde. A forecast comparison of volatility models: does anything beat a garch (1, 1)? Journal of applied econometrics, 20(7):873–889, 2005. Rob J Hyndman and Anne B Koehler. Another look at measures of forecast accuracy. International journal of forecasting, 22(4): 679–688, 2006. Souhaib Ben Taieb. Machine learning strategies for multi-step-ahead time series forecasting. PhD thesis, Ph. D. Thesis, 2014. Joe H Ward Jr. Hierarchical grouping to optimize an objective

  • function. Journal of the American statistical association, 58

(301):236–244, 1963.

20/32

slide-26
SLIDE 26

Appendix

21/32

slide-27
SLIDE 27

System overview

Missing values imputation Proxy generation Correlation analysis Model identification Model choice Evaluation choice Forecaster Raw OHLC data Imputed OHLC data σi

t, σSD t

, σG

t

{ANN, KNN} {RO, RW} m∗, θ∗ User choice User choice Data preprocessing

22/32

slide-28
SLIDE 28

System overview

Missing values imputation Proxy generation Correlation analysis Model identification Model choice Evaluation choice Forecaster Raw OHLC data Imputed OHLC data σi

t, σSD t

, σG

t

{ANN, KNN} {RO, RW} m∗, θ∗ User choice User choice Data preprocessing

22/32

slide-29
SLIDE 29

System overview

Missing values imputation Proxy generation Correlation analysis Model identification Model choice Evaluation choice Forecaster Raw OHLC data Imputed OHLC data σi

t, σSD t

, σG

t

{ANN, KNN} {RO, RW} m∗, θ∗ User choice User choice Data preprocessing

22/32

slide-30
SLIDE 30

Correlation analysis - Methodology

  • σi(1), σSD(1), σG(1)
  • σi(j), σSD(j), σG(j)
  • σi(N), σSD(N), σG(N)

corr(·) corr(·) corr(·) Meta- analysis toolkit corr(σAGG) corr(σ(1)) corr(σ(j)) corr(σ(N))

◮ 40 Time series (CAC40) ◮ Time range: 05-01-2009 to 22-10-2014 ⇒ 1489 OHLC

samples per TS

23/32

slide-31
SLIDE 31

NARX forecaster - Methodology

σJ

p

Original DGP Disturbances d Model

m∗(θ∗, σJ

p , σX p )

Structural identification Parametric identification

{ANN,KNN} {RO, RW} σX

p

e σJ

f

ˆ σJ

f

m∗(·, σJ

p , σX p )

θ∗

Model identification

24/32

slide-32
SLIDE 32

Volatility proxies (1) - Garman and Klass [1980]

◮ Closing prices

ˆ σ0(t) =

  • ln
  • P (c)

t+1

P (c)

t

2

= r2

t

(1)

◮ Opening/Closing prices

ˆ σ1(t) = 1 2f ·

  • ln
  • P (o)

t+1

P (c)

t

2

  • Nightly volatility

+ 1 2(1 − f) ·

  • ln
  • P (c)

t

P (o)

t

2

  • Intraday volatility

(2)

◮ OHLC prices

ˆ σ2(t) = 1 2 ln 4 ·

  • ln
  • P (h)

t

P (l)

t

2

(3) ˆ σ3(t) = a f ·

  • ln
  • P (o)

t+1

P (c)

t

2

  • Nightly volatility

+ 1 − a 1 − f · ˆ σ2(t)

  • Intraday volatility

(4)

25/32

slide-33
SLIDE 33

Volatility proxies (2) - Garman and Klass [1980]

◮ OHLC prices

u = ln

  • P (h)

t

P (o)

t

  • d = ln
  • P (l)

t

P (o)

t

  • c = ln
  • P (c)

t

P (o)

t

  • (5)

ˆ σ4(t) = 0.511(u − d)2 − 0.019[c(u + d) − 2ud] − 0.383c2 (6) ˆ σ5(t) = 0.511(u − d)2 − (2 ln 2 − 1)c2 (7) ˆ σ6(t) = a f · log

  • P (o)

t+1

P (c)

t

2

  • Nightly volatility

+ 1 − a 1 − f · ˆ σ4(t)

  • Intraday volatility

(8)

26/32

slide-34
SLIDE 34

Volatility proxies (3)

◮ GARCH (1,1) model - Hansen and Lunde [2005]

σG

t =

  • ω +

p

  • j=1

βj(σG

t−j)2 + q

  • i=1

αiε2

t−i

where εt−i ∼ N(0, 1), with the coefficients ω, αi, βj fitted according to Bollerslev [1986].

◮ Sample standard deviation

σSD,n

t

=

  • 1

n − 1

n−1

  • i=0

(rt−i − ¯ r)2 where rt = ln

  • P (c)

t

P (c)

t−1

  • ¯

rn = 1 n

t

  • j=t−n

rj

27/32

slide-35
SLIDE 35

Hyndman and Koehler [2006] - Error measures

Error measures

Scale independant

MAPE MdAPE RMSPE RMdSPE sMAPE sMdAPE

Scale dependant

MSE RMSE MAE MdAE

Relative Errors

MRAE MdRAE GMRAE MASE

Relative Measures

RelX Percent- Better

28/32

slide-36
SLIDE 36

Hyndman and Koehler [2006] - Scale dependant

Scale dependant

MSE RMSE MAE MdAE

et = yt − ˆ yt

◮ MSE : 1 n

n

t=0(yt − ˆ

yt)2

◮ RMSE :

  • 1

n

n

t=0(yt − ˆ

yt)2

◮ MAE : 1 n

n

t=0 |yt − ˆ

yt|

◮ MdAE :

Mdt∈{1···n}(|yt − ˆ yt|)

29/32

slide-37
SLIDE 37

Hyndman and Koehler [2006] - Scale independant

Scale independant

MAPE MdAPE RMSPE RMdSPE sMAPE sMdAPE

◮ MAPE : 1 n

n

t=0 | 100 · yt−ˆ yt yt

|

◮ MdAPE :

Mdt∈{1···n}(| 100 · yt−ˆ

yt yt

|)

◮ RMSPE :

  • 1

n

n

t=0(100 · yt−ˆ yt yt

)2

◮ RMdSPE :

  • Mdt∈{1···n}((100 · yt−ˆ

yt yt

)2)

◮ sMAPE : 1 n

n

t=0 200 · |yt−ˆ yt| yt+ˆ yt ◮ sMdAPE :

Mdt∈{1···n}(200 · |yt−ˆ

yt| yt+ˆ yt )

30/32

slide-38
SLIDE 38

Hyndman and Koehler [2006] - Relative errors

Relative Errors

MRAE MdRAE GMRAE MASE

rt = et

e∗

t

◮ MRAE : 1 n

n

t=0 | rt | ◮ MdRAE : Mdt∈{1···n}(| rt |) ◮ GMRAE :

n

  • 1

n

t = 0n | rt |

◮ MASE : 1 T

T

t=1

  • |et|

1 T −1

T

i=2|Yi−Yi−1|

  • 31/32
slide-39
SLIDE 39

Hyndman and Koehler [2006] - Relative measures

Relative Measures

RelX Percent- Better

◮ RelX : X Xbench ◮ Percent Better :

PB(X) = 100 · 1

n

  • forecasts I(X < Xb)

where

◮ X: Error measure of the

analyzed method

◮ Xb: Error measure of the

benchmark

32/32