FInancial High Frequency Data Per Mykland University of Chicago, - - PowerPoint PPT Presentation

financial high frequency data
SMART_READER_LITE
LIVE PREVIEW

FInancial High Frequency Data Per Mykland University of Chicago, - - PowerPoint PPT Presentation

What is High Frequency Data? Likelihood Connection FInancial High Frequency Data Per Mykland University of Chicago, October 2012 Mykland FInancial High Frequency Data What is High Frequency Data? Likelihood Connection Outline 1 What is


slide-1
SLIDE 1

What is High Frequency Data? Likelihood Connection

FInancial High Frequency Data

Per Mykland University of Chicago, October 2012

Mykland FInancial High Frequency Data

slide-2
SLIDE 2

What is High Frequency Data? Likelihood Connection

Outline

1

What is High Frequency Data? The data Basic statistical inference

2

Likelihood Connection General Connection Some Applications

Mykland FInancial High Frequency Data

slide-3
SLIDE 3

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

High Frequency Data

In our case: financial prices, and/or volumes Intra-day:

transactions tick-by-tick, from TAQ, Reuters, etc quotes - bid, ask - same sources limit order books, harder to get but more information stocks, bonds, futures, currencies, ... low latency data

Close to continuous observation:

Up to several observations per second

Mykland FInancial High Frequency Data

slide-4
SLIDE 4

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

Example of Transaction Data (medium density data)

MRK 20050405 9:41:37 32.69 100 Merck excerpt MRK 20050405 9:41:42 32.68 100 April 4, 2005 MRK 20050405 9:41:43 32.69 300 MRK 20050405 9:41:44 32.68 1000 Total of 6302 Merck MRK 20050405 9:41:48 32.69 2900 transactions on that day MRK 20050405 9:41:48 32.68 200 MRK 20050405 9:41:48 32.68 200 On same day: MRK 20050405 9:41:51 32.68 4200 80982 Microsoft (MSFT) MRK 20050405 9:41:52 32.69 1000 transactions MRK 20050405 9:41:53 32.68 300 MRK 20050405 9:41:57 32.69 200 Four years later: MRK 20050405 9:42:03 32.67 2500 MRK 20050405 9:42:04 32.69 100 On April 6, 2009: MRK 20050405 9:42:05 32.69 300 63846 Merck transactions MRK 20050405 9:42:15 32.68 3500 144842 MSFT transactions MRK 20050405 9:42:17 32.69 800 MRK 20050405 9:42:17 32.68 500 MRK 20050405 9:42:17 32.68 300 MRK 20050405 9:42:17 32.68 100 MRK 20050405 9:42:20 32.69 6400

Mykland FInancial High Frequency Data

slide-5
SLIDE 5

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

Evolution of Data Size

1995 2000 2005 2010 20000 40000 60000 80000

# of Merck transactions, first Monday in April

year mrk

Mykland FInancial High Frequency Data

slide-6
SLIDE 6

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

Evolution of Data Size

1995 2000 2005 2010 8 9 10 11

# of Merck transactions, first Monday in April

year log(mrk)

Mykland FInancial High Frequency Data

slide-7
SLIDE 7

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

CME at Midnight

Time CentiSec Quantity Price 00:00:42 47 1 150150 00:01:04 69 1 150150 E-mini SP500 Futures 00:01:04 80 1 150150 May 3, 2007 00:01:05 64 1 150150 00:01:06 56 1 150150 00:01:32 09 20 150150 00:01:32 09 34 150150 Total of 62659 trades 00:01:52 24 1 150150

  • n that day

00:02:32 03 10 150150 00:02:58 43 1 150175 00:02:58 43 1 150175 00:02:58 43 1 150175 00:02:58 43 5 150175 00:02:58 43 1 150175 00:02:58 43 1 150175 00:03:42 75 1 150150 00:03:43 20 1 150150 00:04:22 75 1 150150 00:04:24 39 1 150150

Mykland FInancial High Frequency Data

slide-8
SLIDE 8

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

CME in the Morning

Time CentiSec Quantity Price 10:00:00 25 1 150850 10:00:00 25 29 150850 E-mini SP500 Futures 10:00:00 45 1 150850 May 3, 2007 10:00:00 87 10 150850 10:00:01 73 50 150850 10:00:01 87 37 150850 10:00:01 88 463 150850 Total of 62659 trades 10:00:01 95 1 150850

  • n that day

10:00:01 95 1 150850 10:00:01 95 48 150850 10:00:01 95 2 150850 10:00:01 95 1 150850 10:00:01 95 3 150850 10:00:01 95 2 150850 10:00:01 98 1 150850 10:00:01 98 4 150850 10:00:01 98 2 150850 10:00:01 98 3 150850 10:00:02 04 5 150850

Mykland FInancial High Frequency Data

slide-9
SLIDE 9

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

Turning Data into Knowledge

Modern quantitative finance uses high frequency constructions in stochastic processes:

to price assets, underlying and derivative to construct trading strategies

The high frequency data are the empirical realization of the same processes The data open a new angle on quantative finance:

better estimators and models well crafted daily summaries (relationship to sufficiency) combination with longer horizon macroeconomic data a complement to cross-sectionally based (implied) quantities unification of econometrics, risk mgmt, and quantititative finance? a new way of having fun with semimartingales

Mykland FInancial High Frequency Data

slide-10
SLIDE 10

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

Direct Impact of Estimating Intraday or “Spot" Quantities

Asset management, portfolio optimization Empirical or conservative options hedging Risk management Early detection of abrupt changes in market conditions Better trade execution Input to longer run models This is relevant both from the public and private perspective

Mykland FInancial High Frequency Data

slide-11
SLIDE 11

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

Statistical Inference in High Frequency Data

Natural to use same model as in quantitative finance: the Itô process: log securities price: Xt = X0 + t µsds + t σsdBs Bt is Brownian motion; µt and σt can be random processes Model can also include jumps (different but related results) High frequency data formalism: Up to several transcations per second, sampling times 0 = t0 < t1 < ... < tn = T Time period of analysis [0, T]: one day (5 min, 2 weeks)

Mykland FInancial High Frequency Data

slide-12
SLIDE 12

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

Quantities that can be Estimated in In Data from One Day

Classical target: Integrated volatility: X, X = T

0 σ2 t dt = lim∆t→∞

  • ti+1≤T(Xti+1 − Xti)2

Other powers of volatility: T

0 σp t dt

Leverage effect: σ2, XT, or corresponding correlation Volatility of volatility σ2, σ2T Regression of one process on another, intergrated alphas and betas, ANOVA Same quantities, but instantaneously Nonparametric trading strategies Liquidity; time to execution

Mykland FInancial High Frequency Data

slide-13
SLIDE 13

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

The Classical case: Realized Volatility (RV) as Measure of Integrated Volatility

High frequency data: possibility to estimate X, XT very precisely Usual estimator: RV =

  • 0<ti+1≤T

(Xti+1 − Xti)2 “realized volatility” consistent as ∆t → 0 (stochastic calculus) widely used (Andersen, Bollerlev, many others) convergence rate n1/2, asymptotically mixed normal, with estimable variance (Barndorff-Nielsen & Shephard, Jacod & Protter, M & Z)

Mykland FInancial High Frequency Data

slide-14
SLIDE 14

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

Microstructure Noise, and The Hidden Semimartingale model (“Nugget Effect”)

  • bserved log stock price: Yti = Xti + ǫi

Xt is latent log price, semimartingale, say, Ito process dXt = µtdt + σtdBt ǫi is stationary or iid, or similar In financial data more realistic model because of microstructure small deviations from semimartingale model allowable because it may not be possible to take advantage of these for arbitrage noise need not mess up options hedging

Mykland FInancial High Frequency Data

slide-15
SLIDE 15

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

Theology vs. Data: RV vs Sampling Interval

  • dependence of estimated volatility on number of subgrids

K: # of subgrids volatility of AA, Jan 4, 2001, annualized, sq. root scale 5 10 15 20 0.6 0.8 1.0 1.2

Mykland FInancial High Frequency Data

slide-16
SLIDE 16

What is High Frequency Data? Likelihood Connection The data Basic statistical inference

RV as One Samples More Frequently

  • dependence of estimated volatility on sampling frequency

sampling frequency (seconds) volatility of AA, Jan 4, 2001, annualized, sq. root scale 50 100 150 200 0.6 0.8 1.0 1.2

Mykland FInancial High Frequency Data

slide-17
SLIDE 17

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Can We Learn Anything from Parametric Inference?

Thought Experiment: What if one pretended that σt is constant over blocks of M sampling times ti? One possibility: parametric inference for each block, then aggregate results across blocks Does this give estimators that are Consistent? Efficient? Look different than current estimators? Important agendas: multivariate case, elusive univariate quantities (such as leverage effect, volatility of volatility) Sufficiency can inform summaries

Mykland FInancial High Frequency Data

slide-18
SLIDE 18

What is High Frequency Data? Likelihood Connection General Connection Some Applications

A statistical “risk neutral” (equivalent martingale) measure

Can without much loss of generality assume µt = 0 Results shown for µt = 0 carry over to general drifts µt using Girsanov’s Theorem and stable convergence Measure change commutes with stable convergence Same phenomenon: cannot consistently estimate T

0 µtdt

References: Rényi (1963), Aldous and Eagleson (1978), Hall & Heyde (1980), Rootzén (1980), Jacod & Protter (1998). To be precise: True probability Q: dXt = µtdt + σtdBQ

t

Statistical risk neutral probability P: dXt = σtdBt

Mykland FInancial High Frequency Data

slide-19
SLIDE 19

What is High Frequency Data? Likelihood Connection General Connection Some Applications

An approximate model

Sampling times Gn = {0 = tn,0 < tn,1 < ... < tn,n = T} Break points τn,i: Every M’th observation, or other subset

  • f Gn

Approximate model Pn : dXt = στn,i−1dBt for t ∈ (τn,i−1, τn,i] For our results need: Asymptotic decoupling delay (ADD) K(t) = lim

n→∞

  • i
  • tn,j∈(τn,i−1,τn,i]∩[0,t]

(tn,j − τn,i−1) = (M − 1)t 2 in equidistant case Under Pn, for τi < tj ≤ τi+1, conditionally on Fτn,i−1: ∆Xtj = Xtj − Xtj−1 are independent normal vectors, ∆Xtj ∼ N(0, σT

τn,i−1στn,i−1∆tj)

Mykland FInancial High Frequency Data

slide-20
SLIDE 20

What is High Frequency Data? Likelihood Connection General Connection Some Applications

The main approximation result

Suppose: σ2

t is itself an Itô process:

dσ2

t = νtdt + ξtdWt

Assume νt, ξt are locally bounded; also inf σ2

t > 0 (smallest

eigenvalue > 0 in the case of matrix process) Theorem (subject to regularity conditions): dP and dPn are mutually absolutely continuous as probabilities on (Xt0, ..., Xtn)

dP dPn (Xt0, ..., Xtn) → exp{ηZ − 1 2η2} as n → ∞, in stably law

under Pn, where Z is N(0,1), independent of FT, and where η2 = 3 8 T σ−6

t

(σ2, X′

t)2dt

  • related to leverage effect

+ 1 2 T σ−4

t

σ2, σ2′

tdK(t)

  • related to # in each block

In other words, Pn and P are contiguous (Le Cam)

Mykland FInancial High Frequency Data

slide-21
SLIDE 21

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Implications of contiguity for Estimation

θ: quantity to be estimated, say θ = T

0 σ4 t dt

Suppose under Pn: n1/4(ˆ θn − θ) → N(0, d2), stably in law Then under P: n1/4(ˆ θn − θ) → N(b, d2) stably in law Convergence is stable in law; more generally: nα(ˆ θn − θ) For this: weak additional assumptions Effect of approximation is therefore: Possible asymptotic bias b = d η× Pn-asymptotic covariance between n1/4(ˆ θn − θ) and log dP

dPn

Consistency, order of convergence, and asymptotic variance d2 all stay the same

Mykland FInancial High Frequency Data

slide-22
SLIDE 22

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Example: Estimating Integrals of Powers of Volatility

Target (“parameter”): θ = T

0 σp t dt

“Moment” estimator: ˆ θn = cp n

T

n

j=1 |∆Xtj|p

Under our approximation, for τi < tj ≤ τi+1: ∆Xtj = Xtj − Xtj−1 are iid N(0, σ2

τi∆t)

. UMVU estimator of σ2

τi∆t:

  • σ2

τi∆t = 1

M

  • tj∈(τi−1,τi]

∆X 2

tj .

The UMVU estimator of σp

τi is proportional to (

σ2

τi)p/2

Improved efficiency for θ under Pn. Carries over to P

Mykland FInancial High Frequency Data

slide-23
SLIDE 23

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Asymptotic Relative Efficiency of Estimators

M=20

p asymptotic relative efficiency 0.38 0.58 0.78 0.98 1 2 3 4 5 6

M=100

p asymptotic relative efficiency 0.38 0.58 0.78 0.98 1 2 3 4 5 6

Mykland FInancial High Frequency Data

slide-24
SLIDE 24

What is High Frequency Data? Likelihood Connection General Connection Some Applications

ANOVA (Analysis of Variance/Variation)

The setting: Processes: Xt (can be multivariate), Yt are observed at times tj The two processes are related by dYt =

p

  • i=1

f (i)

s dX (i) s

+ dZt, with X (i), Zt = 0 for all t and i Problem: estimate Z, ZT based on the data (Z (2001)) Three interpretations of Z:

ideosynchratic component of price of a stock hedging error of an option estimation or model error

Simple solution: regular regression and ANOVA in each block

  • f M observations. Can use traditional regression theory.
  • Z, ZT =

M M − p

  • blocks

(RSS in block i)

Mykland FInancial High Frequency Data

slide-25
SLIDE 25

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Properties of estimator Z, ZT

Asymptotic bias b = 0 Asymptotic variance for n1/2( Z, ZT − Z, ZT): 2 M M − p T (Z, Z′

t)2dt

Similar to results for p = 1 when M → ∞ with n

Mykland FInancial High Frequency Data

slide-26
SLIDE 26

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Integrated Beta

Same setting as for ANOVA: Regression equation: dYt =

p

  • i=1

f (i)

s dX (i) s

+ dZt, with X (i), Zt = 0 for all t and i Problem: estimate T

0 f (i) t

dt Same solution as for ANOVA: regular regression in each block of M observations. Can use traditional regression theory Asymptotic variance of n1/2( T

0 ftdt −

T

0 ftdt) is

MT M − p − 1 T Z, Z′

t(X, X′ t)−1dt.

Mykland FInancial High Frequency Data

slide-27
SLIDE 27

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Preaveraging and local MA(1)-ness

10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

midquote May 3 2007

5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

1-min averaged data

Mykland FInancial High Frequency Data

slide-28
SLIDE 28

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Best Bid Process

10 20 30 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

Best Bid on May 3 2007

5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

1-min Averaged Best Bid Mykland FInancial High Frequency Data

slide-29
SLIDE 29

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Best Ask Process

10 20 30 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

Best Ask on May 3 2007

5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

1-min Averaged Best Ask Mykland FInancial High Frequency Data

slide-30
SLIDE 30

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Before and After

20000 40000 60000 80000 11.918 11.921 11.924

midquote series for 05−03 trading session

time log price 1000 2000 3000 4000 5000 11.918 11.921 11.924

pre−averaged 15 sec series

Index log price 200 400 600 800 1000 1400 11.918 11.921 11.924

pre−averaged 1 min series

Index log price 50 100 150 200 250 11.918 11.921 11.924

pre−averaged 5 min series

Index log price

Mykland FInancial High Frequency Data

slide-31
SLIDE 31

What is High Frequency Data? Likelihood Connection General Connection Some Applications

The Limits to Inference

“parameter" Effective Sample Size in Different Situations Microstructure Absent Microstructure Present daily spot daily spot volatility regression Op(n) Op(n1/2) Op(n1/2) Op(n1/4) ANOVA leverage effect vol of vol Op(n1/2) Op(n1/4) Op(n1/4) Op(n1/8) n is daily number of transactions/quotes; Effective sample size = rate of convergence2

Mykland FInancial High Frequency Data

slide-32
SLIDE 32

What is High Frequency Data? Likelihood Connection General Connection Some Applications

The Limits to Inference

“parameter" Rate of Convergence in Different Situations Microstructure Absent Microstructure Present daily spot daily spot volatility regression Op(n1/2) Op(n1/4) Op(n1/4) Op(n1/8) ANOVA leverage effect vol of vol Op(n1/4) Op(n1/8) Op(n1/8) Op(n1/16)

Mykland FInancial High Frequency Data

slide-33
SLIDE 33

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Conclusions

A vast amount of data How to turn into knowledge? How to turn into regulation? risk management? trading? Financial prices have “error" Connection to parametric inference

Mykland FInancial High Frequency Data

slide-34
SLIDE 34

What is High Frequency Data? Likelihood Connection General Connection Some Applications

Political High Frequency Data: Romney on Intrade, 23 Oct 2012

Mykland FInancial High Frequency Data