Principal Component Analysis of High Frequency Data t-Sahalia - - PowerPoint PPT Presentation

principal component analysis of high frequency data
SMART_READER_LITE
LIVE PREVIEW

Principal Component Analysis of High Frequency Data t-Sahalia - - PowerPoint PPT Presentation

Motivation Model Setup Inference Simulations Empirical Work Conclusion Principal Component Analysis of High Frequency Data t-Sahalia Dacheng Xiu Yacine A Department of Economics, Princeton University Booth School of


slide-1
SLIDE 1

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Principal Component Analysis of High Frequency Data

Yacine A¨ ıt-Sahalia † Dacheng Xiu ‡

†Department of Economics, Princeton University ‡Booth School of Business, University of Chicago

FERM 2014, Central University of Finance and Economics June 28, 2014

slide-2
SLIDE 2

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Motivation

◮ Principal component analysis (PCA) is one of the most

popular and oldest techniques for multivariate analysis.

  • 1. Pearson (1901, Philosophical Magazine)
  • 2. Hotelling (1933, J. Educ. Psych.)

◮ PCA is a dimension reduction technique that seeks to describe

the multivariate structure of the data.

◮ The central idea is to identify a small number of factors that

effectively summarize the variation of the data.

slide-3
SLIDE 3

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Statistical Inference on PCA

◮ Estimating eigenvalues of the sample covariance matrix is the

key step towards PCA.

◮ Anderson (1963, AOS) studies the statistical inference

problem of the eigenvalues, and find that √n( λ − λ) d → N

  • 0, 2Diag
  • λ2

1, λ2 2, . . . , λ2 d

  • .

where λ and λ are the vectors of eigenvalues of the sample ad population covariance matrix. λ is simple.

◮ When eigenvalues are repeated, the asymptotic theory is a bit

complicated to use.

slide-4
SLIDE 4

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Drawbacks There are at least two obvious drawbacks of the classical asymptotic theory.

◮ The derivation requires i.i.d. and multivariate normality.

◮ Extensions to non-normality or time-series data is possible, e.g.

Waternaux (1976, AOS), Tyler (1983, AOS), Stock and Watson (1998, WP), etc.

◮ The second drawback is the curse of dimensionality.

◮ It is well known that when n/d → C ≥ 1,

1 n

  • λ1

a.s.

− → (1 + C −1/2)2 where the true eigenvalue is 1, see e.g. German (1980, AOP), Bai (1999, Statist. Sinica), and Johnstone (2001, AOS).

slide-5
SLIDE 5

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Limited Applications

◮ An ETF that tracks the Dow Jones index needs d = 30

  • stocks. Its covariance matrix has 465 parameters, if no

additional structure is imposed.

◮ It is very demanding to conduct nonparametric inference with

limited amount of data. Years of daily data are required.

◮ Moreover, stock returns exhibit time-varying volatility and

heavy tails, which deviate from i.i.d. normality to a great extent.

slide-6
SLIDE 6

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Why using High Frequency Data is Better?

◮ The dimension does not need to be “large.”

◮ The large amount of data eases the curse of dimensionality, to

the extent that asymptotic results with dimensionality being fixed may serve as rather good approximations.

◮ E.g. for a typical trading day, we have at least 78 observations

  • f 5-minute returns.

◮ The time span does not need to be “long.”

◮ Fixed time span [0, T], with T = 1 day or 1 month. ◮ The in-fill asymptotic framework enables nonparametric

analysis of general continuous-time stochastic processes.

◮ Instead of estimating “expectations,” we measure

“realizations.”

slide-7
SLIDE 7

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Main Contribution

◮ We define the concept of (realized) PCA for data sampled

from a continuous-time stochastic process within a fixed time window.

◮ We propose asymptotic theory for spectral functions,

eigenvalues, eigenvectors, and principal components, under general nonparametric models, using intraday data.

◮ Empirically, we use this new technique to analyze constituents

  • f Dow Jones Index and document a factor structure within a

short window.

slide-8
SLIDE 8

Motivation Model Setup Inference Simulations Empirical Work Conclusion

PCA and Factor Models

◮ Applications in Finance and Macro: Ross (1976, JET), Stock

and Watson (2002, JBES).

◮ Classic PCA and Factor Analysis: Hotelling (1933, J. Educ.

Psych.), Thomson (1934, J. Educ. Psych.), Anderson and Amemiya (1988, AOS).

◮ Large d Setting: Chamberlain and Rothschild (1983, ECMA),

Connor and Korajczyk (1998, JFE), Stock and Watson (2002, JASA), Bai and NG (2002, ECMA), Bai (2003, ECMA), Mario Forni, Lucrezia Reichlin, Marc Hallin, Marco Lippi (1999, RofE&S), Lam and Yao (2012, AOS).

slide-9
SLIDE 9

Motivation Model Setup Inference Simulations Empirical Work Conclusion

A Large Literature on High Frequency Data dXt = µtdt + σtdWt + Jt

◮ QV and Components of QV, e.g.

t

0 σ2 udu, u≤t(∆Ju)2. ◮ Covariance, e.g.

t

0 σuσ⊺ udu. ◮ Downside Risk, e.g. u≤t(∆Ju)21{∆Ju<0}. ◮ Skewness, u≤t(∆Ju)3. ◮ Other Nonlinear Functionals of Volatility:

t

0 σ4 udu,

t

0 1{σ2

u≤x}du,

t

0 e−xσ2

udu.

◮ Testing for Jumps, Estimation of Jump Activity and Tails. ◮ Robustness to Microstructure Noise, Asynchronous Trading,

Endogenous Trading Time.

◮ Leverage Effect. ◮ From Realized to Spot Variance: σuσ⊤ u .

slide-10
SLIDE 10

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Spot Variance Related Papers

◮ Jacod and Rosenbaum (2013, AOS):

t g(σsσ⊺

s )ds,

e.g. t σ4

s ds. ◮ Fixed T, Fixed d: Eigenvalue Related Problems

◮ Test of Rank: Jacod, Lejay, and Talay (2008, Bernoulli), Jacod

and Podolskij (2013, AOS).

◮ Fixed T, Large d:

◮ High-Dimensional Covariance Matrix Estimation with High

Frequency Data: Wang and Zou (2010, JASA), Tao, Wang, and Zhou (2013, AOS), Tao, Wang, and Chen (2013, ET), and Tao, Wang, Yao, and Zou (2011, JASA).

◮ Spectral Distribution of Realized Covariance Matrix: Zheng

and Li (2011, AOS).

slide-11
SLIDE 11

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Classical PCA Suppose R is a d-dimensional vector-valued random variable. The first component is a linear combination of R, γ⊺

1R, which maximize

its variation. The weight γ1 satisfies the following optimization problem: max

γ1 γ⊺ 1cγ1,

subject to γ⊺

1γ1 = 1

where c = cov(R). Using the Lagrange multiplier, the problem is to maximize γ⊺

1cγ1 − λ1(γ⊺ 1γ1 − 1)

which yields cγ1 = λ1γ1, and γ⊺

1cγ1 = λ1.

slide-12
SLIDE 12

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Classical PCA - continue

◮ Therefore, λ1 is the largest eigenvalue of the population

covariance matrix c, and γ1 is the corresponding eigenvector.

◮ The second principal component solves the following

  • ptimization problem:

max

γ2 γ⊺ 2cγ2,

subject to γ⊺

2γ2 = 1, and cov(γ⊺ 1R, γ⊺ 2R) = 0.

It turns out that the solution γ2 corresponds to the second eigenvalue λ2.

slide-13
SLIDE 13

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Continuous-Time Model We consider a d-dimensional Itˆ

  • semimartingale, defined on a

filtered space (Ω, F, (Ft)t≥0, P) with the following representation: Xt = X0 + t bs ds + t σsdWs + Jt, and writing ct = (σσ⊺)t, ct = c0 + t ˜ bsds + t ˜ σsd ˜ Ws + ˜ Jt, where W is a d-dimensional Brownian motion, ˜ W is another Brownian motion, possibly correlated with W , and Jt and ˜ Jt are price and volatility jumps.

slide-14
SLIDE 14

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Principal Component Analysis How do we introduce PCA in this setting?

◮ Instead of maximizing the variance, we maximize the

continuous component of the quadratic variation.

◮ Theorem: There exists a sequence of {λg,s, γg,s}, 1 ≤ g ≤ d,

0 ≤ s ≤ t, such that csγg,s = λg,sγg,s, γ⊺

g,sγg,s = 1,

and γ⊺

h,scsγg,s = 0,

where λ1,s ≥ λ2,s ≥ . . . ≥ λd,s ≥ 0. Moreover, for any c` adl` ag and vector-valued adapted process γs, such that γ⊺

s γs = 1,

and for 1 ≤ h ≤ g − 1,

u γ⊺

s−dXs,

u γ⊺

h,s−dXs

c = 0, and u λg,sds ≥ u γ⊺

s−dXs,

u γ⊺

s−dXs

c .

slide-15
SLIDE 15

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Eigenvalue as a Function Let’s start with integrated eigenvalues, from which we know the relative importance of different components.

◮ Lemma: The function λ : M+ d → ¯

R+

d is Lipchitz.

◮ ¯

R+

d is the subset of ordered nonnegative numbers of Rd.

◮ M+

d is the space of non-negative matrices.

◮ Therefore,

t

0 λ(cs)ds is well-defined. ◮ Moreover, λg, if simple, is a C ∞-function. So is its unique

corresponding eigenvector function γg up to a sign.

slide-16
SLIDE 16

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Estimation Strategy

◮ The idea is simple.

  • 1. Decompose the interval [0, t] into many subintervals
  • 2. Estimate cs within each subinterval using sample covariance

matrix.

  • 3. Aggregate the eigenvalues of

cs, λ( cs).

◮ Apparently, we need some idea about the derivatives of λ(·)

with respect to a matrix, as the estimation error depends on the smoothness of λ(·).

◮ Then, how do we handle repeated eigenvalues, given that they

are only Lipchitz?

slide-17
SLIDE 17

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Spectral Functions

◮ It turns out we should consider estimating an even more

general quantity t

0 F(cs)ds. ◮ A spectral function is a function of non-negative matrices,

which satisfies, F(c) = F(O⊺cO), for any orthogonal matrix O.

◮ In other words, a spectral function depends on the matrix only

through its eigenvalues.

◮ Therefore, we can write F(c) = (f ◦ λ)(c). ◮ F is spectral ⇐

⇒ f is symmetric, i.e. f (Px) = f (x), for any vector x ∈ ¯ R+

d , where P is a permutation matrix.

slide-18
SLIDE 18

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Property of Spectral Functions

◮ Lemma: The symmetric function f is kth continuously

differentiable at a point λ(c) ∈ ¯ R+

d if and only if the spectral

function F = f ◦ λ is kth continuously differentiable at the point c ∈ M+

d , for k = 0, 1, 2, . . . , ∞. ◮ All the spectral functions on the next slide are

twice-differentiable.

slide-19
SLIDE 19

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Examples

◮ f (x) = d j=1 xj =

⇒ F(c) = Tr(c).

◮ f (x) = d j=1 xj =

⇒ F(c) = det(c)

◮ f (x) = the k-th largest entry in x = ¯

xk.

◮ F(c) = the k-th eigenvalue of c and it is simple. ◮ This function is only differentiable when xk−1 > xk > xk+1.

◮ f (x) = 1 gl−gl−1

gl

j=gl−1+1 ¯

xj.

◮ F(c) = the k-th eigenvalue of c, and it is repeated. ◮ This function is differentiable when

xgl−1 > xgl−1+1 = . . . = xgl > xgl+1.

slide-20
SLIDE 20

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Estimation

  • 1. Denote the distance between adjacent observations by ∆n. At

each i∆n, we can form a block of length kn∆n, and estimate the ci∆n by

  • ci∆n =

1 kn∆n

kn

  • j=1
  • ∆n

i+jX

⊺ ∆n

i+jX

  • 1{∆n

i+jX≤un}.

where un = α∆̟

n , and ∆n i X = Xi∆n − X(i−1)∆n.

  • 2. Our estimator of

t

0 F(cs)ds is

V (∆n, X; F) = ∆n

[t/∆n]−kn

  • i=0

f

  • λi∆n
  • ,

where λi∆n := λ( ci∆n).

slide-21
SLIDE 21

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Consistency and Failure of CLT

◮ Theorem: Suppose f is C 3 and has polynomial growth. As

kn → ∞ and kn∆n → 0, as long as 1/(4 − 2r) ≤ ̟ < 1/2, V (∆n, X; F)

p

− → t F(cs)ds.

◮ Theorem: As k2 n∆n → 0, we have

kn

  • V (∆n, X; F) −

t F(cs)ds

  • p

− →1 2

d

  • j,k,l,m=1

t ∂2

jk,lmF(cs) (cjl,sckm,s + cjm,sckl,s) ds.

slide-22
SLIDE 22

Motivation Model Setup Inference Simulations Empirical Work Conclusion

The Bias-Corrected Estimator and its CLT

◮ The Bias-Corrected Estimator:

  • V (∆n, X; F) = ∆n

[t/∆n]−kn

  • i=0
  • F(

ci∆n) − 1 2kn

d

  • j,k,l,m=1

∂2

jk,lmF(

ci∆n) ( cjl,i∆n ckm,i∆n + cjm,i∆n ckl,i∆n)

  • .

◮ Theorem (CLT): As k3 n∆n → ∞, k2 n∆n → 0,

1 √∆n

  • V (∆n, X; F) −

t F(cs)ds

  • L−s

− → Wt, where E(Wp,tWq,t|F) = t

d

  • j,k,l,m=1

∂jkFp(cs)∂lmFq(cs) (cjl,sckm,s + cjm,sckl,s) ds.

slide-23
SLIDE 23

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Return to the Integrated Eigenvalues

◮ Assumption: Eigenvalues maintain the following structure:

λ1(cs) = . . . = λg1(cs) > λg1+1(cs) = . . . = λg2(cs) > . . . λgr−1(cs) > λgr−1+1(cs) = . . . = λgr (cs) > 0.

Moreover, their structure, i.e. {g1, g2, . . . , gr}, does not vary

  • ver [0, t]. Not needed for consistency.

◮ Choose the spectral function F accordingly:

F λ(·) =   1 g1

g1

  • j=1

λj(·), 1 g2 − g1

g2

  • j=g1+1

λj(·), . . . , 1 gr − gr−1

gr

  • j=gr−1+1

λj(·)  

.

slide-24
SLIDE 24

Motivation Model Setup Inference Simulations Empirical Work Conclusion

The CLT of Eigenvalues

◮ Corollary: The bias-corrected estimator takes on the

following form:

  • V (∆n, X; F λ

p ) =

∆n gp − gp−1

[t/∆n]−kn

  • i=0

gp

  • h=gp−1+1
  • λh,i∆n − 1

kn Tr

  • (

λh,i∆nI − ci∆n)+ ci∆n

  • λh,i∆n
  • .

The asymptotic covariance matrix is given by E(Wλ

t (Wλ t )⊺|F)

=       

2 g1

t

0 λ2 g1,sds 2 g2−g1

t

0 λ2 g2,sds

...

2 gr −gr−1

t

0 λ2 gr ,sds

       .

slide-25
SLIDE 25

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Eigenvectors Suppose γg,s is a vector-valued function that corresponds to the eigenvector of cs with respect to a simple λg,s, for each s ∈ [0, t]. We have

1 √∆n  ∆n

[t/∆n]−kn

  • i=0

  γg,i∆n + 1 2kn

  • p=g
  • λg,i∆n

λp,i∆n ( λg,i∆n − λp,i∆n)2 γg,i∆n   − t γg,sds  

L−s

− → W γ

t ,

where the covariance matrix is given by

E(Wγ

t (Wγ t )⊺|F) =

t λg,s

  • (λg,sI − cs)+cs(λg,sI − cs)+

ds.

slide-26
SLIDE 26

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Principal Components

◮ Suppose γg,s is a vector-valued function that corresponds to

the eigenvector of cs with respect to a simple root λg,s, for each s ∈ [0, t]. We have

[t/(kn∆n)]−kn

  • i=1
  • γ⊺

g,(i−1)kn∆n(X(i+1)kn∆n − Xikn∆n) p

− → t γ⊺

g,s−dXs.

◮ So far, we have estimated

t

0 λ(cs)ds,

t

0 γ⊺ g,s−dXs, and

t

0 γg,sds.

slide-27
SLIDE 27

Motivation Model Setup Inference Simulations Empirical Work Conclusion

PCA on Integrated Covariance? Why not apply the usual PCA technique to integrate covariance matrix t

0 csds? ◮ The “eigenvalues” and “principal components” do not have

the usual interpretations, i.e. the first eigenvalue λ1( t

0 csds)

is not the “variance(in any sense)” of the first principal component γ⊺

1(Xt − X0). By contrast:

t λ1,sds = t γ⊺

1,sdXs,

t γ⊺

1,sdXs

c .

◮ Our PCA can uncover a factor structure, even if factor

loadings and factors are time-varying.

slide-28
SLIDE 28

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Simulation Model The cross-section of log stock prices X in continuous-time follows a factor model:

dXt = βtdYt + dZt,

where Y is unknown, and Z is a idiosyncratic component,

  • rthogonal to Y . Suppose Y follows a three-factor model:

   dY1t dY2t dY3t    =    µ1 µ2 µ3    dt +    σ1t σ2t σ3t       1 ρ12 ρ13 ρ12 1 ρ23 ρ13 ρ23 1       dW1t dW2t dW3t    + JY

t dNt,

where for 1 ≤ i ≤ 3,

dσ2

it = κi(θi − σ2 it)dt + ξiσit

  • ρidBit +
  • 1 − ρ2

i dWit

  • + Jσ

it dNt.

dZit = zidUit.

slide-29
SLIDE 29

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Simulation Design

◮ We simulate intraday returns of up to 30 stocks at 5-second

frequency spanning 1 week.

◮ There are 3 distinct eigenvalues which reflect the local factor

structure of the simulated data.

◮ The remaining 27 population eigenvalues are identical, due to

idiosyncratic variations. d[X, X]c

t

dt = βt d[Y , Y ]c

t

dt β⊺

t

  • Rank=3

+ d[Z, Z]c

t

dt

  • Diagonal Full Rank

◮ We fix kn = θ∆−1/2 n

  • log(d), with θ = 0.25 and d is the

dimension of X.

slide-30
SLIDE 30

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Verification of CLT of Eigenvalues We estimate 3 simple integrated eigenvalues as well as the average

  • f the remaining 27 identical eigenvalues.

−4 −2 2 4 0.2 0.4 CLT: 3rd Eigenvalue −4 −2 2 4 0.2 0.4 CLT: 2nd Eigenvalue −4 −2 2 4 0.2 0.4 CLT: 1st Eigenvalue −4 −2 2 4 0.2 0.4 CLT: 4th Eigenvalue

slide-31
SLIDE 31

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Integrated Eigenvalues

1 Week, 5 Seconds 1 Week, 1 Minute # Stocks True Bias Stdev RMSE True Bias Stdev RMSE 3 1.329

  • 0.001

0.012 0.990 1.330

  • 0.002

0.042 1.002 5 2.508

  • 0.001

0.023 0.990 2.509

  • 0.003

0.082 1.034 10 4.684

  • 0.001

0.044 0.990 4.686

  • 0.007

0.152 1.026 15 6.457

  • 0.002

0.060 0.994 6.460

  • 0.011

0.208 1.024 20 7.480

  • 0.003

0.070 0.993 7.484

  • 0.012

0.241 1.027 25 8.292

  • 0.003

0.077 0.994 8.296

  • 0.014

0.267 1.028 30 8.771

  • 0.003

0.082 0.994 8.775

  • 0.015

0.283 1.030 1 Week, 5 Minutes 1 Month, 5 Minutes # Stocks True Bias Stdev RMSE True Bias Stdev RMSE 3 1.333

  • 0.011

0.096 1.050 1.748 0.001 0.065 0.988 5 2.514

  • 0.012

0.183 1.076 3.430 0.002 0.126 1.004 10 4.696

  • 0.033

0.336 1.076 6.344

  • 0.007

0.234 1.006 15 6.473

  • 0.054

0.463 1.093 8.771

  • 0.019

0.324 1.019 20 7.499

  • 0.064

0.537 1.098 10.158

  • 0.026

0.374 1.020 25 8.313

  • 0.069

0.596 1.107 11.319

  • 0.031

0.416 1.022 30 8.793

  • 0.072

0.631 1.111 12.008

  • 0.035

0.441 1.023

Table: Simulation Results of 1st Eigenvalue Estimation

slide-32
SLIDE 32

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Integrated Eigenvalues

1 Week, 5 Seconds 1 Week, 1 Minute # Stocks True Bias Stdev RMSE True Bias Stdev RMSE 5 0.123 0.000 0.001 1.028 0.123 0.000 0.003 1.164 10 0.123 0.000 0.000 1.026 0.123 0.000 0.002 1.183 15 0.123 0.000 0.000 1.046 0.123 0.000 0.001 1.260 20 0.123 0.000 0.000 1.046 0.123 0.000 0.001 1.263 25 0.123 0.000 0.000 1.044 0.123 0.000 0.001 1.252 30 0.123 0.000 0.000 1.076 0.123 0.000 0.001 1.241 1 Week, 5 Minutes 1 Month, 5 Minutes # Stocks True Bias Stdev RMSE True Bias Stdev RMSE 5 0.123 0.000 0.007 1.305 0.123 0.003 0.004 1.689 10 0.123 0.002 0.005 1.762 0.123 0.004 0.003 3.250 15 0.123 0.002 0.005 2.301 0.123 0.004 0.003 4.415 20 0.123 0.002 0.005 2.602 0.123 0.004 0.003 5.194 25 0.123 0.002 0.004 2.724 0.123 0.004 0.003 5.428 30 0.123 0.002 0.004 2.787 0.123 0.004 0.003 5.620

Table: Simulation Results of Repeated Eigenvalue Estimation

slide-33
SLIDE 33

Motivation Model Setup Inference Simulations Empirical Work Conclusion

The First Integrated Eigenvector

1 Week, 5 Seconds 1 Week, 1 Minute # Stocks True Bias Stdev RMSE True Bias Stdev RMSE

  • 0.207

0.000 0.003 0.995

  • 0.207

0.000 0.011 0.985 3

  • 0.554

0.000 0.004 0.984

  • 0.554

0.000 0.015 0.982 0.806 0.000 0.002 0.992 0.807 0.000 0.008 0.979 0.042 0.000 0.003 1.020 0.042 0.000 0.009 0.998

  • 0.462

0.000 0.004 1.032

  • 0.463

0.000 0.013 0.998 5

  • 0.321

0.000 0.002 0.984

  • 0.321

0.001 0.009 1.037

  • 0.192

0.000 0.002 0.986

  • 0.192

0.000 0.007 1.036 0.802 0.000 0.002 1.043 0.802 0.000 0.006 0.979 0.065 0.000 0.002 1.007 0.065 0.000 0.008 0.997 0.316 0.000 0.002 1.014 0.316 0.000 0.007 1.008

  • 0.379

0.000 0.003 1.008

  • 0.379

0.000 0.010 1.005 0.179 0.000 0.002 0.988 0.179 0.000 0.007 1.023 10 0.092 0.000 0.005 1.021 0.092 0.000 0.018 1.009

  • 0.221

0.000 0.002 0.991

  • 0.221

0.000 0.009 1.024

  • 0.117

0.000 0.002 1.007

  • 0.117

0.000 0.006 1.052 0.037 0.000 0.002 0.968 0.037 0.000 0.006 1.053 0.594 0.000 0.004 1.020 0.594 0.000 0.014 0.995 0.543 0.000 0.003 1.032 0.543 0.000 0.011 0.997

Table: Simulation Results of Eigenvector Estimation

slide-34
SLIDE 34

Motivation Model Setup Inference Simulations Empirical Work Conclusion

The First Integrated Eigenvector - Continue

1 Week, 5 Seconds 1 Week, 1 Minute # Stocks True Bias Stdev RMSE True Bias Stdev RMSE 0.048 0.000 0.001 1.009 0.048 0.000 0.005 1.024 0.218 0.000 0.001 1.007 0.219 0.000 0.005 1.037

  • 0.281

0.000 0.002 1.005

  • 0.281

0.000 0.006 1.031 0.129 0.000 0.001 0.978 0.129 0.000 0.004 1.046 0.049 0.000 0.003 1.021 0.049 0.000 0.011 1.045

  • 0.146

0.000 0.002 0.988

  • 0.146

0.000 0.006 1.052

  • 0.091

0.000 0.001 1.004

  • 0.091

0.000 0.004 1.065 0.014 0.000 0.001 0.966 0.014 0.000 0.005 1.072 0.436 0.000 0.002 1.021 0.436 0.000 0.007 1.010 0.393 0.000 0.002 1.022 0.393 0.000 0.007 1.031

  • 0.187

0.000 0.001 0.993

  • 0.187

0.000 0.003 1.045 0.399 0.000 0.001 1.003 0.399 0.000 0.003 1.015 0.089 0.000 0.001 1.028 0.089 0.000 0.003 1.039 0.020 0.000 0.001 0.975 0.020 0.000 0.004 1.012 30 0.059 0.000 0.001 1.031 0.059 0.000 0.005 1.059

  • 0.064

0.000 0.001 1.048

  • 0.064

0.000 0.004 1.015

  • 0.056

0.000 0.001 1.029

  • 0.056

0.000 0.004 1.029 0.207 0.000 0.001 1.027 0.207 0.000 0.004 1.024 0.184 0.000 0.001 1.008 0.184 0.000 0.005 1.037 0.185 0.000 0.001 0.987 0.185 0.000 0.005 1.014 0.113 0.000 0.001 1.023 0.113 0.000 0.004 1.008 . . . . . .

slide-35
SLIDE 35

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Data

◮ We collect intraday returns of Dow Jones constituents over

2003 - 2012 periods from TAQ database of NYSE.

◮ There are in total 39 different symbols as the constituents

may have changed over time.

◮ These stocks have superior liquidity, avoiding the issue of

Microstructure noise and asynchronous trading.

◮ Data are sampled at 1-min frequency, and grouped by weeks.

slide-36
SLIDE 36

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Time Series Plots of the Cumulative Returns

2003 2005 2007 2009 2011 2013 −8 −7 −6 −5 −4 −3 −2 −1 1 2

slide-37
SLIDE 37

Motivation Model Setup Inference Simulations Empirical Work Conclusion

The Scree Graph

◮ We conduct PCA weekly, and plot the time series average of

the integrated eigenvalues against their order, i.e. the scree graph.

5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

slide-38
SLIDE 38

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Percentage Variation Explained by Eigenvalues

◮ Here we provide the time series of the first three eigenvalues

as well as the average of the remaining 27 eigenvalues.

2003 2005 2007 2009 2011 2013 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1st 2nd 3rd ≥ 4 Average

slide-39
SLIDE 39

Motivation Model Setup Inference Simulations Empirical Work Conclusion

The First Three PCs

2003 2005 2007 2009 2011 2013 −8 −6 −4 −2 2 4 6 PC1 PC2 PC3

slide-40
SLIDE 40

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Time-Varying Factor Loadings on the First PC

2003 2008 2013 0.25 0.5

GE

2003 2008 2013 0.25 0.5

XOM

2003 2008 2013 0.25 0.5

KO

2003 2008 2013 0.25 0.5

IBM

2003 2008 2013 0.25 0.5

PG

2003 2008 2013 0.25 0.5

DD

2003 2008 2013 0.25 0.5

AA

2003 2008 2013 0.25 0.5

UTX

2003 2008 2013 0.25 0.5

MMM

2003 2008 2013 0.25 0.5

MRK

2003 2008 2013 0.25 0.5

AXP

2003 2008 2013 0.25 0.5

MCD

2003 2008 2013 0.25 0.5

BA

2003 2008 2013 0.25 0.5

CAT

2003 2008 2013 0.25 0.5

DIS

2003 2008 2013 0.25 0.5

HPQ

2003 2008 2013 0.25 0.5

JNJ

2003 2008 2013 0.25 0.5

WMT

2003 2008 2013 0.25 0.5

HD

2003 2008 2013 0.25 0.5

INTC

2003 2008 2013 0.25 0.5

T

2003 2008 2013 0.25 0.5

MSFT

2003 2008 2013 0.25 0.5

JPM

slide-41
SLIDE 41

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Conclusion

◮ We propose Principal Component Analysis for high frequency

data.

  • The asymptotic theory is developed under general assumptions.
  • The in-fill asymptotic approximations work well in finite sample

even when the dimension is as large as 30, and the time span is as short as a week.

  • This is in sharp contrast with classic PCA and recent large d

large n asymptotic results, in which stronger assumptions are required to obtain much weaker results.

slide-42
SLIDE 42

Motivation Model Setup Inference Simulations Empirical Work Conclusion

Conclusion

◮ Empirical results suggest

  • In the short run, stock returns exhibit a factor structure.

◮ Three factors explain up to 50-60% variations of the data. ◮ Idiosyncratic component explains up to 30%. ◮ During crisis, the first PC dominates up to 70% variations.