SLIDE 1 Modeling Financial Durations Using Estimating Functions
Yaohua Zhang1 Jian Zou2 Nalini Ravishanker1 Aerambamoorthy Thavaneswaran3
1Department of Statistics, University of Connecticut 2Department of Statistics, Worcester Polytechnic Institute 3Department of Statistics, University of Manitoba
QPRC, June 15, 2017
SLIDE 2
Outline
◮ Introduction ◮ Estimating Functions Approach for LogACD Models. ◮ Simulation Study ◮ Application on Real Stock Prices ◮ Summary
SLIDE 3
Background
◮ Investigators are interested in studying the behavior of the
exchange rate process
◮ High frequency price quote data inherently arrive over irreg-
ularly spaced time intervals, so that time duration between consecutive data points is not uniform
◮ Traditional discrete-time models which bin the data into equally
spaced-time intervals are inadequate (too small = zero, too large = smooth)
SLIDE 4 Why Do We Care?
◮ Information is important! (How long it will be until prices
change)
◮ Rothchild Family
◮ Knowing the time interval as it could influence the speed
with which he please an order
◮ In an active market, the price may last much less than a
minute/second.
◮ If automated trading system is used, opportunities may be
eliminated.
SLIDE 5
Literature Review
◮ Engle & Russell (1998) proposed a nonlinear model for ir-
regularly spaced inter-event durations, called the Autore- gressive Conditional Duration (ACD) model
◮ In fact, the authors treat the arrival times of the data as a
point process with an intensity defined conditional on past activity
◮ Several generalizations have been discussed in the litera-
ture (Thavaneswaran et.al 2014)
◮ Developing fast and accurate methods for fitting models to
long time series of durations under least restrictive assump- tions is an interesting ongoing research problem
SLIDE 6 A Review of Duration Models
Let xi = ti − ti−, where i = , , . . ., denote a time series of du- rations, and let Fx
i−1 denote the information associated with pre-
vious durations. The ACD(p, q) model (Engle & Russell, 1998) is defined as: xi = ψiεi/µε, where ψi = ω +
p
αjxi−j +
q
βjψi−j, The conditions ω > , αj ≥ for j = , . . . , p, βj ≥ 0 for j = , . . . , q and p
j= αj + q j= βj < ensure that the durations process is
non-negative and weakly stationary.
SLIDE 7 A Review of Duration Models Cont’d
The Log ACD(p, q) model (Bauwens 2000,Pacurar 2008), which relaxes the restrictions on the parameters that ensure nonneg- ativity on the durations and thus provides greater flexibility than the ACD(p, q) model. xi = exp(ψi)εi/µε, where ψi = ω +
p
αj log xi−j +
q
βjψi−j. where the condition max(p, q)
j=
(αj + βj) < ensures weak sta- tionarity.
SLIDE 8 The Problem
Suppose durations data {xi}n
i=1 that follow the Log ACD(p, q)
model are available. Let g = max(p, q). The the maximum likelihood estimates (MLEs) θ may be obtained by maximizing the conditional likelihood function (Tsay 2009): L(θ|xn) =
n
f(xi|xi−1, θ).
◮ In practice, the true fε(.) in usually unknown ◮ In some cases, the ML or QML approach may not be
feasible (Thavaneswaran, Ravishanker & Liang, 2014)
◮ Model orders (p, q) are unknown.
SLIDE 9
General Framework
◮ We propose a semi-parametric estimation approach which
based on combined martingale estimating functions
◮ It only requires the specification of the first four conditional
moments of the duration process
◮ Our method can be easily extended by adding a penalized
term.
SLIDE 10 General Framework Cont’d
Suppose xi is a realization of a duration process and let Fx
i−1
denote the information associated with {x1, . . . , xi−1}. Suppose the first four conditional moments of {xi} given Fx
i−1 are µi(θ),
σ
i (θ), γi(θ), and κi(θ). Define the linear and quadratic martin-
gale differences by mi(θ) = xi − µi(θ) and Mi(θ) = m
i (θ) − σ i (θ).
Their quadratic variations and covariation are mi = E[m
i (θ)|Fx i−] = σ i (θ)
Mi = E[m
i (θ)|Fx i−] −
i (θ)|Fx i−]
2 = κi(θ) − σ
i (θ)
m, Mi = E[m
i (θ)|Fx i−] = γi(θ).
SLIDE 11 General Framework Cont’d
Consider the class M of zero-mean, square integrable p-dimensional martingale estimating functions, M =
n
(ai−(θ)mi(θ) + bi−(θ)Mi(θ))
where ai−(θ) and bi−(θ) are p × q matrices that are functions
- f θ and x, . . . , xi−, ≤ i ≤ n.
SLIDE 12
Three Approaches
◮ Nonlinear Equation Solver Estimation (NESE): solve the sys-
tem of nonlinear equations g∗
C(θ) = 0 for θ ◮ Approximate Vector Recursive Estimation (AVRE): estimate
θ via recursive formulas
◮ Approximate Iterated Scalar Recursive Estimation (AISRE):
estimate θ through a sequence of scalar recursions for each component and iterating these to convergence
SLIDE 13 Starting Values for the Recursion
Suppose {xi} follows the Log ACD(p, q) model. The natural logarithm of xi is yi = log xi. Then, yi = ψi + log εi − log µε = ω +
p
αjyi−j +
q
βjψi−j + log εi − log µε = ω +
p
αjyi−j +
q
βj(yi−j − log εi−j + log µε) + log εi − log µε = ω⋆ +
p
αjyi−j +
q
βjyi−j −
q
βjνi−j + νi from which it follows that yi = log xi follows an ARMA(max(p, q), q) model with non-normal errors, i.e., (1 −
max(p,q)
(αj + βj)Bj)yi = ω⋆ + (1 −
q
βjBj)νi
SLIDE 14 Simulation Study
Table: Percentiles of parameter estimates for the Log ACD(p, q) mod- els for L = 250 simulated durations of length n = 7500.
fε(.) Param True NESE AVRE AISRE 5th 50th 95th 5th 50th 95th 5th 50th 95th Gamma ω 0.25 0.23 0.25 0.26 0.23 0.25 0.26 0.24 0.25 0.27 (0.6, 0.7) α 0.06 0.04 0.06 0.08 0.04 0.06 0.08 0.05 0.06 0.08 ω 0.04 0.02 0.04 0.07 0.02 0.04 0.18 0.03 0.04 0.06 Exp(1) α 0.05 0.03 0.05 0.07 0.02 0.05 0.24 0.04 0.05 0.07 β 0.75 0.42 0.74 0.89 0.48 0.73 0.83 0.62 0.73 0.83 Weibull ω 1.00 0.37 1.12 3.65 0.63 1.06 1.83 0.63 1.08 1.90 (0.4, 0.5) α 0.05 0.01 0.05 0.08 −0.03 0.05 0.26 0.04 0.05 0.07 β 0.60 −0.45 0.55 0.85 0.32 0.58 0.75 0.29 0.57 0.75 ω 0.50 0.37 0.51 0.68 0.42 0.51 0.62 0.42 0.51 0.62 Weibull α1 0.05 0.03 0.05 0.07 0.03 0.05 0.07 0.03 0.05 0.07 (0.9, 0.9) α2 0.10 0.07 0.10 0.13 0.07 0.10 0.13 0.08 0.10 0.12 β 0.60 0.47 0.59 0.69 0.52 0.59 0.65 0.52 0.59 0.65 ω 0.15 0.07 0.18 0.63 −0.15 0.20 0.55 0.09 0.19 1.61 Gamma α1 0.10 0.08 0.10 0.11 −0.02 0.10 0.22 0.08 0.10 0.12 (0.5, 0.8) α2 −0.05 −0.07 −0.04 0.01 −0.55 −0.04 0.19 −0.07 −0.04 −0.01 β1 0.05 −0.54 0.02 0.15 −0.34 −0.01 0.37 −0.26 0.01 0.14 β2 0.70 0.28 0.68 0.78 0.11 0.66 0.78 0.45 0.67 0.76
SLIDE 15
Penalized Estimating Equations
◮ Penalized methods are usually used in regression settings ◮ However, the literature on variable selection in estimating
equations is rare
◮ Wang et al. (2012) and the references therein discussed
penalized generalized estimating equations in longitudinal setup
SLIDE 16 Penalized Estimating Equations Cont’d
◮ Recap
gn(θ) : gn(θ) =
n
(ai−(θ)mi(θ) + bi−(θ)Mi(θ))
◮ Now
g∗
C(θ) − np′ λ(|θ|)
where p′
λ(|θ|) is the first derivative of Smoothly Clipped Ab-
solute Deviation (SCAD) penalty (Fan et al. 2001) and is defined as p′
λ(|θ|) = λ{I(|θ| ≤ λ) + (aλ − |θ|)+
(a − 1)λ I(|θ| > λ)}
◮ Remark: SCAD can achieve unbiasedness (LASSO), spar-
sity and continuity.
SLIDE 17
Illustrative Simulation Study
Table: Percentiles of parameter estimates for the Log ACD(p, 0) models for L = 500 durations of length n = 7500.
fε(.) Param True EF w Penalty 5th 50th 95th Gamma ω 0.25 0.22 0.25 0.27 (0.5, 0.6) α1 0.20 0.19 0.20 0.20 α2 0.10 0.09 0.10 0.10 Gamma ω 0.10 0.06 0.09 0.14 (0.5, 0.6) α1 0.10 0.09 0.10 0.10 α2 0.05 0.04 0.05 0.05 α3 0.05 0.04 0.05 0.05 α4 0.10 0.09 0.10 0.10
SLIDE 18
Illustrative Simulation Study Cont’d
0.0 0.1 0.2 0.3 0.4 0.70 0.50 0.30 0.10 0.05 λ θ ω α1 α2 α3 α4 α5 α6 α7 α8 α9 α10 α11 α12 α13 α14 α15 α16 α17 α18 α19 α20
Figure: Solution path to the simulated LogACD(2, 0) model. The vertical bar represents the optimal λ.
SLIDE 19
Application on Real Stock Prices
◮ Stock price data from a few trading days in June 2013 of
three assets (BAC, GE, IBM and MMM)
◮ The data set is obtained from the Trade and Quotes (TAQ)
database at Wharton Research Data Services (WRDS) from the Wharton School at the University of Pennsylvania
◮ An event occurs when the change in log return between two
successive transactions exceeds a certain given threshold, ̟. We then define a duration as the elapsed time between two successive occurrences of this event
SLIDE 20
Diurnal Effect
Follow Tsay (2005), we adjust raw durations to get rid of time of day effect. Xi = xiφ(ti) .
Time of Day effect
Time in sec Mean of Durations 1000 2000 3000 4000 5000 6000 7000 2.5 3.0 3.5 4.0 4.5
SLIDE 21
Model fitting for BAC
Table: Parameter estimates for adjusted BAC durations in June, 2013.
Date ˆ ω ˆ α1 ˆ α2 ˆ α3 ˆ β1 ˆ β2 ˆ β3 d 20130603 0.227 0.047 0.001 20130604 0.319 0.103 0.005 20130605 0.258 0.074 0.001 20130606 0.289 0.093 0.006 20130607 0.261 0.034 0.000 20130610 0.299 0.011 0.005 20130611 −0.009 −1.146 0.054 20130612 0.330 0.051 0.001 20130613 0.293 0.025 0.001 20130614 0.137 0.046 0.048 −0.057 0.753 0.111 20130617 −0.015 0.555 0.305 20130618 0.045 0.014 0.041 −0.009 0.167 0.394 0.341 0.162 20130619 −0.501 −0.625 0.767 20130620 0.240 0.066 0.002 20130621 0.213 0.059 0.002
SLIDE 22 Model Fitting Using Penalized EF
Table: Parameter estimates for adjusted durations in June, 2013 using penalized estimating functions.
Date BAC GE IBM MMM λ ˆ ω d λ ˆ ω d λ ˆ ω d λ ˆ ω d 20130603 0.7 0.220 0.010 0.5 0.275 0.009 10 0.425 0.023 14 0.428 0.025 20130604 0.7 0.306 0.018 0.7 0.358 0.012 10 0.421 0.029 20 0.447 0.029 20130605 0.5 0.251 0.012 1.0 0.364 0.026 10 0.405 0.041 41 0.463 0.022 20130606 0.5 0.255 0.007 1.0 0.341 0.021 10 0.426 0.031 50 0.437 0.055 20130607 1.0 0.289 0.020 1.0 0.335 0.019 10 0.442 0.030 34 0.467 0.024 20130610 0.7 0.315 0.019 1.0 0.344 0.016 10 0.437 0.036 24 0.430 0.027 20130611 0.5 0.324 0.009 1.0 0.369 0.013 10 0.437 0.032 50 0.475 0.037 20130612 0.5 0.285 0.010 1.0 0.397 0.010 10 0.417 0.037 2 0.458 0.025 20130613 0.7 0.316 0.019 1.0 0.360 0.027 10 0.427 0.021 8 0.419 0.048 20130614 0.7 0.341 0.013 1.0 0.386 0.035 10 0.451 0.032 88 0.505 0.034 20130617 0.7 0.316 0.019 0.7 0.332 0.008 10 0.443 0.028 11 0.401 0.020 20130618 0.7 0.341 0.013 1.0 0.318 0.035 10 0.464 0.040 24 0.445 0.016 20130619 1.0 0.339 0.040 0.7 0.285 0.013 40 0.454 0.072 90 0.499 0.085 20130620 1.0 0.231 0.013 0.5 0.271 0.011 1.0 0.359 0.010 11 0.401 0.020 20130621 1.0 0.203 0.015 0.5 0.269 0.007 1.0 0.323 0.025 24 0.445 0.016
SLIDE 23
Summary
◮ Three different estimation approaches for modeling dura-
tions using Log ACD(p, q)
◮ Good starting values are important ◮ Our method is naturally appealing for a wide range of finan-
cial modeling problems
◮ Penalized approach may encounter overfitting problems.
SLIDE 24
Questions?
Thank You!