[PPT] - Causality with Non-Gaussian Time Series Arthur Charpentier (Universit PowerPoint Presentation

SLIDE 1

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Causality with Non-Gaussian Time Series Arthur Charpentier (Université de Rennes 1 & UQàM) Université Paris 7 Diderot, May 2016. http://freakonometrics.hypotheses.org

@freakonometrics

1

SLIDE 2

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Motivation (Earthquakes)

Time before and after a major eathquake (magnitude >6.5) in days

Number of earthquakes (magnitude >2) per 15 sec., average before=100

−15

−10 −5 5 10 15 200 400 600 800 1000 Same techtonic plate as major one Different techtonic plate as major one

see Boudreault & C. (2011) on contagion among tectonic plates

@freakonometrics

2

SLIDE 3

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Motivation (Onsite vs. Online)

nsite protestors, camped-out, arrests and injuries
vs. online #indignados, #occupy and #vinegar on Twitter & Facebook

see Bastos, Mercea & C. (2015)

@freakonometrics

3

SLIDE 4

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Multivariate Stationary Time Series Definition A time series (Xt = (X1,t, · · · , Xd,t))t∈Z with values in Rd is called a VAR(1) process if              X1,t = φ1,1X1,t−1 + φ1,2X2,t−1 + · · · + φ1,dXd,t−1 + ε1,t X2,t = φ2,1X1,t−1 + φ2,2X2,t−1 + · · · + φ2,dXd,t−1 + ε2,t · · · Xd,t = φd,1X1,t−1 + φd,2X2,t−1 + · · · + φd,dXd,t−1 + εd,t (1)

r equivalently

        X1,t X2,t . . . Xd,t        

Xt

=         φ1,1 φ1,2 · · · φ1,d φ2,1 φ2,2 · · · φ2,d . . . . . . . . . φd,1 φd,2 · · · φd,d        

Φ

        X1,t−1 X2,t−1 . . . Xd,t−1        

Xt−1

+         ε1,t ε2,t . . . εd,t        

εt

@freakonometrics

4

SLIDE 5

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Multivariate Stationary Time Series For some real-valued d × d matrix Φ, and some i.i.d. random vectors εt with values in Rd. Assume that εt is a Gaussian white noise N(0, Σ), with density f(ε) = 1

(2π)d| det Σ|

exp

−εTΣ−1ε

2

,

∀ε ∈ Rd. Assume also that εt is independent of Xt−1 = σ({Xt−1, Xt−2, · · · , }). : (εt)t∈Z is the innovation process. Definition A time series (Xt)t∈N is said to be (weakly) stationary if

E(Xt) is independent of t (=: µ)
cov(Xt, Xt−h) is independent of t (=: γ(h)), called autocovariance matrix

@freakonometrics

5

SLIDE 6

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Multivariate Stationary Time Series Define the autocorrelation matrix, ρ(h) := ∆−1γ(h)∆−1, where ∆ :=

diag (γ(0)).

(Xt)t∈N a stationary AR(1) time series, Xt = ΦXt−1 + εt Proposition (Xt)t∈N is a stationary AR(1) time series if and only if the d eigenvalues of Φ should have a norm lower than 1. Proposition If (Xt)t∈N is a stationary VAR(1) time series, ρ(h) = Φh, h ∈ N.

@freakonometrics

6

SLIDE 7

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Causality, in dimension 2 Two stationary time series (Xt, Yt)t∈Z. Heuristics on independence, f(xt, yt|Xt−1, Y t−1) = f(xt|Xt−1) · f(yt|Y t−1) Write (with X for Xt−1) f(xt, yt|X, Y ) f(xt|X) · f(yt|Y )

(X,Y )

= f(xt|X, Y ) f(xt|X)

X→Y

· f(yt|X, Y ) f(yt|Y )

X←Y

· f(xt, yt|X, Y ) f(xt|X, Y ) · f(yt|X, Y )

X⇔Y

Gouriéroux, Monfort & Renault (1987) define the following Kullback-measures C(X, Y ) = E

log

f(Xt, Yt|X, Y ) f(Xt|X) · f(Yt|Y )

@freakonometrics

7

SLIDE 8

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Causality, in dimension 2 C(X → Y ) = E

log f(Xt|X, Y )

f(Xt|X)

C(Y → X) = E
log f(Yt|X, Y )

f(Yt|Y )

C(X ⇔ Y ) = E
log

f(Xt, Yt|X, Y ) f(Xt|X, Y ) · f(Yt|X, Y )

so that C(X, Y ) = C(X → Y ) + C(X ← Y ) + C(X ⇔ Y ).

From Granger (1969) (X) causes (Y ) at time t if L(yt|Xt−1, Y t−1) = L(yt|Y t−1) (X) causes (Y ) instantaneously at time t if L(yt|Xt, Y t−1) = L(yt|Xt−1, Y t−1)

@freakonometrics

8

SLIDE 9

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Causality, in dimension 2, for VAR(1) time series  Xt Yt  

Xt

=  φ1,1 φ1,2 φ2,1 φ2,2  

Φ

 Xt−1 Yt−1  

Xt−1

+  ut vt  

εt

, with Var  ut vt   =   σ2

u

σuv σuv σ2

v

  From Granger (1969) (see also Toda & Phillips (1994)) (X) causes (Y ) at time t, X → Y , if φ2,1 = 0 (Y ) causes (X) at time t, Y → X, if φ1,2 = 0 (X) causes (Y ) instantaneously at time t, X ⇔ X, if σu,v = 0

@freakonometrics

9

SLIDE 10

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Testing Causality, in dimension d For lagged causality, we test H0 : Φ ∈ P against H1 : Φ / ∈ P, where P is a set of constrained shaped matrix, e.g. P is the set of d × d diagonal matrices for lagged independence, or a set of block triangular matrices for lagged causality. Proposition Let Φ denote the conditional maximum likelihood estimate of Φ in the non-constrained MINAR(1) model, and Φ

c denote the conditional maximum

likelihood estimate of Φ in the constrained model, then under suitable conditions, 2[log L(X, Φ|X0) − log L(X, Φ

c|X0)] L

→ χ2(d2 − dim(P)), as T → ∞, under H0. Example Testing (X1,t)←(X2,t) is testing whether φ1,2 = 0, or not.

@freakonometrics

10

SLIDE 11

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Modeling Counts Processes Steutel & van Harn (1979) defined a thinning operator as follows Definition Define operator ◦ as p ◦N =

N

i=1

Yi = Y1 + · · · + YN if N = 0, and 0 otherwise, where N is a random variable with values in N, p ∈ [0, 1], and Y1, Y2, · · · are i.i.d. Bernoulli variables, independent of N, with P(Yi = 1) = p = 1 − P(Yi = 0). Thus p ◦ N is a compound sum of i.i.d. Bernoulli variables. Hence, given N, p ◦ N has a binomial distribution B(N, p). Note that p ◦ (q ◦ N)

L

= [pq] ◦ N for all p, q ∈ [0, 1]. Further E (p ◦ N) = pE(N) and Var (p ◦ N) = p2Var(N) + p(1 − p)E(N).

@freakonometrics

11

SLIDE 12

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

(Poisson) Integer AutoRegressive processes INAR(1) Based on that thinning operator, Al-Osh & Alzaid (1987) and McKenzie (1985) defined the integer autoregressive process of order 1: Definition A time series (Xt)t∈N with values in R is called an INAR(1) process if Xt = p ◦ Xt−1 + εt, (2) where (εt) is a sequence of i.i.d. integer valued random variables, i.e. Xt =

Xt−1

i=1

Yi + εt, where Y ′

i s are i.i.d. B(p).

Such a process can be related to Galton-Watson processes.

@freakonometrics

12

SLIDE 13

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

INAR(1) & Galton-Watson Xt+1 =

Xt

i=1

Yi + εt+1, where Y ′

i s are i.i.d. B(p)

@freakonometrics

13

SLIDE 14

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Proposition E (Xt) = E(εt) 1 − p, Var (Xt) = γ(0) = pE(εt) + Var(εt) 1 − p2 and γ(h) = cov(Xt, Xt−h) = ph. It is common to assume that εt are independent variables, with a Poisson distribution P(λ), with probability function P(εt = k) = e−λ λk k! , k ∈ N. Proposition If (εt) are Poisson random variables, then (Xt) will also be a sequence of Poisson random variables. Note that we assume also that εt is independent of Xt−1, i.e. past observations X0, X1, · · · , Xt−1. Thus, (εt)t∈N is called the innovation process. Proposition (Xt)t∈N is a stationary INAR(1) time series if and only if p ∈ [0, 1). Proposition If (Xt)t∈N is a stationary INAR(1) time series, (Xt)t∈N is an homogeneous Markov chain.

@freakonometrics

14

SLIDE 15

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Markov Property of INAR(1) Time Series π(xt, xt−1) = P(Xt = xt|Xt−1 = xt−1) =

xt

k=0

P xt−1

i=1

Yi = xt − k

Binomial

· P(ε = k)

Poisson

.

@freakonometrics

15

SLIDE 16

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Inference of INAR(1) Processes Consider a Poisson INAR(1) process, then the likelihood is L(p, λ; X0, X) = n

t=1

ft(Xt)

·

λX0 (1 − p)X0X0! exp

−

λ 1 − p

where

ft(y) = exp(−λ)

min{Xt,Xt−1}

i=0

λy−i (y − i)! Yt−1 i

pi(1 − p)Yt−1−y, for t = 1, · · · , n.

Maximum likelihood estimators are ( p, λ) ∈ argmax {log L(p, λ; (X0, X))}

@freakonometrics

16

SLIDE 17

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Multivariate Integer Autoregressive processes MINAR(1) Let Xt := (X1,t, · · · , Xd,t), denote a multivariate vector of counts. Definition Let P := [pi,j] be a d × d matrix with entries in [0, 1]. If X = (X1, · · · , Xd) is a random vector with values in Nd, then P ◦ X is a d-dimensional random vector, with i-th component [P ◦ X]i =

d

j=1

pi,j ◦ Xj, for all i = 1, · · · , d, where all counting variates Y in pi,j ◦ Xj’s are assumed to be independent. Note that P ◦ (Q ◦ X)

L

= [P Q] ◦ X. Further, E (P ◦ X) = P E(X), and E

(P ◦ X)(P ◦ X)T

= P E(XXT)P T + ∆, with ∆ := diag(V E(X)) where V is the d × d matrix with entries pi,j(1 − pi,j).

@freakonometrics

17

SLIDE 18

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Multivariate Integer Autoregressive processes MINAR(1) Definition A time series (Xt) with values in Nd is called a d-variate MINAR(1) process if Xt = P ◦ Xt−1 + εt (3) for all t, for some d × d matrix P with entries in [0, 1], and some i.i.d. random vectors εt with values in Nd. (Xt) is a Markov chain with states in Nd with transition probabilities π(xt, xt−1) = P(Xt = xt|Xt−1 = xt−1) (4) satisfying π(xt, xt−1) =

xt

k=0

P(P ◦ xt−1 = xt − k) · P(ε = k).

@freakonometrics

18

SLIDE 19

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Inference for MINAR(1) Proposition Let (Xt) be a d-variate MINAR(1) process satisfying stationary conditions, as well as technical assumptions (called C1-C6 in Franke & Subba Rao (1993)), then the conditional maximum likelihood estimate θ of θ = (P , Λ) is asymptotically normal, √n( θ − θ)

L

→ N(0, Σ−1(θ)), as n → ∞. Further, 2[log L(N, θ|N 0) − log L(N, θ|N 0)]

L

→ χ2(d2 + dim(λ)), as n → ∞.

@freakonometrics

19

SLIDE 20

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Granger causality with BINAR(1) (X1,t) and (X2,t) are instantaneously related if ε is a noncorrelated noise,g g g g g g g g g g g g g g  X1,t X2,t  

Xt

=  p1,1 p1,2 p2,1 p2,2  

P


X1,t−1 X2,t−1  

Xt−1

+  ε1,t ε2,t  

εt

, with Var  ε1,t ε2,t   =  λ1 ϕ ϕ λ2  

@freakonometrics

20

SLIDE 21

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Granger causality with BINAR(1)

1. (X1) and (X2) are instantaneously related if ε is a noncorrelated noise, g g g

g g g g g g g g g g g  X1,t X2,t  

Xt

=  p1,1 p1,2 p2,1 p2,2  

P


X1,t−1 X2,t−1  

Xt−1

+  ε1,t ε2,t  

εt

, with Var  ε1,t ε2,t   =  λ1 ⋆ ⋆ λ2  

@freakonometrics

21

SLIDE 22

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Granger causality with BINAR(1)

2. (X1) and (X2) are independent, (X1)⊥(X2) if P is diagonal, i.e.

p1,2 = p2,1 = 0, and ε1 and ε2 are independent,  X1,t X2,t  

Xt

=  p1,1 p2,2  

P


X1,t−1 X2,t−1  

Xt−1

+  ε1,t ε2,t  

εt

, with Var  ε1,t ε2,t   =  λ1 λ2  

@freakonometrics

22

SLIDE 23

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Granger causality with BINAR(1)

3. (N1) causes (N2) but (N2) does not cause (X1), (X1)→(X2), if P is a lower

triangle matrix, i.e. p2,1 = 0 while p1,2 = 0,  X1,t X2,t  

Xt

=  p1,1 ⋆ p2,2  

P


X1,t−1 X2,t−1  

Xt−1

+  ε1,t ε2,t  

εt

, with Var  ε1,t ε2,t   =  λ1 ϕ ϕ λ2  

@freakonometrics

23

SLIDE 24

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Granger causality with BINAR(1)

4. (N2) causes (N1) but (N1,t) does not cause (N2), (N1)←(N2,t), if P is a

upper triangle matrix, i.e. p1,2 = 0 while p2,1 = 0,  X1,t X2,t  

Xt

=  p1,1 ⋆ p2,2  

P


X1,t−1 X2,t−1  

Xt−1

+  ε1,t ε2,t  

εt

, with Var  ε1,t ε2,t   =  λ1 ϕ ϕ λ2  

@freakonometrics

24

SLIDE 25

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Granger causality with BINAR(1)

5. (N1) causes (N2) and conversely, i.e. a feedback effect (N1)↔(N2), if P is a

full matrix, i.e. p1,2, p2,1 = 0  X1,t X2,t  

Xt

=  p1,1 ⋆ ⋆ p2,2  

P


X1,t−1 X2,t−1  

Xt−1

+  ε1,t ε2,t  

εt

, with Var  ε1,t ε2,t   =  λ1 ϕ ϕ λ2  

@freakonometrics

25

SLIDE 26

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Bivariate Poisson BINAR(1) A classical distribution for εt is the bivariate Poisson distribution, with one common shock, i.e.    ε1,t = M1,t + M0,t ε2,t = M2,t + M0,t where M1,t, M2,t and M0,t are independent Poisson variates, with parameters λ1 − ϕ, λ2 − ϕ and ϕ, respectively. In that case, εt := (ε1,t, ε2,t) has joint probability function e−[λ1+λ2−ϕ] (λ1 − ϕ)k1 k1! (λ2 − ϕ)k2 k2!

min{k1,k2}

i=0

k1 i k2 i

i!
ϕ

[λ1 − ϕ][λ2 − ϕ]

with λ1, λ2 > 0, ϕ ∈ [0, min{λ1, λ2}].

λ =  λ1 λ2   and Λ =  λ1 ϕ ϕ λ2  

@freakonometrics

26

SLIDE 27

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Bivariate Poisson BINAR(1) and Granger causality For instantaneous causality, we test H0 : ϕ = 0 against H1 : ϕ = 0 Proposition Let λ denote the conditional maximum likelihood estimate of λ = (λ1, λ2, ϕ) in the non-constrained MINAR(1) model, and λ⊥ denote the conditional maximum likelihood estimate of λ⊥ = (λ1, λ2, 0) in the constrained model (when innovation has independent margins), then under suitable conditions, 2[log L(X, λ|X0) − log L(X, λ

⊥|X0)] L

→ χ2(1), as n → ∞, under H0.

@freakonometrics

27

SLIDE 28

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Bivariate Poisson BINAR(1) and Granger causality For lagged causality, we test H0 : P ∈ P against H1 : P / ∈ P, where P is a set of constrained shaped matrix, e.g. P is the set of d × d diagonal matrices for lagged independence, or a set of block triangular matrices for lagged causality. Proposition Let P denote the conditional maximum likelihood estimate of P in the non-constrained MINAR(1) model, and P

c denote the conditional maximum

likelihood estimate of P in the constrained model, then under suitable conditions, 2[log L(X, P |X0) − log L(X, P

c|X0)] L

→ χ2(d2 − dim(P)), as n → ∞, under H0. Example Testing (X1,t)←(X2,t) is testing whether p1,2 = 0, or not.

@freakonometrics

28

SLIDE 29

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Autocorrelation of MINAR(1) processes Proposition Consider a MINAR(1) process with representation Xt = P ◦ Xt−1 + εt, where (εt) is the innovation process, with λ := E(εt) and Λ := Var(εt). Let µ := E(Xt) and γ(h) := cov(Xt, Xt−h). Then µ = [I − P ]−1λ and for all h ∈ Z, γ(h) = P hγ(0) with γ(0) solution of γ(0) = P γ(0)P T + (∆ + Λ). See Boudreault & C. (2011) for additional properties

@freakonometrics

29

SLIDE 30

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Granger causality X1 → X2 or X1 ← X2

1. North American Plate, 2. Eurasian Plate,3. Okhotsk Plate, 4. Pacific Plate (East), 5. Pacific Plate (West), 6.

Amur Plate, 7. Indo-Australian Plate, 8. African Plate, 9. Indo-Chinese Plate, 10. Arabian Plate, 11. Philippine Plate, 12. Coca Plate, 13. Caribbean Plate, 14. Somali Plate, 15. South American Plate, 16. Nasca Plate, 17. Antarctic Plate

17

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Granger Causality test, 3 hours

17

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Granger Causality test, 6 hours

@freakonometrics

30

SLIDE 31

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Granger causality X1 → X2 or X1 ← X2

1. North American Plate, 2. Eurasian Plate,3. Okhotsk Plate, 4. Pacific Plate (East), 5. Pacific Plate (West), 6.

Amur Plate, 7. Indo-Australian Plate, 8. African Plate, 9. Indo-Chinese Plate, 10. Arabian Plate, 11. Philippine Plate, 12. Coca Plate, 13. Caribbean Plate, 14. Somali Plate, 15. South American Plate, 16. Nasca Plate, 17. Antarctic Plate

17

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Granger Causality test, 12 hours

17

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Granger Causality test, 24 hours

@freakonometrics

31

SLIDE 32

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Granger causality X1 → X2 or X1 ← X2

1. North American Plate, 2. Eurasian Plate,3. Okhotsk Plate, 4. Pacific Plate (East), 5. Pacific Plate (West), 6.

Amur Plate, 7. Indo-Australian Plate, 8. African Plate, 9. Indo-Chinese Plate, 10. Arabian Plate, 11. Philippine Plate, 12. Coca Plate, 13. Caribbean Plate, 14. Somali Plate, 15. South American Plate, 16. Nasca Plate, 17. Antarctic Plate

17

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Granger Causality test, 36 hours

17

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Granger Causality test, 48 hours

@freakonometrics

32

SLIDE 33

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Using Ranks for Time Series Haugh (1976) suggested to use ranks to test for independence. Set Rt denote the rank of Xt within {X1, · · · , XT }, and set Ut = Rt T = 1 T

T

s=1

1Xt≤Xs = FX(Xt) and similarly Vt = St T = 1 T

T

s=1

1Yt≤Ys = FY (Yt) See also Dufour(1981) for rank tests for serial dependence.

@freakonometrics

33

SLIDE 34

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Causality, in dimension 2 From Taamouti, Bouezmarni & El Ghouch (2014), consider some copula based causality approach: C(X → Y ) = E

log f(Xt|X, Y )

f(Xt|X)

can be written, for Markov 1 processes

C(X → Y ) = E

log f(Xt|Xt−1, Yt−1)

f(Xt|Xt−1)

= E
log f(Xt, Xt−1, Yt−1) · f(Xt−1)

f(Xt, Xt−1) · f(Xt−1, Yt−1)

i.e.

C(X → Y ) = E

log

c(FX(Xt), FX(Xt−1), FY (Yt−1)) c(FX(Xt), FX(Xt−1)) · c(FX(Xt−1), FY (Yt−1))

@freakonometrics

34

SLIDE 35

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Using a Probit-type Transformation Following Geenens, C. & Paindaveine (2014), consider some Probit-type transformation, for stationary time series

Xt = Φ−1(Ut) = Φ−1(

FX(Xt))

Yt = Φ−1(Vt) = Φ−1(

FY (Yt)) Application in Bastos, Mercea & C. (2015)

@freakonometrics

35

SLIDE 36

Arthur Charpentier, Causality & (non-Gaussian) Time Series, P7

Online vs. Onsite Causality For #occupy and #indignados

Facebook

F Twitter T Protestors P Injuries I Arrests A

Facebook

F Twitter T Protestors P Camped C Arrests A