Clustering Random Walk Time Series GSI 2015 - Geometric Science of - - PowerPoint PPT Presentation

clustering random walk time series
SMART_READER_LITE
LIVE PREVIEW

Clustering Random Walk Time Series GSI 2015 - Geometric Science of - - PowerPoint PPT Presentation

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Clustering Random Walk Time Series GSI 2015 - Geometric Science of Information Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 29 October


slide-1
SLIDE 1

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Clustering Random Walk Time Series

GSI 2015 - Geometric Science of Information Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 29 October 2015

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-2
SLIDE 2

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

1

Introduction

2

Geometry of Random Walk Time Series

3

The Hierarchical Block Model

4

Conclusion

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-3
SLIDE 3

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Context (data from www.datagrapple.com)

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-4
SLIDE 4

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

What is a clustering program?

Definition Clustering is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each

  • ther than those in different groups.

Example of a clustering program We aim at finding k groups by positioning k group centers {c1, . . . , ck} such that data points {x1, . . . , xn} minimize

minc1,...,ck n

i=1 mink j=1 d(xi, cj)2

But, what is the distance d between two random walk time series?

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-5
SLIDE 5

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

What are clusters of Random Walk Time Series?

French banks and building materials CDS over 2006-2015

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-6
SLIDE 6

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

What are clusters of Random Walk Time Series?

French banks and building materials CDS over 2006-2015

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-7
SLIDE 7

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

1

Introduction

2

Geometry of Random Walk Time Series

3

The Hierarchical Block Model

4

Conclusion

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-8
SLIDE 8

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Geometry of RW TS ≡ Geometry of Random Variables

i.i.d. observations: X1 : X 1

1 ,

X 2

1 ,

. . . , X T

1

X2 : X 1

2 ,

X 2

2 ,

. . . , X T

2

. . . , . . . , . . . , . . . , . . . XN : X 1

N,

X 2

N,

. . . , X T

N

Which distances d(Xi, Xj) between dependent random variables?

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-9
SLIDE 9

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Pitfalls of a basic distance

Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N(µX, σ2

X),

Y ∼ N(µY , σ2

Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1].

E[(X − Y )2] = (µX − µY )2 + (σX − σY )2 + 2σXσY (1 − ρ(X, Y )) Now, consider the following values for correlation: ρ(X, Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2

X + σ2 Y .

Assume µX = µY and σX = σY . For σX = σY ≫ 1, we

  • btain E[(X − Y )2] ≫ 1 instead of the distance 0, expected

from comparing two equal Gaussians. ρ(X, Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2.

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-10
SLIDE 10

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Pitfalls of a basic distance

Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σ2

X ), Y ∼ N (µY , σ2 Y ) and whose correlation is

ρ(X, Y ) ∈ [−1, 1]. E[(X − Y )2] = (µX − µY )2 + (σX − σY )2 + 2σX σY (1 − ρ(X, Y )) Now, consider the following values for correlation: ρ(X, Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2

X + σ2 Y . Assume µX = µY and σX = σY . For

σX = σY ≫ 1, we obtain E[(X − Y )2] ≫ 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ(X, Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2.

30 20 10 10 20 30 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

Probability density functions of Gaus- sians N(−5, 1) and N(5, 1), Gaus- sians N(−5, 3) and N(5, 3), and Gaussians N(−5, 10) and N(5, 10). Green, red and blue Gaussians are equidistant using L2 geometry on the parameter space (µ, σ).

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-11
SLIDE 11

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Sklar’s Theorem

Theorem (Sklar’s Theorem (1959)) For any random vector X = (X1, . . . , XN) having continuous marginal cdfs Pi, 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P(X1, . . . , XN) = C(P1(X1), . . . , PN(XN)), where C, the multivariate distribution of uniform marginals, is known as the copula of X.

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-12
SLIDE 12

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Sklar’s Theorem

Theorem (Sklar’s Theorem (1959)) For any random vector X = (X1, . . . , XN) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P(X1, . . . , XN) = C(P1(X1), . . . , PN(XN)), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-13
SLIDE 13

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

The Copula Transform

Definition (The Copula Transform) Let X = (X1, . . . , XN) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi, 1 ≤ i ≤ N. The random vector U = (U1, . . . , UN) := P(X) = (P1(X1), . . . , PN(XN)) is known as the copula transform. Ui, 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probability integral transform): for Pi the cdf of Xi, we have x = Pi(Pi −1(x)) = Pr(Xi ≤ Pi −1(x)) = Pr(Pi(Xi) ≤ x), thus Pi(Xi) ∼ U[0, 1].

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-14
SLIDE 14

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

The Copula Transform

Definition (The Copula Transform) Let X = (X1, . . . , XN) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N. The random vector U = (U1, . . . , UN) := P(X) = (P1(X1), . . . , PN(XN)) is known as the copula transform.

0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

X ∼U[0,1]

10 8 6 4 2 2

Y ∼ln(X) ρ ≈0.84

0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

PX (X)

0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

PY (Y) ρ =1

The Copula Transform invariance to strictly increasing transformation

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-15
SLIDE 15

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Deheuvels’ Empirical Copula Transform

Let (X t

1 , . . . , X t N), 1 ≤ t ≤ T, be T observations from a random vector (X1, . . . , XN) with continuous margins.

Since one cannot directly obtain the corresponding copula observations (Ut

1, . . . , Ut N) = (P1(X t 1 ), . . . , PN(X t N)),

where t = 1, . . . , T, without knowing a priori (P1, . . . , PN), one can instead

Definition (The Empirical Copula Transform) estimate the N empirical margins PT

i (x) = 1 T

T

t=1 1(X t i ≤ x),

1 ≤ i ≤ N, to obtain the T empirical observations ( ˜ Ut

1, . . . , ˜

Ut

N) = (PT 1 (X t 1), . . . , PT N (X t N)).

Equivalently, since ˜ Ut

i = Rt i /T, Rt i being the rank of observation

X t

i , the empirical copula transform can be considered as the

normalized rank transform. In practice

x_transform = rankdata(x)/len(x)

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-16
SLIDE 16

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Generic Non-Parametric Distance

d2

θ (Xi, Xj)

= θ3E

  • |Pi(Xi) − Pj(Xj)|2

+ (1 − θ)1 2

  • R
  • dPi

dλ −

  • dPj

dλ 2 dλ (i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric, (iii) dθ is invariant under diffeomorphism

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-17
SLIDE 17

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Generic Non-Parametric Distance

d2

0 : 1 2

  • R
  • dPi

dλ −

  • dPj

2 dλ = Hellinger2 d2

1 : 3E

  • |Pi(Xi) − Pj(Xj)|2

= 1 − ρS 2 = 2−6 1 1 C(u, v)dudv Remark: If f (x, θ) = cΦ(u1, . . . , uN; Σ) N

i=1 fi(xi; νi) then

ds2 = ds2

GaussCopula + N

  • i=1

ds2

margins

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-18
SLIDE 18

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

1

Introduction

2

Geometry of Random Walk Time Series

3

The Hierarchical Block Model

4

Conclusion

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-19
SLIDE 19

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

The Hierarchical Block Model

A model of nested partitions

The nested partitions defined by the model can be seen on the distance matrix for a proper distance and the right permutation of the data points In practice, one observe and work with the above distance matrix which is identitical to the left one up to a permutation of the data

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-20
SLIDE 20

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Results: Data from Hierarchical Block Model

Adjusted Rand Index Algo. Distance Distrib Correl Correl+Distrib HC-AL (1 − ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01

E[(X − Y )2]

0.00 ±0.00 0.09 ±0.12 0.55 ±0.05

GPR θ = 0

0.34 ±0.01 0.01 ±0.01 0.06 ±0.02

GPR θ = 1

0.00 ±0.01 0.99 ±0.01 0.56 ±0.01

GPR θ = .5

0.34 ±0.01 0.59 ±0.12 0.57 ±0.01

GNPR θ = 0

1 0.00 ±0.00 0.17 ±0.00

GNPR θ = 1

0.00 ±0.00 1 0.57 ±0.00

GNPR θ = .5

0.99 ±0.01 0.25 ±0.20 0.95 ±0.08 AP (1 − ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02

E[(X − Y )2]

0.14 ±0.03 0.94 ±0.02 0.59 ±0.00

GPR θ = 0

0.25 ±0.08 0.01 ±0.01 0.05 ±0.02

GPR θ = 1

0.00 ±0.01 0.99 ±0.01 0.48 ±0.02

GPR θ = .5

0.06 ±0.00 0.80 ±0.10 0.52 ±0.02

GNPR θ = 0

1 0.00 ±0.00 0.18 ±0.01

GNPR θ = 1

0.00 ±0.01 1 0.59 ±0.00

GNPR θ = .5

0.39 ±0.02 0.39 ±0.11 1 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-21
SLIDE 21

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Results: Application to Credit Default Swap Time Series

Distance matrices computed on CDS time series exhibit a hierarchical block structure Marti, Very, Donnat, Nielsen

IEEE ICMLA 2015

(un)Stability of clusters with L2 distance Stability of clusters with the proposed distance

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-22
SLIDE 22

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Consistency

Definition (Consistency of a clustering algorithm) A clustering algorithm A is consistent with respect to the Hierarchical Block Model defining a set of nested partitions P if the probability that the algorithm A recovers all the partitions in P converges to 1 when T → ∞. Definition (Space-conserving algorithm) A space-conserving algorithm does not distort the space, i.e. the distance Dij between two clusters Ci and Cj is such that Dij ∈

  • min

x∈Ci,y∈Cj d(x, y),

max

x∈Ci,y∈Cj d(x, y)

  • .

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-23
SLIDE 23

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Consistency

Theorem (Consistency of space-conserving algorithms (Andler, Marti, Nielsen, Donnat, 2015)) Space-conserving algorithms (e.g., Single, Average, Complete Linkage) are consistent with respect to the Hierarchical Block Model.

T = 100 T = 1000 T = 10000

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-24
SLIDE 24

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

1

Introduction

2

Geometry of Random Walk Time Series

3

The Hierarchical Block Model

4

Conclusion

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

slide-25
SLIDE 25

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion

Discussion and questions?

Avenue for research: distances on (copula,margins) clustering using multivariate dependence information clustering using multi-wise dependence information

Optimal Copula Transport for Clustering Multivariate Time Series, Marti, Nielsen, Donnat, 2015

Gautier Marti, Frank Nielsen Clustering Random Walk Time Series