Clustering Random Walk Time Series GSI 2015 - Geometric Science of - PowerPoint PPT Presentation

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Clustering Random Walk Time Series GSI 2015 - Geometric Science of Information Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 29 October 2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Introduction 1 2 Geometry of Random Walk Time Series The Hierarchical Block Model 3 Conclusion 4 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Context (data from www.datagrapple.com ) Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What is a clustering program? Definition Clustering is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than those in different groups. Example of a clustering program We aim at finding k groups by positioning k group centers { c 1 , . . . , c k } such that data points { x 1 , . . . , x n } minimize � n i =1 min k j =1 d ( x i , c j ) 2 min c 1 ,..., c k But, what is the distance d between two random walk time series? Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What are clusters of Random Walk Time Series? French banks and building materials CDS over 2006-2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Geometry of RW TS ≡ Geometry of Random Variables i.i.d. observations: X 1 X 2 X T X 1 : 1 , 1 , . . . , 1 X 1 X 2 X T X 2 : 2 , 2 , . . . , 2 . . . , . . . , . . . , . . . , . . . X 1 X 2 X T X N : N , N , . . . , N Which distances d ( X i , X j ) between dependent random variables? Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Pitfalls of a basic distance Let ( X , Y ) be a bivariate Gaussian vector, with X ∼ N ( µ X , σ 2 X ), Y ∼ N ( µ Y , σ 2 Y ) and whose correlation is ρ ( X , Y ) ∈ [ − 1 , 1]. E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + ( σ X − σ Y ) 2 + 2 σ X σ Y (1 − ρ ( X , Y )) Now, consider the following values for correlation: ρ ( X , Y ) = 0, so E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + σ 2 X + σ 2 Y . Assume µ X = µ Y and σ X = σ Y . For σ X = σ Y ≫ 1, we obtain E [( X − Y ) 2 ] ≫ 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ ( X , Y ) = 1, so E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + ( σ X − σ Y ) 2 . Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Pitfalls of a basic distance Let ( X , Y ) be a bivariate Gaussian vector, with X ∼ N ( µ X , σ 2 X ), Y ∼ N ( µ Y , σ 2 Y ) and whose correlation is ρ ( X , Y ) ∈ [ − 1 , 1]. E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + ( σ X − σ Y ) 2 + 2 σ X σ Y (1 − ρ ( X , Y )) Now, consider the following values for correlation: ρ ( X , Y ) = 0, so E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + σ 2 X + σ 2 Y . Assume µ X = µ Y and σ X = σ Y . For σ X = σ Y ≫ 1, we obtain E [( X − Y ) 2 ] ≫ 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ ( X , Y ) = 1, so E [( X − Y ) 2 ] = ( µ X − µ Y ) 2 + ( σ X − σ Y ) 2 . Probability density functions of Gaus- 0.40 sians N ( − 5 , 1) and N (5 , 1), Gaus- 0.35 0.30 sians N ( − 5 , 3) and N (5 , 3), and 0.25 Gaussians N ( − 5 , 10) and N (5 , 10). 0.20 0.15 Green, red and blue Gaussians are 0.10 equidistant using L 2 geometry on the 0.05 0.00 parameter space ( µ, σ ). 30 20 10 0 10 20 30 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Sklar’s Theorem Theorem (Sklar’s Theorem (1959)) For any random vector X = ( X 1 , . . . , X N ) having continuous marginal cdfs P i , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P ( X 1 , . . . , X N ) = C ( P 1 ( X 1 ) , . . . , P N ( X N )) , where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Copula Transform Definition (The Copula Transform) Let X = ( X 1 , . . . , X N ) be a random vector with continuous marginal cumulative distribution functions (cdfs) P i , 1 ≤ i ≤ N. The random vector U = ( U 1 , . . . , U N ) := P ( X ) = ( P 1 ( X 1 ) , . . . , P N ( X N )) is known as the copula transform. U i , 1 ≤ i ≤ N , are uniformly distributed on [0 , 1] (the probability integral transform): for P i the cdf of X i , we have x = P i ( P i − 1 ( x )) = Pr ( X i ≤ P i − 1 ( x )) = Pr ( P i ( X i ) ≤ x ), thus P i ( X i ) ∼ U [0 , 1]. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Copula Transform Definition (The Copula Transform) Let X = ( X 1 , . . . , X N ) be a random vector with continuous marginal cumulative distribution functions (cdfs) P i , 1 ≤ i ≤ N. The random vector U = ( U 1 , . . . , U N ) := P ( X ) = ( P 1 ( X 1 ) , . . . , P N ( X N )) is known as the copula transform. ρ ≈ 0 . 84 ρ =1 2 1.2 1.0 0 Y ∼ ln( X ) 0.8 2 P Y ( Y ) 0.6 4 0.4 6 0.2 8 0.0 10 0.2 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 X ∼U [0 , 1] P X ( X ) The Copula Transform invariance to strictly increasing transformation Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Deheuvels’ Empirical Copula Transform Let ( X t 1 , . . . , X t N ), 1 ≤ t ≤ T , be T observations from a random vector ( X 1 , . . . , X N ) with continuous margins. Since one cannot directly obtain the corresponding copula observations ( U t 1 , . . . , U t N ) = ( P 1 ( X t 1 ) , . . . , P N ( X t N )), where t = 1 , . . . , T , without knowing a priori ( P 1 , . . . , P N ), one can instead Definition (The Empirical Copula Transform) � T estimate the N empirical margins P T i ( x ) = 1 t =1 1 ( X t i ≤ x ), T 1 ≤ i ≤ N , to obtain the T empirical observations ( ˜ 1 , . . . , ˜ N ) = ( P T 1 ( X t 1 ) , . . . , P T N ( X t U t U t N )) . Equivalently, since ˜ U t i = R t i / T , R t i being the rank of observation X t i , the empirical copula transform can be considered as the normalized rank transform . In practice x_transform = rankdata(x)/len(x) Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Generic Non-Parametric Distance d 2 | P i ( X i ) − P j ( X j ) | 2 � � θ ( X i , X j ) = θ 3 E � 2 �� (1 − θ )1 dP i dP j � d λ − + d λ 2 d λ R (i) 0 ≤ d θ ≤ 1, (ii) 0 < θ < 1, d θ metric, (iii) d θ is invariant under diffeomorphism Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Generic Non-Parametric Distance � 2 �� dP j dP i d 2 0 : 1 d λ = Hellinger 2 � d λ − 2 R d λ � 1 � 1 = 1 − ρ S d 2 | P i ( X i ) − P j ( X j ) | 2 � � 1 : 3 E = 2 − 6 C ( u , v ) d u d v 2 0 0 Remark: If f ( x , θ ) = c Φ ( u 1 , . . . , u N ; Σ) � N i =1 f i ( x i ; ν i ) then N ds 2 = ds 2 � ds 2 GaussCopula + margins i =1 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Clustering Random Walk Time Series GSI 2015 - Geometric Science of - PowerPoint PPT Presentation

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Clustering Random Walk Time Series GSI 2015 - Geometric Science of Information Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 29 October

The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at

Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Aspects of

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Back to Random Walks on Graphs Random walk on a graph: Stationary distribution: Back to Random

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Short Walks in Higher Dimensions Ghislain McKay Febuary 3, 2015 What is a Random Walk? A path

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Onelight.com Training Series Connecting the Pyramids and the Crystal Cities the ISIS Walk 2 The

Advanced Algorithms (XII) Shanghai Jiao Tong University Chihao Zhang May 25, 2020 Random Walk

Critical density for Activated Random Walk Lorenzo Taggi Max Planck Institute for Mathematics in

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Grundy Distinguishes Treewidth from Pathwidth Michael Lampis LAMSADE Universit e Paris

Course summary 18.354 Dimensional analysis Keplers problem L = r md r dt , Random walks

News CPSC 314 Computer Graphics midterm review Wednesday Jan-Apr 2005 plus Kangaroo Hall

Introduction to CS 245 Alice Gao Lecture 0 Based on work by many people with special thanks to

A PPALACHIAN E NERGY NASA Earth Observation Detection of Burned and Blighted Areas for Creation

January 22, 2020 About ut The e GPSMNA MNA Provide a recognized avenue of representation

Print version Updated: 23 February 2020 Lecture #18 Dissolved Carbon Dioxide: Introduction

JULY 2020 SERVICE CHANGES Presentation to GCTD Board of Directors JULY 1, 2020 July 2020 Service

Sambuz

Useful Links

Newsletter

Mail Us