agglomerative 2 3 hierarchical agglomerative 2 3
play

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical - PowerPoint PPT Presentation

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical Clustering: theoretical Clustering: theoretical improvements and tests improvements and tests Sergiu Chelcea 1 , Patrice Bertrand , Patrice Bertrand 1,2 1,2 , Brigitte Trousse ,


  1. Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical Clustering: theoretical Clustering: theoretical improvements and tests improvements and tests Sergiu Chelcea 1 , Patrice Bertrand , Patrice Bertrand 1,2 1,2 , Brigitte Trousse , Brigitte Trousse 1 Sergiu Chelcea AxIS , INRIA Sophia-Antipolis, France , INRIA Sophia-Antipolis, France 1. Action 1. Action AxIS 2. ENST 2. ENST Bretagne Bretagne, France , France LastName LastName.FirstName@inria.fr .FirstName@inria.fr GfKl 2003 13 March 2003

  2. Outline Outline • • The classical case of AHC The classical case of AHC • • 2-3 Hierarchies 2-3 Hierarchies • Definitions Definitions • Properties Properties • • Algorithm of 2-3AHC of 2-3AHC Algorithm • • Analysis of of complexity complexity Analysis • • Application on simulated simulated data data Application on • Experimental Experimental Validation of Validation of Complexity Complexity • Ongoing Ongoing and and Future Future Work Work GfKl 2003 13 March 2003 1

  3. Context Context Bertrand 2002 Bertrand 2002 Hierarchies Hierarchies Diday Diday 1984-86, 1984-86, Fichet Fichet 1987 1987 2-3 Hierarchies 2-3 Hierarchies Pyramids Pyramids Weak Hierarchies Weak Hierarchies Bandelt, Dress 1989 Bandelt , Dress 1989 Diatta Diatta, Fichet Fichet 1994 1994 GfKl 2003 13 March 2003 2

  4. Hierarchies (1/3) (1/3) Hierarchies We recall recall some some definitions definitions related related to to the the hierarchical hierarchical We case that that w ill w ill be be extended extended to to the the 2-3 2-3 hierarchies hierarchies: : case • Hierarchy: : - • Hierarchy - each each cluster cluster is is nonempty nonempty - - E E and and the the singletons are clusters singletons are clusters - - each each pair of clusters (A,B) pair of clusters (A,B) is is hierarchical hierarchical: B A ∩ B ∈ { ∅ ,A,B ,A,B} A 2 A 1 Remark : : - Remark - admits admits at at most most n-1 non trivial clusters n-1 non trivial clusters Indexed hierarchy Indexed hierarchy: : - - each each cluster cluster is associated to is associated to a positive a positive real number real number f f , , ∀ ∈ ⊂ ⇒ < A , B S , A B f ( A ) f ( B ) w here w here GfKl 2003 13 March 2003 3

  5. Agglomerative Hierarchical Classification Agglomerative Hierarchical Classification (2/3) (2/3) Vocabulary: : Vocabulary - set inclusion - set inclusion order order on on the the set of clusters: set of clusters: - predecessor - predecessor/successor successor - comparable clusters - comparable clusters - candidate clusters (unmarked) = maximal clusters - candidate clusters (unmarked) = maximal clusters δ × → ∞ : E E [ 0 , ) - data input: dissimilarity - data input: dissimilarity δ = δ > δ = ∀ ∈ ( a , b ) ( b , a ) ( a , a ) 0 , a , b E µ : clusters), µ - aggregation - aggregation index ( index (link link betw een betw een clusters), : - single linkage - single linkage - complete - complete linkage linkage - average - average linkage linkage µ (X,Y) Y) = µ f(X ∪ Y) = - usually - usually f(X (X,Y) GfKl 2003 13 March 2003 4

  6. Algorithm AHC (3/3) AHC (3/3) Algorithm 1. Initialisation Initialisation: : iter ← 0; Clusters are the singletons of set E. 1. iter 0; Clusters are the singletons of set E. f ← 0; f 0; 2. iter ← iter 2. iter iter + 1; + 1; µ - the tw o nearest X and Y w hich are - in the sense of µ Merge Merge X and Y w hich are - in the sense of - the tw o nearest clusters; compute f(X ∪ Y) clusters; compute f(X 3. Reduction Reduction: : Eliminate the successors found on the same 3. Eliminate the successors found on the same level level f w ith their predecessor, if there are any w ith their predecessor, if there are any µ Update µ 4. Update 4. , predecessor predecessor links, links, successor successor links links 5. Stopping Stopping rule rule: : Repeat step 2-4, until the set E becomes a 5. Repeat step 2-4, until the set E becomes a cluster cluster GfKl 2003 13 March 2003 5

  7. 2-3 Hierarchies Hierarchies: : Definitions Definitions 2-3 Proper intersection intersection: : Proper • • B, if A ∩ B ∉ { ∅ ,A,B - A - A properly properly intersects intersects B, if A ,A,B} A B Concept: - Concept: - in a 2-3 in a 2-3 hierarchy hierarchy, for , for any three any three clusters clusters at least tw o at least tw o pairs of them pairs of them are are hierarchical hierarchical • 2-3 Hierarchy Hierarchy [Bertrand 2002]: [Bertrand 2002]: • 2-3 - each - each cluster cluster is is nonempty nonempty - - E E and singletons are clusters and singletons are clusters - the - the proper proper intersection of intersection of tw o tw o clusters clusters is is also also a cluster a cluster - each - each cluster cluster properly properly intersects intersects no more no more than than one one other other cluster cluster GfKl 2003 13 March 2003 6

  8. 2-3 Hierarchies Hierarchies: : Properties Properties 2-3 [Bertrand 2002] [Bertrand 2002] • • The The number number of of elements elements of a 2-3 of a 2-3 hierarchy hierarchy that that are are   3 not reduced not reduced to to singletons, singletons, is is at at most most − ) ( n 1     2 • Each 2-3 2-3 hierarchical hierarchical set set system system on E on E is is a a • Each collection of intervals intervals of of some some linear linear order order collection of defined on E. on E. defined 2-3 Hierarchy 2-3 Hierarchy Pyramid Pyramid GfKl 2003 13 March 2003 7

  9. Algorithm of 2-3AHC of 2-3AHC Algorithm ← 0; Clusters are the singletons of set E. 1. Initialisation Initialisation: : 1. iter iter 0; Clusters are the singletons of set E. f ← 0; f 0; ← iter 2. iter 2. iter iter + 1; + 1; µ - the tw o X and Y w hich are - in the sense of µ Merge X and Y w hich are - in the sense of Merge - the tw o nearest non-comparable nearest non-comparable clusters, such that at least clusters, such that at least one of them is maximal; compute f(X ∪ Y) one of them is maximal; compute f(X X ∪ Y and the other predecessor of X or Y, if it 3. Merge Merge X 3. and the other predecessor of X or Y, if it exists. exists. compute f(X ∪ Y) compute f(X 4. Reduction Reduction: : Eliminate the successors found on the same 4. Eliminate the successors found on the same level f level f w ith their predecessor, if there are any w ith their predecessor, if there are any µ µ 5. 5. Update Update , , predecessor predecessor links, links, successor successor links links 6. Stopping Stopping rule rule: : Repeat step 2-5, until the set E becomes a 6. Repeat step 2-5, until the set E becomes a cluster cluster GfKl 2003 13 March 2003 8

  10. Algorithm of 2-3AHC of 2-3AHC Algorithm • Generalizes the the AHC: AHC: • Generalizes - a cluster - a cluster can can be be merged merged w ith w ith tw o tw o different different clusters clusters • • Double single linkage Double single linkage [ [Jullien Jullien, Bertrand 2002]: , Bertrand 2002]: ∪ = µ µ ∪ f ( X Y ) Min { ( X , Y ), ( X Y , Z ) : Z candidate cluster } • Complexity: : O(n • Complexity O(n 2 log log n) n) GfKl 2003 13 March 2003 9

  11. Analysis of Complexity (1/3) Analysis of Complexity (1/3) We use an ordered dissimilarity matrix on three levels: We use an ordered dissimilarity matrix on three levels: - dissimilarity values - dissimilarity values - cardinality of the tw o clusters - cardinality of the tw o clusters - lexicographical order - lexicographical order Step 1. Step 1. Initialisation Initialisation: : Compute and order the dissimilarity Compute and order the dissimilarity matrix, O(n matrix, O(n 2 log log n) n) Step 2. Merge Merge X and Y … : Retrieve (X,Y) from the data structure, Step 2. X and Y … : Retrieve (X,Y) from the data structure, and create X ∪ Y, O(1) and create X Y, O(1) X ∪ Y and … : Intermediate merging w ith O(n) Step 3. Merge Merge X Step 3. and … : Intermediate merging w ith O(n) complexity complexity GfKl 2003 13 March 2003 10

  12. Analysis of of Complexity Complexity (2/3) (2/3) Analysis Step 4. Reduction Reduction: : We have five possible cases of reduction Step 4. We have five possible cases of reduction w hen merging a cluster: w hen merging a cluster: α . α β 2 X’ β 2 Y’ β 2 Z β 1 - eliminate the successors found on the same level - eliminate the successors found on the same level w ith their predecessor w ith their predecessor - complexity O(n) - complexity O(n) GfKl 2003 13 March 2003 11

  13. Analysis of of Complexity Complexity (3/3) (3/3) Analysis µ µ Step 5. Update Update : Step 5. - compute new dissimilarities and store them in - compute new dissimilarities and store them in the matrix, O(n the matrix, O(n log log n) n) - eliminate dissimilarities containing non candidates - eliminate dissimilarities containing non candidates clusters, O(n clusters, O(n log log n) n) Total complexity of the algorithm : Total complexity of the algorithm n) + n × O(n n) → O(n O(n 2 log O(n 2 log O(n log n) + n O(n log log n) log n) n) step 1. step 1. steps 2. - 5. steps 2. - 5. GfKl 2003 13 March 2003 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend