improved clustering algorithms for the random cluster
play

Improved Clustering Algorithms for the Random Cluster Graph Model - PowerPoint PPT Presentation

Improved Clustering Algorithms for the Random Cluster Graph Model Ron Shamir Dekel Tsur Tel Aviv University 1/18 The Clustering Problem Input: A graph G . (edges in G represent similarity between the vertices) Output: A partition of the


  1. Improved Clustering Algorithms for the Random Cluster Graph Model Ron Shamir Dekel Tsur Tel Aviv University 1/18

  2. The Clustering Problem Input: A graph G . (edges in G represent similarity between the vertices) Output: A partition of the vertices of V into sets such that there are many edges between vertices from the same set, and few edges between vertices from different sets. 2/18

  3. The Clustering Problem Input: A graph G . (edges in G represent similarity between the vertices) Output: A partition of the vertices of V into sets such that there are many edges between vertices from the same set, and few edges between vertices from different sets. 2/18

  4. The Random Cluster Graph Model A graph G = ( V, E ) which is built by the following process: 1. V is partitioned into disjoint sets V 1 , . . . , V m (clusters). 2. Mates (= vertices from the same set) are connected by an edge with probability p . 3. Non-mates are connected by an edge with probability r < p . The edges are independent. 3/18

  5. The Clustering Problem Input: A cluster graph G . Output: The clusters V 1 , . . . , V m . n = | V | k = min | V i | i ∆ = p − r 4/18

  6. Previous Results General case Paper Requirements Complexity k ∆ Ω( n ) Ω(1) Ben-Dor et al 99 Equal sized clusters m ∆ Ω( n − 1 / 4 log 1 / 4 n ) Dyer and Frieze 86 2 Ω( n − 1 / 2 √ log n ) 2 Boppana 87 Ω( n − 1 / 6+ ε ) Jerrum and Sorkin 93 2 Ω( n − 1 / 2+ ε ) Condon and Karp 99 O (1) n = | V | k = min | V i | ∆ = p − r i 5/18

  7. Previous Results General case Paper Requirements Complexity k ∆ Ω( n ) Ω(1) Ben-Dor et al 99 Ω(∆ − 1 √ n max(log n, ∆ − ε )) This paper Equal sized clusters m ∆ Ω( n − 1 / 4 log 1 / 4 n ) Dyer and Frieze 86 2 Ω( n − 1 / 2 √ log n ) 2 Boppana 87 Ω( n − 1 / 6+ ε ) Jerrum and Sorkin 93 2 Ω( n − 1 / 2+ ε ) Condon and Karp 99 O (1) Ω( mn − 1 / 2 √ log n ) This paper n = | V | k = min | V i | ∆ = p − r i 5/18

  8. Previous Results General case Paper Requirements Complexity k ∆ n 2 log O (1) n Ω( n ) Ω(1) Ben-Dor et al 99 Ω(∆ − 1 √ n max(log n, ∆ − ε )) O ( mn 2 / log n ) This paper Equal sized clusters m ∆ Ω( n − 1 / 4 log 1 / 4 n ) O ( n 2 ) Dyer and Frieze 86 2 Ω( n − 1 / 2 √ log n ) n O (1) 2 Boppana 87 Ω( n − 1 / 6+ ε ) O ( n 4 ) Jerrum and Sorkin 93 2 Ω( n − 1 / 2+ ε ) O ( n 2 ) Condon and Karp 99 O (1) O ( mn 2 log n ) Ω( mn − 1 / 2 √ log n ) This paper n = | V | k = min | V i | ∆ = p − r i 5/18

  9. Previous Results General case Paper Requirements Complexity k ∆ n 2 log O (1) n Ω( n ) Ω(1) Ben-Dor et al 99 Ω(∆ − 1 √ n max(log n, ∆ − ε )) This paper O ( n log n ) Equal sized clusters m ∆ Ω( n − 1 / 4 log 1 / 4 n ) Dyer and Frieze 86 2 Ω( n − 1 / 2 √ log n ) 2 Boppana 87 Ω( n − 1 / 6+ ε ) Jerrum and Sorkin 93 2 Ω( n − 1 / 2+ ε ) Condon and Karp 99 O (1) Ω( mn − 1 / 2 √ log n ) This paper n = | V | k = min | V i | ∆ = p − r i 5/18

  10. More Notation For a graph G = ( V, E ) , w . h . p . = With probability 1 − n − Ω(1) N ( v ) = The neighbors of v d S ( v ) = | N ( v ) ∩ S | d S ( v ) = 2 S v 6/18

  11. Top Level Description A set S ⊆ V is called a subcluster if S ⊆ V i for some cluster V i . Our algorithm: While G is not empty: Find seed: Find a subcluster S of size Θ(log n/ ∆ 2 ) . Find the whole cluster V i which contains Expand: S , and remove it from G . 7/18

  12. Expanding a subcluster S Suppose that S ⊆ V i and | S | = Θ(log n/ ∆ 2 ) . Consider d S ( v ) for v ∈ V − S : � | S | p if v ∈ V i E[ d S ( v )] = | S | r otherwise 8/18

  13. Expanding a subcluster S Suppose that S ⊆ V i and | S | = Θ(log n/ ∆ 2 ) . Consider d S ( v ) for v ∈ V − S : � | S | p if v ∈ V i E[ d S ( v )] = | S | r otherwise Using Chernoff-like bound, w.h.p. | d S ( v ) − E[ d S ( v )] | < 1 � 2 D , where D = Θ( | S | log n ) 8/18

  14. Expanding a subcluster S Suppose that S ⊆ V i and | S | = Θ(log n/ ∆ 2 ) . Consider d S ( v ) for v ∈ V − S : � | S | p if v ∈ V i E[ d S ( v )] = | S | r otherwise Using Chernoff-like bound, w.h.p. | d S ( v ) − E[ d S ( v )] | < 1 � 2 D , where D = Θ( | S | log n ) D D V i 0 | S | r | S | p d S ( v ) 8/18

  15. Expanding a subcluster S Suppose that S ⊆ V i and | S | = Θ(log n/ ∆ 2 ) . Consider d S ( v ) for v ∈ V − S : � | S | p if v ∈ V i E[ d S ( v )] = | S | r otherwise Using Chernoff-like bound, w.h.p. | d S ( v ) − E[ d S ( v )] | < 1 � 2 D , where D = Θ( | S | log n ) | S | ∆ − D > D D D V i 0 | S | r | S | p d S ( v ) 8/18

  16. Expanding a subcluster S 1. Order V − S = { v 1 , . . . , v n −| S | } such that d S ( v 1 ) ≥ d S ( v 2 ) ≥ · · · ≥ d S ( v n −| S | ) . � 2. Let D = Θ( | S | log n ) . 3. If max j { d S ( v j ) − d S ( v j +1 ) } < D , then return V . 4. Otherwise, let j be the first index for which d S ( v j ) − d S ( v j +1 ) ≥ D . Return S ∪ { u 1 , . . . , u j } . 9/18

  17. Finding a Subcluster — Imbalance For two disjoint sets L, R of vertices of equal size, the L, R -imbalance of V i (Jerrum and Sorkin 93) is I( V i , L, R ) = | V i ∩ L | − | V i ∩ R | . | L | The imbalance of L, R is max { I( V 1 , L, R ) , . . . , I( V m , L, R ) } . The secondary imbalance of L, R is the second largest value. 10/18

  18. Finding a Subcluster 1. Find L, R with large imbalance and small secondary imbalance. � 2. Let f ( v ) = d L ( v ) − d R ( v ) , D = Θ( | L | log n ) . 3. Randomly choose Θ( m 2 log n ) vertices from V − ( L ∪ R ) ∆ 2 into a set S . 4. Order S = { v 1 , . . . , v s } such that f ( v 1 ) ≥ · · · ≥ f ( v s ) . 5. If max j { f ( v j ) − f ( v j +1 ) } < D , then return. ( L, R are “bad”) 6. Let j be the first index for which f ( v j ) − f ( v j +1 ) ≥ D . Return { v 1 , . . . , v j } . 11/18

  19. Correctness of the Algorithm Denote b i = I( V i , L, R ) and l = | L | . Suppose that b 1 ≥ b 2 ≥ · · · ≥ b m . √ log n l ) and b 2 ≤ 1 Lemma If b 1 ≥ Ω( 2 b 1 then w.h.p. the √ ∆ alg. returns a subcluster. Proof For v ∈ V i , E[ f ( v )] = ∆ lb i . 12/18

  20. Correctness of the Algorithm Denote b i = I( V i , L, R ) and l = | L | . Suppose that b 1 ≥ b 2 ≥ · · · ≥ b m . √ log n l ) and b 2 ≤ 1 Lemma If b 1 ≥ Ω( 2 b 1 then w.h.p. the √ ∆ alg. returns a subcluster. Proof For v ∈ V i , E[ f ( v )] = ∆ lb i . | f ( v ) − E[ f ( v )] | < 1 2 D D D D V 3 V 2 V 1 0 ∆ lb 3 ∆ lb 2 ∆ lb 1 f ( v ) 12/18

  21. Correctness of the Algorithm Denote b i = I( V i , L, R ) and l = | L | . Suppose that b 1 ≥ b 2 ≥ · · · ≥ b m . √ log n l ) and b 2 ≤ 1 Lemma If b 1 ≥ Ω( 2 b 1 then w.h.p. the √ ∆ alg. returns a subcluster. Proof For v ∈ V i , E[ f ( v )] = ∆ lb i . | f ( v ) − E[ f ( v )] | < 1 2 D > D D D D V 3 V 2 V 1 0 ∆ lb 3 ∆ lb 2 ∆ lb 1 f ( v ) 12/18

  22. Finding the Sets L, R — Initialization 1. L 0 , R 0 ← φ . Let l = Θ( m 2 ∆ 2 ) . 2. Randomly select a vertex u and l pairs of vertices. 3. For each pair of vertices, if only one vertex is a neighbor of u , place that vertex in L 0 and the other vertex in R 0 . u L 0 R 0 13/18

  23. Finding the Sets L, R — Initialization 1. L 0 , R 0 ← φ . Let l = Θ( m 2 ∆ 2 ) . 2. Randomly select a vertex u and l pairs of vertices. 3. For each pair of vertices, if only one vertex is a neighbor of u , place that vertex in L 0 and the other vertex in R 0 . u L 0 R 0 13/18

  24. Finding the Sets L, R — Initialization 1. L 0 , R 0 ← φ . Let l = Θ( m 2 ∆ 2 ) . 2. Randomly select a vertex u and l pairs of vertices. 3. For each pair of vertices, if only one vertex is a neighbor of u , place that vertex in L 0 and the other vertex in R 0 . Otherwise randomly place one vertex in L 0 and the other vertex in R 0 . u L 0 R 0 13/18

  25. Finding the Sets L, R — Initialization 1. L 0 , R 0 ← φ . Let l = Θ( m 2 ∆ 2 ) . 2. Randomly select a vertex u and l pairs of vertices. 3. For each pair of vertices, if only one vertex is a neighbor of u , place that vertex in L 0 and the other vertex in R 0 . Otherwise randomly place one vertex in L 0 and the other vertex in R 0 . u L 0 R 0 13/18

  26. Analysis of the Initialization Suppose that u ∈ V 1 . If v ∈ V 1 and w / ∈ V 1 , then P [ v is a neighbor of u ] = p > r = P [ w is a neighbor of u ] ⇒ Using Chernoff bounds and Hoeffding-Azuma’s Inequality, w.h.p., I( V 1 , L 0 , R 0 ) ≈ (1 − 1 m )∆ m I( V i , L 0 , R 0 ) ≈ − 1 m · ∆ i > 1 m 14/18

  27. Finding the Sets L, R — 1st Iteration 4. If L 0 , R 0 are “good” (yielding a subcluster) stop. 15/18

  28. Finding the Sets L, R — 1st Iteration 4. If L 0 , R 0 are “good” (yielding a subcluster) stop. 5. Let f 0 ( v ) = d L 0 ( v ) − d R 0 ( v ) . 6. L 1 , R 1 ← φ . Randomly select l pairs of unchosen vertices. 7. For each pair v, w , if f 0 ( v ) � = f 0 ( w ) place the vertex with larger f 0 -value in L 1 and the other vertex in R 1 . L 0 R 0 f 0 ( v ) = 1 f 0 ( w ) = − 2 L 1 R 1 15/18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend