Improved Clustering Algorithms for the Random Cluster Graph Model - PowerPoint PPT Presentation

Improved Clustering Algorithms for the Random Cluster Graph Model Ron Shamir Dekel Tsur Tel Aviv University 1/18

The Clustering Problem Input: A graph G . (edges in G represent similarity between the vertices) Output: A partition of the vertices of V into sets such that there are many edges between vertices from the same set, and few edges between vertices from different sets. 2/18

The Random Cluster Graph Model A graph G = ( V, E ) which is built by the following process: 1. V is partitioned into disjoint sets V 1 , . . . , V m (clusters). 2. Mates (= vertices from the same set) are connected by an edge with probability p . 3. Non-mates are connected by an edge with probability r < p . The edges are independent. 3/18

The Clustering Problem Input: A cluster graph G . Output: The clusters V 1 , . . . , V m . n = | V | k = min | V i | i ∆ = p − r 4/18

Previous Results General case Paper Requirements Complexity k ∆ Ω( n ) Ω(1) Ben-Dor et al 99 Equal sized clusters m ∆ Ω( n − 1 / 4 log 1 / 4 n ) Dyer and Frieze 86 2 Ω( n − 1 / 2 √ log n ) 2 Boppana 87 Ω( n − 1 / 6+ ε ) Jerrum and Sorkin 93 2 Ω( n − 1 / 2+ ε ) Condon and Karp 99 O (1) n = | V | k = min | V i | ∆ = p − r i 5/18

Previous Results General case Paper Requirements Complexity k ∆ Ω( n ) Ω(1) Ben-Dor et al 99 Ω(∆ − 1 √ n max(log n, ∆ − ε )) This paper Equal sized clusters m ∆ Ω( n − 1 / 4 log 1 / 4 n ) Dyer and Frieze 86 2 Ω( n − 1 / 2 √ log n ) 2 Boppana 87 Ω( n − 1 / 6+ ε ) Jerrum and Sorkin 93 2 Ω( n − 1 / 2+ ε ) Condon and Karp 99 O (1) Ω( mn − 1 / 2 √ log n ) This paper n = | V | k = min | V i | ∆ = p − r i 5/18

Previous Results General case Paper Requirements Complexity k ∆ n 2 log O (1) n Ω( n ) Ω(1) Ben-Dor et al 99 Ω(∆ − 1 √ n max(log n, ∆ − ε )) O ( mn 2 / log n ) This paper Equal sized clusters m ∆ Ω( n − 1 / 4 log 1 / 4 n ) O ( n 2 ) Dyer and Frieze 86 2 Ω( n − 1 / 2 √ log n ) n O (1) 2 Boppana 87 Ω( n − 1 / 6+ ε ) O ( n 4 ) Jerrum and Sorkin 93 2 Ω( n − 1 / 2+ ε ) O ( n 2 ) Condon and Karp 99 O (1) O ( mn 2 log n ) Ω( mn − 1 / 2 √ log n ) This paper n = | V | k = min | V i | ∆ = p − r i 5/18

Previous Results General case Paper Requirements Complexity k ∆ n 2 log O (1) n Ω( n ) Ω(1) Ben-Dor et al 99 Ω(∆ − 1 √ n max(log n, ∆ − ε )) This paper O ( n log n ) Equal sized clusters m ∆ Ω( n − 1 / 4 log 1 / 4 n ) Dyer and Frieze 86 2 Ω( n − 1 / 2 √ log n ) 2 Boppana 87 Ω( n − 1 / 6+ ε ) Jerrum and Sorkin 93 2 Ω( n − 1 / 2+ ε ) Condon and Karp 99 O (1) Ω( mn − 1 / 2 √ log n ) This paper n = | V | k = min | V i | ∆ = p − r i 5/18

More Notation For a graph G = ( V, E ) , w . h . p . = With probability 1 − n − Ω(1) N ( v ) = The neighbors of v d S ( v ) = | N ( v ) ∩ S | d S ( v ) = 2 S v 6/18

Top Level Description A set S ⊆ V is called a subcluster if S ⊆ V i for some cluster V i . Our algorithm: While G is not empty: Find seed: Find a subcluster S of size Θ(log n/ ∆ 2 ) . Find the whole cluster V i which contains Expand: S , and remove it from G . 7/18

Expanding a subcluster S Suppose that S ⊆ V i and | S | = Θ(log n/ ∆ 2 ) . Consider d S ( v ) for v ∈ V − S : � | S | p if v ∈ V i E[ d S ( v )] = | S | r otherwise 8/18

Expanding a subcluster S Suppose that S ⊆ V i and | S | = Θ(log n/ ∆ 2 ) . Consider d S ( v ) for v ∈ V − S : � | S | p if v ∈ V i E[ d S ( v )] = | S | r otherwise Using Chernoff-like bound, w.h.p. | d S ( v ) − E[ d S ( v )] | < 1 � 2 D , where D = Θ( | S | log n ) D D V i 0 | S | r | S | p d S ( v ) 8/18

Expanding a subcluster S Suppose that S ⊆ V i and | S | = Θ(log n/ ∆ 2 ) . Consider d S ( v ) for v ∈ V − S : � | S | p if v ∈ V i E[ d S ( v )] = | S | r otherwise Using Chernoff-like bound, w.h.p. | d S ( v ) − E[ d S ( v )] | < 1 � 2 D , where D = Θ( | S | log n ) | S | ∆ − D > D D D V i 0 | S | r | S | p d S ( v ) 8/18

Expanding a subcluster S 1. Order V − S = { v 1 , . . . , v n −| S | } such that d S ( v 1 ) ≥ d S ( v 2 ) ≥ · · · ≥ d S ( v n −| S | ) . � 2. Let D = Θ( | S | log n ) . 3. If max j { d S ( v j ) − d S ( v j +1 ) } < D , then return V . 4. Otherwise, let j be the first index for which d S ( v j ) − d S ( v j +1 ) ≥ D . Return S ∪ { u 1 , . . . , u j } . 9/18

Finding a Subcluster — Imbalance For two disjoint sets L, R of vertices of equal size, the L, R -imbalance of V i (Jerrum and Sorkin 93) is I( V i , L, R ) = | V i ∩ L | − | V i ∩ R | . | L | The imbalance of L, R is max { I( V 1 , L, R ) , . . . , I( V m , L, R ) } . The secondary imbalance of L, R is the second largest value. 10/18

Finding a Subcluster 1. Find L, R with large imbalance and small secondary imbalance. � 2. Let f ( v ) = d L ( v ) − d R ( v ) , D = Θ( | L | log n ) . 3. Randomly choose Θ( m 2 log n ) vertices from V − ( L ∪ R ) ∆ 2 into a set S . 4. Order S = { v 1 , . . . , v s } such that f ( v 1 ) ≥ · · · ≥ f ( v s ) . 5. If max j { f ( v j ) − f ( v j +1 ) } < D , then return. ( L, R are “bad”) 6. Let j be the first index for which f ( v j ) − f ( v j +1 ) ≥ D . Return { v 1 , . . . , v j } . 11/18

Correctness of the Algorithm Denote b i = I( V i , L, R ) and l = | L | . Suppose that b 1 ≥ b 2 ≥ · · · ≥ b m . √ log n l ) and b 2 ≤ 1 Lemma If b 1 ≥ Ω( 2 b 1 then w.h.p. the √ ∆ alg. returns a subcluster. Proof For v ∈ V i , E[ f ( v )] = ∆ lb i . 12/18

Correctness of the Algorithm Denote b i = I( V i , L, R ) and l = | L | . Suppose that b 1 ≥ b 2 ≥ · · · ≥ b m . √ log n l ) and b 2 ≤ 1 Lemma If b 1 ≥ Ω( 2 b 1 then w.h.p. the √ ∆ alg. returns a subcluster. Proof For v ∈ V i , E[ f ( v )] = ∆ lb i . | f ( v ) − E[ f ( v )] | < 1 2 D D D D V 3 V 2 V 1 0 ∆ lb 3 ∆ lb 2 ∆ lb 1 f ( v ) 12/18

Correctness of the Algorithm Denote b i = I( V i , L, R ) and l = | L | . Suppose that b 1 ≥ b 2 ≥ · · · ≥ b m . √ log n l ) and b 2 ≤ 1 Lemma If b 1 ≥ Ω( 2 b 1 then w.h.p. the √ ∆ alg. returns a subcluster. Proof For v ∈ V i , E[ f ( v )] = ∆ lb i . | f ( v ) − E[ f ( v )] | < 1 2 D > D D D D V 3 V 2 V 1 0 ∆ lb 3 ∆ lb 2 ∆ lb 1 f ( v ) 12/18

Finding the Sets L, R — Initialization 1. L 0 , R 0 ← φ . Let l = Θ( m 2 ∆ 2 ) . 2. Randomly select a vertex u and l pairs of vertices. 3. For each pair of vertices, if only one vertex is a neighbor of u , place that vertex in L 0 and the other vertex in R 0 . u L 0 R 0 13/18

Finding the Sets L, R — Initialization 1. L 0 , R 0 ← φ . Let l = Θ( m 2 ∆ 2 ) . 2. Randomly select a vertex u and l pairs of vertices. 3. For each pair of vertices, if only one vertex is a neighbor of u , place that vertex in L 0 and the other vertex in R 0 . Otherwise randomly place one vertex in L 0 and the other vertex in R 0 . u L 0 R 0 13/18

Analysis of the Initialization Suppose that u ∈ V 1 . If v ∈ V 1 and w / ∈ V 1 , then P [ v is a neighbor of u ] = p > r = P [ w is a neighbor of u ] ⇒ Using Chernoff bounds and Hoeffding-Azuma’s Inequality, w.h.p., I( V 1 , L 0 , R 0 ) ≈ (1 − 1 m )∆ m I( V i , L 0 , R 0 ) ≈ − 1 m · ∆ i > 1 m 14/18

Finding the Sets L, R — 1st Iteration 4. If L 0 , R 0 are “good” (yielding a subcluster) stop. 15/18

Finding the Sets L, R — 1st Iteration 4. If L 0 , R 0 are “good” (yielding a subcluster) stop. 5. Let f 0 ( v ) = d L 0 ( v ) − d R 0 ( v ) . 6. L 1 , R 1 ← φ . Randomly select l pairs of unchosen vertices. 7. For each pair v, w , if f 0 ( v ) � = f 0 ( w ) place the vertex with larger f 0 -value in L 1 and the other vertex in R 1 . L 0 R 0 f 0 ( v ) = 1 f 0 ( w ) = − 2 L 1 R 1 15/18

Improved Clustering Algorithms for the Random Cluster Graph Model - PowerPoint PPT Presentation

Improved Clustering Algorithms for the Random Cluster Graph Model Ron Shamir Dekel Tsur Tel Aviv University 1/18 The Clustering Problem Input: A graph G . (edges in G represent similarity between the vertices) Output: A partition of the

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Clustering Data Clustering with user constraints The clustering problem : Given a set of

Clustering with k-means Introduction to Machine Learning Clustering, what? Cluster :

CLUSTER ANALYSIS Agenda Introduction to cluster analysis and application Feature

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Cluster Architectures Overview Cluster Computing The Problem The Solution The Anatomy

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline Clustering K-mean

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Phase Transitions in Semidefinite Relaxations Andrea Montanari [with Adel Javanmard, Federico

On sub-determinants and the diameter of polyhedra Martin Niemeier, EPF Lausanne Joint work with:

Graphs and limits Mathias Schacht Institut f ur Informatik Humboldt-Universit at zu Berlin

Decomposing Cubic Graphs into Connected Subgraphs of Size Three Laurent Bulteau Guillaume Fertin

Counting Problems over Incomplete Databases Mikal Monet Formal Methods team seminar at LaBRI

Kauffman bracket polynomials of Conway-Coxeter Friezes (joint work with Michihisa Wakui) Takeyoshi

Probabilistic Analysis of Christofides Algorithm Markus Bl aser Konstantinos Panagiotou B.

Power-Law Tail of the Degree Distribution in the Connected Component of the Duplication Graph