Community Detection : A Simple Example Joon Ho Park, Yumlembam - - PowerPoint PPT Presentation
Community Detection : A Simple Example Joon Ho Park, Yumlembam - - PowerPoint PPT Presentation
Community Detection : A Simple Example Joon Ho Park, Yumlembam Hemajit and Ki-Ho Lee Project Motivation To understand the basics of community det ection To apply the ideas on traditional methods fo r community detection to known system
Project Motivation
- To understand the basics of community det
ection
- To apply the ideas on traditional methods fo
r community detection to known system
- To figure out the clustering of proteins from t
he analogy of the project
Quick Review of Community Detection
- Traditional Methods of Clustering
– Graph Partitioning
- Dividing vertices in groups of predefined size
- Minimizing cut size (# edges running between clusters)
– Hierarchical clustering
- Including small clusters in larger clusters according to similarity
- Agglomerative (bottom-up) or divisive (top-down) algorithms
– Partitional clustering
- Distance between vertices = dissimilarity between vertices
- E.g., k-means clustering: minimizing the total intra-cluster distance
– Spectral clustering
- Clustering by eigenvectors of matrices (e.g., similarity matrix)
Mountain Top Valley Top Valley Top Mountain Hub Mountain Con dominium & S ki Valley Condo minium & Ski
# of vertices : 16 # of edges : 27 Z1 VT V VH Z2 MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2
Graph Partitioning
- Simplest conditions
– Dividing into two groups of equal size – Minimal # of edges between two groups – Maximal # of edges inside the modules – Kernighan-Ling algorithm
- Maximizing Q
- Q = (# of edges inside the modules) – (# of edges lying between them)
Z1 VT V VH Z2 MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 Cut Size : 11 Q = 4
Z2 Z1 VT V VH MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 Cut Size : 9 Q = 9
Z2 Z1 VT V VH MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 Cut Size : 8 Q = 10
Z2 Z1 VT V VH MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 Cut Size : 6 Q = 15
AT1 V Z2 Z1 VT VH MT H AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 Cut Size : 5 Q = 17
AT1 V Z2 Z1 VT VH MT H AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 Cut Size : 5 Q = 17
AT1 V Z2 Z1 VT VH MT H AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 Cut Size : 5 Q = 17
AT1 V Z2 Z1 VT VH MT H AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 Cut Size : 5 Q = 17
Hierarchical Clustering
- Simplest conditions
– Divisive algorithm
- Clusters are iteratively split by removing edges conn
ecting vertices with low similarity
– Vertex similarity
- Defined by the # of edge-(or vertex-) independent pa
ths between two vertices
- Independent paths do not share any edge (vertex).
Z1 VT V VH Z2 MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 4 2 3 3 3 3 2 2 3 2 3 2 3 3 3 3 3 3 2 4 3 3 3 2 2 2 3
Z1 VT V VH Z2 MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 2 3 3 3 3 3 3 3 2
Z1 VT V VH Z2 MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 2 3 3 3 3
Z1 VT V VH Z2 MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 3 3 3 3 3
Z1 VT V VH Z2 MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 2 3 3 3 2
1 2 3 4 6 5 7 9 10 11 14 12 13 8 15 16
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − − − − − = 2 ... ... ... ... ... ... ... ... 3 1 ... 4 1 ... 1 3 1 ... 1 3 L
size cut 4 1
T
= Ls s
1 2 3 4 6 5 7 9 10 11 14 12 13 8 15 16
size cut 4 1
T
= Ls s
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − − − − − − − − = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 S
5
Z1 VT V VH Z2 MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 4 2 3 3 3 3 2 2 3 2 3 2 3 3 3 3 3 3 2 4 3 3 3 2 2 2 3
Z1 VT V VH Z2 MT H AT1 AP1 MH AT2 MC&S AP3 Z3 VC&S AP2 2 3 3 3 3
Not good enough !
Conclusions
- Graph partitioning proposes a basic idea for com
munity detection
- The concept of similarity is adopted to hierarchic
al, partitional and spectral clustering
- We’ve realized that the community detection can
be used for the clustering of protein databases if the similarity is replaced by the score (TM-score
- r RMSD, etc)