Graph Clustering Graph Clustering What is clustering? What is - PowerPoint PPT Presentation

Graph Clustering Graph Clustering

What is clustering? What is clustering? � Finding patterns in data, or grouping similar groups of data-points together into clusters . � Clustering algorithms for numeric data: � Lloyd’s K-means, EM clustering, spectral clustering etc.

Examples of good clustering: Examples of good clustering: IMAGE SEGMENTATION

Graph Clustering: Graph Clustering: � Graphical representation of data as undirected graphs. GRAPH PARTITIONING!!

Graph clustering: Graph clustering: � Undirected graphs � Clustering of vertices on basis of edge structure. � Defining a graph cluster? g g p � In its loosest sense, a graph cluster is a connected component. � In its strictest sense, it’s a maximal clique of a graph. q g p � Many vertices within each cluster. � Few edges between clusters. � Few edges between clusters.

Graph terminology: Graph terminology:

Graph partitioning: Graph partitioning:

Graph Partitioning: Graph Partitioning: � The optimization problem for normalized cuts is intractable (an NP hard problem). � Hence we resort to spectral clustering and approximation algorithms.

More Graph notation: More Graph notation: Adjacency Matrix, A Degree Matrix The properties of the Laplacian of a graph are found to be more interesting for the characterizatiojnof a grpah than the adjacency matrix. The unnormalized Graph Laplacian is defined as Laplacian is defined as

Properties of the Laplacian: Properties of the Laplacian: For every vector 1. L is symmetric and positive definite. 2. 0 is an eigenvalue of the Laplacian with the constant 0 is an eigenvalue of the Laplacian, with the constant 3 3. vector as a corresponding eigenvector. L has n non-negative eigenvalues L has n non negative eigenvalues. 4 4.

Number of Components: Number of Components:

Graph spectra: Graph spectra: � The multiplicity of the eigenvalue 0 gives the number of connected components in the graph.

Graph Generation models: Graph Generation models: � Uniform random model � All edges equiprobable � Poissonian degree distribution � No cluster structure. � Planted partition model � l partitions of vertex set � Edge-probabilities p and q . � Caveman graphs, RMAT generation etc. � Fuzzy graphs?? y g p

General clustering paradigms: General clustering paradigms: � Hierarchical clustering VS flat clustering. � Hierarchical: � T op down � Bottom up

Overview: Overview: � Cut based methods: � Become NP hard with introduction of size constraints. � Approximation algorithms minimizing graph conductance. � M � Maximum flow i fl � Using results by Golberg and Tarjan � Reasonable for small graphs � Reasonable for small graphs. � Graph Spectrum based: � Stable perturbation analysis � Stable perturbation analysis � Good even when graph is not exactly block diagonal. � Typically, second smallest eigenvalue is taken as graph characterstic. � Spectrum of graph transition matrix for blind walk.

Overview: Overview: � Could experiment with properties of different Laplacians. � Typically outperforms k-means and other traditional clustering algorithms. � Computationally unfeasible for large graphs. � Roundabouts?

Voltage potential view: ☺ Voltage-potential view: ☺ � Related to ‘betweenness’ of edges. � Not stable to placement of rando N bl l f d m sources and sinks.

Markov Random walks: Markov Random walks: � Vertices in same cluster are quickly reachable. � A random walk in one of the clusters is likely to remain for a long time a long time. � The Perron-Frobenius theorem ensures that the largest eigenvalue associated with a transition matrix is always 1 eigenvalue associated with a transition matrix is always 1. (relation with Graph Laplacian). � Component of second eigenvector vector of the � Component of second eigenvector vector of the transition matrix serves as a measure of absorption time.

Thank you.

Graph Clustering Graph Clustering What is clustering? What is - PowerPoint PPT Presentation

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns in data, or grouping similar groups of data-points together into clusters . Clustering algorithms for numeric data: Lloyds K-means, EM

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Graph Clustering Why graph clustering is useful? Distance matrices are graphs as useful as

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Explicit R enyi Entropy for Hidden Markov Chains Joachim Breitner, Maciej Skorski ISIT, June

The Power STATIS-ACT method Jacques B enass eni, Mohammed Bennani Dosse Universit e

Less Noisy Domination by Symmetric Channels Anuran Makur and Yury Polyanskiy EECS Department,

A linear operator-theoretic approach to nonlinear systems Alexandre Mauroy University of Namur

Towers of function fields over finite fields and their sequences of zeta functions Alexey Zaytsev

MillWheel: Fault Tolerant Stream Processing at Internet Scale Presented by Rui Zhang October

Ohios Health Home Model for Beneficiaries with Serious and Persistent Mental Illness: A

Dynamic Partial-Order Reduction for Model Checking Software Cormac Flanagan Patrice Godefroid