Graph Clustering Graph Clustering What is clustering? What is - - PowerPoint PPT Presentation

graph clustering graph clustering what is clustering what
SMART_READER_LITE
LIVE PREVIEW

Graph Clustering Graph Clustering What is clustering? What is - - PowerPoint PPT Presentation

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns in data, or grouping similar groups of data-points together into clusters . Clustering algorithms for numeric data: Lloyds K-means, EM


slide-1
SLIDE 1

Graph Clustering Graph Clustering

slide-2
SLIDE 2

What is clustering? What is clustering?

Finding patterns in data, or grouping similar groups of

data-points together into clusters.

Clustering algorithms for numeric data:

Lloyd’s K-means, EM clustering, spectral clustering etc.

slide-3
SLIDE 3

Examples of good clustering: Examples of good clustering:

IMAGE SEGMENTATION

slide-4
SLIDE 4

Graph Clustering: Graph Clustering:

Graphical representation of data as undirected graphs.

GRAPH PARTITIONING!!

slide-5
SLIDE 5

Graph clustering: Graph clustering:

Undirected graphs Clustering of vertices on basis of edge structure. Defining a graph cluster?

g g p

In its loosest sense, a graph cluster is a connected component. In its strictest sense, it’s a maximal clique of a graph.

q g p

Many vertices within each cluster. Few edges between clusters. Few edges between clusters.

slide-6
SLIDE 6

Graph terminology: Graph terminology:

slide-7
SLIDE 7

Graph partitioning: Graph partitioning:

slide-8
SLIDE 8

Graph Partitioning: Graph Partitioning:

The optimization problem for normalized cuts is

intractable (an NP hard problem).

Hence we resort to spectral clustering and

approximation algorithms.

slide-9
SLIDE 9

More Graph notation: More Graph notation:

Adjacency Matrix, A Degree Matrix The properties of the Laplacian of a graph are found to be more interesting for the characterizatiojnof a grpah than the adjacency matrix. The unnormalized Graph Laplacian is defined as Laplacian is defined as

slide-10
SLIDE 10

Properties of the Laplacian: Properties of the Laplacian:

1.

For every vector

2.

L is symmetric and positive definite.

3

0 is an eigenvalue of the Laplacian with the constant

3.

0 is an eigenvalue of the Laplacian, with the constant vector as a corresponding eigenvector.

4

L has n non-negative eigenvalues

4.

L has n non negative eigenvalues.

slide-11
SLIDE 11

Number of Components: Number of Components:

slide-12
SLIDE 12

Graph spectra: Graph spectra:

The multiplicity of the eigenvalue 0 gives the number of

connected components in the graph.

slide-13
SLIDE 13

Graph Generation models: Graph Generation models:

Uniform random model

All edges equiprobable Poissonian degree distribution No cluster structure.

Planted partition model

l partitions of vertex set Edge-probabilities p and q.

Caveman graphs, RMAT generation etc. Fuzzy graphs??

y g p

slide-14
SLIDE 14

General clustering paradigms: General clustering paradigms:

Hierarchical clustering VS flat clustering. Hierarchical:

T

  • p down

Bottom up

slide-15
SLIDE 15

Overview: Overview:

Cut based methods:

Become NP hard with introduction of size constraints. Approximation algorithms minimizing graph conductance.

M

i fl

Maximum flow

Using results by Golberg and Tarjan Reasonable for small graphs Reasonable for small graphs.

Graph Spectrum based:

Stable perturbation analysis Stable perturbation analysis Good even when graph is not exactly block diagonal. Typically, second smallest eigenvalue is taken as graph

characterstic.

Spectrum of graph transition matrix for blind walk.

slide-16
SLIDE 16

Overview: Overview:

Could experiment with properties of different Laplacians. Typically outperforms k-means and other traditional

clustering algorithms.

Computationally unfeasible for large graphs. Roundabouts?

slide-17
SLIDE 17

Voltage potential view: ☺ Voltage-potential view: ☺

Related to ‘betweenness’ of edges.

N bl l f d

Not stable to placement of rando

m sources and sinks.

slide-18
SLIDE 18

Markov Random walks: Markov Random walks:

Vertices in same cluster are

quickly reachable.

A random walk in one of the

clusters is likely to remain for a long time a long time.

The Perron-Frobenius theorem ensures that the largest

eigenvalue associated with a transition matrix is always 1 eigenvalue associated with a transition matrix is always 1. (relation with Graph Laplacian).

Component of second eigenvector vector of the Component of second eigenvector vector of the

transition matrix serves as a measure of absorption time.

slide-19
SLIDE 19

Thank you.