Graph-based Clustering Transform the data into a graph - - PowerPoint PPT Presentation

graph based clustering
SMART_READER_LITE
LIVE PREVIEW

Graph-based Clustering Transform the data into a graph - - PowerPoint PPT Presentation

Graph-based Clustering Transform the data into a graph representation Vertices are the data points to be clustered Edges are weighted based on similarity between data points Graph partitioning Each connected component is a


slide-1
SLIDE 1

‹#›

Graph-based Clustering

  • Transform the data into a graph representation

– Vertices are the data points to be clustered – Edges are weighted based on similarity between data points

Þ

Graph partitioning Each connected component is a cluster

slide-2
SLIDE 2

‹#›

Clustering as Graph Partitioning

  • Two things needed:
  • 1. An objective function to determine what would be the

best way to “cut” the edges of a graph

  • 2. An algorithm to find the optimal partition (optimal

according to the objective function)

slide-3
SLIDE 3

‹#›

Objective Function for Partitioning

  • Suppose we want to partition the set of vertices V

into two sets: V1 and V2

– One possible objective function is to minimize graph cut

å

Î Î

=

2 1,

2 1

) , ( Cut

V j V i ij

w V V

0.1 0.3 0.1 0.1

v1 v3 v2 v5 v4 v6

0.2 0.1 0.3 0.1 0.1

v1 v3 v2 v5 v4 v6

0.2 Cut = 0.2 Cut = 0.4

wij is weight of the edge between nodes i and j

slide-4
SLIDE 4

‹#›

Objective Function for Partitioning

  • Limitation of minimizing graph cut:

– The optimal solution might be to split up a single node from the rest of the graph! Not a desirable solution

Cut = 0.1 0.1 0.3 0.1 0.1

v1 v3 v2 v5 v4 v6

0.2

slide-5
SLIDE 5

‹#›

Objective Function for Partitioning

  • We should not only minimize the graph cut; but

also look for “balanced” clusters

å å å

= + = + =

Î Î j ij i V j j V i i

w d d V V d V V V V V V V V V V V V e wher ) , Cut( ) , Cut( ) , ( cut Normalized | | ) , Cut( | | ) , Cut( ) , ( cut Ratio

2 1

2 1 2 1 2 1 2 2 1 1 2 1 2 1 V1 and V2 are the set of nodes in partitions 1 and 2 |Vi| is the number of nodes in partition Vi

V1 V2

slide-6
SLIDE 6

‹#›

Example

Cut = 0.1 Ratio cut = 0.1/1 + 0.1/5 = 0.12 Normalized cut = 0.1/0.1 + 0.1/1.5 = 1.07 Cut = 0.2 Ratio cut = 0.2/3 + 0.2/3 = 0.13 Normalized cut = 0.2/1 + 0.2/0.6 = 0.53 0.1 0.3 0.1 0.1

v1 v3 v2 v5 v4 v6

0.2 0.1 0.3 0.1 0.1

v1 v3 v2 v5 v4 v6

0.2

slide-7
SLIDE 7

‹#›

Example

Cut = 1 Ratio cut = 1/1 + 1/5 = 1.2 Normalized cut = 1/1 + 1/9 = 1.11 Cut = 2 Ratio cut = 1/3 + 1/3 = 0.67 Normalized cut = 1/5 + 1/5 = 0.2 1 1 1 1

v1 v3 v2 v5 v4 v6

1 1 1 1 1

v1 v3 v2 v5 v4 v6

1

If graph is unweighted (or has the same edge weight)

slide-8
SLIDE 8

‹#›

Algorithm for Graph Partitioning

  • How to minimize the objective function?

– We can use a heuristic (greedy) approach to do this

u Example: METIS graph partitioning

http://www.cs.umn.edu/~metis

– An elegant way to optimize the function is by using ideas from spectral graph theory

u This leads to a class of algorithms known as spectral

clustering

slide-9
SLIDE 9

‹#›

Spectral Clustering

  • Spectral properties of a graph

– Spectral properties: eigenvalues/eigenvectors of the adjacency matrix can be used to represent a graph

  • There exists a relationship between spectral

properties of a graph and the graph partitioning problem

slide-10
SLIDE 10

‹#›

Spectral Properties of a Graph

  • Start with a similarity/adjacency matrix, W, of a

graph

  • Define a diagonal matrix D

– If W is a binary 0/1 matrix, then Dii represents the degree of node i

ï î ï í ì = = å

=

  • therwise

if

1

j i w D

n k ik ij

slide-11
SLIDE 11

‹#›

Preliminaries

ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é = 1 1 1 1 1 1 1 1 W

1 1 1 1 v1 v3 v2 v5 v4 v6

Two block- diagonal matrices

ï î ï í ì = = å

=

  • therwise

if

1

j i w D

n k ik ij

ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é = 1 1 2 2 1 1 D

Two clusters

slide-12
SLIDE 12

‹#›

Graph Laplacian Matrix

ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é = 1 1 1 1 1 1 1 1 W

1 1 1 1 v1 v3 v2 v5 v4 v6

Two block matrices

W D L

  • =

Laplacian,

ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é

  • =

1 1 1 1 1 1 2 2 1 1 1 1 1 1 L

Laplacian also has a block structure

slide-13
SLIDE 13

‹#›

Properties of Graph Laplacian

  • L = (D – W) is a symmetric matrix
  • L is a positive semi-definite matrix

– Consequence: all eigenvalues of L are ³ 0

slide-14
SLIDE 14

‹#›

Spectral Clustering

Consider a data set with N data points

1.

Construct an N ´ N similarity matrix, W

2.

Compute the N ´ N Laplacian matrix, L = D – W

3.

Compute the k “smallest” eigenvectors of L

a) Each eigenvector vi is an N ´ 1 column vector b) Create a matrix V containing eigenvectors v1, v2, .., vk as columns (you may exclude the first eigenvector)

4.

Cluster the rows in V using k-means or other clustering algorithms into K clusters

slide-15
SLIDE 15

‹#›

Example

slide-16
SLIDE 16

‹#›

Summary

  • Spectral properties of a graph (i.e., eigenvalues

and eigenvectors) contain information about clustering structure

  • To find k clusters, apply k-means or other

algorithms to the first k eigenvectors of the graph Laplacian matrix

slide-17
SLIDE 17

‹#›

Minimum Spanning Tree

  • Given the MST of data points, remove the longest edge

(inconsistent) and then the next longest edge,.......

slide-18
SLIDE 18

‹#›

slide-19
SLIDE 19

‹#›

  • One useful statistics that can be estimated from the MST is

the edge length distribution

  • For instance, in the case of 2 dense clusters immersed in a

sparse set of points

slide-20
SLIDE 20

‹#›

Cluster Validity

l Which clustering method is appropriate for a

particular data set?

l How does one determine whether the results of

a clustering method truly characterize data?

l How do you know when you have a good set of

clusters?

l Is it unusual to find a cluster as compact and

isolated as the observed clusters?

l How to guard against elaborate interpretation of

randomly distributed data?

slide-21
SLIDE 21

‹#›

21 21

Cluster Validity

  • Clustering algorithms find clusters, even if

there are no natural clusters in data

to design new methods, difficult to validate

K-Means; K=3 100 2D uniform data points

  • Cluster stability: Perturb data by bootstrapping.

How do clusters change over the ensemble

slide-22
SLIDE 22

‹#›

Hierarchical Clustering

  • Hierarchical clustering is a method of cluster

analysis which seeks to build a hierarchy of

  • clusters. Two approaches:
  • Agglomerative ("bottom up“): each point starts in

its own cluster, and pairs of clusters are merged as one moves up the hierarchy; more popular

  • Divisive ("top down“): all points start in one

cluster, and splits are performed recursively as

  • ne moves down the hierarchy

How to define similarity between two clusters or a point and a cluster?

slide-23
SLIDE 23

‹#›

Agglomerative Clustering Example

  • Cluster six elements {a}, {b}, {c}, {d}, {e} and {f} in 2D; use Euclidean

distance as a similarity

  • Build the hierarchy from the individual elements by progressively

merging clusters

  • Which elements to merge in a cluster? Usually, merge the two closest

elements, according to the chosen distance

slide-24
SLIDE 24

‹#›

Suppose we have merged the two closest elements b and c to obtain clusters {a}, {b, c}, {d}, {e} and {f} To merge them further, we need to take the distance between {a} and {b c}. Two common ways to define distance between two clusters:

  • The maximum distance between elements of each cluster (also called

complete-linkage clustering): max { d ( x , y ) : x ∈ A , y ∈ B }

  • The minimum distance between elements of each cluster (single-linkage

clustering): min { d ( x , y ) : x ∈ A , y ∈ B } Stop clustering either when the clusters are too far apart to be merged or when there is a sufficiently small number of clusters

Single-link v. Complete-link Hierarchical Clustering

slide-25
SLIDE 25

2D PCA Project ction of Iris Data

slide-26
SLIDE 26

Mi Mini nimum um S Spa panni nning ng T Tree C Clus usteri ring ng o

  • f

2D PCA Project ction of Iris Data

slide-27
SLIDE 27

K-Mea Means s Clusteri ering of Iri Iris s Data

(C (Clu lusterin ing A Assig ignment s t shown o

  • n 2

2D P PCA P Proje jectio tion) )

slide-28
SLIDE 28

Si Singl gle-lin link Clu lusterin ing of

  • f Iris

is Data

slide-29
SLIDE 29

Co Comp mplete-lin link Clu lusterin ing of

  • f Iris

is Data

slide-30
SLIDE 30

‹#›

Angkor Wat

Hindu temple built by a Khmer king ~1,150AD; Khmer kingdom declined in the 15th century; French explorers discovered the hidden ruins in late 1800’s

slide-31
SLIDE 31

‹#›

Apsaras of Angkor Wat

  • Angkor Wat contains the most unique gallery of ~2,000 women

depicted by detailed full body portraits

  • What facial types are represented in these portraits?
slide-32
SLIDE 32

‹#›

Clustering of Apsara Faces

Shape alignment How to validate the clusters or groups? 127 facial landmarks

127 landmarks 1 2 3 4 5 6 7 8 9 10

Single Link clusters

Single Link

slide-33
SLIDE 33

‹#›

Ground Truth

Khmer Dance and Cultural Center

slide-34
SLIDE 34

‹#›

Exploratory Data Analysis

Clustering with large weights assigned to chin and nose Example devata faces from the clusters differ largely in chin and nose, thereby reflecting the weights chosen for similarity

2D MDS Projection of the Similarity matrix

slide-35
SLIDE 35

‹#›

Exploratory Data Analysis

3D MDS Projection of the Similarity matrix

slide-36
SLIDE 36

‹#›

slide-37
SLIDE 37

‹#›

Spectral Clustering & Graph Partitioning

  • We have shown that the spectral properties of the

graph is related to the clusters

– How is it related to minimizing graph cut?

slide-38
SLIDE 38

‹#›

Graph Partitioning

  • Recall the following objective of graph partitioning

å å å å

Î Î Î Î

= = + = + =

2 1 2 1

, 2 1 2 1 2 1 2 1 2 2 1 1 2 1 2 1

) , Cut( e wher ) , Cut( ) , Cut( ) , ( cut Normalized | | ) , Cut( | | ) , Cut( ) , ( cut Ratio

V j V i ij j ij i V j j V i i

w V V w d d V V d V V V V V V V V V V V V

slide-39
SLIDE 39

‹#›

Ratio Cut

  • Let xi indicates membership of node vi in a cluster:
  • Also:

– where L is the graph Laplacian matrix

ï ï î ï ï í ì Î

  • Î

=

2 2 1 1 1 2

if | | | | if | | | | V v V V V v V V x

i i i

( ) ( ) ( )

å å å

Î Î Î Î

  • +
  • =
  • =

1 2 2 1

, 2 , 2 , 2

2 1 2 1 2 1

V j V i j i ij V j V i j i ij j i j i ij T

x x w x x w x x w Lx x

slide-40
SLIDE 40

‹#›

Ratio Cut

( ) ( )

) , ( | | | | | | | | | | | | | | ) , ( | | | | | | | | | | | | | | | | ) , ( 2 | | | | | | | | ) , ( 2 | | | | | | | | 2 1 2 | | | | | | | | 2 1 | | | | | | | | 2 1 | | | | | | | | 2 1 2 1 2 1

2 1 2 2 1 1 1 2 2 1 2 2 1 1 2 1 1 2 2 1 2 1 1 2 2 1 , 1 2 2 1 , 2 1 1 2 , 2 1 2 2 1 , 2 2 1 1 2 , 2 , 2

1 2 2 1 1 2 2 1 1 2 2 1

V V RatioCut V V V V V V V V V Cut V V V V V V V V V V Cut V V V V V V Cut V V V V w V V V V w V V V V w V V V V w x x w x x w Lx x

V j V i ij V j V i ij V j V i ij V j V i ij V j V i j i ij V j V i j i ij T

´ = ÷ ÷ ø ö ç ç è æ + + + = ÷ ÷ ø ö ç ç è æ + + + = ÷ ÷ ø ö ç ç è æ + + = ÷ ÷ ø ö ç ç è æ + + + ÷ ÷ ø ö ç ç è æ + + = ÷ ÷ ø ö ç ç è æ

  • +

÷ ÷ ø ö ç ç è æ + =

  • +
  • =

å å å å å å

Î Î Î Î Î Î Î Î Î Î Î Î

slide-41
SLIDE 41

‹#›

Ratio Cut

  • Therefore:

– Thus, we have related ratio cut to Laplacian matrix L – But there is one issue:

u Trivial solution is x is a vector of all zeros u Need to look for a non-trivial solution

– Look for constraints that must be satisfied by x

Lx x V V RatioCut

T x V V

min ) , ( min

2 1 , 2

1

=

| | | | | | | | 1

2 1

2 1 1 2 1

=

  • =

=

å å å

Î Î = V i V i n i i T

V V V V x x

The solution x must be orthogonal to the vector of all 1s

slide-42
SLIDE 42

‹#›

Ratio Cut

Another constraint that must be satisfied by x:

Lx x V V RatioCut

T x V V

min ) , ( min

2 1 , 2

1

=

n V V V V V V x x x

V i V i n i i T

| | | | | | | | | | | |

1 2 2 2 1 2 1 2 1 2

2 1

= + = ÷ ÷ ø ö ç ç è æ- + ÷ ÷ ø ö ç ç è æ = =

å å å

Î Î =

slide-43
SLIDE 43

‹#›

Ratio Cut

subject to:

  • This is a constrained optimization problem where
  • Instead, we solve a relaxation of the problem:

Lx xT

x

min

n x xT =

( )

x Lx x Lx x F n x x Lx x F

T T

l l l = Þ =

  • =

¶ ¶

  • =

ï ï î ï ï í ì Î

  • Î

=

2 2 1 1 1 2

if | | | | if | | | | V v V V V v V V x

i i i

slide-44
SLIDE 44

‹#›

Putting It Altogether

  • We have shown that

– Minimizing graph cut is equivalent to finding x that – Solution for x is given by the eigenvectors of L – Thus, the spectral decomposition of graph Laplacian is equivalent to the solution of the graph partitioning problem

Lx xT

x

min

n x xT = such that

slide-45
SLIDE 45

‹#›

Spectral Clustering with Ratio Cut

  • But lmin=0 with eigenvector 1 = (1 1 1…1)T
  • Since we want a solution where xT1 = 0, so x ¹ 1
  • Instead of the smallest eigenvalue, we look for the

eigenvector corresponding to the next smallest eigenvalue

  • In summary, finding the eigenvector that corresponds to

the second smallest eigenvalue is a relaxation of the ratio cut graph partitioning problem (for k=2)

l l min min min = = x x Lx x

T x T x

slide-46
SLIDE 46

‹#›

Properties of Graph Laplacian

  • L = (D – W) is a symmetric matrix
  • L is a positive semi-definite matrix

– For all real-valued vectors, x: xTLx ³ 0 – Consequence: all eigenvalues of L are ³ 0

( )

å å å å å å å

=

  • =

÷ ÷ ø ö ç ç è æ +

  • =

=

  • =
  • =
  • =

N j i j i ij j i i ij j i j i ij j i i ij j ij i j i j i ij i i i T T T T

x x w x w x x w x w x d x x w x d Wx x Dx x x W D x Lx x

1 , 2 , 2 , , 2 , 2

2 1 2 2 1 ) where ( ) (

slide-47
SLIDE 47

‹#›

Properties of Laplacian Matrix

ú ú ú ú û ù ê ê ê ê ë é = ú ú ú ú û ù ê ê ê ê ë é ú ú ú ú û ù ê ê ê ê ë é = ú ú ú ú û ù ê ê ê ê ë é = ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é = ú ú ú ú û ù ê ê ê ê ë é ú ú ú ú û ù ê ê ê ê ë é = ú ú ú ú û ù ê ê ê ê ë é =

å å å

= = = dd dd dd d j dj d j j d j j dd d d d d

D D D D D D De D D D w w w w w w w w w w w w We e ... 1 ... 1 1 ... ... ... ... ... ... ... ... ... 1 ... 1 1 ... ... ... ... ... ... ... , 1 ... 1 1 Suppose

22 11 22 11 22 11 1 1 2 1 1 2 1 2 22 21 1 12 11

slide-48
SLIDE 48

‹#›

Properties of Laplacian Matrix

  • Since e ¹ [0..0]T, therefore l = 0

– 0 is an eigenvalue of L with the corresponding eigenvector e = [1 1 1 1…1]T – Furthermore, since L is positive semi-definite, 0 is the smallest eigenvalue of L

e Le Le e W D We De l = = Þ =

  • Þ

= : equation Eigenvalue ) (

slide-49
SLIDE 49

‹#›

Properties of Laplacian Matrix

  • More generally, if

– Then

u There are k eigenvalues of L which have the value 0 u The corresponding eigenvectors are:

ú ú ú ú û ù ê ê ê ê ë é =

k

L L L L ... ... ... ...

2 1

ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é e e e ... ... ,..., ... ... , ... ...

where e is [1 1…1]T

slide-50
SLIDE 50

‹#›

Properties of Laplacian Matrix

Eigenvalues of L:

ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ë é = L 3 3 1 1 .

Eigenvectors of L: ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ë é

  • =

41 . 71 . 58 . 41 . 71 . 58 . 82 . 58 . 81 . 58 . 41 . 71 . 58 . 41 . 71 . 58 . V

1 1 1 1 v1 v3 v2 v5 v4 v6

ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é

  • =

1 1 1 1 1 1 2 2 1 1 1 1 1 1 L

slide-51
SLIDE 51

‹#›

Properties of Laplacian Matrix

Eigenvalues of L:

ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ë é = L 3 3 1 1 .

Eigenvectors of L: ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ë é

  • =

41 . 71 . 58 . 41 . 71 . 58 . 82 . 58 . 81 . 58 . 41 . 71 . 58 . 41 . 71 . 58 . V

1 1 1 1 v1 v3 v2 v5 v4 v6 If we cluster the data using

  • nly the first 2 eigenvectors,

we get the two desired clusters

slide-52
SLIDE 52

‹#›

Properties of Laplacian Matrix

ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é = 1 1 1 1 1 1 1 1 1 1 W

W D L

  • =

Laplacian,

ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é

  • =

1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 L

1 1 1 1 v1 v3 v2 v5 v4 v6 1 Clusters are no longer well separated

slide-53
SLIDE 53

‹#›

Properties of Laplacian Matrix

ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é

  • =

1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 L

ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ë é

  • =

18 . 29 . 65 . 28 . 46 . 41 . 18 . 29 . 65 . 28 . 46 . 41 . 66 . 58 . 26 . 41 . 66 . 58 . 26 . 41 . 18 . 29 . 28 . 65 . 46 . 41 . 18 . 29 . 28 . 65 . 46 . 41 . V ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ë é = L 56 . 4 3 1 1 44 .

Eigenvalues of L: Eigenvectors of L:

1 1 1 1 v1 v3 v2 v5 v4 v6 1

slide-54
SLIDE 54

‹#›

Properties of Laplacian Matrix

Eigenvalues of the graph Laplacian: 0, 0.5505, 0.5505, 3, 3, 3, 3, 5.4495, 5.4495 ú ú ú ú ú ú ú ú ú ú ú ú û ù ê ê ê ê ê ê ê ê ê ê ê ê ë é

  • =

... ... ... ... 50 . 23 . 33 . ... ... ... ... 35 . 50 . 23 . 33 . ... ... ... ... 35 . 23 . 10 . 33 . ... ... ... ... 61 . 05 . 55 . 33 . ... ... ... ... 26 . 05 . 55 . 33 . ... ... ... ... 35 . 02 . 25 . 33 . ... ... ... ... 35 . 20 . 14 . 33 . ... ... ... ... 18 . 71 . 45 . 32 . 33 . ... ... ... ... 18 . 71 . 45 . 32 . 33 . V Eigenvectors of Laplacian:

Can be used to obtain 3 clusters