Hierarchical)&)Spectral)clustering) Lecture)13) - - PowerPoint PPT Presentation
Hierarchical)&)Spectral)clustering) Lecture)13) - - PowerPoint PPT Presentation
Hierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein Agglomerative Clustering Agglomerative
Agglomerative Clustering
- Agglomerative clustering:
– First merge very similar instances – Incrementally build larger clusters out
- f smaller clusters
- Algorithm:
– Maintain a set of clusters – Initially, each instance in its own cluster – Repeat:
- Pick the two closest clusters
- Merge them into a new cluster
- Stop when there’s only one cluster left
- Produces not one clustering, but a
family of clusterings represented by a dendrogram
Agglomerative Clustering
- How should we define closest for clusters
with multiple elements?
Agglomerative Clustering
- How should we define closest for clusters
with multiple elements?
- Many options:
– Closest pair (single-link clustering) – Farthest pair (complete-link clustering) – Average of all pairs
- Different choices create
different clustering behaviors
Agglomerative Clustering
- How should we define closest for clusters
with multiple elements?
Farthest pair (complete-link clustering) Closest pair (single-link clustering)
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
[Pictures from Thorsten Joachims]
Clustering&Behavior&
Average
Mouse tumor data from [Hastie et al.]
Farthest Nearest
Agglomera<ve&Clustering&
When&can&this&be&expected&to&work?& Closest pair (single-link clustering)
1 2 3 4 5 6 7 8
Strong separation property: All points are more similar to points in their own cluster than to any points in any other cluster Then, the true clustering corresponds to some pruning of the tree obtained by single-link clustering! Slightly weaker (stability) conditions are solved by average-link clustering (Balcan et al., 2008)
Spectral)Clustering)
Slides adapted from James Hays, Alan Fern, and Tommi Jaakkola
Spectral)clustering)
[Shi & Malik ‘00; Ng, Jordan, Weiss NIPS ‘01]
1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 two circles, 2 clusters (K−means) 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 twocircles, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
K-means Spectral clustering
Spectral)clustering)
[Figures from Ng, Jordan, Weiss NIPS ‘01]
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 nips, 8 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 lineandballs, 3 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 fourclouds, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 squiggles, 4 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 twocircles, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 threecircles−joined, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 threecircles−joined, 3 clusters − − −
Spectral)clustering)
))Group)points)based)on)links)in)a)graph)
A B [Slide from James Hays]
!"#$"%&'($'$)'*&(+),
- -$./0"11"2$"3/'(*(3//.(24'&2'5$"
0"1+3$'/.1.5(&.$67'$#''2"78'0$/
- 92'0"35:0&'($'
– ;<35560"22'0$':=&(+) – 42'(&'/$2'.=)7"&=&(+)>'(0)2":'./"256 0"22'0$':$".$/42'(&'/$2'.=)7"&/?
A B [Slide from Alan Fern]
Can)we)use)minimum)cut)for) clustering?)
[Shi & Malik ‘00]
Graphpartitioning
GraphTerminologies
- Degreeofnodes
- Volumeofaset
GraphCut
- ConsiderapartitionofthegraphintotwopartsA
andB
- Cut(A,B):sumoftheweightsofthesetofedgesthat
connectthetwogroups
- Anintuitivegoalisfindthepartitionthatminimizes
thecut
NormalizedCut
- Considertheconnectivitybetweengroups
relativetothevolumeofeachgroup
A B
) ( ) , ( ) ( ) , ( ) , ( B Vol B A cut A Vol B A cut B A Ncut
- )
( ) ( ) ( ) ( ) , ( ) , ( B Vol A Vol B Vol A Vol B A cut B A Ncut
- MinimizedwhenVol(A)andVol(B)areequal.
Thusencouragebalancedcut
1 D yT
Subjectto:
SolvingNCut
- HowtominimizeNcut?
- Withsomesimplifications,wecanshow:
Dy y y W D y x Ncut
T T y x
) ( min ) ( min
- Rayleighquotient
NPHard!
. 1 ) ( , } 1 , 1 { in vector a be Let ); , ( ) , ( matrix, diag. the be D Let ; ) , ( matrix, similarity the be Let
,
A i i x x j i W i i D W j i W W
N j j i
- (y takes discrete values)
- Relaxtheoptimizationproblemintothecontinuousdomain
bysolvinggeneralizedeigenvaluesystem: subjectto
- Whichgives:
- Notethat ,sothefirsteigenvectoris
witheigenvalue.
- Thesecondsmallesteigenvectoristherealvaluedsolutionto
thisproblem!!
SolvingNCut
2wayNormalizedCuts
- 1. ComputetheaffinitymatrixW,computethe
degreematrix(D),Disdiagonaland
- 2. Solve
,where is calledtheLaplacian matrix
- 3. Usetheeigenvectorwiththesecondsmallest
eigenvaluetobipartitionthegraphintotwo parts.
CreatingBipartitionUsing2nd Eigenvector
- Sometimesthereisnotaclearthresholdtosplit
basedonthesecondvectorsinceittakes continuousvalues
- Howtochoosethesplittingpoint?
a) Pickaconstantvalue(0,or0.5). b) Pickthemedianvalueassplittingpoint. c) LookforthesplittingpointthathastheminimumNcut value:
1. Choosen possiblesplittingpoints. 2. ComputeNcut value. 3. Pickminimum.
Spectral clustering: example
−3 −2 −1 1 2 3 4 5 −2 −1 1 2 3 4 5 6 −4 −2 2 4 6 −2 −1 1 2 3 4 5 6
Tommi Jaakkola, MIT CSAIL 18
Spectral clustering: example cont’d
5 10 15 20 25 30 35 40 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5
Components of the eigenvector corresponding to the second largest eigenvalue
Tommi Jaakkola, MIT CSAIL 19
KwayPartition?
- Recursivebipartitioning(Hagenetal.,^91)
– Recursivelyapplybipartitioningalgorithmina hierarchicaldivisivemanner. – Disadvantages:Inefficient,unstable
- Clustermultipleeigenvectors