Clustering Lecture 8 David Sontag New York University - - PowerPoint PPT Presentation
Clustering Lecture 8 David Sontag New York University - - PowerPoint PPT Presentation
Clustering Lecture 8 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein Clustering Clustering: Unsupervised learning
Clustering
Clustering:
– Unsupervised learning – Requires data, but no labels – Detect patterns e.g. in
- Group emails or search results
- Customer shopping patterns
- Regions of images
– Useful when don’t know what you’re looking for – But: can get gibberish
Clustering
- Basic idea: group together similar instances
- Example: 2D point patterns
Clustering
- Basic idea: group together similar instances
- Example: 2D point patterns
Clustering
- Basic idea: group together similar instances
- Example: 2D point patterns
- What could “similar” mean?
– One option: small Euclidean distance (squared) – Clustering results are crucially dependent on the measure of similarity (or distance) between “points” to be clustered dist(~ x, ~ y) = ||~ x − ~ y||2
2
Clustering algorithms
- /(&'+'0-(0+"+"*,'(%-.$
– 1,%%,.#23 +**",.&'+%(4& – 5,26,7)3 6(4($(4&
- 8+'%(%(,)+"*,'(%-.$9:"+%;
– <.&+)$ – =(>%#'&,?@+#$$(+) – A2&0%'+"!"#$%&'()*
Clustering examples
¡Image ¡segmenta2on ¡ Goal: ¡Break ¡up ¡the ¡image ¡into ¡meaningful ¡or ¡ perceptually ¡similar ¡regions ¡
[Slide from James Hayes]
Clustering examples
Clustering gene expression data
Eisen et al, PNAS 1998
Clustering examples
¡Cluster ¡news ¡ ar2cles ¡
Clustering examples
Cluster ¡people ¡by ¡space ¡and ¡2me ¡
[Image from Pilho Kim]
Clustering examples
Clustering ¡languages ¡
[Image from scienceinschool.org]
Clustering examples
Clustering ¡languages ¡
[Image from dhushara.com]
Clustering examples
Clustering ¡species ¡ (“phylogeny”) ¡
[Lindblad-Toh et al., Nature 2005]
Clustering examples
Clustering ¡search ¡queries ¡
K-Means
- An iterative clustering
algorithm
– Initialize: Pick K random
points as cluster centers
– Alternate:
- 1. Assign data points to
closest cluster center
- 2. Change the cluster
center to the average
- f its assigned points
– Stop when no points’ assignments change
K-Means
- An iterative clustering
algorithm
– Initialize: Pick K random
points as cluster centers
– Alternate:
- 1. Assign data points to
closest cluster center
- 2. Change the cluster
center to the average
- f its assigned points
– Stop when no points’ assignments change
K-‑means ¡clustering: ¡Example ¡
- Pick K random
points as cluster centers (means) Shown here for K=2
17
K-‑means ¡clustering: ¡Example ¡
Iterative Step 1
- Assign data points to
closest cluster center
18
K-‑means ¡clustering: ¡Example ¡
19
Iterative Step 2
- Change the cluster
center to the average of the assigned points
K-‑means ¡clustering: ¡Example ¡
- Repeat ¡unDl ¡
convergence ¡
20
ProperDes ¡of ¡K-‑means ¡algorithm ¡
- Guaranteed ¡to ¡converge ¡in ¡a ¡finite ¡number ¡of ¡
iteraDons ¡
- Running ¡Dme ¡per ¡iteraDon: ¡
- 1. Assign data points to closest cluster center
O(KN) time
- 2. Change the cluster center to the average of its
assigned points O(N) ¡
!"#$%& '(%)#*+#%,#
!"#$%&'($
- . /012(340"05#!"
- 6. /01!#(340"05#
- – 7$8#3$*40$9:#*0)$40)#(; $%:,(5#*(2<#=$)#
!"#$%&'()#*+,
- !"#$-&'()#*+,
!"#$%& 4$8#&$%$94#*%$40%+(340"05$40(%$33*($,=2#$,=&4#30&+>$*$%4##:4( :#,*#$=#(?@#,40)#A 4=>&+>$*$%4##:4(,(%)#*+#
[Slide from Alan Fern] with respect to
Example: K-Means for Segmentation
K = 2 K = 3 K = 10 Original image
K=2 Original
Goal of Segmentation is to partition an image into regions each of which has reasonably homogenous visual appearance.
Example: K-Means for Segmentation
K = 2 K = 3 K = 10 Original image
K=2 K=3 K=10 Original
Example: K-Means for Segmentation
K = 2 K = 3 K = 10 Original image
K=2 K=3 K=10 Original
Example: Vector quantization
FIGURE 14.9. Sir Ronald A. Fisher (1890 − 1962) was one of the founders
- f modern day statistics, to whom we owe maximum-likelihood, sufficiency, and
many other fundamental concepts. The image on the left is a 1024×1024 grayscale image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using 200 code vectors, with a compression rate of 1.9 bits/pixel. The right image uses
- nly four code vectors, with a compression rate of 0.50 bits/pixel
[Figure from Hastie et al. book]
Initialization
- K-means algorithm is a
heuristic
– Requires initial means – It does matter what you pick! – What can go wrong? – Various schemes for preventing this kind of thing: variance-based split / merge, initialization heuristics
K-Means Getting Stuck
A local optimum:
Would be better to have
- ne cluster here
… and two clusters here
K-means not able to properly cluster
X Y
Changing the features (distance function) can help
θ R
Hierarchical ¡Clustering ¡
Agglomerative Clustering
- Agglomerative clustering:
– First merge very similar instances – Incrementally build larger clusters out
- f smaller clusters
- Algorithm:
– Maintain a set of clusters – Initially, each instance in its own cluster – Repeat:
- Pick the two closest clusters
- Merge them into a new cluster
- Stop when there’s only one cluster left
- Produces not one clustering, but a
family of clusterings represented by a dendrogram
Agglomerative Clustering
- How should we define “closest” for clusters
with multiple elements?
Agglomerative Clustering
- How should we define “closest” for clusters
with multiple elements?
- Many options:
– Closest pair (single-link clustering) – Farthest pair (complete-link clustering) – Average of all pairs
- Different choices create
different clustering behaviors
Agglomerative Clustering
- How should we define “closest” for clusters
with multiple elements?
Farthest pair (complete-link clustering) Closest pair (single-link clustering)
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
[Pictures from Thorsten Joachims]
Clustering ¡Behavior ¡
Average
Mouse tumor data from [Hastie et al.]
Farthest Nearest
AgglomeraDve ¡Clustering ¡
When ¡can ¡this ¡be ¡expected ¡to ¡work? ¡ Closest pair (single-link clustering)
1 2 3 4 5 6 7 8
Strong separation property: All points are more similar to points in their own cluster than to any points in any other cluster Then, the true clustering corresponds to some pruning of the tree obtained by single-link clustering! Slightly weaker (stability) conditions are solved by average-link clustering (Balcan et al., 2008)
Spectral ¡Clustering ¡
Slides adapted from James Hays, Alan Fern, and Tommi Jaakkola
Spectral ¡clustering ¡
[Shi & Malik ‘00; Ng, Jordan, Weiss NIPS ‘01]
1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 two circles, 2 clusters (K−means) 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 twocircles, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
K-means Spectral clustering
Spectral ¡clustering ¡
[Figures from Ng, Jordan, Weiss NIPS ‘01]
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 nips, 8 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 lineandballs, 3 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 fourclouds, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 squiggles, 4 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 twocircles, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 threecircles−joined, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 threecircles−joined, 3 clusters − − −
Spectral ¡clustering ¡
¡ ¡Group ¡points ¡based ¡on ¡links ¡in ¡a ¡graph ¡
A B [Slide from James Hays]
!"#$"%&'($'$)'*&(+),
- -$./0"11"2$"3/'(*(3//.(24'&2'5$"
0"1+3$'/.1.5(&.$67'$#''2"78'0$/
- 92'0"35:0&'($'
– ;<35560"22'0$':=&(+) – 42'(&'/$2'.=)7"&=&(+)>'(0)2":'./"256 0"22'0$':$".$/42'(&'/$2'.=)7"&/?
A B [Slide from Alan Fern]
Spectral ¡clustering ¡for ¡segmenta7on ¡
[Slide from James Hays]
Can ¡we ¡use ¡minimum ¡cut ¡for ¡ clustering? ¡
[Shi & Malik ‘00]
Graphpartitioning
GraphTerminologies
- Degreeofnodes
- Volumeofaset
GraphCut
- ConsiderapartitionofthegraphintotwopartsA
andB
- Cut(A,B):sumoftheweightsofthesetofedgesthat
connectthetwogroups
- Anintuitivegoalisfindthepartitionthatminimizes
thecut
NormalizedCut
- Considertheconnectivitybetweengroups
relativetothevolumeofeachgroup
A B
) ( ) , ( ) ( ) , ( ) , ( B Vol B A cut A Vol B A cut B A Ncut
- )
( ) ( ) ( ) ( ) , ( ) , ( B Vol A Vol B Vol A Vol B A cut B A Ncut
- MinimizedwhenVol(A)andVol(B)areequal.
Thusencouragebalancedcut
1 D yT
Subjectto:
SolvingNCut
- HowtominimizeNcut?
- Withsomesimplifications,wecanshow:
Dy y y W D y x Ncut
T T y x
) ( min ) ( min
- Rayleighquotient
NPHard!
. 1 ) ( , } 1 , 1 { in vector a be Let ); , ( ) , ( matrix, diag. the be D Let ; ) , ( matrix, similarity the be Let
,
A i i x x j i W i i D W j i W W
N j j i
- (y takes discrete values)
- Relaxtheoptimizationproblemintothecontinuousdomain
bysolvinggeneralizedeigenvaluesystem: subjectto
- Whichgives:
- Notethat ,sothefirsteigenvectoris
witheigenvalue.
- Thesecondsmallesteigenvectoristherealvaluedsolutionto
thisproblem!!
SolvingNCut
2wayNormalizedCuts
- 1. ComputetheaffinitymatrixW,computethe
degreematrix(D),Disdiagonaland
- 2. Solve
,where is calledtheLaplacian matrix
- 3. Usetheeigenvectorwiththesecondsmallest
eigenvaluetobipartitionthegraphintotwo parts.
CreatingBipartitionUsing2nd Eigenvector
- Sometimesthereisnotaclearthresholdtosplit
basedonthesecondvectorsinceittakes continuousvalues
- Howtochoosethesplittingpoint?
a) Pickaconstantvalue(0,or0.5). b) Pickthemedianvalueassplittingpoint. c) LookforthesplittingpointthathastheminimumNcut value:
1. Choosen possiblesplittingpoints. 2. ComputeNcut value. 3. Pickminimum.
Spectral clustering: example
−3 −2 −1 1 2 3 4 5 −2 −1 1 2 3 4 5 6 −4 −2 2 4 6 −2 −1 1 2 3 4 5 6
Tommi Jaakkola, MIT CSAIL 18
Spectral clustering: example cont’d
5 10 15 20 25 30 35 40 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5
Components of the eigenvector corresponding to the second largest eigenvalue
Tommi Jaakkola, MIT CSAIL 19
KwayPartition?
- Recursivebipartitioning(Hagenetal.,^91)
– Recursivelyapplybipartitioningalgorithmina hierarchicaldivisivemanner. – Disadvantages:Inefficient,unstable
- Clustermultipleeigenvectors