Clustering Lecture 8 David Sontag New York University - - PowerPoint PPT Presentation

clustering lecture 8
SMART_READER_LITE
LIVE PREVIEW

Clustering Lecture 8 David Sontag New York University - - PowerPoint PPT Presentation

Clustering Lecture 8 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein Clustering Clustering: Unsupervised learning


slide-1
SLIDE 1

Clustering ¡ Lecture ¡8 ¡

David ¡Sontag ¡ New ¡York ¡University ¡

Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

slide-2
SLIDE 2

Clustering

Clustering:

– Unsupervised learning – Requires data, but no labels – Detect patterns e.g. in

  • Group emails or search results
  • Customer shopping patterns
  • Regions of images

– Useful when don’t know what you’re looking for – But: can get gibberish

slide-3
SLIDE 3

Clustering

  • Basic idea: group together similar instances
  • Example: 2D point patterns
slide-4
SLIDE 4

Clustering

  • Basic idea: group together similar instances
  • Example: 2D point patterns
slide-5
SLIDE 5

Clustering

  • Basic idea: group together similar instances
  • Example: 2D point patterns
  • What could “similar” mean?

– One option: small Euclidean distance (squared) – Clustering results are crucially dependent on the measure of similarity (or distance) between “points” to be clustered dist(~ x, ~ y) = ||~ x − ~ y||2

2

slide-6
SLIDE 6

Clustering algorithms

  • /(&'+'0-(0+"+"*,'(%-.$

– 1,%%,.#23 +**",.&'+%(4& – 5,26,7)3 6(4($(4&

  • 8+'%(%(,)+"*,'(%-.$9:"+%;

– <.&+)$ – =(>%#'&,?@+#$$(+) – A2&0%'+"!"#$%&'()*

slide-7
SLIDE 7

Clustering examples

¡Image ¡segmenta2on ¡ Goal: ¡Break ¡up ¡the ¡image ¡into ¡meaningful ¡or ¡ perceptually ¡similar ¡regions ¡

[Slide from James Hayes]

slide-8
SLIDE 8

Clustering examples

Clustering gene expression data

Eisen et al, PNAS 1998

slide-9
SLIDE 9

Clustering examples

¡Cluster ¡news ¡ ar2cles ¡

slide-10
SLIDE 10

Clustering examples

Cluster ¡people ¡by ¡space ¡and ¡2me ¡

[Image from Pilho Kim]

slide-11
SLIDE 11

Clustering examples

Clustering ¡languages ¡

[Image from scienceinschool.org]

slide-12
SLIDE 12

Clustering examples

Clustering ¡languages ¡

[Image from dhushara.com]

slide-13
SLIDE 13

Clustering examples

Clustering ¡species ¡ (“phylogeny”) ¡

[Lindblad-Toh et al., Nature 2005]

slide-14
SLIDE 14

Clustering examples

Clustering ¡search ¡queries ¡

slide-15
SLIDE 15

K-Means

  • An iterative clustering

algorithm

– Initialize: Pick K random

points as cluster centers

– Alternate:

  • 1. Assign data points to

closest cluster center

  • 2. Change the cluster

center to the average

  • f its assigned points

– Stop when no points’ assignments change

slide-16
SLIDE 16

K-Means

  • An iterative clustering

algorithm

– Initialize: Pick K random

points as cluster centers

– Alternate:

  • 1. Assign data points to

closest cluster center

  • 2. Change the cluster

center to the average

  • f its assigned points

– Stop when no points’ assignments change

slide-17
SLIDE 17

K-­‑means ¡clustering: ¡Example ¡

  • Pick K random

points as cluster centers (means) Shown here for K=2

17

slide-18
SLIDE 18

K-­‑means ¡clustering: ¡Example ¡

Iterative Step 1

  • Assign data points to

closest cluster center

18

slide-19
SLIDE 19

K-­‑means ¡clustering: ¡Example ¡

19

Iterative Step 2

  • Change the cluster

center to the average of the assigned points

slide-20
SLIDE 20

K-­‑means ¡clustering: ¡Example ¡

  • Repeat ¡unDl ¡

convergence ¡

20

slide-21
SLIDE 21

ProperDes ¡of ¡K-­‑means ¡algorithm ¡

  • Guaranteed ¡to ¡converge ¡in ¡a ¡finite ¡number ¡of ¡

iteraDons ¡

  • Running ¡Dme ¡per ¡iteraDon: ¡
  • 1. Assign data points to closest cluster center

O(KN) time

  • 2. Change the cluster center to the average of its

assigned points O(N) ¡

slide-22
SLIDE 22

!"#$%& '(%)#*+#%,#

!"#$%&'($

  • . /012(340"05#!"
  • 6. /01!#(340"05#
  • – 7$8#3$*40$9:#*0)$40)#(; $%:&#44(5#*(2<#=$)#

!"#$%&'()#*+,

  • !"#$-&'()#*+,

!"#$%& 4$8#&$%$94#*%$40%+(340"05$40(%$33*($,=2#$,=&4#30&+>$*$%4##:4( :#,*#$&#4=#(?@#,40)#A 4=>&+>$*$%4##:4(,(%)#*+#

[Slide from Alan Fern] with respect to

slide-23
SLIDE 23

Example: K-Means for Segmentation

K = 2 K = 3 K = 10 Original image

K=2 Original

Goal of Segmentation is to partition an image into regions each of which has reasonably homogenous visual appearance.

slide-24
SLIDE 24

Example: K-Means for Segmentation

K = 2 K = 3 K = 10 Original image

K=2 K=3 K=10 Original

slide-25
SLIDE 25

Example: K-Means for Segmentation

K = 2 K = 3 K = 10 Original image

K=2 K=3 K=10 Original

slide-26
SLIDE 26

Example: Vector quantization

FIGURE 14.9. Sir Ronald A. Fisher (1890 − 1962) was one of the founders

  • f modern day statistics, to whom we owe maximum-likelihood, sufficiency, and

many other fundamental concepts. The image on the left is a 1024×1024 grayscale image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using 200 code vectors, with a compression rate of 1.9 bits/pixel. The right image uses

  • nly four code vectors, with a compression rate of 0.50 bits/pixel

[Figure from Hastie et al. book]

slide-27
SLIDE 27

Initialization

  • K-means algorithm is a

heuristic

– Requires initial means – It does matter what you pick! – What can go wrong? – Various schemes for preventing this kind of thing: variance-based split / merge, initialization heuristics

slide-28
SLIDE 28

K-Means Getting Stuck

A local optimum:

Would be better to have

  • ne cluster here

… and two clusters here

slide-29
SLIDE 29

K-means not able to properly cluster

X Y

slide-30
SLIDE 30

Changing the features (distance function) can help

θ R

slide-31
SLIDE 31

Hierarchical ¡Clustering ¡

slide-32
SLIDE 32

Agglomerative Clustering

  • Agglomerative clustering:

– First merge very similar instances – Incrementally build larger clusters out

  • f smaller clusters
  • Algorithm:

– Maintain a set of clusters – Initially, each instance in its own cluster – Repeat:

  • Pick the two closest clusters
  • Merge them into a new cluster
  • Stop when there’s only one cluster left
  • Produces not one clustering, but a

family of clusterings represented by a dendrogram

slide-33
SLIDE 33

Agglomerative Clustering

  • How should we define “closest” for clusters

with multiple elements?

slide-34
SLIDE 34

Agglomerative Clustering

  • How should we define “closest” for clusters

with multiple elements?

  • Many options:

– Closest pair (single-link clustering) – Farthest pair (complete-link clustering) – Average of all pairs

  • Different choices create

different clustering behaviors

slide-35
SLIDE 35

Agglomerative Clustering

  • How should we define “closest” for clusters

with multiple elements?

Farthest pair (complete-link clustering) Closest pair (single-link clustering)

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

[Pictures from Thorsten Joachims]

slide-36
SLIDE 36

Clustering ¡Behavior ¡

Average

Mouse tumor data from [Hastie et al.]

Farthest Nearest

slide-37
SLIDE 37

AgglomeraDve ¡Clustering ¡

When ¡can ¡this ¡be ¡expected ¡to ¡work? ¡ Closest pair (single-link clustering)

1 2 3 4 5 6 7 8

Strong separation property: All points are more similar to points in their own cluster than to any points in any other cluster Then, the true clustering corresponds to some pruning of the tree obtained by single-link clustering! Slightly weaker (stability) conditions are solved by average-link clustering (Balcan et al., 2008)

slide-38
SLIDE 38

Spectral ¡Clustering ¡

Slides adapted from James Hays, Alan Fern, and Tommi Jaakkola

slide-39
SLIDE 39

Spectral ¡clustering ¡

[Shi & Malik ‘00; Ng, Jordan, Weiss NIPS ‘01]

1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 two circles, 2 clusters (K−means) 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 twocircles, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

K-means Spectral clustering

slide-40
SLIDE 40

Spectral ¡clustering ¡

[Figures from Ng, Jordan, Weiss NIPS ‘01]

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 nips, 8 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 lineandballs, 3 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 fourclouds, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 squiggles, 4 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 twocircles, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 threecircles−joined, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 threecircles−joined, 3 clusters − − −

slide-41
SLIDE 41

Spectral ¡clustering ¡

¡ ¡Group ¡points ¡based ¡on ¡links ¡in ¡a ¡graph ¡

A B [Slide from James Hays]

slide-42
SLIDE 42

!"#$"%&'($'$)'*&(+),

  • -$./0"11"2$"3/'(*(3//.(24'&2'5$"

0"1+3$'/.1.5(&.$67'$#''2"78'0$/

  • 92'0"35:0&'($'

– ;<35560"22'0$':=&(+) – 42'(&'/$2'.=)7"&=&(+)>'(0)2":'./"256 0"22'0$':$".$/42'(&'/$2'.=)7"&/?

A B [Slide from Alan Fern]

slide-43
SLIDE 43

Spectral ¡clustering ¡for ¡segmenta7on ¡

[Slide from James Hays]

slide-44
SLIDE 44

Can ¡we ¡use ¡minimum ¡cut ¡for ¡ clustering? ¡

[Shi & Malik ‘00]

slide-45
SLIDE 45

Graphpartitioning

slide-46
SLIDE 46

GraphTerminologies

  • Degreeofnodes
  • Volumeofaset
slide-47
SLIDE 47

GraphCut

  • ConsiderapartitionofthegraphintotwopartsA

andB

  • Cut(A,B):sumoftheweightsofthesetofedgesthat

connectthetwogroups

  • Anintuitivegoalisfindthepartitionthatminimizes

thecut

slide-48
SLIDE 48

NormalizedCut

  • Considertheconnectivitybetweengroups

relativetothevolumeofeachgroup

A B

) ( ) , ( ) ( ) , ( ) , ( B Vol B A cut A Vol B A cut B A Ncut

  • )

( ) ( ) ( ) ( ) , ( ) , ( B Vol A Vol B Vol A Vol B A cut B A Ncut

  • MinimizedwhenVol(A)andVol(B)areequal.

Thusencouragebalancedcut

slide-49
SLIDE 49

1 D yT

Subjectto:

SolvingNCut

  • HowtominimizeNcut?
  • Withsomesimplifications,wecanshow:

Dy y y W D y x Ncut

T T y x

) ( min ) ( min

  • Rayleighquotient

NPHard!

. 1 ) ( , } 1 , 1 { in vector a be Let ); , ( ) , ( matrix, diag. the be D Let ; ) , ( matrix, similarity the be Let

,

A i i x x j i W i i D W j i W W

N j j i

  • (y takes discrete values)
slide-50
SLIDE 50
  • Relaxtheoptimizationproblemintothecontinuousdomain

bysolvinggeneralizedeigenvaluesystem: subjectto

  • Whichgives:
  • Notethat ,sothefirsteigenvectoris

witheigenvalue.

  • Thesecondsmallesteigenvectoristherealvaluedsolutionto

thisproblem!!

SolvingNCut

slide-51
SLIDE 51

2wayNormalizedCuts

  • 1. ComputetheaffinitymatrixW,computethe

degreematrix(D),Disdiagonaland

  • 2. Solve

,where is calledtheLaplacian matrix

  • 3. Usetheeigenvectorwiththesecondsmallest

eigenvaluetobipartitionthegraphintotwo parts.

slide-52
SLIDE 52

CreatingBipartitionUsing2nd Eigenvector

  • Sometimesthereisnotaclearthresholdtosplit

basedonthesecondvectorsinceittakes continuousvalues

  • Howtochoosethesplittingpoint?

a) Pickaconstantvalue(0,or0.5). b) Pickthemedianvalueassplittingpoint. c) LookforthesplittingpointthathastheminimumNcut value:

1. Choosen possiblesplittingpoints. 2. ComputeNcut value. 3. Pickminimum.

slide-53
SLIDE 53

Spectral clustering: example

−3 −2 −1 1 2 3 4 5 −2 −1 1 2 3 4 5 6 −4 −2 2 4 6 −2 −1 1 2 3 4 5 6

Tommi Jaakkola, MIT CSAIL 18

slide-54
SLIDE 54

Spectral clustering: example cont’d

5 10 15 20 25 30 35 40 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5

Components of the eigenvector corresponding to the second largest eigenvalue

Tommi Jaakkola, MIT CSAIL 19

slide-55
SLIDE 55

KwayPartition?

  • Recursivebipartitioning(Hagenetal.,^91)

– Recursivelyapplybipartitioningalgorithmina hierarchicaldivisivemanner. – Disadvantages:Inefficient,unstable

  • Clustermultipleeigenvectors

– Buildareducedspacefrommultipleeigenvectors. – Commonlyusedinrecentpapers – Apreferableapproach`itslikedoingdimension reductionthenkmeans

slide-56
SLIDE 56