Hierarchical)&)Spectral)clustering) Lecture)13) - - PowerPoint PPT Presentation

hierarchical spectral clustering lecture 13
SMART_READER_LITE
LIVE PREVIEW

Hierarchical)&)Spectral)clustering) Lecture)13) - - PowerPoint PPT Presentation

Hierarchical)&)Spectral)clustering) Lecture)13) David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein Agglomerative Clustering Agglomerative


slide-1
SLIDE 1

Hierarchical)&)Spectral)clustering) Lecture)13)

David&Sontag& New&York&University&

Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

slide-2
SLIDE 2

Agglomerative Clustering

  • Agglomerative clustering:

– First merge very similar instances – Incrementally build larger clusters out

  • f smaller clusters
  • Algorithm:

– Maintain a set of clusters – Initially, each instance in its own cluster – Repeat:

  • Pick the two closest clusters
  • Merge them into a new cluster
  • Stop when there’s only one cluster left
  • Produces not one clustering, but a

family of clusterings represented by a dendrogram

slide-3
SLIDE 3

Agglomerative Clustering

  • How should we define closest for clusters

with multiple elements?

slide-4
SLIDE 4

Agglomerative Clustering

  • How should we define closest for clusters

with multiple elements?

  • Many options:

– Closest pair (single-link clustering) – Farthest pair (complete-link clustering) – Average of all pairs

  • Different choices create

different clustering behaviors

slide-5
SLIDE 5

Agglomerative Clustering

  • How should we define closest for clusters

with multiple elements?

Farthest pair (complete-link clustering) Closest pair (single-link clustering)

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

[Pictures from Thorsten Joachims]

slide-6
SLIDE 6

Clustering&Behavior&

Average

Mouse tumor data from [Hastie et al.]

Farthest Nearest

slide-7
SLIDE 7

Agglomera<ve&Clustering&

When&can&this&be&expected&to&work?& Closest pair (single-link clustering)

1 2 3 4 5 6 7 8

Strong separation property: All points are more similar to points in their own cluster than to any points in any other cluster Then, the true clustering corresponds to some pruning of the tree obtained by single-link clustering! Slightly weaker (stability) conditions are solved by average-link clustering (Balcan et al., 2008)

slide-8
SLIDE 8

Spectral)Clustering)

Slides adapted from James Hays, Alan Fern, and Tommi Jaakkola

slide-9
SLIDE 9

Spectral)clustering)

[Shi & Malik ‘00; Ng, Jordan, Weiss NIPS ‘01]

1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 two circles, 2 clusters (K−means) 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 twocircles, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

K-means Spectral clustering

slide-10
SLIDE 10

Spectral)clustering)

[Figures from Ng, Jordan, Weiss NIPS ‘01]

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 nips, 8 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 lineandballs, 3 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 fourclouds, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 squiggles, 4 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 twocircles, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 threecircles−joined, 2 clusters 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 threecircles−joined, 3 clusters − − −

slide-11
SLIDE 11

Spectral)clustering)

))Group)points)based)on)links)in)a)graph)

A B [Slide from James Hays]

slide-12
SLIDE 12

!"#$"%&'($'$)'*&(+),

  • -$./0"11"2$"3/'(*(3//.(24'&2'5$"

0"1+3$'/.1.5(&.$67'$#''2"78'0$/

  • 92'0"35:0&'($'

– ;<35560"22'0$':=&(+) – 42'(&'/$2'.=)7"&=&(+)>'(0)2":'./"256 0"22'0$':$".$/42'(&'/$2'.=)7"&/?

A B [Slide from Alan Fern]

slide-13
SLIDE 13

Can)we)use)minimum)cut)for) clustering?)

[Shi & Malik ‘00]

slide-14
SLIDE 14

Graphpartitioning

slide-15
SLIDE 15

GraphTerminologies

  • Degreeofnodes
  • Volumeofaset
slide-16
SLIDE 16

GraphCut

  • ConsiderapartitionofthegraphintotwopartsA

andB

  • Cut(A,B):sumoftheweightsofthesetofedgesthat

connectthetwogroups

  • Anintuitivegoalisfindthepartitionthatminimizes

thecut

slide-17
SLIDE 17

NormalizedCut

  • Considertheconnectivitybetweengroups

relativetothevolumeofeachgroup

A B

) ( ) , ( ) ( ) , ( ) , ( B Vol B A cut A Vol B A cut B A Ncut

  • )

( ) ( ) ( ) ( ) , ( ) , ( B Vol A Vol B Vol A Vol B A cut B A Ncut

  • MinimizedwhenVol(A)andVol(B)areequal.

Thusencouragebalancedcut

slide-18
SLIDE 18

1 D yT

Subjectto:

SolvingNCut

  • HowtominimizeNcut?
  • Withsomesimplifications,wecanshow:

Dy y y W D y x Ncut

T T y x

) ( min ) ( min

  • Rayleighquotient

NPHard!

. 1 ) ( , } 1 , 1 { in vector a be Let ); , ( ) , ( matrix, diag. the be D Let ; ) , ( matrix, similarity the be Let

,

A i i x x j i W i i D W j i W W

N j j i

  • (y takes discrete values)
slide-19
SLIDE 19
  • Relaxtheoptimizationproblemintothecontinuousdomain

bysolvinggeneralizedeigenvaluesystem: subjectto

  • Whichgives:
  • Notethat ,sothefirsteigenvectoris

witheigenvalue.

  • Thesecondsmallesteigenvectoristherealvaluedsolutionto

thisproblem!!

SolvingNCut

slide-20
SLIDE 20

2wayNormalizedCuts

  • 1. ComputetheaffinitymatrixW,computethe

degreematrix(D),Disdiagonaland

  • 2. Solve

,where is calledtheLaplacian matrix

  • 3. Usetheeigenvectorwiththesecondsmallest

eigenvaluetobipartitionthegraphintotwo parts.

slide-21
SLIDE 21

CreatingBipartitionUsing2nd Eigenvector

  • Sometimesthereisnotaclearthresholdtosplit

basedonthesecondvectorsinceittakes continuousvalues

  • Howtochoosethesplittingpoint?

a) Pickaconstantvalue(0,or0.5). b) Pickthemedianvalueassplittingpoint. c) LookforthesplittingpointthathastheminimumNcut value:

1. Choosen possiblesplittingpoints. 2. ComputeNcut value. 3. Pickminimum.

slide-22
SLIDE 22

Spectral clustering: example

−3 −2 −1 1 2 3 4 5 −2 −1 1 2 3 4 5 6 −4 −2 2 4 6 −2 −1 1 2 3 4 5 6

Tommi Jaakkola, MIT CSAIL 18

slide-23
SLIDE 23

Spectral clustering: example cont’d

5 10 15 20 25 30 35 40 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5

Components of the eigenvector corresponding to the second largest eigenvalue

Tommi Jaakkola, MIT CSAIL 19

slide-24
SLIDE 24

KwayPartition?

  • Recursivebipartitioning(Hagenetal.,^91)

– Recursivelyapplybipartitioningalgorithmina hierarchicaldivisivemanner. – Disadvantages:Inefficient,unstable

  • Clustermultipleeigenvectors

– Buildareducedspacefrommultipleeigenvectors. – Commonlyusedinrecentpapers – Apreferableapproach`itslikedoingdimension reductionthenkmeans

slide-25
SLIDE 25