Lecture 23:
− Spectral clustering − Hierarchical clustering − What is a good clustering?
Aykut Erdem
May 2016 Hacettepe University
Lecture 23: Spectral clustering Hierarchical clustering What is a - - PowerPoint PPT Presentation
Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering? Aykut Erdem May 2016 Hacettepe University Last time K-Means An iterative clustering algorithm - Initialize: Pick K random points as cluster
− Spectral clustering − Hierarchical clustering − What is a good clustering?
Aykut Erdem
May 2016 Hacettepe University
to closest mean
the average of its assigned points
2
slide by David Sontag
3
4
5
K = 2
K=2
Original image
Original
K = 3
K=3
K = 10
K=10
slide by David Sontag
6
K = 2
K=2
Original image
Original
K = 3
K=3
K = 10
K=10
slide by David Sontag
7
K = 2
K=2
Original image
Original
K = 3
K=3
K = 10
K=10
slide by David Sontag
8
FIGURE 14.9. Sir Ronald A. Fisher (1890 − 1962) was one of the founders
many other fundamental concepts. The image on the left is a 1024×1024 grayscale image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using 200 code vectors, with a compression rate of 1.9 bits/pixel. The right image uses
[Figure from Hastie et al. book]
slide by David Sontag
9
. Fua, and S. Susstrunk SLIC Superpixels Compared to State-of-the-art Superpixel Methods, IEEE T-PAMI, 2012
λ: spatial regularization parameter
10
aardvark 0 about 2 all 2 Africa 1 apple anxious ... gas 1 ...
1 … Zaire
slide by Carlos Guestrin
11
slide by Fei Fei Li
12
slide by Fei Fei Li
13
Normalize patch
Detect patches
[Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03]
Compute SIFT descriptor
[Lowe’99]
slide by Josef Sivic
14
slide by Josef Sivic
15
slide by Josef Sivic
16
Vector quantization
slide by Josef Sivic
17
slide by Fei Fei Li
18
Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike).
slide by Andrew Zisserman
19
frequency
codewords
slide by Fei Fei Li
20
slide by Kristen Grauman
21
22
Similarity Graph: G(V,E,W) V – Vertices (Data points) E – Edge if similarity > 0 W - Edge weights (similarities) Partition the graph so that edges within a group have large weights and edges across groups have small weights.
Similarity graph
slide by Aarti Singh
a e d c b
! ! ! ! ! ! " # $ $ $ $ $ $ % & 1 1 1 1 1 1 1 1
Adjacency Matrix
a b c d e a b c d e
23
slide by Bill Freeman and Antonio Torralba
! ! ! ! ! ! " # $ $ $ $ $ $ % & 1 1 7 . 2 . 1 1 6 . 7 . 6 . 1 4 . 3 . 2 . 4 . 1 1 . 3 . 1 . 1
Affinity Matrix a e d c b 6
W =
ij
24
slide by Bill Freeman and Antonio Torralba
Wij = ⇢ 1 kxi xjk ✏
between data points
25
Controls size of neighborhood
slide by Aarti Singh
between data points
26
Controls size of neighborhood
C
slide by Aarti Singh
27
slide by Svetlana Lazebnik
Three points in feature space
Wij = exp(-|| zi – zj ||2 / s2)
With an appropriate s W= The eigenvectors of W are: The first 2 eigenvectors group the points as desired…
British Machine Vision Conference, pp. 103-108, 1990
slide by Bill Freeman and Antonio Torralba
points Affinity matrix eigenvector
29
slide by Bill Freeman and Antonio Torralba
points eigenvector Affinity matrix
30
slide by Bill Freeman and Antonio Torralba
A B
31
slide by Steven Seitz
u∈A,v∈B
Cut: sum of the weight of the cut edges:
32
slide by Bill Freeman and Antonio Torralba
Minimum cut example
33
slide by Svetlana Lazebnik
34
Minimum cut example
slide by Svetlana Lazebnik
Ideal Cut Cuts with lesser weight than the ideal cut
* Slide from Khurram Hassan-Shafique CAP5415 Computer Vision 2003
35
slide by Bill Freeman and Antonio Torralba
assoc(A,V) is sum of all edges with one end in A. cut(A,B) is sum of weights with one end in A and one end in B Write graph as V, one cluster as A and the other as B cut(A,B) assoc(A,V) cut(A,B) assoc(B,V) + Ncut(A,B) =
cut(A,B) = W(u,v),
u∈A,v∈B
with A ∩ B = ∅
a ssoc(A,B) = W(u,v)
u∈A,v∈B
A and B not necessarily disjoint
36
slide by Bill Freeman and Antonio Torralba
T T
37
slide by Svetlana Lazebnik
(threshold), or find threshold that minimizes the Ncut cost
38
slide by Svetlana Lazebnik
39
slide by Bill Freeman and Antonio Torralba
40
Both perform same Spectral clustering is superior
slide by Aarti Singh
41
k-means output Spectral clustering output
slide by Aarti Singh
42
Similarity matrix Second eigenvector of graph Laplacian
slide by Aarti Singh
43
[Ng et al., 2001]
slide by Aarti Singh
maximizes the eigengap (difference between consecutive eigenvalues)
44
slide by Aarti Singh
for Gaussian kernels, choice of σ
45
Good similarity measure Poor similarity measure
slide by Aarti Singh
for Gaussian kernels, choice of σ
46
slide by Aarti Singh
47
48
with n leafs = (2n -3)!/[(2(n -2)) (n -2)!] Number Number of possible
2 1 3 3 4 15 5 105 … … 10 34,459,425
slide by Andrew Moore
49
slide by Andrew Moore
50
Start with each item in its own cluster, find the best pair to merge into a new
fused together.
slide by Andrew Moore
51
Start with each item in its own cluster, find the best pair to merge into a new
fused together.
slide by Andrew Moore
52
Start with each item in its own cluster, find the best pair to merge into a new
fused together.
slide by Andrew Moore
53
Start with each item in its own cluster, find the best pair to merge into a new
fused together.
slide by Andrew Moore
54
slide by Andrew Moore
55
slide by Andrew Moore
56
slide by Andrew Moore
57
slide by Derek Hoiem
58
hidden patterns or latent classes in gold standard data
classes and clusters)
59
slide by Eric P . Xing
with ni members.
Cluster I: Purity = 1/6 (max(5, 1, 0)) = 5/6 Cluster II: Purity = 1/6 (max(1, 4, 1)) = 4/6 Cluster III: Purity = 1/5 (max(2, 0, 3)) = 3/5
60
K
i
TC = TC1 ∪ TC2 ∪...∪ TCn CC = CC1 ∪ CC2 ∪...∪ CCm be the target and computed clusterings, respectively.
clusters in TC
clusters in CC
61
slide by Christophe Giraud-Carrier
62
Measure of clustering agreement: how similar are these two ways of partitioning the data?
slide by Christophe Giraud-Carrier
63
Extension of the Rand index that attempts to account for items that may have been clustered by chance
slide by Christophe Giraud-Carrier
64
Measure of purity with respect to the target clustering
TC j ∈ TC
i=1 m
slide by Christophe Giraud-Carrier