Lecture 21:
−Clustering −K-Means
Aykut Erdem
December 2018 Hacettepe University
Lecture 21: Clustering K-Means Aykut Erdem December 2018 Hacettepe - - PowerPoint PPT Presentation
Lecture 21: Clustering K-Means Aykut Erdem December 2018 Hacettepe University Last time Boosting Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers vote On each
−Clustering −K-Means
Aykut Erdem
December 2018 Hacettepe University
training data, then let the learned classifiers vote
weighted by their strength
2
slide by Aarti Singh & Barnabas Poczos
3
slide by Jiri Matas and Jan Šochman
4
5
6
slide by Tamara Broderick
7
slide by Tamara Broderick
8
slide by Tamara Broderick
9
slide by Tamara Broderick
10
slide by Tamara Broderick
11
slide by Tamara Broderick
12
slide by Tamara Broderick
13
slide by Tamara Broderick
14
slide by Tamara Broderick
15
slide by Tamara Broderick
16
slide by Tamara Broderick
17
slide by Tamara Broderick
18
slide by Tamara Broderick
19
slide by Tamara Broderick
20
[Krivitsky Handcock 2008]
slide by Tamara Broderick
21
too quickly, expensive to label data, etc) ... when the cartoon looks so easy?
[Krivitsky Handcock 2008]
slide by Tamara Broderick
22
too quickly, expensive to label data, etc) ... when the cartoon looks so easy?
[Krivitsky Handcock 2008]
slide by Tamara Broderick
23
too quickly, expensive to label data, etc) ... when the cartoon looks so easy?
[Krivitsky Handcock 2008]
slide by Tamara Broderick
24
slide by Tamara Broderick
25
slide by Tamara Broderick
26
... when the cartoon looks so easy?
[Blei 2003]
slide by Tamara Broderick
... when the cartoon looks so easy?
27
[Blei 2003]
slide by Tamara Broderick
... when the cartoon looks so easy?
28
[Blei 2003]
Datum: word Similarity: how many documents exist where two words co-occur
slide by Tamara Broderick
... when the cartoon looks so easy?
29
[Blei 2003]
Datum: binary vector indicating document
Similarity: how many documents exist where two words co-occur
slide by Tamara Broderick
30
slide by Tamara Broderick
31
[Carpineto et al. 2009]
8
... when the cartoon looks so easy?
slide by Tamara Broderick
32
[Carpineto et al. 2009]
8
... when the cartoon looks so easy?
Datum: document Dissimilarity: distance between topic distributions
slide by Tamara Broderick
33
[Carpineto et al. 2009]
8
... when the cartoon looks so easy?
Datum: vector of topic
Dissimilarity: distance between topic distributions
slide by Tamara Broderick
34
slide by Tamara Broderick
35
[Fei-Fei 2011]
slide by Tamara Broderick
36
[Fei-Fei 2011]
Datum: pixel Dissimilarity: difference in color + difference in location
slide by Tamara Broderick
37
[Fei-Fei 2011]
Datum: pixel RGB values and pixel horizontal and vertical locations Dissimilarity: difference in color + difference in location
slide by Tamara Broderick
and then evaluate them by some criterion
criterion
38
39
slide by Andrew Moore
40
41
slide by Tamara Broderick
42
slide by Tamara Broderick
43
slide by Tamara Broderick
44
slide by Tamara Broderick
45
slide by Tamara Broderick
1.5 6.2 Distance East Distance North
46
slide by Tamara Broderick
1.5 6.2 Distance East Distance North x3 = (1.5, 6.2) 1.2 5.9 4.3 2.1 1.5 6.3 4.1 2.3 x1 x2 x3 xN Nor East ... Distance East North East
47
slide by Tamara Broderick
1.5 6.2 Feature 1 Feature 2 x3 = (1.5, 6.2) 1.2 5.9 4.3 2.1 1.5 6.3 4.1 2.3 x1 x2 x3 xN Nor East ... Distance East
Feature 1 Feature 2
48
slide by Tamara Broderick
Feature 1 Feature 2
x3 = (x3,1, x3,2) x1 x2 x3 xN F F ... x1,1 x1,2 x2,1 x2,2 x3,1 x3,2 xN,1 xN,2 x3 = (x3,1, x3,2) Feature 1 Feature 2
49
slide by Tamara Broderick
Feature 1 Feature 2
x3 = (x3,1, x3,2) x1 x2 x3 xN F F ... x1,1 x1,2 x2,1 x2,2 x3,1 x3,2 xN,1 xN,2 x3 = (x3,1, x3,2) Feature 1 Feature 2
50
slide by Tamara Broderick
51
slide by Tamara Broderick
52
slide by Tamara Broderick
x3 x17
53
slide by Tamara Broderick
x3 x17
54
slide by Tamara Broderick
x3 x17
55
slide by Tamara Broderick
x3 x17 dis(x3, x17) = (x3,1 − x17,1)2 + (x3,2 − x17,2)2
56
slide by Tamara Broderick
x3 x17 dis(x3, x17) = (x3,1 − x17,1)2 + (x3,2 − x17,2)2 dis(x3, x17) =
D
(x3,d − x17,d)2 For each feature For each feature
57
slide by Tamara Broderick
58
slide by Tamara Broderick
59
slide by Tamara Broderick
60
slide by Tamara Broderick
61
slide by Tamara Broderick
62
slide by Tamara Broderick
µ2 µ3 µ1
63
slide by Tamara Broderick
µ2 µ3 µ1
1 = (µ1,1, µ1,2)
64
slide by Tamara Broderick
µ2 µ3 µ1
1 = (µ1,1, µ1,2)
µ1, µ2, . . . , µK
65
slide by Tamara Broderick
µ1, µ2, . . . , µK
66
slide by Tamara Broderick
µ1, µ
µ1, µ2, . . . , µK
67
slide by Tamara Broderick
µ1, µ
µ1, µ2, . . . , µK Sk = set of points in cluster k
= set of points in cluster k
68
slide by Tamara Broderick
µ1, µ
µ1, µ2, . . . , µK Sk = set of points in cluster k
= set of points in cluster k
S1, S2, . . . , SK
69
slide by Tamara Broderick
µ1, µ
µ1, µ2, . . . , µK S1, S2, . . . , SK µ1 µ3 µ2 Sk = set of points in cluster k
= set of points in cluster k
70
slide by Tamara Broderick
71
slide by Tamara Broderick
K
D
(xn,d − µk,d)2
72
slide by Tamara Broderick
K
D
(xn,d − µk,d)2 For each cluster
73
slide by Tamara Broderick
K
D
(xn,d − µk,d)2
74
slide by Tamara Broderick
K
D
(xn,d − µk,d)2
75
slide by Tamara Broderick
K
D
(xn,d − µk,d)2
76
slide by Tamara Broderick
✦ Assign each data point to
the cluster with the closest center.
✦ Assign each cluster
center to be the mean of its cluster’s data points
77
slide by Tamara Broderick
✦ Assign each data point to
the cluster with the closest center.
✦ Assign each cluster
center to be the mean of its cluster’s data points
78
slide by Tamara Broderick
µk ← xn
✦Randomly draw n from
1,…,N without replacement
✦
✦ Assign each data point to
the cluster with the closest center.
✦ Assign each cluster
center to be the mean of its cluster’s data points
79
slide by Tamara Broderick
µk ← xn
✦Randomly draw n from
1,…,N without replacement
✦
✦ Assign each data point to
the cluster with the closest center.
✦ Assign each cluster
center to be the mean of its cluster’s data points
80
slide by Tamara Broderick
µk ← xn
✦Randomly draw n from
1,…,N without replacement
✦
✦ Assign each data point to
the cluster with the closest center.
✦ Assign each cluster
center to be the mean of its cluster’s data points
81
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
✦ Assign each data point to
the cluster with the closest center.
✦ Assign each cluster
center to be the mean of its cluster’s data points
µk ← xn
82
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ Assign each data point to
the cluster with the closest center.
✦ Assign each cluster
center to be the mean of its cluster’s data points
µk ← xn
83
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ Assign each data point to
the cluster with the closest center.
✦ Assign each cluster
center to be the mean of its cluster’s data points
µk ← xn
84
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ Assign each data point to
the cluster with the closest center.
✦ Assign each cluster
center to be the mean of its cluster’s data points
µk ← xn
85
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ❖ Find k with smallest ❖ Put (and no
✦ Assign each cluster
center to be the mean of its cluster’s data points
µk ← xn
86
slide by Tamara Broderick
µk ← xn
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ❖ Find k with smallest ❖ Put (and no
✦ Assign each cluster
center to be the mean of its cluster’s data points
87
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ❖ Find k with smallest ❖ Put (and no
✦ Assign each cluster
center to be the mean of its cluster’s data points
µk ← xn
88
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ✤ Find k with smallest ✤ Put (and no
✦ Assign each cluster
center to be the mean of its cluster’s data points
µk ← xn
xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest
89
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ✤ Find k with smallest ✤ Put (and no
✦ For k = 1,…,K ✤
µk ← xn
xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1
xn
90
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ✤ Find k with smallest ✤ Put (and no
✦ For k = 1,…,K ✤
µk ← xn
xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1
xn
91
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ❖ Find k with smallest ❖ Put (and no
✦ Assign each cluster
center to be the mean of its cluster’s data points
µk ← xn
92
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ❖ Find k with smallest ❖ Put (and no
✦ Assign each cluster
center to be the mean of its cluster’s data points
µk ← xn
93
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ❖ Find k with smallest ❖ Put (and no
✦ Assign each cluster
center to be the mean of its cluster’s data points
µk ← xn
94
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ✤ Find k with smallest ✤ Put (and no
✦ For k = 1,…,K ✤
µk ← xn
xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1
xn
95
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ✤ Find k with smallest ✤ Put (and no
✦ For k = 1,…,K ✤
µk ← xn
xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1
xn
96
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ✤ Find k with smallest ✤ Put (and no
✦ For k = 1,…,K ✤
µk ← xn
xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1
xn
97
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ✤ Find k with smallest ✤ Put (and no
✦ For k = 1,…,K ✤
µk ← xn
xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1
xn
98
slide by Tamara Broderick
✦ Randomly draw n from
1,…,N without replacement
✦
change:
✦ For n = 1,…N ✤ Find k with smallest ✤ Put (and no
✦ For k = 1,…,K ✤
µk ← xn
xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1
xn
99
slide by Tamara Broderick
100
slide by Tamara Broderick
101
slide by Tamara Broderick
Global dissimilarity only useful for comparing clusterings.
102
slide by David Sontag
– Take partial derivative of μi and set to zero, we have
103
K-Means takes an alternating optimization approach, each step is guaranteed to decrease the objective – thus guaranteed to converge
slide by Alan Fern
104
105
slide by Kristen Grauman
106
107
K = 2
K=2
Original image
Original
K = 3
K=3
K = 10
K=10
slide by David Sontag
108
K = 2
K=2
Original image
Original
K = 3
K=3
K = 10
K=10
slide by David Sontag
109
K = 2
K=2
Original image
Original
K = 3
K=3
K = 10
K=10
slide by David Sontag
110
FIGURE 14.9. Sir Ronald A. Fisher (1890 − 1962) was one of the founders
many other fundamental concepts. The image on the left is a 1024×1024 grayscale image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using 200 code vectors, with a compression rate of 1.9 bits/pixel. The right image uses
[Figure from Hastie et al. book]
slide by David Sontag
111
. Fua, and S. Susstrunk SLIC Superpixels Compared to State-of-the-art Superpixel Methods, IEEE T-PAMI, 2012
λ: spatial regularization parameter
112
aardvark 0 about 2 all 2 Africa 1 apple anxious ... gas 1 ...
1 … Zaire
slide by Carlos Guestrin
113
slide by Fei Fei Li
114
slide by Fei Fei Li
115
Normalize patch
Detect patches
[Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03]
Compute SIFT descriptor
[Lowe’99]
slide by Josef Sivic
116
slide by Josef Sivic
117
slide by Josef Sivic
118
Vector quantization
slide by Josef Sivic
119
slide by Fei Fei Li
120
Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike).
slide by Andrew Zisserman
121
frequency
codewords
slide by Fei Fei Li