Lecture 21: Clustering K-Means Aykut Erdem December 2018 Hacettepe - - PowerPoint PPT Presentation

lecture 21
SMART_READER_LITE
LIVE PREVIEW

Lecture 21: Clustering K-Means Aykut Erdem December 2018 Hacettepe - - PowerPoint PPT Presentation

Lecture 21: Clustering K-Means Aykut Erdem December 2018 Hacettepe University Last time Boosting Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers vote On each


slide-1
SLIDE 1

Lecture 21:

−Clustering −K-Means

Aykut Erdem

December 2018 Hacettepe University

slide-2
SLIDE 2

Last time… Boosting

  • Idea: given a weak learner, run it multiple times on (reweighted)

training data, then let the learned classifiers vote

  • On each iteration t:
  • weight each training example by how incorrectly it was classified
  • Learn a hypothesis – ht
  • A strength for this hypothesis – at
  • Final classifier:
  • A linear combination of the votes of the different classifiers

weighted by their strength

  • Practically useful
  • Theoretically interesting

2

slide by Aarti Singh & Barnabas Poczos

slide-3
SLIDE 3

Last time.. The AdaBoost Algorithm

3

slide by Jiri Matas and Jan Šochman

slide-4
SLIDE 4

Today

  • What is clustering?
  • K-means algorithm

4

slide-5
SLIDE 5

What is clustering

5

slide-6
SLIDE 6

Clustering

  • Grouping data according to similarity

6

slide by Tamara Broderick

slide-7
SLIDE 7

Clustering

  • Grouping data according to similarity

7

slide by Tamara Broderick

slide-8
SLIDE 8

Clustering

  • Grouping data according to similarity

8

slide by Tamara Broderick

slide-9
SLIDE 9

Clustering

  • Grouping data according to similarity

9

e.g. archaeological dig

slide by Tamara Broderick

slide-10
SLIDE 10

Clustering

  • Grouping data according to similarity

10

Artifact location e.g. archaeological dig Artifact location Distance East Distance North

slide by Tamara Broderick

slide-11
SLIDE 11

Clustering

  • Grouping data according to similarity

11

Artifact location e.g. archaeological dig Artifact location Distance East Distance North

slide by Tamara Broderick

slide-12
SLIDE 12

Clustering

  • Grouping data according to similarity

12

Distance East Distance North Distance Nor e.g. archaeological dig

slide by Tamara Broderick

slide-13
SLIDE 13

Clustering

  • Grouping data according to similarity

13

Distance East Distance North Distance Nor e.g. archaeological dig

slide by Tamara Broderick

slide-14
SLIDE 14

Clustering

  • Grouping data according to similarity

14

Distance East Distance North Distance Nor e.g. archaeological dig

slide by Tamara Broderick

slide-15
SLIDE 15

Clustering

  • Grouping data according to similarity

15

Distance East Distance North Distance Nor e.g. archaeological dig

slide by Tamara Broderick

slide-16
SLIDE 16

Clustering

  • Grouping data according to similarity

16

Distance East Distance North Distance Nor e.g. archaeological dig

slide by Tamara Broderick

slide-17
SLIDE 17

Clustering vs. Classification

  • Grouping data according to similarity

17

Distance Nor

Predicting new labels from old labels

slide by Tamara Broderick

slide-18
SLIDE 18

Clustering vs. Classification

  • Grouping data according to similarity

18

Distance Nor

Predicting new labels from old labels

Family A Family B Family C

slide by Tamara Broderick

slide-19
SLIDE 19

Clustering vs. Classification

  • Grouping data according to similarity

19

Distance Nor

Predicting new labels from old labels

Family A Family B Family C Family A Family B Family C

slide by Tamara Broderick

slide-20
SLIDE 20

Why use clustering… …instead of classification

  • Exploratory data analysis

20

[Krivitsky Handcock 2008]

slide by Tamara Broderick

slide-21
SLIDE 21

Why use clustering… …instead of classification

  • Exploratory data analysis

21

  • Classes are unspecified (unknown, changing

too quickly, expensive to label data, etc) ... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

[Krivitsky Handcock 2008]

slide by Tamara Broderick

slide-22
SLIDE 22

Why use clustering… …instead of classification

  • Exploratory data analysis

22

  • Classes are unspecified (unknown, changing

too quickly, expensive to label data, etc) ... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

[Krivitsky Handcock 2008]

Datum: person Similarity: the number 


  • f common interests of 


two people

slide by Tamara Broderick

slide-23
SLIDE 23

Why use clustering… …instead of classification

  • Exploratory data analysis

23

  • Classes are unspecified (unknown, changing

too quickly, expensive to label data, etc) ... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

[Krivitsky Handcock 2008]

Datum: a binary vector 
 specifying whether a 
 person has each
 interest Similarity: the number 


  • f common interests of 


two people

slide by Tamara Broderick

slide-24
SLIDE 24

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

24

slide by Tamara Broderick

slide-25
SLIDE 25

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

25

slide by Tamara Broderick

slide-26
SLIDE 26

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

26

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

[Blei 2003]

Topic Analysis

slide by Tamara Broderick

slide-27
SLIDE 27

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

27

[Blei 2003]

Topic Analysis

slide by Tamara Broderick

slide-28
SLIDE 28

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

28

[Blei 2003]

Topic Analysis

Datum: word Similarity: how many documents exist where two words co-occur

slide by Tamara Broderick

slide-29
SLIDE 29

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

29

[Blei 2003]

Topic Analysis

Datum: binary vector indicating document

  • ccurrence

Similarity: how many documents exist where two words co-occur

slide by Tamara Broderick

slide-30
SLIDE 30

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

30

slide by Tamara Broderick

slide-31
SLIDE 31

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

31

[Carpineto et al. 2009]

8

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Document clustering

slide by Tamara Broderick

slide-32
SLIDE 32

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

32

[Carpineto et al. 2009]

8

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Document clustering

Datum: document Dissimilarity: distance between topic distributions

  • f two documents

slide by Tamara Broderick

slide-33
SLIDE 33

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

33

[Carpineto et al. 2009]

8

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Document clustering

Datum: vector of topic

  • ccurrences

Dissimilarity: distance between topic distributions

  • f two documents

slide by Tamara Broderick

slide-34
SLIDE 34

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

34

slide by Tamara Broderick

slide-35
SLIDE 35
  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

35

[Fei-Fei 2011]

Image segmentation

slide by Tamara Broderick

slide-36
SLIDE 36
  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

36

[Fei-Fei 2011]

Image segmentation

Datum: pixel Dissimilarity: difference in color + difference in location

slide by Tamara Broderick

slide-37
SLIDE 37
  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

37

[Fei-Fei 2011]

Image segmentation

Datum: pixel RGB values and pixel horizontal and vertical locations Dissimilarity: difference in color + difference in location

slide by Tamara Broderick

slide-38
SLIDE 38

Clustering algorithms

  • Partitioning algorithms
  • Construct various partitions


and then evaluate them by 
 some criterion

  • K-means
  • Mixture of Gaussians
  • Spectral Clustering
  • Hierarchical algorithms
  • Create a hierarchical decomposition 

  • f the set of objects using some


criterion

  • Bottom-up – agglomerative
  • Top-down – divisive

38

  • %;
  • slide by Eric Xing
slide-39
SLIDE 39

Desirable Properties of a Clustering Algorithm

  • Scalability (in terms of both time and space)
  • Ability to deal with different data types
  • Minimal requirements for domain

knowledge to determine input parameters

  • Ability to deal with noisy data
  • Interpretability and usability
  • Optional
  • Incorporation of user-specified constraints

39

slide by Andrew Moore

slide-40
SLIDE 40

K-Means 
 Clustering

40

slide-41
SLIDE 41

K-Means Clustering

Benefits

  • Fast
  • Conceptually straightforward
  • Popular

41

slide by Tamara Broderick

slide-42
SLIDE 42

K-Means: Preliminaries

42

slide by Tamara Broderick

slide-43
SLIDE 43

K-Means: Preliminaries

43

slide by Tamara Broderick

Datum: Vector of continuous values

slide-44
SLIDE 44

K-Means: Preliminaries

44

slide by Tamara Broderick

Datum: Vector of continuous values

Distance East Distance North

slide-45
SLIDE 45

K-Means: Preliminaries

45

slide by Tamara Broderick

Datum: Vector of continuous values

1.5 6.2 Distance East Distance North

slide-46
SLIDE 46

K-Means: Preliminaries

46

slide by Tamara Broderick

Datum: Vector of continuous values

1.5 6.2 Distance East Distance North x3 = (1.5, 6.2) 1.2 5.9 4.3 2.1 1.5 6.3 4.1 2.3 x1 x2 x3 xN Nor East ... Distance East North East

slide-47
SLIDE 47

K-Means: Preliminaries

47

slide by Tamara Broderick

Datum: Vector of continuous values

1.5 6.2 Feature 1 Feature 2 x3 = (1.5, 6.2) 1.2 5.9 4.3 2.1 1.5 6.3 4.1 2.3 x1 x2 x3 xN Nor East ... Distance East

Feature 1 Feature 2

slide-48
SLIDE 48

K-Means: Preliminaries

48

slide by Tamara Broderick

Datum: Vector of continuous values

Feature 1 Feature 2 x3 = (1.5, 6.2) 1.2 5.9 4.3 2.1 1.5 6.3 4.1 2.3 x1 x2 x3 xN Nor East ... Distance East

Feature 1 Feature 2

x3 = (x3,1, x3,2) x1 x2 x3 xN F F ... x1,1 x1,2 x2,1 x2,2 x3,1 x3,2 xN,1 xN,2 x3 = (x3,1, x3,2) Feature 1 Feature 2

slide-49
SLIDE 49

K-Means: Preliminaries

49

slide by Tamara Broderick

Datum: Vector of D continuous values

Feature 1 Feature 2 1.2 5.9 4.3 2.1 1.5 6.3 4.1 2.3 x1 x2 x3 xN Nor East ... Distance East

Feature 1 Feature 2

x3 = (x3,1, x3,2) x1 x2 x3 xN F F ... x1,1 x1,2 x2,1 x2,2 x3,1 x3,2 xN,1 xN,2 x3 = (x3,1, x3,2) Feature 1 Feature 2

slide-50
SLIDE 50

K-Means: Preliminaries

50

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Datum: Vector of D continuous values

slide-51
SLIDE 51

K-Means: Preliminaries

51

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Distance as the crow flies

slide-52
SLIDE 52

K-Means: Preliminaries

52

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Distance as the crow flies

x3 x17

slide-53
SLIDE 53

K-Means: Preliminaries

53

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Distance as the crow flies

x3 x17

slide-54
SLIDE 54

K-Means: Preliminaries

54

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Euclidean distance

x3 x17

slide-55
SLIDE 55

K-Means: Preliminaries

55

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Squared Euclidean distance

x3 x17 dis(x3, x17) = (x3,1 − x17,1)2 + (x3,2 − x17,2)2

slide-56
SLIDE 56

K-Means: Preliminaries

56

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Squared Euclidean distance

x3 x17 dis(x3, x17) = (x3,1 − x17,1)2 + (x3,2 − x17,2)2 dis(x3, x17) =

D

  • d=1

(x3,d − x17,d)2 For each feature For each feature

slide-57
SLIDE 57

K-Means: Preliminaries

57

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity

slide-58
SLIDE 58

K-Means: Preliminaries

58

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

K = number of clusters

slide-59
SLIDE 59

K-Means: Preliminaries

59

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers
slide-60
SLIDE 60

K-Means: Preliminaries

60

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers
slide-61
SLIDE 61

K-Means: Preliminaries

61

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers
slide-62
SLIDE 62

K-Means: Preliminaries

62

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers

µ2 µ3 µ1

slide-63
SLIDE 63

K-Means: Preliminaries

63

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers

µ2 µ3 µ1

1 = (µ1,1, µ1,2)

slide-64
SLIDE 64

K-Means: Preliminaries

64

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers

µ2 µ3 µ1

1 = (µ1,1, µ1,2)

µ1, µ2, . . . , µK

slide-65
SLIDE 65

K-Means: Preliminaries

65

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

µ1, µ2, . . . , µK

  • K cluster centers
  • Data assignments to clusters
slide-66
SLIDE 66

K-Means: Preliminaries

66

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • Data assignments to clusters

µ1, µ

  • K cluster centers
  • Data assignments to clusters

µ1, µ2, . . . , µK

slide-67
SLIDE 67

K-Means: Preliminaries

67

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • Data assignments to clusters

µ1, µ

  • K cluster centers
  • Data assignments to clusters

µ1, µ2, . . . , µK Sk = set of points in cluster k

= set of points in 
 cluster k

slide-68
SLIDE 68

K-Means: Preliminaries

68

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • Data assignments to clusters

µ1, µ

  • K cluster centers
  • Data assignments to clusters

µ1, µ2, . . . , µK Sk = set of points in cluster k

= set of points in 
 cluster k

S1, S2, . . . , SK

slide-69
SLIDE 69

K-Means: Preliminaries

69

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • Data assignments to clusters

µ1, µ

  • K cluster centers
  • Data assignments to clusters

µ1, µ2, . . . , µK S1, S2, . . . , SK µ1 µ3 µ2 Sk = set of points in cluster k

= set of points in 
 cluster k

slide-70
SLIDE 70

K-Means: Preliminaries

70

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity

Featur

slide-71
SLIDE 71

K-Means: Preliminaries

71

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity (global)

Featur disglobal =

K

  • k=1
  • n:xn∈Sk

D

  • d=1

(xn,d − µk,d)2

slide-72
SLIDE 72

K-Means: Preliminaries

72

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity (global)

Featur disglobal =

K

  • k=1
  • n:xn∈Sk

D

  • d=1

(xn,d − µk,d)2 For each cluster

slide-73
SLIDE 73

K-Means: Preliminaries

73

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity (global)

Featur disglobal =

K

  • k=1
  • n:xn∈Sk

D

  • d=1

(xn,d − µk,d)2

  • r each cluster
  • r each data

For each cluster For each data
 point in the 
 kth cluster

slide-74
SLIDE 74

K-Means: Preliminaries

74

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity (global)

Featur disglobal =

K

  • k=1
  • n:xn∈Sk

D

  • d=1

(xn,d − µk,d)2

  • r each cluster
  • r each data

For each cluster

  • r each data

point in the kth cluster

  • r each featur

For each data
 point in the 
 kth cluster For each feature

slide-75
SLIDE 75

K-Means: Preliminaries

75

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity (global)

Featur disglobal =

K

  • k=1
  • n:xn∈Sk

D

  • d=1

(xn,d − µk,d)2

slide-76
SLIDE 76

76

slide by Tamara Broderick

K-Means Algorithm

Featur

  • Initialize K cluster centers
  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-77
SLIDE 77

77

slide by Tamara Broderick

K-Means Algorithm

Featur

  • Initialize K cluster centers
  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-78
SLIDE 78

78

slide by Tamara Broderick

K-Means Algorithm

Featur

  • Repeat until con

µk ← xn

  • For k = 1,…, K

✦Randomly draw n from 


1,…,N without replacement

  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-79
SLIDE 79

79

slide by Tamara Broderick

K-Means Algorithm

  • Repeat until con

µk ← xn

  • For k = 1,…, K

✦Randomly draw n from 


1,…,N without replacement

  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-80
SLIDE 80

80

slide by Tamara Broderick

K-Means Algorithm

  • Repeat until con

µk ← xn

  • For k = 1,…, K

✦Randomly draw n from 


1,…,N without replacement

  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-81
SLIDE 81

81

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con
slide-82
SLIDE 82

82

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con
slide-83
SLIDE 83

83

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

Or no change Or no change 
 in Or no change in disglobal

slide-84
SLIDE 84

84

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con
slide-85
SLIDE 85

85

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

slide-86
SLIDE 86

86

slide by Tamara Broderick

K-Means Algorithm

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-87
SLIDE 87

87

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

slide-88
SLIDE 88

88

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest

slide-89
SLIDE 89

89

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-90
SLIDE 90

90

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-91
SLIDE 91

91

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

slide-92
SLIDE 92

92

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

slide-93
SLIDE 93

93

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

slide-94
SLIDE 94

94

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-95
SLIDE 95

95

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-96
SLIDE 96

96

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-97
SLIDE 97

97

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-98
SLIDE 98

98

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-99
SLIDE 99

99

slide by Tamara Broderick

K-Means: Evaluation

slide-100
SLIDE 100

100

slide by Tamara Broderick

K-Means: Evaluation

  • Will it terminate?
  • Yes. Always.
slide-101
SLIDE 101

101

slide by Tamara Broderick

K-Means: Evaluation

  • Will it terminate?
  • Yes. Always.
  • Is the clustering any good?

Global dissimilarity only useful for comparing clusterings.

slide-102
SLIDE 102
  • Guaranteed to converge in a finite number
  • f iterations
  • Running time per iteration:
  • 1. Assign data points to closest cluster center 


O(KN) time

  • 2. Change the cluster center to the average of its

assigned points 
 O(N) time

102

slide by David Sontag

K-Means: Evaluation

slide-103
SLIDE 103

Objective

  • 1. Fix μ, optimize C:
  • 2. Fix C, optimize μ:

– Take partial derivative of μi and set to zero, we have

103

K-Means takes an alternating optimization approach, each step is guaranteed to decrease the objective – thus guaranteed to converge

slide by Alan Fern

K-Means: Evaluation

slide-104
SLIDE 104

Demo time…

104

slide-105
SLIDE 105
  • How to set k?
  • Sensitive to initial centers
  • Multiple initializations
  • Sensitive to outliers
  • Detects spherical clusters
  • Assuming means can be computed
  • It requires continuous, numerical features

105

slide by Kristen Grauman

K-Means Algorithm: Some Issues

slide-106
SLIDE 106

K-Means 
 Example Applications

106

slide-107
SLIDE 107

Example: K-Means for Segmentation

107

K = 2

K=2

Original image

Original

K = 3

K=3

K = 10

K=10

Goal of Segmentation is to partition an image into regions each of which has reasonably homogenous visual appearance.

slide by David Sontag

slide-108
SLIDE 108

Example: K-Means for Segmentation

108

K = 2

K=2

Original image

Original

K = 3

K=3

K = 10

K=10

slide by David Sontag

slide-109
SLIDE 109

Example: K-Means for Segmentation

109

K = 2

K=2

Original image

Original

K = 3

K=3

K = 10

K=10

slide by David Sontag

slide-110
SLIDE 110

Example: Vector quantization

110

FIGURE 14.9. Sir Ronald A. Fisher (1890 − 1962) was one of the founders

  • f modern day statistics, to whom we owe maximum-likelihood, sufficiency, and

many other fundamental concepts. The image on the left is a 1024×1024 grayscale image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using 200 code vectors, with a compression rate of 1.9 bits/pixel. The right image uses

  • nly four code vectors, with a compression rate of 0.50 bits/pixel

[Figure from Hastie et al. book]

slide by David Sontag

slide-111
SLIDE 111

Example: Simple Linear Iterative Clustering (SLIC) superpixels

111

  • R. Achanta, A. Shaji, K. Smith, A. Lucchi, P

. Fua, and S. Susstrunk SLIC Superpixels Compared to State-of-the-art Superpixel Methods, IEEE T-PAMI, 2012

λ: spatial regularization parameter

slide-112
SLIDE 112

Bag of Words model

112

aardvark 0 about 2 all 2 Africa 1 apple anxious ... gas 1 ...

  • il

1 … Zaire

slide by Carlos Guestrin

slide-113
SLIDE 113

113

slide by Fei Fei Li

slide-114
SLIDE 114

114

Object Bag of ‘words’

slide by Fei Fei Li

slide-115
SLIDE 115

Interest Point Features

115

Normalize patch

Detect patches

[Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03]

Compute SIFT descriptor

[Lowe’99]

slide by Josef Sivic

slide-116
SLIDE 116

Patch Features

116

slide by Josef Sivic

slide-117
SLIDE 117

Dictionary Formation

117

slide by Josef Sivic

slide-118
SLIDE 118

Clustering (usually K-means)

118

Vector quantization

slide by Josef Sivic

slide-119
SLIDE 119

Clustered Image Patches

119

slide by Fei Fei Li

slide-120
SLIDE 120

Visual synonyms and polysemy

120

Visual Polysemy. Single visual word occurring on different (but locally 
 similar) parts on different object categories. Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike).

slide by Andrew Zisserman

slide-121
SLIDE 121

Image Representation

121

…..

frequency

codewords

slide by Fei Fei Li