BBM406 Fundamentals of Machine Learning Lecture 21: Clustering - - PowerPoint PPT Presentation

bbm406
SMART_READER_LITE
LIVE PREVIEW

BBM406 Fundamentals of Machine Learning Lecture 21: Clustering - - PowerPoint PPT Presentation

Photo by Unsplash user @foodiesfeed BBM406 Fundamentals of Machine Learning Lecture 21: Clustering K-Means Aykut Erdem // Hacettepe University // Fall 2019 Last time Boosting Idea: given a weak learner, run it multiple times on


slide-1
SLIDE 1

Aykut Erdem // Hacettepe University // Fall 2019

Lecture 21:

Clustering K-Means

BBM406

Fundamentals of 
 Machine Learning

Photo by Unsplash user @foodiesfeed

slide-2
SLIDE 2

Last time… Boosting

  • Idea: given a weak learner, run it multiple times on (reweighted)

training data, then let the learned classifiers vote

  • On each iteration t:
  • weight each training example by how incorrectly it was classified
  • Learn a hypothesis – ht
  • A strength for this hypothesis – at
  • Final classifier:
  • A linear combination of the votes of the different classifiers

weighted by their strength

  • Practically useful
  • Theoretically interesting

2

slide by Aarti Singh & Barnabas Poczos
slide-3
SLIDE 3

Last time.. The AdaBoost Algorithm

3

slide by Jiri Matas and Jan Šochman
slide-4
SLIDE 4

Today

  • What is clustering?
  • K-means algorithm

4

slide-5
SLIDE 5

What is clustering

5

slide-6
SLIDE 6

Clustering

  • Grouping data according to similarity

6

slide by Tamara Broderick
slide-7
SLIDE 7

Clustering

  • Grouping data according to similarity

7

slide by Tamara Broderick
slide-8
SLIDE 8

Clustering

  • Grouping data according to similarity

8

slide by Tamara Broderick
slide-9
SLIDE 9

Clustering

  • Grouping data according to similarity

9

e.g. archaeological dig

slide by Tamara Broderick
slide-10
SLIDE 10

Clustering

  • Grouping data according to similarity

10

Artifact location e.g. archaeological dig Artifact location Distance East Distance North

slide by Tamara Broderick
slide-11
SLIDE 11

Clustering

  • Grouping data according to similarity

11

Artifact location e.g. archaeological dig Artifact location Distance East Distance North

slide by Tamara Broderick
slide-12
SLIDE 12

Clustering

  • Grouping data according to similarity

12

Distance East Distance North Distance Nor e.g. archaeological dig

slide by Tamara Broderick
slide-13
SLIDE 13

Clustering

  • Grouping data according to similarity

13

Distance East Distance North Distance Nor e.g. archaeological dig

slide by Tamara Broderick
slide-14
SLIDE 14

Clustering

  • Grouping data according to similarity

14

Distance East Distance North Distance Nor e.g. archaeological dig

slide by Tamara Broderick
slide-15
SLIDE 15

Clustering

  • Grouping data according to similarity

15

Distance East Distance North Distance Nor e.g. archaeological dig

slide by Tamara Broderick
slide-16
SLIDE 16

Clustering

  • Grouping data according to similarity

16

Distance East Distance North Distance Nor e.g. archaeological dig

slide by Tamara Broderick
slide-17
SLIDE 17

Clustering vs. Classification

  • Grouping data according to similarity

17

Distance Nor

Predicting new labels from old labels

slide by Tamara Broderick
slide-18
SLIDE 18

Clustering vs. Classification

  • Grouping data according to similarity

18

Distance Nor

Predicting new labels from old labels

Family A Family B Family C

slide by Tamara Broderick
slide-19
SLIDE 19

Clustering vs. Classification

  • Grouping data according to similarity

19

Distance Nor

Predicting new labels from old labels

Family A Family B Family C Family A Family B Family C

slide by Tamara Broderick
slide-20
SLIDE 20

Why use clustering… …instead of classification

  • Exploratory data analysis

20

[Krivitsky Handcock 2008]

slide by Tamara Broderick
slide-21
SLIDE 21

Why use clustering… …instead of classification

  • Exploratory data analysis

21

  • Classes are unspecified (unknown, changing

too quickly, expensive to label data, etc) ... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

[Krivitsky Handcock 2008]

slide by Tamara Broderick
slide-22
SLIDE 22

Why use clustering… …instead of classification

  • Exploratory data analysis

22

  • Classes are unspecified (unknown, changing

too quickly, expensive to label data, etc) ... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

[Krivitsky Handcock 2008]

Datum: person Similarity: the number 


  • f common interests of 


two people

slide by Tamara Broderick
slide-23
SLIDE 23

Why use clustering… …instead of classification

  • Exploratory data analysis

23

  • Classes are unspecified (unknown, changing

too quickly, expensive to label data, etc) ... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

[Krivitsky Handcock 2008]

Datum: a binary vector 
 specifying whether a 
 person has each
 interest Similarity: the number 


  • f common interests of 


two people

slide by Tamara Broderick
slide-24
SLIDE 24

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

24

slide by Tamara Broderick
slide-25
SLIDE 25

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

25

slide by Tamara Broderick
slide-26
SLIDE 26

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

26

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

[Blei 2003]

Topic Analysis

slide by Tamara Broderick
slide-27
SLIDE 27

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

27

[Blei 2003]

Topic Analysis

slide by Tamara Broderick
slide-28
SLIDE 28

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

28

[Blei 2003]

Topic Analysis

Datum: word Similarity: how many documents exist where two words co-occur

slide by Tamara Broderick
slide-29
SLIDE 29

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

29

[Blei 2003]

Topic Analysis

Datum: binary vector indicating document

  • ccurrence

Similarity: how many documents exist where two words co-occur

slide by Tamara Broderick
slide-30
SLIDE 30

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

30

slide by Tamara Broderick
slide-31
SLIDE 31

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

31

[Carpineto et al. 2009]

8

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Document clustering

slide by Tamara Broderick
slide-32
SLIDE 32

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

32

[Carpineto et al. 2009]

8

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Document clustering

Datum: document Dissimilarity: distance between topic distributions

  • f two documents
slide by Tamara Broderick
slide-33
SLIDE 33

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

33

[Carpineto et al. 2009]

8

... when the cartoon looks so easy?

  • High-dimensional data
  • Big data
  • Data not numerical

Document clustering

Datum: vector of topic

  • ccurrences

Dissimilarity: distance between topic distributions

  • f two documents
slide by Tamara Broderick
slide-34
SLIDE 34

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

34

slide by Tamara Broderick
slide-35
SLIDE 35
  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

35

[Fei-Fei 2011]

Image segmentation

slide by Tamara Broderick
slide-36
SLIDE 36
  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

36

[Fei-Fei 2011]

Image segmentation

Datum: pixel Dissimilarity: difference in color + difference in location

slide by Tamara Broderick
slide-37
SLIDE 37
  • High-dimensional data
  • Big data
  • Data not numerical

Why use clustering… …instead of classification

  • Exploratory data analysis
  • Classes are unspecified (unknown, changing too

quickly, expensive to label data, etc)

37

[Fei-Fei 2011]

Image segmentation

Datum: pixel RGB values and pixel horizontal and vertical locations Dissimilarity: difference in color + difference in location

slide by Tamara Broderick
slide-38
SLIDE 38

Clustering algorithms

  • Partitioning algorithms
  • Construct various partitions


and then evaluate them by 
 some criterion

  • K-means
  • Mixture of Gaussians
  • Spectral Clustering
  • Hierarchical algorithms
  • Create a hierarchical decomposition 

  • f the set of objects using some


criterion

  • Bottom-up – agglomerative
  • Top-down – divisive

38

  • %;
  • slide by Eric Xing
slide-39
SLIDE 39

Desirable Properties of a Clustering Algorithm

  • Scalability (in terms of both time and space)
  • Ability to deal with different data types
  • Minimal requirements for domain

knowledge to determine input parameters

  • Ability to deal with noisy data
  • Interpretability and usability
  • Optional
  • Incorporation of user-specified constraints

39

slide by Andrew Moore
slide-40
SLIDE 40

K-Means 
 Clustering

40

slide-41
SLIDE 41

K-Means Clustering

Benefits

  • Fast
  • Conceptually straightforward
  • Popular

41

slide by Tamara Broderick
slide-42
SLIDE 42

K-Means: Preliminaries

42

slide by Tamara Broderick
slide-43
SLIDE 43

K-Means: Preliminaries

43

slide by Tamara Broderick

Datum: Vector of continuous values

slide-44
SLIDE 44

K-Means: Preliminaries

44

slide by Tamara Broderick

Datum: Vector of continuous values

Distance East Distance North

slide-45
SLIDE 45

K-Means: Preliminaries

45

slide by Tamara Broderick

Datum: Vector of continuous values

1.5 6.2 Distance East Distance North

slide-46
SLIDE 46

K-Means: Preliminaries

46

slide by Tamara Broderick

Datum: Vector of continuous values

1.5 6.2 Distance East Distance North x3 = (1.5, 6.2) 1.2 5.9 4.3 2.1 1.5 6.3 4.1 2.3 x1 x2 x3 xN Nor East ... Distance East North East

slide-47
SLIDE 47

K-Means: Preliminaries

47

slide by Tamara Broderick

Datum: Vector of continuous values

1.5 6.2 Feature 1 Feature 2 x3 = (1.5, 6.2) 1.2 5.9 4.3 2.1 1.5 6.3 4.1 2.3 x1 x2 x3 xN Nor East ... Distance East

Feature 1 Feature 2

slide-48
SLIDE 48

K-Means: Preliminaries

48

slide by Tamara Broderick

Datum: Vector of continuous values

Feature 1 Feature 2 x3 = (1.5, 6.2) 1.2 5.9 4.3 2.1 1.5 6.3 4.1 2.3 x1 x2 x3 xN Nor East ... Distance East

Feature 1 Feature 2

x3 = (x3,1, x3,2) x1 x2 x3 xN F F ... x1,1 x1,2 x2,1 x2,2 x3,1 x3,2 xN,1 xN,2 x3 = (x3,1, x3,2) Feature 1 Feature 2

slide-49
SLIDE 49

K-Means: Preliminaries

49

slide by Tamara Broderick

Datum: Vector of D continuous values

Feature 1 Feature 2 1.2 5.9 4.3 2.1 1.5 6.3 4.1 2.3 x1 x2 x3 xN Nor East ... Distance East

Feature 1 Feature 2

x3 = (x3,1, x3,2) x1 x2 x3 xN F F ... x1,1 x1,2 x2,1 x2,2 x3,1 x3,2 xN,1 xN,2 x3 = (x3,1, x3,2) Feature 1 Feature 2

slide-50
SLIDE 50

K-Means: Preliminaries

50

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Datum: Vector of D continuous values

slide-51
SLIDE 51

K-Means: Preliminaries

51

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Distance as the crow flies

slide-52
SLIDE 52

K-Means: Preliminaries

52

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Distance as the crow flies

x3 x17

slide-53
SLIDE 53

K-Means: Preliminaries

53

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Distance as the crow flies

x3 x17

slide-54
SLIDE 54

K-Means: Preliminaries

54

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Euclidean distance

x3 x17

slide-55
SLIDE 55

K-Means: Preliminaries

55

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Squared Euclidean distance

x3 x17 dis(x3, x17) = (x3,1 − x17,1)2 + (x3,2 − x17,2)2

slide-56
SLIDE 56

K-Means: Preliminaries

56

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity: Squared Euclidean distance

x3 x17 dis(x3, x17) = (x3,1 − x17,1)2 + (x3,2 − x17,2)2 dis(x3, x17) =

D

  • d=1

(x3,d − x17,d)2 For each feature For each feature

slide-57
SLIDE 57

K-Means: Preliminaries

57

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Dissimilarity

slide-58
SLIDE 58

K-Means: Preliminaries

58

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

K = number of clusters

slide-59
SLIDE 59

K-Means: Preliminaries

59

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers
slide-60
SLIDE 60

K-Means: Preliminaries

60

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers
slide-61
SLIDE 61

K-Means: Preliminaries

61

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers
slide-62
SLIDE 62

K-Means: Preliminaries

62

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers

µ2 µ3 µ1

slide-63
SLIDE 63

K-Means: Preliminaries

63

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers

µ2 µ3 µ1

1 = (µ1,1, µ1,2)

slide-64
SLIDE 64

K-Means: Preliminaries

64

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • K cluster centers

µ2 µ3 µ1

1 = (µ1,1, µ1,2)

µ1, µ2, . . . , µK

slide-65
SLIDE 65

K-Means: Preliminaries

65

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

µ1, µ2, . . . , µK

  • K cluster centers
  • Data assignments to clusters
slide-66
SLIDE 66

K-Means: Preliminaries

66

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • Data assignments to clusters

µ1, µ

  • K cluster centers
  • Data assignments to clusters

µ1, µ2, . . . , µK

slide-67
SLIDE 67

K-Means: Preliminaries

67

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • Data assignments to clusters

µ1, µ

  • K cluster centers
  • Data assignments to clusters

µ1, µ2, . . . , µK Sk = set of points in cluster k

= set of points in 
 cluster k

slide-68
SLIDE 68

K-Means: Preliminaries

68

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • Data assignments to clusters

µ1, µ

  • K cluster centers
  • Data assignments to clusters

µ1, µ2, . . . , µK Sk = set of points in cluster k

= set of points in 
 cluster k

S1, S2, . . . , SK

slide-69
SLIDE 69

K-Means: Preliminaries

69

slide by Tamara Broderick

Feature 1 Feature 2 x3 = (x3,1, x3,2) x3 = (x3,1, x3,2) Feature 1 Feature 2

Cluster summary

  • Data assignments to clusters

µ1, µ

  • K cluster centers
  • Data assignments to clusters

µ1, µ2, . . . , µK S1, S2, . . . , SK µ1 µ3 µ2 Sk = set of points in cluster k

= set of points in 
 cluster k

slide-70
SLIDE 70

K-Means: Preliminaries

70

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity

Featur

slide-71
SLIDE 71

K-Means: Preliminaries

71

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity (global)

Featur disglobal =

K

  • k=1
  • n:xn∈Sk

D

  • d=1

(xn,d − µk,d)2

slide-72
SLIDE 72

K-Means: Preliminaries

72

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity (global)

Featur disglobal =

K

  • k=1
  • n:xn∈Sk

D

  • d=1

(xn,d − µk,d)2 For each cluster

slide-73
SLIDE 73

K-Means: Preliminaries

73

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity (global)

Featur disglobal =

K

  • k=1
  • n:xn∈Sk

D

  • d=1

(xn,d − µk,d)2

  • r each cluster
  • r each data

For each cluster For each data
 point in the 
 kth cluster

slide-74
SLIDE 74

K-Means: Preliminaries

74

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity (global)

Featur disglobal =

K

  • k=1
  • n:xn∈Sk

D

  • d=1

(xn,d − µk,d)2

  • r each cluster
  • r each data

For each cluster

  • r each data

point in the kth cluster

  • r each featur

For each data
 point in the 
 kth cluster For each feature

slide-75
SLIDE 75

K-Means: Preliminaries

75

slide by Tamara Broderick

Feature 1 Feature 2

Dissimilarity (global)

Featur disglobal =

K

  • k=1
  • n:xn∈Sk

D

  • d=1

(xn,d − µk,d)2

slide-76
SLIDE 76

76

slide by Tamara Broderick

K-Means Algorithm

Featur

  • Initialize K cluster centers
  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-77
SLIDE 77

77

slide by Tamara Broderick

K-Means Algorithm

Featur

  • Initialize K cluster centers
  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-78
SLIDE 78

78

slide by Tamara Broderick

K-Means Algorithm

Featur

  • Repeat until con

µk ← xn

  • For k = 1,…, K

✦Randomly draw n from 


1,…,N without replacement

  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-79
SLIDE 79

79

slide by Tamara Broderick

K-Means Algorithm

  • Repeat until con

µk ← xn

  • For k = 1,…, K

✦Randomly draw n from 


1,…,N without replacement

  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-80
SLIDE 80

80

slide by Tamara Broderick

K-Means Algorithm

  • Repeat until con

µk ← xn

  • For k = 1,…, K

✦Randomly draw n from 


1,…,N without replacement

  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-81
SLIDE 81

81

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until convergence:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con
slide-82
SLIDE 82

82

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con
slide-83
SLIDE 83

83

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

Or no change Or no change 
 in Or no change in disglobal

slide-84
SLIDE 84

84

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ Assign each data point to


the cluster with the closest
 center.

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con
slide-85
SLIDE 85

85

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

slide-86
SLIDE 86

86

slide by Tamara Broderick

K-Means Algorithm

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

slide-87
SLIDE 87

87

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

slide-88
SLIDE 88

88

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest

slide-89
SLIDE 89

89

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-90
SLIDE 90

90

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-91
SLIDE 91

91

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

slide-92
SLIDE 92

92

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

slide-93
SLIDE 93

93

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ❖ Find k with smallest
 ❖ Put (and no 


  • ther Sj)

✦ Assign each cluster 


center to be the mean of its cluster’s data points

µk ← xn

  • Repeat until con

* Find k with smallest dis(xn, µk) * Put (and no xn ∈ Sk

slide-94
SLIDE 94

94

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-95
SLIDE 95

95

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-96
SLIDE 96

96

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-97
SLIDE 97

97

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-98
SLIDE 98

98

slide by Tamara Broderick

K-Means Algorithm

  • For k = 1,…, K

✦ Randomly draw n from 


1,…,N without replacement

  • Repeat until S1,…,Sk don’t

change:

✦ For n = 1,…N ✤ Find k with smallest
 ✤ Put (and no 


  • ther Sj)

✦ For k = 1,…,K ✤

µk ← xn

  • Repeat until con

xn ∈ Sk * Put (and no dis(xn, µk) * Find k with smallest For k = 1,...,K * µk ← |Sk|−1

  • n:n∈Sk

xn

slide-99
SLIDE 99

99

slide by Tamara Broderick

K-Means: Evaluation

slide-100
SLIDE 100

100

slide by Tamara Broderick

K-Means: Evaluation

  • Will it terminate?
  • Yes. Always.
slide-101
SLIDE 101

101

slide by Tamara Broderick

K-Means: Evaluation

  • Will it terminate?
  • Yes. Always.
  • Is the clustering any good?

Global dissimilarity only useful for comparing clusterings.

slide-102
SLIDE 102
  • Guaranteed to converge in a finite number
  • f iterations
  • Running time per iteration:
  • 1. Assign data points to closest cluster center 


O(KN) time

  • 2. Change the cluster center to the average of its

assigned points 
 O(N) time

102

slide by David Sontag

K-Means: Evaluation

slide-103
SLIDE 103

Objective

  • 1. Fix μ, optimize C:
  • 2. Fix C, optimize μ:

– Take partial derivative of μi and set to zero, we have

103

K-Means takes an alternating optimization approach, each step is guaranteed to decrease the objective – thus guaranteed to converge

slide by Alan Fern

K-Means: Evaluation

slide-104
SLIDE 104

Demo time…

104

slide-105
SLIDE 105
  • How to set k?
  • Sensitive to initial centers
  • Multiple initializations
  • Sensitive to outliers
  • Detects spherical clusters
  • Assuming means can be computed
  • It requires continuous, numerical features

105

slide by Kristen Grauman

K-Means Algorithm: Some Issues

slide-106
SLIDE 106

Next Lecture:

K-Means Applications, Spectral clustering, Hierarchical clustering and What is a good clustering?

106