[PPT] - Clustering: K-Means & Mixture models Prof. Mike Hughes Many PowerPoint Presentation

SLIDE 1

Clustering: K-Means & Mixture models

2

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/

Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI)

Prof. Mike Hughes

SLIDE 2

3

Mike Hughes - Tufts COMP 135 - Spring 2019

What will we learn?

Data Examples data x

Supervised Learning Unsupervised Learning Reinforcement Learning

{xn}N

n=1

Task summary

f x

Performance measure

SLIDE 3

4

Mike Hughes - Tufts COMP 135 - Spring 2019

Task: Clustering

Supervised Learning Unsupervised Learning Reinforcement Learning

clustering

SLIDE 4

Clustering: Unit Objectives

Understand key challenges
How to choose the number of clusters?
How to choose the shape of clusters?
K-means clustering (deep dive)
Shape: Linear Boundaries (nearest Euclidean centroid)
Explain algorithm as instance of “coordinate descent”
Update some variables while holding others fixed
Need smart init and multiple restarts to avoid local optima
Mixture models (primer)
Advantages of soft assignments and covariances

5

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 5

Examples of Clustering

6

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 6

Clustering Animals by Features

7

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 7

Clustering Images

8

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 8

Image Compression

9

Mike Hughes - Tufts COMP 135 - Spring 2019

This image on the right achieves a compression factor of around 1 million!

Possible pixel values (R, G, B): 255 * 255 * 255 = 16 million Possible pixel values: One of 16 fixed (R,G,B) values

SLIDE 9

10

Mike Hughes - Tufts COMP 135 - Spring 2019

Understanding Genes

SLIDE 10

How to cluster these points?

11

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 11

How to cluster these points?

12

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 12

Key Questions

13

Mike Hughes - Tufts COMP 135 - Spring 2019

min

m ∈RF N

X

n=1

(xn − m)T (xn − m)

SLIDE 13

K-Means

14

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 14

Input:

Dataset of N example feature vectors
Number of clusters K

15

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 15

K-Means Goals

Assign each example to one of K clusters
Assumption: Clusters are exclusive
Minimize Euclidean distance from examples to

cluster centers

Assumption: Isotropic Euclidean distance (all

features weighted equally, no covariance modeled) is a good metric for your data

16

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 16

K-Means output

Centroid Vectors (one per cluster k in 1, … K)
Assignments (one per example n in 1 … N)

17

Mike Hughes - Tufts COMP 135 - Spring 2019

One-hot vector indicates which of K clusters example n is assigned to

Length = # features F Real-valued

SLIDE 17

Use Euclidean distance

18

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 18

K-means Optimization Problem

19

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 19

K-Means Algorithm

20

Mike Hughes - Tufts COMP 135 - Spring 2019

Initialize cluster means Repeat until converged 1) Update per-example assignment 2) Update per-cluster centroid

For each k in 1:K: Set to mean of data vectors assigned to k For each n in 1:N: Find cluster k* that minimizes Set to indicate k*

SLIDE 20

K-Means Algorithm

21

Mike Hughes - Tufts COMP 135 - Spring 2019

Initialize cluster means Repeat until converged 1) Update per-example assignment 2) Update per-cluster centroid

SLIDE 21

Updates each improve cost

22

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 22

K-Means Algo: Coordinate Ascent

23

Mike Hughes - Tufts COMP 135 - Spring 2019

E-step or per-example step: Update Assignments M-step or per-centroid step: Update Centroid Locations Each step yields cost equal or lower than before

Credit: Jake VanderPlas

SLIDE 23

Demo!

http://stanford.edu/class/ee103/visualizations/ kmeans/kmeans.html

24

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 24

Demo 2 (Choose initial clusters)

https://www.naftaliharris.com/blog/visualizing- k-means-clustering/

25

Mike Hughes - Tufts COMP 135 - Spring 2019

Pick a dataset and fix a K value (e.g. 2 clusters) Can you find a different fixed point solution from your neighbor? What does this mean about the objective?

SLIDE 25

K-means Boundaries are Linear

26

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 26

Decisions when applying k-means

How to initialize the clusters?
How to choose K?

27

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 27

Initialization: K-means++

28

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 28

Possible Initializations

29

Mike Hughes - Tufts COMP 135 - Spring 2019

Draw K random centroid locations
Choose K data vectors as centroids
Uniformly at random

What can go wrong?

SLIDE 29

Toy Example: Cluster these 4 points with K=2

30

Mike Hughes - Tufts COMP 135 - Spring 2019

D units

1 units

Example

SLIDE 30

No Guarantees on Cost!

31

Mike Hughes - Tufts COMP 135 - Spring 2019

BAD solution. Cost scales with distance D, which could be arbitrarily larger than 1 OPTIMAL solution. Cost scales will be O(1)

SLIDE 31

Better init: k-means++

32

Arthur & Vassilvitskii SODA ‘07

Step 1: choose an example uniformly at random as first centroid Repeat for k = 2, 3, … K: Choose example based on distance from nearest centroid

SLIDE 32

k-means++: Guarantees on Quality

33

Arthur & Vassilvitskii SODA ‘07

Theorem: This initialization will achieve score that is O(log K) of optimal score.

Step 1: choose an example uniformly at random as first centroid Repeat for k = 2, 3, … K: Choose with probability proportional to distance from nearest centroid

SLIDE 33

Use cost to decide among multiple runs of k-means

34

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 34

How to pick K in K-means?

35

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 35

Same data. Which K is best?

36

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 36

Use cost function? No!

37

Mike Hughes - Tufts COMP 135 - Spring 2019

At each K, the global optimal cost always decreases. (Local

ptima may not)

Limit as K -> N, cost is zero.

SLIDE 37

Add complexity penalty!

38

Mike Hughes - Tufts COMP 135 - Spring 2019

Want adding additional clusters to increase cost, if don’t help “enough”

SLIDE 38

Computation Issues

39

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 39

K-Means Computation

Most expensive step: Updating assignments
N x K distance calculations
Scalable?
Don’t need to update all examples, just grab a minibatch
Can do stochastic learning rate updates too
Parallelizable?
Yes. Given fixed centroids, can process minibatches of

examples (the assignment step) in parallel

40

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 40

Improved clustering: Gaussian mixture model

41

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 41

Improving K-Means

Assign each example to one of K clusters
Assumption: Clusters are exclusive
Improvement: Soft probabilistic assignment
Minimize Euclidean distance from examples to

cluster centers

Assumption: Isotropic Euclidean distance (all

features weighted equally, no covariance modeled) is a good metric for your data

Improvement: Model cluster covariance

42

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 42

Gaussian Mixture Model

43

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 43

Gaussian Mixture Model

44

Mike Hughes - Tufts COMP 135 - Spring 2019

Mean Vectors (one per cluster k in 1, … K)
Covariance Matrix (one per cluster k in 1 … K)
Soft assignments (one per example n in 1 … N)

Probabilistic! Vector sums to one

Length = # features F Real-valued F x F square symmetric matrix Positive definite (invertible)

SLIDE 44

Covariance Models

45

Mike Hughes - Tufts COMP 135 - Spring 2019

Most similar to k-means More flexible

Credit: Jake VanderPlas

SLIDE 45

GMM Training

46

Mike Hughes - Tufts COMP 135 - Spring 2019

Maximize the likelihood of the data

Beyond this course: Can show this looks a lot like K-means’ simplified objective Algorithm: Coordinate ascent! E-step : Update soft assignments r M-step: Update means and covariances

SLIDE 46

Special Case

K-means is a GMM with:
Hard winner-take-all assignments
Spherical covariance constraints

47

Mike Hughes - Tufts COMP 135 - Spring 2019

SLIDE 47

Clustering: Unit Objectives

Understand key challenges
How to choose the number of clusters?
How to choose the shape of clusters?
K-means clustering (deep dive)
Shape: Linear Boundaries (nearest Euclidean centroid)
Explain algorithm as instance of “coordinate descent”
Update some variables while holding others fixed
Need smart init and multiple restarts to avoid local optima
Mixture models (primer)
Advantages of soft assignments and covariances

48

Mike Hughes - Tufts COMP 135 - Spring 2019