Clustering: K-Means & Mixture models
2
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/
Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI)
- Prof. Mike Hughes
Clustering: K-Means & Mixture models Prof. Mike Hughes Many - - PowerPoint PPT Presentation
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Clustering: K-Means & Mixture models Prof. Mike Hughes Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI) 2 What will we learn?
2
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/
Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI)
3
Mike Hughes - Tufts COMP 135 - Spring 2019
Data Examples data x
Supervised Learning Unsupervised Learning Reinforcement Learning
n=1
Task summary
Performance measure
4
Mike Hughes - Tufts COMP 135 - Spring 2019
Supervised Learning Unsupervised Learning Reinforcement Learning
clustering
5
Mike Hughes - Tufts COMP 135 - Spring 2019
6
Mike Hughes - Tufts COMP 135 - Spring 2019
7
Mike Hughes - Tufts COMP 135 - Spring 2019
8
Mike Hughes - Tufts COMP 135 - Spring 2019
9
Mike Hughes - Tufts COMP 135 - Spring 2019
This image on the right achieves a compression factor of around 1 million!
Possible pixel values (R, G, B): 255 * 255 * 255 = 16 million Possible pixel values: One of 16 fixed (R,G,B) values
10
Mike Hughes - Tufts COMP 135 - Spring 2019
11
Mike Hughes - Tufts COMP 135 - Spring 2019
12
Mike Hughes - Tufts COMP 135 - Spring 2019
13
Mike Hughes - Tufts COMP 135 - Spring 2019
m ∈RF N
n=1
14
Mike Hughes - Tufts COMP 135 - Spring 2019
15
Mike Hughes - Tufts COMP 135 - Spring 2019
features weighted equally, no covariance modeled) is a good metric for your data
16
Mike Hughes - Tufts COMP 135 - Spring 2019
17
Mike Hughes - Tufts COMP 135 - Spring 2019
One-hot vector indicates which of K clusters example n is assigned to
Length = # features F Real-valued
18
Mike Hughes - Tufts COMP 135 - Spring 2019
19
Mike Hughes - Tufts COMP 135 - Spring 2019
20
Mike Hughes - Tufts COMP 135 - Spring 2019
For each k in 1:K: Set to mean of data vectors assigned to k For each n in 1:N: Find cluster k* that minimizes Set to indicate k*
21
Mike Hughes - Tufts COMP 135 - Spring 2019
22
Mike Hughes - Tufts COMP 135 - Spring 2019
23
Mike Hughes - Tufts COMP 135 - Spring 2019
E-step or per-example step: Update Assignments M-step or per-centroid step: Update Centroid Locations Each step yields cost equal or lower than before
Credit: Jake VanderPlas
24
Mike Hughes - Tufts COMP 135 - Spring 2019
25
Mike Hughes - Tufts COMP 135 - Spring 2019
Pick a dataset and fix a K value (e.g. 2 clusters) Can you find a different fixed point solution from your neighbor? What does this mean about the objective?
26
Mike Hughes - Tufts COMP 135 - Spring 2019
27
Mike Hughes - Tufts COMP 135 - Spring 2019
28
Mike Hughes - Tufts COMP 135 - Spring 2019
29
Mike Hughes - Tufts COMP 135 - Spring 2019
What can go wrong?
30
Mike Hughes - Tufts COMP 135 - Spring 2019
D units
31
Mike Hughes - Tufts COMP 135 - Spring 2019
BAD solution. Cost scales with distance D, which could be arbitrarily larger than 1 OPTIMAL solution. Cost scales will be O(1)
32
Arthur & Vassilvitskii SODA ‘07
Step 1: choose an example uniformly at random as first centroid Repeat for k = 2, 3, … K: Choose example based on distance from nearest centroid
33
Arthur & Vassilvitskii SODA ‘07
Step 1: choose an example uniformly at random as first centroid Repeat for k = 2, 3, … K: Choose with probability proportional to distance from nearest centroid
34
Mike Hughes - Tufts COMP 135 - Spring 2019
35
Mike Hughes - Tufts COMP 135 - Spring 2019
36
Mike Hughes - Tufts COMP 135 - Spring 2019
37
Mike Hughes - Tufts COMP 135 - Spring 2019
At each K, the global optimal cost always decreases. (Local
Limit as K -> N, cost is zero.
38
Mike Hughes - Tufts COMP 135 - Spring 2019
Want adding additional clusters to increase cost, if don’t help “enough”
39
Mike Hughes - Tufts COMP 135 - Spring 2019
examples (the assignment step) in parallel
40
Mike Hughes - Tufts COMP 135 - Spring 2019
41
Mike Hughes - Tufts COMP 135 - Spring 2019
features weighted equally, no covariance modeled) is a good metric for your data
42
Mike Hughes - Tufts COMP 135 - Spring 2019
43
Mike Hughes - Tufts COMP 135 - Spring 2019
44
Mike Hughes - Tufts COMP 135 - Spring 2019
Probabilistic! Vector sums to one
Length = # features F Real-valued F x F square symmetric matrix Positive definite (invertible)
45
Mike Hughes - Tufts COMP 135 - Spring 2019
Credit: Jake VanderPlas
46
Mike Hughes - Tufts COMP 135 - Spring 2019
Maximize the likelihood of the data
Beyond this course: Can show this looks a lot like K-means’ simplified objective Algorithm: Coordinate ascent! E-step : Update soft assignments r M-step: Update means and covariances
47
Mike Hughes - Tufts COMP 135 - Spring 2019
48
Mike Hughes - Tufts COMP 135 - Spring 2019