Clustering
Aarti Singh
Slides courtesy: Eric Xing
Machine Learning 10-701/15-781 Oct 25, 2010
Clustering Aarti Singh Slides courtesy: Eric Xing Machine Learning - - PowerPoint PPT Presentation
Clustering Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Oct 25, 2010 Unsupervised Learning Learning from unlabeled/ unannotated data (without supervision) Learning algorithm What can we predict from unlabeled
Aarti Singh
Slides courtesy: Eric Xing
Machine Learning 10-701/15-781 Oct 25, 2010
“Learning from unlabeled/unannotated data” (without supervision)
2
Learning algorithm
What can we predict from unlabeled data?
“Learning from unlabeled/unannotated data” (without supervision)
3
Learning algorithm
What can we predict from unlabeled data?
“Learning from unlabeled/unannotated data” (without supervision)
4
Learning algorithm
What can we predict from unlabeled data?
“Learning from unlabeled/unannotated data” (without supervision)
5
Learning algorithm
What can we predict from unlabeled data?
6
– high intra-class similarity – low inter-class similarity – It is the commonest form of unsupervised learning
7
more pragmatic approach - think in terms of a distance (rather than similarity) between vectors or correlations between random variables. Hard to define! But we know it when we see it
8
4 3 x y x = (x1, x2, …, xp) y = (y1, y2, …, yp)
d = 2
| | max ) , ( | | ) , ( | | ) , (
1 1 2 2 1 i i p i p i i i p i i i
y x y x d y x y x d y x y x d
5 7 4 Euclidean distance Manhattan distance Sup-distance
9
Pearson correlation coefficient x = (x1, x2, …, xp) y = (y1, y2, …, yp) Random vectors (e.g. expression levels
. and where ) ( ) ( ) )( ( ) , (
1 1 1 1 1 1 2 2 1
p i i p p i i p p i p i i i p i i i
y y x x y y x x y y x x y x
ve +ve
10
11
Starts with each object in a separate cluster, and repeat: – Joins the most similar pair of clusters, – Update the similarity of the new cluster to other clusters until there is only one cluster. Greedy – less accurate but simple, typically computationally expensive
Starts with all the data in a single cluster, and repeat: – Split each cluster into two using a partition based algorithm Until each object is a separate cluster. More accurate but complex, can be computationally cheaper
12
Different algorithms differ in how the similarities are defined (and hence updated) between two clusters
– Nearest Neighbor: similarity between their closest members.
– Furthest Neighbor: similarity between their furthest members.
– Similarity between the centers of gravity
– Average similarity of all cross-cluster pairs.
13
b a
4 5 3 6 5 2 c b a d c b
Distance Matrix Euclidean Distance
4 5 3 6 5 2 c b a d c b
c d (1) c d a,b (2) a,b,c d (3) a,b,c,d
4 5 3 , c b a d c 4 , , c b a d
14
b a
4 5 3 6 5 2 c b a d c b
Distance Matrix Euclidean Distance
4 5 3 6 5 2 c b a d c b
c d (1) c d a,b
4 6 5 , c b a d c 6 , , b a d c
(3) a,b,c,d
(2) a,b c,d
15
2 4 6 Single-Link Complete-Link
16
17
Shape of clusters Outliers Single-linkage allows anisotropic and
sensitive to outliers non-convex shapes
Complete-linkage
assumes isotopic, convex robust to outliers shapes
Outlier/noise
18
– Sort similarities to find largest one O(n2log n). – Update similarity between merged cluster and other clusters.
similarity to each other cluster must be done in constant time.
(Homework)
19
set of K clusters
partitioning criterion
– Globally optimal: exhaustively enumerate all partitions – Effective heuristic method: K-means algorithm
20
Algorithm
Input – Desired number of clusters, k Initialize – the k cluster centers (randomly if necessary) Iterate – 1. Decide the class memberships of the N objects by assigning them to the nearest cluster centers 2. Re-estimate the k cluster centers (aka the centroid or mean), by assuming the memberships found above are correct. Termination – If none of the N objects changed membership in the last iteration, exit. Otherwise go to 1.
21
Voronoi diagram
22
23
24
25
26
– Computing distance between each of the n objects and the K cluster centers is O(Kn). – Computing cluster centers: Each object gets added once to some cluster: O(n).
O(lKn).
27
28
29
30
convergence to sub-optimal clustering. – Select good seeds using a heuristic (e.g., object least similar to any existing mean) – Try out multiple starting points (very important!!!) – Initialize with the results of another method. – Further reading: k-means ++ algorithm of Arthur and Vassilvitskii
– Assumes isotopic, convex clusters
31
– Objective function – Look for “Knee” in objective function – Can you pick K by minimizing the objective over K? (Homework)
32