Machine Learning (AIMS) - MT 2017
- 2. Clustering
Machine Learning (AIMS) - MT 2017 2. Clustering Varun Kanade - - PowerPoint PPT Presentation
Machine Learning (AIMS) - MT 2017 2. Clustering Varun Kanade University of Oxford November 7, 2017 Outline This week, we will study some approaches to clustering Defining an objective function for clustering k -Means formulation for
◮ Defining an objective function for clustering ◮ k-Means formulation for clustering ◮ Multidimensional Scaling ◮ Hierarchical clustering ◮ Spectral clustering
1
2
2
2
3
◮ Weighted dissimilarity between (real-valued) attributes
D
i)
◮ In the simplest setting wi = 1 and di(xi, x′ i) = (xi − x′ i)2 and f(z) = z,
◮ Weights allow us to emphasise features differently ◮ If features are ordinal or categorical then define distance suitably ◮ Standardisation (mean 0, variance 1) may or may not help
4
5
6
k
k
1 |Cj|
7
k
j=1 that minimises W is easy
j′ xi − µj′}
j=1 is
8
9
9
9
9
9
10
j′
11
k
12
13
2 4 6 8 10 12 14 16 0.05 0.1 0.15 0.2 0.25 MSE on test vs K for K−means ◮ As in the case of PCA, larger k will give better value of the objective ◮ Choose suitable k by identifying a ‘‘kink’’ or ‘‘elbow’’ in the curve
(Source: Kevin Murphy, Chap 11)
14
15
ij = xi − xj2
i xi − 2xT i xj + xT j xj
i xi = 0, M can be recovered from D
16
17
18
◮ In certain applications, it may be easier to define pairwise similarities or
◮ Many machine learning algorithms require (or are more naturally
◮ Multidimensional Scaling gives a way to find an embedding of the data in
19
◮ Measurements of different species and individuals within species ◮ Top-level and low-level categories in news articles ◮ Country, county, town level data
◮ Agglomerative: Bottom-up, clusters formed by merging smaller
◮ Divisive: Top-down, clusters formed by splitting larger clusters
20
◮ Single Linkage
x∈C,x′∈C′ d(x, x′) ◮ Complete Linkage
x∈C,x′∈C′ d(x, x′) ◮ Average Linkage
21
◮ Single Linkage
x∈C,x′∈C′ d(x, x′) ◮ Complete Linkage
x∈C,x′∈C′ d(x, x′) ◮ Average Linkage
22
j,k∈S
23
24
25
26
27
28
29
j Wij
30
31
32
33
34
35
36
37
38