MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
1
APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft - - PowerPoint PPT Presentation
MACHINE LEARNING - MSc Course APPLIED MACHINE LEARNING APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE LEARNING - MSc Course APPLIED MACHINE LEARNING Objectives Learn basic techniques for data
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
1
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
2
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
3
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
4
1 2 3 1 2 3
? ? ? ? ? ?
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
5
? ? ? ? ?
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
6
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
7
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
8
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
9
Person1 with glasses Person1 without glasses Person2 without glasses Person2 with glasses
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
10
Projection onto first two principal components after PCA
Person1 with glasses Person1 without glasses Person2 without glasses Person2 with glasses
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
11
Projection onto e1 against e2
Person1 with glasses Person1 without glasses Person2 without glasses Person2 with glasses e1 e2
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
12
Projection onto e1 against e3
Person1 with glasses Person1 without glasses Person2 without glasses Person2 with glasses e1 e3 e2
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
13
Projection onto first two principal components after PCA
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
14
x1 x2 x3
Outliers (noise) Relevant Data
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
15
x1 x2 x3
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
16
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
17
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
18
x1 x2
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
19
x1 x2
K-Means clustering generates a number K of disjoint clusters to miminize:
2 1 1
i k i k
K K k x c
i
ith data point
k
geometric centroid
𝑑𝒍
cluster label or number
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
20
x1 x2
In mldemos; centroids are initialized on one datapoint with no overlap across centroids.
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
21
If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid).
x1 x2
i k
i k
i
ith data point
k
geometric centroid
i
i k i
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
22
x1 x2
i k
k i i k i i
r x r
i k
i k
i
i k i
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
23
x1 x2
i k
i k
i
i k i
If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest winning centroid).
i k
k i i k i i
r x r
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
24
x1 x2
Stopping Criterion: Go back to step 2 and repeat the process until the clusters are stable.
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
25
x1 x2 Intersection points
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
26
L1-Norm L2-Norm L3-Norm L8-Norm
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
27
i k
i k
i k i
i k
k i i k i i
r x r
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
28
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
29
see next slides)
cluster. The algorithm is guaranteed to converge in a finite number of iterations But it converges to a local optimum! It is hence very sensitive to initialization of the centroids.
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
30
x1 x2
'
, , '
: responsibility of cluster for point [0,1], Normalized over clusters: 1
i k i k i
k i d x k i d x k k i k
r k x e r e r
i
k
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
31
x1 x2
The model parameters, i.e. the means, are adjusted to match the weighted sample means of the data points that they are responsible for.
i
k i k i k i i
'
, , '
: responsibility of cluster for point [0,1], Normalized over clusters: 1
i k i k i
k i d x k i d x k k i k
r k x e r e r
The update algorithm of the soft K-means is identical to that of the hard K-means, aside from the fact that the responsibilities to a particular cluster are now real numbers varying between 0 and 1.
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
32
'
, , '
: responsibility of cluster for point [0,1], Normalized over clusters: 1
i k i k i
k i d x k i d x k k i k
r k x e r e r
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
33
Soft K-means algorithm with a small (left), medium (center) and large (right)
10 5 1
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
34
Iterations of the Soft K-means algorithm from the random initialization (left) to convergence (right). Computed with = 10.
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
35
Advantages:
Drawbacks:
Different initial partitions can result in different final clusters.
It is, therefore, good practice to run the algorithm several times using different K values, to determine the optimal number of clusters.
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
36
Advantages:
Drawbacks:
Different initial partitions can result in different final clusters.
It is, therefore, good practice to run the algorithm several times using different K values, to determine the optimal number of clusters.
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
37
K-means takes into account only the distance between the means and data points; it has no representation of the variance of the data within each cluster.
K-means imposes a fixed shape for each cluster (sphere).
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
38
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
39
x1 x2 x3
Outliers (noise) Relevant Data
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
40
x1 x2 x3
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
41
x1 x2 x3
Outliers (noise)
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
42
x1 x2 x3
Outliers (noise)
Cluster 1
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
43
x1 x2 x3
Outliers (noise)
Cluster 1 Cluster 2 Cluster 1
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
44
x1 x2 x3
Outliers (noise) Cluster 1 Cluster 2 Cluster 1
APPLIED MACHINE LEARNING
46
K-means DBSCAN Hyperparameters K: Nm of clusters
Computational cost O(K*M) O(M*log(M)), M: nm datapoints Type of cluster Globular Non-globular (arbitrary shapes, non- linear boundaries) Robustness to noise Not robust Robust to outliers within e
APPLIED MACHINE LEARNING
47
48 48
ADVANCED MACHINE LEARNING
APPLIED MACHINE LEARNING
49
2 1
k
K k k x C
50 50
ADVANCED MACHINE LEARNING
2 1
k
K k k x C
Measure of Distortion 𝐿: 𝑁 𝑑𝑚𝑣𝑡𝑢𝑓𝑠𝑡 𝑆𝑇𝑇: 0 𝑁: 100 𝑒𝑏𝑢𝑏𝑞𝑝𝑗𝑜𝑢𝑡 𝑂: 2 𝑒𝑗𝑛𝑓𝑜𝑡𝑗𝑝𝑜𝑡
slope of the decrease of the measure as 𝐿 increases.
51 51
ADVANCED MACHINE LEARNING
Procedure: Run K-means – increase monotonically number of clusters – run K- means with several initialization and take best run; use RSS measure to measure improvement in clustering determine a plateau
Optimal 𝑙 is at the ‘elbow’ of the curve
𝑁: 100 𝑒𝑏𝑢𝑏𝑞𝑝𝑗𝑜𝑢𝑡 𝑂: 2 𝑒𝑗𝑛𝑓𝑜𝑡𝑗𝑝𝑜𝑡 𝑙: 4 𝑑𝑚𝑣𝑡𝑢𝑓𝑠𝑡
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
52
[N. Das, 9th Int. Conf. on Computing Economis and Finance, 2011]
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
53
Cluster Analysis of Hedge Funds (fonds speculatifs)
[N. Das, 9th Int. Conf. on Computing Economis and Finance, 2011]
No legal definition of Hedge funds - consists of a wide category of investment funds with high risk & high returns – variety of strategies for guiding the investment Research Question: classify type of Hedge funds based on information provided to the client Data Dimension (Features): such as: asset class, size of the hedge fund, incentive fee, risk- level, and liquidity of hedge funds
Number of Clusters (K)
Optimal results are found with 7 clusters.
Cutoff
Procedure: Run K-means – increase monotonically number of clusters – run K-means with several initialization and take best run; Use RSS measure to measure improvement in clustering determine a plateau
54 54
ADVANCED MACHINE LEARNING
Which one is the ‘optimal’ 𝑙?
𝑁: 100 𝑒𝑏𝑢𝑏𝑞𝑝𝑗𝑜𝑢𝑡 K: 3 𝑒𝑗𝑛𝑓𝑜𝑡𝑗𝑝𝑜𝑡 The ‘elbow’ or ‘plateau’ method for choosing the optimal 𝑙 from the RSS curve can be unreliable for certain datasets:
We don’t know! We need an additional penalty or criterion!
APPLIED MACHINE LEARNING
55
L: maximum likelihood of the model : number of free parameters : number of datapoints
B M
Penalty for an increase in computational costs due to number of parameters and number of datapoints
Choosing AIC versus BIC depends on the application: Is the purpose of the analysis to make predictions, or to decide which model best represents reality? AIC may have better predictive ability than BIC, but BIC finds a computationally more efficient solution.
56 56
ADVANCED MACHINE LEARNING
Weighting Factor
2 1
k
K k k x C
Number of free parameters B=(K*N) K: # clusters N: # dimensions
: likelihood of model : number of free parameters L B
57 57
ADVANCED MACHINE LEARNING
Weighting factor penalizes wrt. # datapoints (i.e. computational complexity)
2 1
k
K k k x C
Number of free parameters B=(K*N) K: # clusters N: # dimensions
58 58
ADVANCED MACHINE LEARNING
𝑁: 100 𝑒𝑏𝑢𝑏𝑞𝑝𝑗𝑜𝑢𝑡 N: 3 𝑒𝑗𝑛𝑓𝑜𝑡𝑗𝑝𝑜𝑡 Procedure: Run K-means – increase monotonically number of clusters – run K- means with several initialization and take best run;
𝑙: 2 𝑑𝑚𝑣𝑡𝑢𝑓𝑠𝑡
59 59
ADVANCED MACHINE LEARNING
60 60
ADVANCED MACHINE LEARNING
61 61
ADVANCED MACHINE LEARNING
DBSCAN large e DBSCAN medium e DBSCAN small e
DBSCAN large e DBSCAN medium e DBSCAN small e
RSS 43 26 0.5 BIC 42 34 78 AIC 69 51 24
62 62
ADVANCED MACHINE LEARNING
DBSCAN large e DBSCAN medium e DBSCAN small e
K-means DBSCAN large e DBSCAN medium e DBSCAN small e
RSS 51 95 59 0.6 BIC 65 118 88 331 AIC 55 102 67 93
K-means
APPLIED MACHINE LEARNING
63
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
64
(careful: similar but not the same F-measure as the F-measure we will see for classification!)
Tradeoff between clustering correctly all datapoints of the same class in the same cluster and making sure that each cluster contains points of only one class.
1 1 1
: nm of labeled datapoints : the set of classes : nm of clusters, : nm of members of class and of cluster , max , 2 , , , , , , ,
ik i ik ik
i i i i c C k i i i i i i i i
M C c K n c k c F C K F c k M R c k P c k F c k R c k P c k n R c k c n P c k k
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
65
Class 1 Class 2 Labeled Unlabeled
1 1 1
: nm of labeled datapoints : the set of classes : nm of clusters, : nm of members of class and of cluster , max , 2 , , , , , , ,
ik i ik ik
i i i i c C k i i i i i i i i
M C c K n c k c F C K F c k M R c k P c k F c k R c k P c k n R c k c n P c k k
1 2
2 4 , 1 1 , 2 1 2 4 R c k R c k
1 2
2 4 , 1 , 2 6 6 P c k R c k
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
66
1 2
2 4 , , 1 , 2 0.7 6 6 F C K F c k F c k
Class 1 Class 2 Labeled Unlabeled
1 1 1
: nm of labeled datapoints : the set of classes : nm of clusters, : nm of members of class and of cluster , max , 2 , , , , , , ,
ik i ik ik
i i i i c C k i i i i i i i i
M C c K n c k c F C K F c k M R c k P c k F c k R c k P c k n R c k c n P c k k
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
67
Picks for each class the cluster with the maximal F1 measure Recall: proportion of datapoints correctly classified/clusterized Precision: proportion of datapoints of the same class in the cluster
1 1 1
: nm of labeled datapoints : the set of classes : nm of clusters, : nm of members of class and of cluster , max , 2 , , , , , , ,
ik i ik ik
i i i i c C k i i i i i i i i
M C c K n c k c F C K F c k M R c k P c k F c k R c k P c k n R c k c n P c k k
Penalize fraction of labeled points in each class
(careful: similar but not the same F-measure as the F-measure we will see for classification!)
Tradeoff between clustering correctly all datapoints of the same class in the same cluster and making sure that each cluster contains points of only one class.
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
68
APPLIED MACHINE LEARNING
69
El-Khoury, S., Miao, Li and Billard, A. (2013) On the Generation of a Variety of Grasps. Robotics and Autonomous Systems Journal.
APPLIED MACHINE LEARNING
70
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
71
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
72
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
73
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
74
MACHINE LEARNING - MSc Course
APPLIED MACHINE LEARNING
75