CS 472 - Clustering 1
Unsupervised Learning and Clustering
l In unsupervised learning you are given a data set with no output
classifications (labels)
l Clustering is an important type of unsupervised learning
–
Unsupervised Learning and Clustering l In unsupervised learning you - - PowerPoint PPT Presentation
Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no output classifications (labels) l Clustering is an important type of unsupervised learning PCA was another type of unsupervised learning l The
CS 472 - Clustering 1
–
CS 472 - Clustering 2
l Problematic, e.g. when k is pre-defined (How about k = 2 above) l If k = 3 above then it could be its own cluster, rarely used, but at least
l Could remove clusters with 1 or few elements as a post-process step
l Can significantly adjust cluster radius, and cause it to absorb other
l Detection non-trivial – when is it really an outlier?
CS 472 - Clustering 3
l Then just measure distance to the centroid
CS 472 - Clustering 4
CS 472 - Clustering 5
CS 472 - Clustering 6
CS 472 - Clustering 7
CS 472 - Clustering 8
CS 472 - Clustering 10
CS 472 - Clustering 11
CS 472 - Clustering 12
CS 472 - Clustering 13
i=1 |X c |
CS 472 - Clustering 14
15
j distij i=1 |C|
CS 472 - Clustering
CS 472 - Clustering 16
CS 472 - Clustering 17
CS 472 - Clustering 18
CS 472 - Clustering 19
CS 472 - Clustering 20
CS 472 - Clustering 21
CS 472 - Clustering 22
CS 472 - Clustering 23
CS 472 - Clustering 24
CS 472 - Clustering 25
l Could use cluster validity metrics (e.g. Silhouette) to help in the decision
CS 472 - Clustering 26
CS 472 - Clustering 27
l K-medoids finds medoid (median) centers rather than average centers and
l Could compare different solutions for a specific k value by seeing which
l And test solutions with different k values using Silhouette or other metric
CS 472 - Clustering 28
CS 472 - Clustering 29
CS 472 - Clustering 30
–
–
CS 472 - Clustering 31
CS 472 - Clustering 32
CS 472 - Clustering 33
CS 472 - Clustering 34
CS 472 - Clustering 35
CS 472 - Clustering 36
CS 472 - Clustering 37
CS 472 - Clustering 38