SLIDE 1 Unsupervised learning
Clustering and Dimensionality Reduction
Marta Arias marias@cs.upc.edu
Fall 2018
SLIDE 2
Clustering
Partition input examples into similar subsets
SLIDE 3
Clustering
Partition input examples into similar subsets
SLIDE 4
Clustering
Main challenges
◮ How to measure similarity? ◮ How many clusters? ◮ How do we evaluate the clusters?
Algorithms we will cover
◮ K-means ◮ Hierarchical clustering
SLIDE 5 K-means clustering
Intuition
◮ Input data are:
◮ m examples x1, .., xm, and ◮ K, the number of desired clusters
◮ Clusters represented by cluster centers µ1, .., µK ◮ Given centers µ1, .., µK, each center defines a cluster: the
subset of inputs xi that are closer to it than to other centers
SLIDE 6 K-means clustering
Intuition
The aim is to find
◮ cluster centers µ1, .., µK and ◮ a cluster assignment z = (z 1, .., z m) where z i ∈ {1, .., K}
◮ z i is the cluster assigned to example xi
such that µ1, .., µK, z minimize the cost function J(µ1, .., µK, z) =
xi − µz i2.
SLIDE 7 K-means clustering
Cost function
J(µ1, .., µK, z) =
xi − µz i2
Pseudocode
◮ Pick initial centers µ1, .., µK at random ◮ Repeat until convergence
◮ Optimize z in J(µ1, .., µK, z) keeping µ1, .., µK fixed ◮ Set z i to closest center: z i = arg min k
xi − µk2
◮ Optimize µ1, .., µK in J(µ1, .., µK, z) keeping z fixed ◮ For each k = 1, .., K, set µk =
1 |{i|z i = k}|
xi
SLIDE 8
K-Means illustrated
SLIDE 9
Limitations of k-Means
K-Means works well if..
◮ Clusters are spherical ◮ Clusters are well separated ◮ Clusters are of similar volumes ◮ Clusters have similar number of points
.. so improve it with more general model
◮ Mixture of Gaussians: ◮ Learn it using Expectation Maximization
SLIDE 10
Hierarchical clustering
Output is a dendogram
SLIDE 11 Agglomerative hierarchical clustering
Bottom-up
Pseudocode
- 1. Start with one cluster per example
- 2. Repeat until all examples in one cluster
◮ merge two closest clusters
(Next example from D. Blei’s course at Princeton)
SLIDE 12 Example
40 60 80 −20 20 40 60 80
Data
Clustering 02 5 / 21
SLIDE 13 Example
40 60 80 −20 20 40 60 80
iteration 001
V1 V2
Clustering 02 5 / 21
SLIDE 14 Example
40 60 80 −20 20 40 60 80
iteration 002
V1 V2
Clustering 02 5 / 21
SLIDE 15 Example
40 60 80 −20 20 40 60 80
iteration 003
V1 V2
Clustering 02 5 / 21
SLIDE 16 Example
40 60 80 −20 20 40 60 80
iteration 004
V1 V2
Clustering 02 5 / 21
SLIDE 17 Example
40 60 80 −20 20 40 60 80
iteration 005
V1 V2
Clustering 02 5 / 21
SLIDE 18 Example
40 60 80 −20 20 40 60 80
iteration 006
V1 V2
Clustering 02 5 / 21
SLIDE 19 Example
40 60 80 −20 20 40 60 80
iteration 007
V1 V2
Clustering 02 5 / 21
SLIDE 20 Example
40 60 80 −20 20 40 60 80
iteration 008
V1 V2
Clustering 02 5 / 21
SLIDE 21 Example
40 60 80 −20 20 40 60 80
iteration 009
V1 V2
Clustering 02 5 / 21
SLIDE 22 Example
40 60 80 −20 20 40 60 80
iteration 010
V1 V2
Clustering 02 5 / 21
SLIDE 23 Example
40 60 80 −20 20 40 60 80
iteration 011
V1 V2
Clustering 02 5 / 21
SLIDE 24 Example
40 60 80 −20 20 40 60 80
iteration 012
V1 V2
Clustering 02 5 / 21
SLIDE 25 Example
40 60 80 −20 20 40 60 80
iteration 013
V1 V2
Clustering 02 5 / 21
SLIDE 26 Example
40 60 80 −20 20 40 60 80
iteration 014
V1 V2
Clustering 02 5 / 21
SLIDE 27 Example
40 60 80 −20 20 40 60 80
iteration 015
V1 V2
Clustering 02 5 / 21
SLIDE 28 Example
40 60 80 −20 20 40 60 80
iteration 016
V1 V2
Clustering 02 5 / 21
SLIDE 29 Example
40 60 80 −20 20 40 60 80
iteration 017
V1 V2
Clustering 02 5 / 21
SLIDE 30 Example
40 60 80 −20 20 40 60 80
iteration 018
V1 V2
Clustering 02 5 / 21
SLIDE 31 Example
40 60 80 −20 20 40 60 80
iteration 019
V1 V2
Clustering 02 5 / 21
SLIDE 32 Example
40 60 80 −20 20 40 60 80
iteration 020
V1 V2
Clustering 02 5 / 21
SLIDE 33 Example
40 60 80 −20 20 40 60 80
iteration 021
V1 V2
Clustering 02 5 / 21
SLIDE 34 Example
40 60 80 −20 20 40 60 80
iteration 022
V1 V2
Clustering 02 5 / 21
SLIDE 35 Example
40 60 80 −20 20 40 60 80
iteration 023
V1 V2
Clustering 02 5 / 21
SLIDE 36 Example
40 60 80 −20 20 40 60 80
iteration 024
V1 V2
Clustering 02 5 / 21
SLIDE 37 Agglomerative hierarchical clustering
Bottom-up
Pseudocode
- 1. Start with one cluster per example
- 2. Repeat until all examples in one cluster
◮ merge two closest clusters
Defining distance between clusters (i.e. sets of points)
◮ Single Linkage: d(X , Y ) =
min
x∈X ,y∈Y d(x, y)
◮ Complete Linkage: d(X , Y ) =
max
x∈X ,y∈Y d(x, y)
◮ Group Average: d(X , Y ) =
|X | × |Y |
◮ Centroid Distance: d(X , Y ) = d( 1
|X |
x, 1 |Y |
y)
SLIDE 38
Many, many, many other algorithms available ..
SLIDE 39
Clustering with scikit-learn I
K-means: an example with the Iris dataset
SLIDE 40
Clustering with scikit-learn II
K-means: an example with the Iris dataset
SLIDE 41
Clustering with scikit-learn I
Hierarchical clustering: an example with the Iris dataset
SLIDE 42
Dimensionality reduction I
The curse of dimensionality
◮ When dimensionality increases, data becomes increasingly
sparse in the space that it occupies
◮ Definitions of density and distance between points (critical
for many tasks!) become less meaningful
◮ Visualization and qualitative analysis becomes impossible
SLIDE 43
Dimensionality reduction II
The curse of dimensionality
And so dimensionality reduction methods..
◮ avoid or at least mitigate the curse of dimensionality ◮ reduce time and memory required ◮ allow data to be more easily visualized ◮ may help eliminate irrelevant features ◮ may reduce noise
SLIDE 44
Principal Components Analysis
Find linear projections of original coordinates that maximize variance
SLIDE 45 t-SNE: t-distributed stochastic neighbor embedding
A non-linear distance-preserving method
1
1From https://lvdmaaten.github.io/tsne/
SLIDE 46
Dimensionality reduction with scikit-learn
SLIDE 47
So, we are done for this course
Lots of important things we have left out!
◮ Online and incremental learning; data mining for streams ◮ Important models: Support Vector Machines, Neural Nets
(and Deep learning)
◮ Kernel methods and learning from structured objects ◮ Ensemble methods: random forests, boosting, bagging, etc. ◮ Spatial and temporal learning ◮ Feature selection methods ◮ many many more...
SLIDE 48 Para finalizar
Reading assignment
Article by Pedro Domingos: “A few useful things to know about machine learning”
Examen: 17 de diciembre 2018
Será “tipo test”. Consistirá en preguntas rápidas de tipo
- conceptual. No es necesaria calculadora. Sin apuntes. Si se
necesita alguna fórmula ya se pondrá en el enunciado.