Pattern Analysis and Machine Intelligence
Lecture Notes on Clustering (II) 2011-2012
Davide Eynard
eynard@elet.polimi.it
Department of Electronics and Information Politecnico di Milano
– p. 1/29
Pattern Analysis and Machine Intelligence Lecture Notes on - - PowerPoint PPT Presentation
Pattern Analysis and Machine Intelligence Lecture Notes on Clustering (II) 2011-2012 Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano p. 1/29 Course Schedule [ Tentative ] Date Topic
Davide Eynard
eynard@elet.polimi.it
Department of Electronics and Information Politecnico di Milano
– p. 1/29
Course Schedule [Tentative]
Date Topic 07/05/2012 Clustering I: Introduction, K-means 14/05/2012 Clustering II: K-M alternatives, Hierarchical, SOM 21/05/2012 Clustering III: Mixture of Gaussians, DBSCAN, J-P 28/05/2012 Clustering IV: Evaluation Measures
– p. 2/29
K-Means limits
Importance of choosing initial centroids
– p. 3/29
K-Means limits
Importance of choosing initial centroids
– p. 3/29
K-Means limits
Differing sizes
– p. 3/29
K-Means limits
Differing density
– p. 3/29
K-Means limits
Non-globular shapes
– p. 3/29
K-Means: higher K
What if we tried to increase K to solve K-Means problems?
– p. 4/29
K-Means: higher K
What if we tried to increase K to solve K-Means problems?
– p. 4/29
K-Means: higher K
What if we tried to increase K to solve K-Means problems?
– p. 4/29
K-Medoids
distort the distribution of the data
representative point of the cluster
centroid could be not part of the cluster
representative point of a set
– p. 5/29
PAM
PAM means Partitioning Around Medoids. The algorithm follows:
– p. 6/29
PAM
values than a mean (can you tell why?)
large data sets
is # of clusters
positions of points but just their distances!
– p. 7/29
Fuzzy C-Means
Fuzzy C-Means (FCM, developed by Dunn in 1973 and improved by Bezdek in 1981) is a method of clustering which allows one piece of data to belong to two or more clusters.
Jm =
N
C
um
ij xi − cj2, 1 ≤ m < ∞ where: m is any real number greater than 1 (fuzziness coefficient), uij is the degree of membership of xi in the cluster j, xi is the i-th of d-dimensional measured data, cj is the d-dimension center of the cluster, · is any norm expressing the similarity between measured data and the center.
– p. 8/29
K-Means vs. FCM
– p. 9/29
K-Means vs. FCM
cluster, but they may belong to several clusters (with different membership values)
– p. 9/29
Data representation
(KM)UN×C = 1 1 1 . . . . . . 1 (FCM)UN×C = 0.8 0.2 0.3 0.7 0.6 0.4 . . . . . . 0.9 0.1
– p. 10/29
FCM Algorithm
The algorithm is composed of the following steps:
– p. 11/29
FCM Algorithm
The algorithm is composed of the following steps:
cj = N
i=1 um ij · xi
N
i=1 um ij
– p. 11/29
FCM Algorithm
The algorithm is composed of the following steps:
cj = N
i=1 um ij · xi
N
i=1 um ij
uij = 1 C
k=1
xi−ck
m−1
– p. 11/29
FCM Algorithm
The algorithm is composed of the following steps:
cj = N
i=1 um ij · xi
N
i=1 um ij
uij = 1 C
k=1
xi−ck
m−1
– p. 11/29
An Example
– p. 12/29
An Example
– p. 12/29
An Example
– p. 12/29
FCM Demo
Time for a demo!
– p. 13/29
Hierarchical Clustering
– p. 14/29
Agglomerative Hierarchical Clustering
Given a set of N items to be clustered, and an N*N distance (or dissimilarity) matrix, the basic process of agglomerative hierarchical clustering is the following:
between the clusters be the same as the dissimilarities between the items they contain.
into a single cluster. Now, you have one cluster less.
the old ones.
cluster of size N.
– p. 15/29
Single Linkage (SL) clustering
the shortest distance from any member of one cluster to any member of the other one (greatest similarity).
– p. 16/29
Complete Linkage (CL) clustering
the greatest distance from any member of one cluster to any member of the other one (smallest similarity).
– p. 17/29
Group Average (GA) clustering
the average distance from any member of one cluster to any member of the other one.
– p. 18/29
About distances
If the data exhibit strong clustering tendency, all 3 methods produce similar results.
produced clusters can violate the “compactness” property (cluster with large diameters)
but can violate the “closeness” property)
clusters and relatively far apart. BUT it depends on the dissimilarity scale.
– p. 19/29
Hierarchical algorithms limits
Strength of MIN
– p. 20/29
Hierarchical algorithms limits
Limitations of MIN
– p. 20/29
Hierarchical algorithms limits
Strength of MAX
– p. 20/29
Hierarchical algorithms limits
Limitations of MAX
– p. 20/29
Hierarchical clustering: Summary
collection of groups
where n is the number of objects
– p. 21/29
Hierarchical Clustering Demo
Time for another demo!
– p. 22/29
Self Organizing Features Maps
Kohonen Self Organizing Features Maps (a.k.a. SOM) provide a way to represent multidimensional data in much lower dimensional spaces.
training set are maintained Example: Mapping of colors from their three dimensional components (i.e., red, green and blue) into two dimensions.
– p. 23/29
Self Organizing Feature Maps: The Topology
same dimension as the input vectors
A SOM does not need a target output to be specified; instead, where the node weights match the input vector, that area of the lattice is selectively optimized to more closely resemble the data vector
– p. 24/29
Self Organizing Features Maps: The Algorithm
Training occurs in several steps over many iterations:
– p. 25/29
Self Organizing Features Maps: The Algorithm
Training occurs in several steps over many iterations:
– p. 25/29
Self Organizing Features Maps: The Algorithm
Training occurs in several steps over many iterations:
vector (the winning node is commonly known as the Best Matching Unit)
– p. 25/29
Self Organizing Features Maps: The Algorithm
Training occurs in several steps over many iterations:
vector (the winning node is commonly known as the Best Matching Unit)
typically set to the ’radius’ of the lattice, but diminishes each time-step). Any nodes found within this radius are deemed to be inside the BMU’s neighborhood
– p. 25/29
Self Organizing Features Maps: The Algorithm
Training occurs in several steps over many iterations:
vector (the winning node is commonly known as the Best Matching Unit)
typically set to the ’radius’ of the lattice, but diminishes each time-step). Any nodes found within this radius are deemed to be inside the BMU’s neighborhood
– p. 25/29
Self Organizing Features Maps: The Algorithm
Training occurs in several steps over many iterations:
vector (the winning node is commonly known as the Best Matching Unit)
typically set to the ’radius’ of the lattice, but diminishes each time-step). Any nodes found within this radius are deemed to be inside the BMU’s neighborhood
– p. 25/29
Practical Learning of Self Organizing Features Maps
There are few things that have to be specified in the previous algorithm:
– p. 26/29
Practical Learning of Self Organizing Features Maps
There are few things that have to be specified in the previous algorithm:
input vector: ||x − wi|| = p
k=1(x[k] − wi[k])2
– p. 26/29
Practical Learning of Self Organizing Features Maps
There are few things that have to be specified in the previous algorithm:
input vector: ||x − wi|| = p
k=1(x[k] − wi[k])2
hij = e− (i−j)2
2σ2
– p. 26/29
Practical Learning of Self Organizing Features Maps
There are few things that have to be specified in the previous algorithm:
input vector: ||x − wi|| = p
k=1(x[k] − wi[k])2
hij = e− (i−j)2
2σ2
wi(t + 1) = wi + α(t)[x(t) − wi(t)], i ∈ Ni(t) wi, i / ∈ Ni(t)
– p. 26/29
Self Organizing Feature Maps Demo
Courtesy of: http://www.ai-junkie.com
– p. 27/29
Bibliography
Matteucci
Tutorial Slides by A. Moore
Tutorial Slides by P .L. Lanzi
Online tutorials by K. Teknomo
– p. 28/29
– p. 29/29