Pattern Analysis and Machine Intelligence
Lecture Notes on Clustering (III) 2010-2011
Davide Eynard
eynard@elet.polimi.it
Department of Electronics and Information Politecnico di Milano
– p. 1/32
Pattern Analysis and Machine Intelligence Lecture Notes on - - PowerPoint PPT Presentation
Pattern Analysis and Machine Intelligence Lecture Notes on Clustering (III) 2010-2011 Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano p. 1/32 Course Schedule [ Tentative ] Date Topic
Davide Eynard
eynard@elet.polimi.it
Department of Electronics and Information Politecnico di Milano
– p. 1/32
Course Schedule [Tentative]
Date Topic 13/04/2011 Clustering I: Introduction, K-means 20/04/2011 Clustering II: K-M alternatives, Hierarchical, SOM 27/04/2011 Clustering III: Mixture of Gaussians, DBSCAN, J-P 04/05/2011 Clustering IV: Evaluation Measures
– p. 2/32
Lecture outline
– p. 3/32
Mixture of Gaussians
– p. 4/32
Clustering as a Mixture of Gaussians
parametric distribution, like a Gaussian (continuous) or a Poisson (discrete)
distributions
traits:
cluster are tight)
patterns in data are captured by component distributions)
– p. 5/32
Advantages of Model-Based Clustering
– p. 6/32
Mixture of Gaussians
It is the most widely used model-based clustering method: we can actually consider clusters as Gaussian distributions centered on their barycentres (as we can see in the figure, where the grey circle represents the first variance of the distribution).
– p. 7/32
How does it work?
probability P(ωi)
P(ω1), . . . , P(ωK), σ
P(x|ωi, µ1, µ2, . . . , µK) (probability that an observation from class ωi would have value x given class means µ1, . . . , µK)
... Can we do it? How? (let’s first look at some examples on Expectation Maximization...)
– p. 8/32
The Algorithm
The algorithm is composed of the following steps:
λ0 = {µ(0)
1 , µ(0) 2 , . . . , µ(0) k , p(0) 1 , p(0) 2 , . . . , p(0) k }
where p(t)
i
is shorthand for P(ωi) at t-th iteration
– p. 9/32
The Algorithm
The algorithm is composed of the following steps:
λ0 = {µ(0)
1 , µ(0) 2 , . . . , µ(0) k , p(0) 1 , p(0) 2 , . . . , p(0) k }
where p(t)
i
is shorthand for P(ωi) at t-th iteration
P(ωj|xk, λt) = P(xk|ωj, λt)P(ωj|λt) P(xk|λt) = P(xk|ωi, µ(t)
i
, σ2)pi(t)
j , σ2)p(t) j
– p. 9/32
The Algorithm
The algorithm is composed of the following steps:
λ0 = {µ(0)
1 , µ(0) 2 , . . . , µ(0) k , p(0) 1 , p(0) 2 , . . . , p(0) k }
where p(t)
i
is shorthand for P(ωi) at t-th iteration
P(ωj|xk, λt) = P(xk|ωj, λt)P(ωj|λt) P(xk|λt) = P(xk|ωi, µ(t)
i
, σ2)pi(t)
j , σ2)p(t) j
µ(t+1)
i
=
p(t+1)
i
=
R where R is the number of records
– p. 9/32
Mixture of Gaussians Demo
Time for a demo!
– p. 10/32
Question
What if we had a dataset like this?
– p. 11/32
DBSCAN
– p. 12/32
DBSCAN: background
radius
– p. 13/32
DBSCAN: background
core point
– p. 14/32
DBSCAN: core, border and noise points
Eps = 10, MinPts = 4
– p. 15/32
DBSCAN: background
(Eps, MinPts) if:
(the relation is symmetric for pairs of core points)
points p1, . . . , pn (where p1 = q and pn = p) such that pi+1 is directly density-reachable from pi for every i
that both p and q are density-reachable from o
be a core point in C from which both border points are density-reachable)
– p. 16/32
DBSCAN: background
which is maximal wrt. density-reachability
belonging to any of its clusters
– p. 17/32
DBSCAN algorithm
– p. 18/32
DBSCAN evaluation
– p. 19/32
When DBSCAN works well
– p. 20/32
Clustering using a similarity measure
clustering, where long "stringy" clusters are the rule, not the exception.
algorithm must be self-scaling, since it is expected to find both straggly, diverse clusters and tight ones
concept of a cluster is not acceptable
– p. 21/32
Jarvis-Patrick
– p. 22/32
Jarvis-Patrick
Euclidean vector space
near neighbors
respective k nearest neighbor lists match
required that the tested points themselves belong to the common neighborhood
– p. 23/32
Jarvis-Patrick
Automatic scaling of neighborhoods (k=5)
– p. 24/32
Jarvis-Patrick
“Trap condition” for k=7: Xi belongs to Xj’s neighborhood, but not vice versa.
– p. 25/32
JP algorithm
Once the neighborhood lists have been tabulated, the raw data can be discarded.
set to the first entry of the corresponding neighborhood row.
replace both label entries by the smaller of the two existing entries if both 0th neighbors are found in both neighborhood rows and at least kt neighbor matches exist between the two
(throughout the entire label table) with the lower label if the above test is successful.
identical labeling of the points belonging to the clusters.
– p. 26/32
JP algorithm
– p. 27/32
JP: alternative approaches
Similarity matrix
– p. 28/32
JP: alternative approaches
Hierarchical clustering - dendrogram
– p. 29/32
JP: conclusions
Pros:
Cons:
expensive to generate
– p. 30/32
Bibliography
Andrew W. Moore
– p. 31/32
– p. 32/32