Data Mining and Machine Learning: Fundamental Concepts and - - PowerPoint PPT Presentation

data mining and machine learning fundamental concepts and
SMART_READER_LITE
LIVE PREVIEW

Data Mining and Machine Learning: Fundamental Concepts and - - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science


slide-1
SLIDE 1

Data Mining and Machine Learning: Fundamental Concepts and Algorithms

dataminingbook.info Mohammed J. Zaki1 Wagner Meira Jr.2

1Department of Computer Science

Rensselaer Polytechnic Institute, Troy, NY, USA

2Department of Computer Science

Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 14: Hierarchical Clustering

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 1 / 14

slide-2
SLIDE 2

Hierarchical Clustering

The goal of hierarchical clustering is to create a sequence of nested partitions, which can be conveniently visualized via a tree or hierarchy of clusters, also called the cluster dendrogram. The clusters in the hierarchy range from the fine-grained to the coarse-grained – the lowest level of the tree (the leaves) consists of each point in its own cluster, whereas the highest level (the root) consists of all points in one cluster. Agglomerative hierarchical clustering methods work in a bottom-up manner. Starting with each of the n points in a separate cluster, they repeatedly merge the most similar pair of clusters until all points are members of the same cluster.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 2 / 14

slide-3
SLIDE 3

Hierarchical Clustering: Nested Partitions

Given a dataset D = {x1,...,xn}, where xi ∈ Rd, a clustering C = {C1,...,Ck} is a partition of D. A clustering A = {A1,...,Ar} is said to be nested in another clustering B = {B1,...,Bs} if and only if r > s, and for each cluster Ai ∈ A, there exists a cluster Bj ∈ B, such that Ai ⊆ Bj. Hierarchical clustering yields a sequence of n nested partitions C1,...,Cn. The clustering Ct−1 is nested in the clustering Ct. The cluster dendrogram is a rooted binary tree that captures this nesting structure, with edges between cluster Ci ∈ Ct−1 and cluster Cj ∈ Ct if Ci is nested in Cj, that is, if Ci ⊂ Cj.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 3 / 14

slide-4
SLIDE 4

Hierarchical Clustering Dendrogram

ABCDE ABCD AB A B CD C D E

The dendrogram represents the following sequence of nested partitions:

Clustering Clusters C1 {A},{B},{C},{D},{E} C2 {AB},{C},{D},{E} C3 {AB},{CD},{E} C4 {ABCD},{E} C5 {ABCDE}

with Ct−1 ⊂ Ct for t = 2,...,5. We assume that A and B are merged before C and D.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 4 / 14

slide-5
SLIDE 5

Number of Hierarchical Clusterings

The total number of different dendrograms with n leaves is given as:

n−1

  • m=1

(2m − 1) = 1 × 3 × 5 × 7 × ··· × (2n − 3) = (2n − 3)!!

1

b

(a) n = 1

b

1

b

2

b

(b) n = 2

b b

1

b

3

b

2

b b

1

b b

2

b

3

b b b

1

b

2

b

3

b

(c) n = 3

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 5 / 14

slide-6
SLIDE 6

Agglomerative Hierarchical Clustering

In agglomerative hierarchical clustering, we begin with each of the n points in a separate cluster. We repeatedly merge the two closest clusters until all points are members of the same cluster. Given a set of clusters C = {C1,C2,..,Cm}, we find the closest pair of clusters Ci and Cj and merge them into a new cluster Cij = Ci ∪ Cj. Next, we update the set of clusters by removing Ci and Cj and adding Cij, as follows C =

  • C \ {Ci,Cj}
  • ∪ {Cij}.

We repeat the process until C contains only one cluster. If specified, we can stop the merging process when there are exactly k clusters remaining.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 6 / 14

slide-7
SLIDE 7

Agglomerative Hierarchical Clustering Algorithm

AgglomerativeClustering(D,k):

1 C ← {Ci = {xi} | xi ∈ D} // Each point in separate cluster 2 ∆ ← {δ(xi,xj): xi,xj ∈ D} // Compute distance matrix 3 repeat 4

Find the closest pair of clusters Ci,Cj ∈ C

5

Cij ← Ci ∪ Cj // Merge the clusters

6

C ←

  • C \ {Ci,Cj}
  • ∪ {Cij} // Update the clustering

7

Update distance matrix ∆ to reflect new clustering

8 until |C| = k

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 7 / 14

slide-8
SLIDE 8

Distance between Clusters

Single, Complete and Average

A typical distance between two points is the Euclidean distance or L2-norm δ(x,y) = x − y2 = d

  • i=1

(xi − yi)21/2 Single Link: The minimum distance between a point in Ci and a point in Cj δ(Ci,Cj) = min{δ(x,y) | x ∈ Ci,y ∈ Cj} Complete Link: The maximum distance between points in the two clusters: δ(Ci,Cj) = max{δ(x,y) | x ∈ Ci,y ∈ Cj} Group Average: The average pairwise distance between points in Ci and Cj: δ(Ci,Cj) =

  • x∈Ci
  • y∈Cj δ(x,y)

ni · nj

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 8 / 14

slide-9
SLIDE 9

Distance between Clusters: Mean and Ward’s

Mean Distance: The distance between two clusters is defined as the distance between the means or centroids of the two clusters: δ(Ci,Cj) = δ(µi,µj) Minimum Variance or Ward’s Method: The distance between two clusters is defined as the increase in the sum of squared errors (SSE) when the two clusters are merged, where the SSE for a given cluster Ci is given as δ(Ci,Cj) = ∆SSEij = SSEij − SSEi − SSEj where SSEi =

x∈Ci x − µi2. After simplification, we get:

δ(Ci,Cj) = ninj ni + nj

  • µi − µj
  • 2

Ward’s measure is therefore a weighted version of the mean distance measure.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 9 / 14

slide-10
SLIDE 10

Single Link Agglomerative Clustering

ABCDE δ E ABCD 3 ABCD δ CD E AB 2 3 CD 3 CD δ C D E AB 3 2 3 C 1 3 D 5 AB δ B C D E A 1 3 2 4 B 3 2 3 C 1 3 D 5 A B C D E 1 1 1 1 2 2 3 3

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 10 / 1

slide-11
SLIDE 11

Lance–Williams Formula

Whenever two clusters Ci and Cj are merged into Cij, we need to update the distance matrix by recomputing the distances from the newly created cluster Cij to all other clusters Cr (r = i and r = j). The Lance–Williams formula provides a general equation to recompute the distances for all of the cluster proximity measures δ(Cij,Cr) = αi · δ(Ci,Cr) + αj · δ(Cj,Cr) + β · δ(Ci,Cj) + γ ·

  • δ(Ci,Cr) − δ(Cj,Cr)
  • The coefficients αi,αj,β, and γ differ from one measure to another.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 11 / 1

slide-12
SLIDE 12

Lance–Williams Formulas for Cluster Proximity

Measure αi αj β γ Single link

1 2 1 2

− 1

2

Complete link

1 2 1 2 1 2

Group average

ni ni +nj nj ni +nj

Mean distance

ni ni +nj nj ni +nj −ni ·nj (ni +nj )2

Ward’s measure

ni +nr ni +nj +nr nj +nr ni +nj +nr −nr ni +nj +nr

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 12 / 1

slide-13
SLIDE 13

Iris Dataset: Complete Link Clustering

−4 −3 −2 −1 1 2 3 −1.5 −1.0 −0.5 0.5 1.0 u1 u2

bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uTrS rS rS rS rS rS rS rS rS rS rS rS rS rS

Contingency Table: iris-setosa iris-virginica iris-versicolor C1 (circle) 50 C2 (triangle) 1 36 C3 (square) 49 14

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 13 / 1

slide-14
SLIDE 14

Data Mining and Machine Learning: Fundamental Concepts and Algorithms

dataminingbook.info Mohammed J. Zaki1 Wagner Meira Jr.2

1Department of Computer Science

Rensselaer Polytechnic Institute, Troy, NY, USA

2Department of Computer Science

Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 14: Hierarchical Clustering

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 14 / 1