Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 14: Hierarchical Clustering Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 1 / 14

Hierarchical Clustering The goal of hierarchical clustering is to create a sequence of nested partitions, which can be conveniently visualized via a tree or hierarchy of clusters, also called the cluster dendrogram . The clusters in the hierarchy range from the fine-grained to the coarse-grained – the lowest level of the tree (the leaves) consists of each point in its own cluster, whereas the highest level (the root) consists of all points in one cluster. Agglomerative hierarchical clustering methods work in a bottom-up manner. Starting with each of the n points in a separate cluster, they repeatedly merge the most similar pair of clusters until all points are members of the same cluster. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 2 / 14

Hierarchical Clustering: Nested Partitions Given a dataset D = { x 1 ,..., x n } , where x i ∈ R d , a clustering C = { C 1 ,..., C k } is a partition of D . A clustering A = { A 1 ,..., A r } is said to be nested in another clustering B = { B 1 ,..., B s } if and only if r > s , and for each cluster A i ∈ A , there exists a cluster B j ∈ B , such that A i ⊆ B j . Hierarchical clustering yields a sequence of n nested partitions C 1 ,..., C n . The clustering C t − 1 is nested in the clustering C t . The cluster dendrogram is a rooted binary tree that captures this nesting structure, with edges between cluster C i ∈ C t − 1 and cluster C j ∈ C t if C i is nested in C j , that is, if C i ⊂ C j . Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 3 / 14

Hierarchical Clustering Dendrogram The dendrogram represents the following sequence of nested partitions: ABCDE Clustering Clusters C 1 { A } , { B } , { C } , { D } , { E } ABCD C 2 { AB } , { C } , { D } , { E } C 3 { AB } , { CD } , { E } C 4 { ABCD } , { E } C 5 { ABCDE } AB CD with C t − 1 ⊂ C t for t = 2 ,..., 5. We assume that A and B are merged before A B C D E C and D . Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 4 / 14

b b b b b b b b b b b b b b b b b b b Number of Hierarchical Clusterings The total number of different dendrograms with n leaves is given as: n − 1 � ( 2 m − 1 ) = 1 × 3 × 5 × 7 × ··· × ( 2 n − 3 ) = ( 2 n − 3 )!! m = 1 1 2 1 3 (a) n = 1 1 2 (b) n = 2 1 3 2 3 1 2 (c) n = 3 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 5 / 14

Agglomerative Hierarchical Clustering In agglomerative hierarchical clustering, we begin with each of the n points in a separate cluster. We repeatedly merge the two closest clusters until all points are members of the same cluster. Given a set of clusters C = { C 1 , C 2 ,.., C m } , we find the closest pair of clusters C i and C j and merge them into a new cluster C ij = C i ∪ C j . Next, we update the set of clusters by removing C i and C j and adding C ij , as � � follows C = C \ { C i , C j } ∪ { C ij } . We repeat the process until C contains only one cluster. If specified, we can stop the merging process when there are exactly k clusters remaining. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 6 / 14

Agglomerative Hierarchical Clustering Algorithm AgglomerativeClustering( D , k ): 1 C ← { C i = { x i } | x i ∈ D } // Each point in separate cluster 2 ∆ ← { δ ( x i , x j ): x i , x j ∈ D } // Compute distance matrix 3 repeat Find the closest pair of clusters C i , C j ∈ C 4 C ij ← C i ∪ C j // Merge the clusters 5 � � C ← C \ { C i , C j } ∪ { C ij } // Update the clustering 6 Update distance matrix ∆ to reflect new clustering 7 8 until |C| = k Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 7 / 14

Distance between Clusters Single, Complete and Average A typical distance between two points is the Euclidean distance or L 2 - norm � d ( x i − y i ) 2 � 1 / 2 � δ ( x , y ) = � x − y � 2 = i = 1 Single Link: The minimum distance between a point in C i and a point in C j δ ( C i , C j ) = min { δ ( x , y ) | x ∈ C i , y ∈ C j } Complete Link: The maximum distance between points in the two clusters: δ ( C i , C j ) = max { δ ( x , y ) | x ∈ C i , y ∈ C j } Group Average: The average pairwise distance between points in C i and C j : � � y ∈ C j δ ( x , y ) x ∈ C i δ ( C i , C j ) = n i · n j Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 8 / 14

Distance between Clusters: Mean and Ward’s Mean Distance: The distance between two clusters is defined as the distance between the means or centroids of the two clusters: δ ( C i , C j ) = δ ( µ i , µ j ) Minimum Variance or Ward’s Method: The distance between two clusters is defined as the increase in the sum of squared errors (SSE) when the two clusters are merged, where the SSE for a given cluster C i is given as δ ( C i , C j ) = ∆ SSE ij = SSE ij − SSE i − SSE j x ∈ C i � x − µ i � 2 . After simplification, we get: where SSE i = � � n i n j �� 2 � δ ( C i , C j ) = � µ i − µ j n i + n j Ward’s measure is therefore a weighted version of the mean distance measure. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 9 / 14

Single Link Agglomerative Clustering ABCDE 3 δ E ABCD ABCD 3 2 δ CD E AB 3 CD 2 2 CD 3 3 C D E δ AB 3 2 3 AB C 3 1 1 1 D 5 1 1 δ B C D E A 3 2 4 1 A B C D E B 3 2 3 1 3 C D 5 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 10 / 1

Lance–Williams Formula Whenever two clusters C i and C j are merged into C ij , we need to update the distance matrix by recomputing the distances from the newly created cluster C ij to all other clusters C r ( r � = i and r � = j ). The Lance–Williams formula provides a general equation to recompute the distances for all of the cluster proximity measures δ ( C ij , C r ) = α i · δ ( C i , C r ) + α j · δ ( C j , C r ) + � � β · δ ( C i , C j ) + γ · � δ ( C i , C r ) − δ ( C j , C r ) � The coefficients α i ,α j ,β, and γ differ from one measure to another. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 11 / 1

Lance–Williams Formulas for Cluster Proximity Measure α i α j β γ 1 1 − 1 Single link 0 2 2 2 1 1 1 Complete link 0 2 2 2 n j n i Group average 0 0 n i + n j n i + n j n j − n i · n j n i Mean distance 0 n i + n j n i + n j ( n i + n j ) 2 n j + n r n i + n r − n r Ward’s measure 0 n i + n j + n r n i + n j + n r n i + n j + n r Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 12 / 1

rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS uT rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT rS uT rS rS rS uT uT uT uT uT uT uT uT uT uT uT uT rS rS uT bC bC uT bC bC bC bC bC uT bC bC bC bC bC bC bC bC bC rS rS rS rS rS rS rS rS bC rS rS rS rS rS uTrS uT bC bC rS bC bC bC bC bC bC bC rS bC rS rS rS rS rS rS bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC uT Iris Dataset: Complete Link Clustering u 2 1 . 0 bC bC 0 . 5 bC bC 0 − 0 . 5 − 1 . 0 u 1 − 1 . 5 − 4 − 3 − 2 − 1 0 1 2 3 Contingency Table: iris-setosa iris-virginica iris-versicolor C 1 (circle) 50 0 0 C 2 (triangle) 0 1 36 C 3 (square) 0 49 14 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 13 / 1

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 14: Hierarchical Clustering Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 14: Hierarchical Clustering 14 / 1

Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 1 of Data Mining by

Introduction What is data mining? to Data mining functionalities Data Mining Major

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Lecture 22: Clustering Distance measures K-Means Aykut Erdem May 2016 Hacettepe

Lecture 21: Unsupervised Learning and Clustering Algorithms Dr. Chengjiang Long Computer Vision

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred &

RECSM Summer School: Machine Learning for Social Sciences Session 3.4: Hierarchical Clustering

Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393

Geographic Data Science - Lecture VII Grouping Data over Space Dani Arribas-Bel Today The need

Machine Learning (AIMS) - MT 2017 2. Clustering Varun Kanade University of Oxford November 7,

Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 1 of Data Mining by

Introduction What is data mining? to Data mining functionalities Data Mining Major

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Lecture 22: Clustering Distance measures K-Means Aykut Erdem May 2016 Hacettepe

Lecture 21: Unsupervised Learning and Clustering Algorithms Dr. Chengjiang Long Computer Vision

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred &amp;

RECSM Summer School: Machine Learning for Social Sciences Session 3.4: Hierarchical Clustering

Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393

Geographic Data Science - Lecture VII Grouping Data over Space Dani Arribas-Bel Today The need

Machine Learning (AIMS) - MT 2017 2. Clustering Varun Kanade University of Oxford November 7,

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred &