Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf - PowerPoint PPT Presentation

Geometric Data Analysis Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit´ e de Montr´ eal Fall 2019 MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 1 / 17

Outline Hierarchical clustering 1 Divisive & agglomerative approaches Dendrogram visualization Bisecting k -means Agglomerative clustering 2 Single linkage Complete linkage Average linkage Ward’s method Large-scale clustering 3 CURE BIRCH Chameleon MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 2 / 17

Hierarchical clustering Question: how many cluster should we find in the data? MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 3 / 17

Hierarchical clustering Question: how many cluster should we find in the data? Suggestion: why not consider all options in a single hierarchy? MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 3 / 17

Hierarchical clustering A hierarchical approach can be useful when considering versatile cluster shapes: 2-means By first detecting many small clusters, and then merging them, we can uncover patterns that are challenging for partitional methods. MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 3 / 17

Hierarchical clustering A hierarchical approach can be useful when considering versatile cluster shapes: 10-means By first detecting many small clusters, and then merging them, we can uncover patterns that are challenging for partitional methods. MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 3 / 17

Hierarchical clustering Divisive & agglomerative approaches Hierarchical clustering methods produce a set of nested clusters organized in a hierarchy tree. The cluster hierarchy is typically visualized using dendrograms Such approaches are applied either to provide multiresolution data organization, or to alleviate computational challenges when clustering big datasets In general, two approaches are applied to build nested clusters: divisive clustering and agglomerative clustering. Divisive approaches start with the entire data as one cluster, and then iteratively split “loose” clusters until a stopping criterion (e.g., k clusters or tight enough clusters) is satisfied. Agglomerative approaches start with small tight clusters, or even with single-point clusters, and then iteratively merge close clusters until only a single one remains. MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 4 / 17

Hierarchical clustering Dendrogram visualization A dendrogram is a tree graph that visualizes a sequence of cluster merges or divisions: Divisive methods take a top-down root → leaves approach. Agglomerative ones take a bottom-up leaves → root approach. MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 5 / 17

Hierarchical clustering Bisecting k-means Bisecting k -means is a divisive algorithm that utilized the k -means iteratively to bisect the data into clusters. Bisecting k -means Use 2 -means to split the data into two clusters 1 While there are less than k clusters: Select C as the cluster with the highest SSE Use 2 -means to split C into two clusters 1 Replace C with the two new clusters The hierarchical approach in this case is used to stabilize some of the weaknesses of the original k -means algorithm, and not for data organization purposes. 1 Choose best SSE out of t attempts MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 6 / 17

Hierarchical clustering Bisecting k-means Example MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 6 / 17

Agglomerative clustering Agglomerative clustering approaches are more popular than divisive ones. They all use variations of the following simple algorithm: Agglomerative clustering paradigm Build a singleton cluster for each data point Repeat the following steps: Find the two closest clusters Merge these two clusters together Until there is only a single cluster Two main choices distinguish agglomerative clustering algorithms: How to quantify proximity between clusters 1 How to merge clusters and efficiently update this proximity 2 MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 7 / 17

Agglomerative clustering Agglomerative clustering approaches are more popular than divisive ones. They all use variations of the following simple algorithm: Agglomerative clustering paradigm Build a singleton cluster for each data point Repeat the following steps: Find the two closest clusters Merge these two clusters together Until there is only a single cluster With proper implementation, this approach is also helpful for Big Data processing, since each iteration considers a smaller coarse-grained ver- sion of the dataset. MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 7 / 17

Agglomerative clustering MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 7 / 17

Agglomerative clustering Linkage How to quantify distance or similarity between clusters? MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 8 / 17

Agglomerative clustering Linkage How to quantify distance or similarity between clusters? Suggestion #1: represent clusters by centroids and use distance/similarity between them. MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 8 / 17

Agglomerative clustering Linkage How to quantify distance or similarity between clusters? Suggestion #1: represent clusters by centroids and use distance/similarity between them. Problem: this approach ignores the shapes of the clusters. MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 8 / 17

Agglomerative clustering Linkage How to quantify distance or similarity between clusters? Suggestion #1: represent clusters by centroids and use distance/similarity between them. Problem: this approach ignores the shapes of the clusters. Suggestion #2: combine pairwise distances between each point in one cluster and each point in the other cluster. This approach is called linkage . MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 8 / 17

Agglomerative clustering Single linkage Single linkage uses minimal distance (or maximum similarity) between a point in one cluster and a point in the other cluster. Only one inter-cluster link determines the distance, while many other links can be significantly weaker. MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 9 / 17

Agglomerative clustering Single linkage Example MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 9 / 17

Agglomerative clustering Complete linkage Complete linkage uses maximal distance (or minimal similarity) between a point in one cluster and a point in the other cluster. In some sense, all inter-cluster links are considered since they must all be strong to have a small distance. MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 10 / 17

Agglomerative clustering Complete linkage Example MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 10 / 17

Agglomerative clustering Average linkage Average linkage uses mean distance (or similarity) between points in one cluster and points in the other cluster. Less susceptible than single- and complete-linkage to noise an outliers, but biased toward globular clusters. MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 11 / 17

Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf - PowerPoint PPT Presentation

Geometric Data Analysis Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de Montr eal Fall 2019 MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 1 / 17 Outline Hierarchical

Hierarchical clustering David M. Blei COS424 Princeton University February 28, 2008 D. Blei

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

RECSM Summer School: Machine Learning for Social Sciences Session 3.4: Hierarchical Clustering

Chapter 7: Clustering (Unsupervised Data Organization) 7.1 Hierarchical Clustering 7.2 Flat

Clustering: Hierarchical Clustering and K- Means Clustering Machine

LECTURE 7 Clustering The k-means algorithm Hierarchical Clustering The DBSCAN algorithm

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Tropospheric Water Vapor Variability and Linkage to Tropospheric Water Vapor Variability and

B Street / Broadway Piers, Downtown Anchorage, and Switzer Creek TMDLs Public Workshop &

Shape Co-analysis and constrained clustering Daniel Cohen-Or Tel-Aviv University 1 High-level

Introduction to Microarray Data Analysis and Gene Networks Lecture 5 Alvis Brazma European

, Phase 1: Purpose and Need Pre-Solicitation Informational Meeting DEPARTMENT OF TRANSPORTATION

Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of

Oblivious Coopetitive Analytics Using Hardware Enclaves Ankur Dave , Chester Leung, Raluca Ada

Shado w price sensiti v it y anal y sis SU P P LY C H AIN AN ALYTIC S IN P YTH ON Aaren St u

Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf - PowerPoint PPT Presentation

Geometric Data Analysis Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de Montr eal Fall 2019 MAT 6480W (Guy Wolf) Hierarchical Clustering UdeM - Fall 2019 1 / 17 Outline Hierarchical

Hierarchical clustering David M. Blei COS424 Princeton University February 28, 2008 D. Blei

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

RECSM Summer School: Machine Learning for Social Sciences Session 3.4: Hierarchical Clustering

Chapter 7: Clustering (Unsupervised Data Organization) 7.1 Hierarchical Clustering 7.2 Flat

Clustering: Hierarchical Clustering and K- Means Clustering Machine

LECTURE 7 Clustering The k-means algorithm Hierarchical Clustering The DBSCAN algorithm

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Tropospheric Water Vapor Variability and Linkage to Tropospheric Water Vapor Variability and

B Street / Broadway Piers, Downtown Anchorage, and Switzer Creek TMDLs Public Workshop &amp;

Shape Co-analysis and constrained clustering Daniel Cohen-Or Tel-Aviv University 1 High-level

Introduction to Microarray Data Analysis and Gene Networks Lecture 5 Alvis Brazma European

, Phase 1: Purpose and Need Pre-Solicitation Informational Meeting DEPARTMENT OF TRANSPORTATION

Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of

Oblivious Coopetitive Analytics Using Hardware Enclaves Ankur Dave , Chester Leung, Raluca Ada

Shado w price sensiti v it y anal y sis SU P P LY C H AIN AN ALYTIC S IN P YTH ON Aaren St u

B Street / Broadway Piers, Downtown Anchorage, and Switzer Creek TMDLs Public Workshop &