Finding Clusters Types of Clustering Approaches: Linkage Based, - PowerPoint PPT Presentation

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-Means Density Based Clustering, e.g. DBScan Grid Based Clustering Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 1 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Hierarchical Clustering Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 2 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Hierarchical clustering 3 Iris virginica Iris versicolor 2 Iris setosa 1 0 –1 –2 –3 –3 –2 –1 0 1 2 3 In the two-dimensional MDS (Sammon mapping) representation of the Iris data set, two clusters can be identified. (The colours, indicating the species of the flowers, are ignored here.) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 3 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Hierarchical clustering Hierarchical clustering builds clusters step by step. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 4 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Hierarchical clustering Hierarchical clustering builds clusters step by step. Usually a bottom up strategy is applied by first considering each data object as a separate cluster and then step by step joining clusters together that are close to each other. This approach is called agglomerative hierarchical clustering. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 4 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Hierarchical clustering Hierarchical clustering builds clusters step by step. Usually a bottom up strategy is applied by first considering each data object as a separate cluster and then step by step joining clusters together that are close to each other. This approach is called agglomerative hierarchical clustering. In contrast to agglomerative hierarchical clustering, divisive hierarchical clustering starts with the whole data set as a single cluster and then divides clusters step by step into smaller clusters. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 4 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Hierarchical clustering Hierarchical clustering builds clusters step by step. Usually a bottom up strategy is applied by first considering each data object as a separate cluster and then step by step joining clusters together that are close to each other. This approach is called agglomerative hierarchical clustering. In contrast to agglomerative hierarchical clustering, divisive hierarchical clustering starts with the whole data set as a single cluster and then divides clusters step by step into smaller clusters. In order to decide which data objects should belong to the same cluster, a (dis-)similarity measure is needed. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 4 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Hierarchical clustering Hierarchical clustering builds clusters step by step. Usually a bottom up strategy is applied by first considering each data object as a separate cluster and then step by step joining clusters together that are close to each other. This approach is called agglomerative hierarchical clustering. In contrast to agglomerative hierarchical clustering, divisive hierarchical clustering starts with the whole data set as a single cluster and then divides clusters step by step into smaller clusters. In order to decide which data objects should belong to the same cluster, a (dis-)similarity measure is needed. Note: We do need to have access to features, all that is needed for hierarchical clustering is an n × n -matrix [ d i,j ] , where d i,j is the (dis-)similarity of data objects i and j . ( n is the number of data objects.) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 4 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Hierarchical clustering: Dissimilarity matrix The dissimilarity matrix [ d i,j ] should at least satisfy the following conditions. d i,j ≥ 0 , i.e. dissimilarity cannot be negative. d i,i = 0 , i.e. each data object is completely similar to itself. d i,j = d j,i , i.e. data object i is (dis-)similar to data object j to the same degree as data object j is (dis-)similar to data object i . It is often useful if the dissimilarity is a (pseudo-)metric, satisfying also the triangle inequality d i,k ≤ d i,j + d j,k . Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 5 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Agglomerative hierarchical clustering: Algorithm Input: n × n dissimilarity matrix [ d i,j ] . 1 Start with n clusters, each data objects forms a single cluster. 2 Reduce the number of clusters by joining those two clusters that are most similar (least dissimilar). 3 Repeat step 3 until there is only one cluster left containing all data objects. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 6 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Measuring dissimilarity between clusters The dissimilarity between two clusters containing only one data object each is simply the dissimilarity of the two data objects specified in the dissimilarity matrix [ d i,j ] . Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 7 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Measuring dissimilarity between clusters The dissimilarity between two clusters containing only one data object each is simply the dissimilarity of the two data objects specified in the dissimilarity matrix [ d i,j ] . But how do we compute the dissimilarity between clusters that contain more than one data object? Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 7 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Measuring dissimilarity between clusters Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 8 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Measuring dissimilarity between clusters Centroid Distance between the centroids (mean value vectors) of the two clusters 1 1 Requires that we can compute the mean vector! Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 8 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Measuring dissimilarity between clusters Centroid Distance between the centroids (mean value vectors) of the two clusters 1 Average Linkage Average dissimilarity between all pairs of points of the two clusters. 1 Requires that we can compute the mean vector! Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 8 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Measuring dissimilarity between clusters Centroid Distance between the centroids (mean value vectors) of the two clusters 1 Average Linkage Average dissimilarity between all pairs of points of the two clusters. Single Linkage Dissimilarity between the two most similar data objects of the two clusters. 1 Requires that we can compute the mean vector! Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 8 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Measuring dissimilarity between clusters Centroid Distance between the centroids (mean value vectors) of the two clusters 1 Average Linkage Average dissimilarity between all pairs of points of the two clusters. Single Linkage Dissimilarity between the two most similar data objects of the two clusters. Complete Linkage Dissimilarity between the two most dissimilar data objects of the two clusters. 1 Requires that we can compute the mean vector! Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 8 / 60 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Finding Clusters Types of Clustering Approaches: Linkage Based, - PowerPoint PPT Presentation

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-Means Density Based Clustering, e.g. DBScan Grid Based Clustering Compendium slides for Guide to Intelligent

I nternational research The evidence on clusters is clear Firms located in clusters are more

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Locational narratives in creative clusters An exploration of place, reputation and creative

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

STATUS COUNT FINDING APPROVED 5 FINDING CONDITIONAL 16 FINDING DENIED 11

Tree Pr ee Proximity ximity Finding the good and bad of trees. joe@buildfax.com Tree

Farmer Clusters Pete Thompson Game & Wildlife Conservation Trust Biodiversity Adviser

Inland Empire Clusters of Opportunity Action Plan June 16, 2011 Identifying Inland Empire

Logistics Clusters and Economic Growth Yossi Sheffi Logistics Clusters Acto de Investidura del

Regional Clusters of Opportunity Regional Clusters of Opportunity Northern Rural Training &

Requirements Career Clusters vs. Endorsements Endorsements STEM Arts & Humanities

Trends in High Performance Trends in High Performance Computing and Using Numerical Computing

MEASURING MAGNETIC FIELDS IN GALAXY CLUSTERS THROUGH RADIO OBSERVATIONS Annalisa Bonafede

Issues for progress Issues for Future Number of clusters Progress: Clusters High redshift

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

Air Force Institute of Technology The AFIT of Today is the Air Force of Tomorrow. Determining

Data Visualization Steve Marschner Cornell CS 3220 unless noted, images are from Tufte, The

CS336 Midterm Review Structure of the midterm Two long questions Several short question for

Statistical Graphics for the SAS System Computing for Research I 01/24/2012 N. Baker

Crossed Roller & Ball Bearing Linear Slides : High Precision Compact Design with Smooth

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable

Rotational Motion used for any commercial purpose without the written permission of the owners.

Finding Clusters Types of Clustering Approaches: Linkage Based, - PowerPoint PPT Presentation

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-Means Density Based Clustering, e.g. DBScan Grid Based Clustering Compendium slides for Guide to Intelligent

I nternational research The evidence on clusters is clear Firms located in clusters are more

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Locational narratives in creative clusters An exploration of place, reputation and creative

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

STATUS COUNT FINDING APPROVED 5 FINDING CONDITIONAL 16 FINDING DENIED 11

Tree Pr ee Proximity ximity Finding the good and bad of trees. joe@buildfax.com Tree

Farmer Clusters Pete Thompson Game &amp; Wildlife Conservation Trust Biodiversity Adviser

Inland Empire Clusters of Opportunity Action Plan June 16, 2011 Identifying Inland Empire

Logistics Clusters and Economic Growth Yossi Sheffi Logistics Clusters Acto de Investidura del

Regional Clusters of Opportunity Regional Clusters of Opportunity Northern Rural Training &amp;

Requirements Career Clusters vs. Endorsements Endorsements STEM Arts &amp; Humanities

Trends in High Performance Trends in High Performance Computing and Using Numerical Computing

MEASURING MAGNETIC FIELDS IN GALAXY CLUSTERS THROUGH RADIO OBSERVATIONS Annalisa Bonafede

Issues for progress Issues for Future Number of clusters Progress: Clusters High redshift

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

Air Force Institute of Technology The AFIT of Today is the Air Force of Tomorrow. Determining

Data Visualization Steve Marschner Cornell CS 3220 unless noted, images are from Tufte, The

CS336 Midterm Review Structure of the midterm Two long questions Several short question for

Statistical Graphics for the SAS System Computing for Research I 01/24/2012 N. Baker

Crossed Roller &amp; Ball Bearing Linear Slides : High Precision Compact Design with Smooth

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable

Rotational Motion used for any commercial purpose without the written permission of the owners.

Farmer Clusters Pete Thompson Game & Wildlife Conservation Trust Biodiversity Adviser

Regional Clusters of Opportunity Regional Clusters of Opportunity Northern Rural Training &

Requirements Career Clusters vs. Endorsements Endorsements STEM Arts & Humanities

Crossed Roller & Ball Bearing Linear Slides : High Precision Compact Design with Smooth