Dissimilarity Plots: A Visual Exploration Tool for Partitional - PowerPoint PPT Presentation

Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering CSE Colloquium Dr. Michael Hahsler Department of Computer Science and Engineering, Lyle School of Engineering, Southern Methodist University. Dallas, April 3, 2009.

Motivation Clustering: assignment of objects to groups (clusters) so that objects from the same cluster are more similar to each other than objects from different clusters. 150 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 100 ● ● ● ● y y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 x x Assess the quality of a cluster solution: ❼ Typically judged by intra and inter-cluster similarities ❼ Visualization for judging the quality of a clustering and to explore the cluster structure 2

Motivation (cont’d) Dendrograms (Hartigan, 1967) for hierarchical clustering: Cluster Dendrogram Cluster Dendrogram 150 150 100 100 Height Height 50 50 0 0 → Unfortunately dendrograms are only possible for hierarchical/nested clusterings. 3

Outline 1. Clustering Basics 2. Existing Visualization Techniques 3. Matrix Shading 4. Seriation 5. Creating Dissimilarity Plots 6. Examples 4

Clustering Basics ❼ Partition: Each point is assigned to a (single) group. Γ : R m → { 1 , 2 , . . . , k } ❼ Typical partitional clustering algorithm: k -means Source: Wikipedia ( http://en.wikipedia.org/wiki/K-means_algorithm ) ❼ Dissimilarity (distance) matrix: d : O × O → R D O 1 O 2 O 3 O 4 O 1 0 4 1 8 4 0 2 2 O 2 1 2 0 3 O 3 8 2 3 0 O 4 5

Visualization Techniques for Partitions Project objects into 2-dimensional space (dimensionality reduction techniques, e.g., PCA, MDS; Pison et al. , 1999). Projection (PCA) Projection (MDS) 0.6 2 9 12 10 40 0.4 8 ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 ● ● ● ● 7 ● ● ● ● ● ● ● ● 20 0.2 ● ● ● 3 1 ● ● ● ● ● ● ● ● ● ● 11 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Component 2 Component 2 ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● 0.0 ● ● ● ● ● 0 ● ● ● ● 3 ● ● ● ● 4 ● ● ● ● −0.2 ● −20 −0.4 −40 −0.6 −60 −50 0 50 −0.5 0.0 0.5 Component 1 Component 1 These two components explain 100 % of the point variability. These two components explain 40.59 % of the point variability. → Problems with dimensionality (figure to the right: 16 dimensional data) 6

Visualization Techniques for Partitions (cont’d) ❼ Visualize metrics calculated from inter and intra-cluster similarities to judge cluster quality. For example, silhouette width (Rousseeuw, 1987; Kaufman and Rousseeuw, 1990). Silhouette plot 4 clusters C j n = 75 j : n j | ave i ∈ Cj s i 1 : 23 | 0.75 2 : 20 | 0.73 3 : 15 | 0.80 4 : 17 | 0.67 0.0 0.2 0.4 0.6 0.8 1.0 Silhouette width s i Average silhouette width : 0.74 → Only a diagnostic tool for cluster quality ❼ Several other visualization methods (e.g., based on self-organizing maps and neighborhood graphs) are reviewed in Leisch (2008). → Typically hide structure within clusters or are limited by the number of clusters and dimensionality of data. 7

Matrix Shading Each cell of the matrix (typically a dissimilarity matrix) is represented by a gray value (see, e.g., Sneath and Sokal, 1973; Ling, 1973; Gale et al. , 1984). Initially matrix shading was used with hierarchical clustering → heatmaps. For graph-based partitional clustering: CLUSION (Strehl and Ghosh, 2003). Uses coarse seriation such that “good” clusters from blocks around the main diagonal. CLUSION allows to judge cluster quality but does not reveal the structure of the data → Dissimilarity plots: improve matrix shading/CLUSION with (near) optimal placement of clusters and objects using seriation 8

Seriation Part of combinatorial data analysis (Arabie and Hubert, 1996) ❼ Aim: arrange objects in a linear order given available data and some loss function in order to reveal structural information. ❼ Problem: Requires to solve a discrete optimization problem → solution space grows by the order of O ( n !) Techniques: 1. Partial enumeration methods (currently solve problems with n ≤ 40 ) ❼ dynamic programming (Hubert et al. , 1987) ❼ branch-and-bound (Brusco and Stahl, 2005) 2. Heuristics for larger problems 9

Seriation (cont’d) Set of n objects O = { O 1 , O 2 , . . . , O n } (1) Symmetric dissimilarity matrix D = ( d ij ) (2) where d ij for 1 ≤ i, j ≤ n represents the dissimilarity between O i and O j , and d ii = 0 for all i . Permutation function Ψ reorders the objects in D by simultaneously permuting rows and columns Define a loss function L to evaluate a given permutation Seriation is the optimization problem: Ψ ∗ = argmin L (Ψ( D )) (3) Ψ 10

Column/row gradient measures Perfect anti-Robinson matrix (Robinson, 1951): A symmetric matrix where the values in all rows and columns only increase when moving away from the main diagonal Gradient conditions (Hubert et al. , 1987): d ik ≤ d ij 1 ≤ i < k < j ≤ n ; within rows: for (4) d kj ≤ d ij 1 ≤ i < k < j ≤ n. within columns: for (5) D Ψ(D) O 1 O 2 O 3 O 4 O 1 O 3 O 2 O 4 O 1 O 1 0 4 1 8 0 1 4 8 4 0 2 2 1 0 2 3 O 2 O 3 1 2 0 3 4 2 0 2 O 3 O 2 8 2 3 0 8 3 2 0 O 4 O 4 In an anti-Robinson matrix the smallest dissimilarity values appear close to the main diagonal, therefore, the closer objects are together in the order of the matrix, the higher their similarity. Note: Most matrices can only be brought into a near anti-Robinson form. 11

Column/row gradient measures (cont’d) Loss measure (quantifies the divergence from anti-Robinson form): � � L ( D ) = f ( d ik , d ij ) + f ( d kj , d ij ) (6) i<k<j i<k<j where f ( · , · ) is a function which defines how a violation or satisfaction of a gradient condition for an object triple ( O i , O k and O j ) is counted. Raw number of violations minus satisfactions:  − 1 z > y ; if   f ( z, y ) = sign( y − z ) = 0 z = y ; if (7)  +1 z < y. if  Weight each satisfaction or violation by its magnitude (absolute difference between the values): f ( z, y ) = | y − z | sign( y − z ) = y − z (8) 12

Anti-Robinson events An even simpler loss function can be created in the same way as the gradient measures above by concentrating on violations only. � � L ( D ) = f ( d ik , d ij ) + f ( d kj , d ij ) (9) i<k<j i<k<j To only count the violations we use � 1 z < y if and f ( z, y ) = I ( z, y ) = (10) 0 otherwise. I ( · ) is an indicator function returning 1 only for violations. Chen (2002) also introduced a weighted versions of this loss function by using the absolute deviations as weights: f ( z, y ) = | y − z | I ( z, y ) (11) 13

Hamiltonian path length The dissimilarity matrix D can be represented as a finite weighted graph G = (Ω , E ) where the set of objects constitute the vertices Ω = { O 1 , O 2 , . . . , O n } and each edge e ij ∈ E between the objects O i , O j ∈ Ω has a weight w ij associated which represents the dissimilarity d ij . An order Ψ of the objects can be seen as a path through the graph where each node is visited exactly once, i.e., a Hamiltonian path. Minimizing the Hamiltonian path length results in a seriation optimal with respect to dissimilarities between neighboring objects (see, e.g., Hubert, 1974; Caraux and Pinloche, 2005). D O 1 O 2 O 3 O 4 O 2 The loss function based on the Hamiltonian O 1 0 4 1 8 path length is: 4 0 2 2 O 2 O 1 O 3 n − 1 � 1 2 0 3 O 3 L ( D ) = d i,i +1 (12) 8 2 3 0 O 4 i =1 O 4 This optimization problem is related to the traveling salesperson problem (Gutin and Punnen, 2002) for which good solvers and efficient heuristics exist. 14

Creating dissimilarity plots We use matrix shading with two improvements: 1. Rearrange clusters: more similar clusters are placed closer together (macro-structure). 2. Rearrange objects: show micro-structure Γ Ψ 1 Ψ c Ψ 2 Ψ 3 Ψ 4 D D i Ψ i ( D i ) Ψ c ( D c ) The assignment function Γ assigns a cluster membership to each object (provided by a partitional clustering algorithm) 15

Examples We use the column/row gradient measure as the loss function for seriation. ❼ Placement (seriation) of clusters is done using branch-and-bound to find the optimal solution ❼ Placement (seriation) of objects within its cluster uses a simulated annealing heuristic Seriation algorithms are provides by Brusco and Stahl (2005) and are available in the R extension package seriation (Hahsler et al. , 2008). 16

Dissimilarity Plots: A Visual Exploration Tool for Partitional - PowerPoint PPT Presentation

Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering CSE Colloquium Dr. Michael Hahsler Department of Computer Science and Engineering, Lyle School of Engineering, Southern Methodist University. Dallas, April 3, 2009.

Some Clustering Methods on Some Clustering Methods on Some Clustering Methods on Dissimilarity

??? Encode dissimilarity between locations as edge weights distance

SynAthina Onli line Tools 1. . A mapping tool 2. A Community Tool 3. An Archive Tool 3. An

Analysis of variance and regression November 13, 2007 SAS graphics Scatter plots

How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need? Fabrice Rossi SAMM,

Reranking with Contextual dissimilarity measures from representational Bregman k -means VISAPP

Quantale-valued dissimilarity Lili Shen (joint with Hongliang Lai, Yuanye Tao and Dexue Zhang)

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Visual Encoding of Dissimilarity Data via Topology-Preserving Map

Interpretation of forest plots Part I 1 At the end of this lecture, you should be able to

Quantile plots: New planks in an old campaign Nicholas J. Cox Department of Geography 1

Plotting Dr. Mihail September 25, 2018 (Dr. Mihail) Plots September 25, 2018 1 / 24 Plots

TDR Plots Elizabeth Worcester LBL PWG Meeting May 13, 2019 1 Intro Brief update with new

CS320: Performance Evaluation Plotting data sets Semi log plots Log log plots Analyzing Program

TDR plots updates p L. Escudero for the Pandora Team DUNE FD Sim/Reco meeting 15th of

Intro to R - 4. Base R Plots OIT/SMU Libraries Data Science Workshop Series Michael Hahsler OIT,

Spatio-temporal Aggregation for Visual Analysis of Movements Gennady Andrienko & Natalia

Visual Tools & Methods for Data Cleaning DaQuaTa International Workshop 2017, Lyon, FR

Jenna M. Smith, Coordinator of Assessment & Marketing Big Data movement Pressure

CVD Congressional Visit Day (Spring Semester) What? Travel to D.C Meet with Senators

False Influencing Alexandra J. Roberts IPSC at DePaul College of Law August 8, 2019 Influencers

Sewol Ferry Disaster Apr 16, 2014 , 294 out of 476 people died 325 of the passengers were secondary

Computing Visual Similarity with Social Context Shuqiang Jiang

Bringing Clang and LLVM to Visual C++ users Reid Kleckner Google C++ devs demand a good

Dissimilarity Plots: A Visual Exploration Tool for Partitional - PowerPoint PPT Presentation

Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering CSE Colloquium Dr. Michael Hahsler Department of Computer Science and Engineering, Lyle School of Engineering, Southern Methodist University. Dallas, April 3, 2009.

Some Clustering Methods on Some Clustering Methods on Some Clustering Methods on Dissimilarity

??? Encode dissimilarity between locations as edge weights distance

SynAthina Onli line Tools 1. . A mapping tool 2. A Community Tool 3. An Archive Tool 3. An

Analysis of variance and regression November 13, 2007 SAS graphics Scatter plots

How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need? Fabrice Rossi SAMM,

Reranking with Contextual dissimilarity measures from representational Bregman k -means VISAPP

Quantale-valued dissimilarity Lili Shen (joint with Hongliang Lai, Yuanye Tao and Dexue Zhang)

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Visual Encoding of Dissimilarity Data via Topology-Preserving Map

Interpretation of forest plots Part I 1 At the end of this lecture, you should be able to

Quantile plots: New planks in an old campaign Nicholas J. Cox Department of Geography 1

Plotting Dr. Mihail September 25, 2018 (Dr. Mihail) Plots September 25, 2018 1 / 24 Plots

TDR Plots Elizabeth Worcester LBL PWG Meeting May 13, 2019 1 Intro Brief update with new

CS320: Performance Evaluation Plotting data sets Semi log plots Log log plots Analyzing Program

TDR plots updates p L. Escudero for the Pandora Team DUNE FD Sim/Reco meeting 15th of

Intro to R - 4. Base R Plots OIT/SMU Libraries Data Science Workshop Series Michael Hahsler OIT,

Spatio-temporal Aggregation for Visual Analysis of Movements Gennady Andrienko &amp; Natalia

Visual Tools &amp; Methods for Data Cleaning DaQuaTa International Workshop 2017, Lyon, FR

Jenna M. Smith, Coordinator of Assessment &amp; Marketing Big Data movement Pressure

CVD Congressional Visit Day (Spring Semester) What? Travel to D.C Meet with Senators

False Influencing Alexandra J. Roberts IPSC at DePaul College of Law August 8, 2019 Influencers

Sewol Ferry Disaster Apr 16, 2014 , 294 out of 476 people died 325 of the passengers were secondary

Computing Visual Similarity with Social Context Shuqiang Jiang

Bringing Clang and LLVM to Visual C++ users Reid Kleckner Google C++ devs demand a good

Spatio-temporal Aggregation for Visual Analysis of Movements Gennady Andrienko & Natalia

Visual Tools & Methods for Data Cleaning DaQuaTa International Workshop 2017, Lyon, FR

Jenna M. Smith, Coordinator of Assessment & Marketing Big Data movement Pressure