Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering
CSE Colloquium
- Dr. Michael Hahsler
Department of Computer Science and Engineering, Lyle School of Engineering, Southern Methodist University. Dallas, April 3, 2009.
Dissimilarity Plots: A Visual Exploration Tool for Partitional - - PowerPoint PPT Presentation
Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering CSE Colloquium Dr. Michael Hahsler Department of Computer Science and Engineering, Lyle School of Engineering, Southern Methodist University. Dallas, April 3, 2009.
Department of Computer Science and Engineering, Lyle School of Engineering, Southern Methodist University. Dallas, April 3, 2009.
40 60 80 100 120 50 100 150 x y
40 60 80 100 120 50 100 150 x y
50 100 150
Height 50 100 150
Height
Source: Wikipedia (http://en.wikipedia.org/wiki/K-means_algorithm)
0 4 1 8 4 0 2 2 1 2 0 3 8 2 3 0
O1 O2 O3 O4 O1 O2 O3 O4
D
−50 50 −60 −40 −20 20 40
Projection (PCA)
Component 1 Component 2 These two components explain 100 % of the point variability.
2 3 4 −0.5 0.0 0.5 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6
Projection (MDS)
Component 1 Component 2 These two components explain 40.59 % of the point variability.
2 3 4 5 6 7 8 9 10 11 12
Silhouette width si 0.0 0.2 0.4 0.6 0.8 1.0
Silhouette plot
Average silhouette width : 0.74 n = 75 4 clusters Cj j : nj | avei∈Cj si 1 : 23 | 0.75 2 : 20 | 0.73 3 : 15 | 0.80 4 : 17 | 0.67
n−1
−50 50 −60 −40 −20 20 40
Projection (PCA)
Component 1 Component 2 These two components explain 100 % of the point variability.
2 3 4 Silhouette width si 0.0 0.2 0.4 0.6 0.8 1.0
Silhouette plot
Average silhouette width : 0.74 n = 75 4 clusters Cj j : nj | avei∈Cj si 1 : 23 | 0.75 2 : 20 | 0.73 3 : 15 | 0.80 4 : 17 | 0.67
−3 −2 −1 1 2 3 −4 −2 2
Projection (PCA)
Component 1 Component 2 These two components explain 44.74 % of the point variability.
2 3 Silhouette width si 0.0 0.2 0.4 0.6 0.8 1.0
Silhouette plot
Average silhouette width : 0.13 n = 250 3 clusters Cj j : nj | avei∈Cj si 1 : 103 | 0.09 2 : 90 | 0.13 3 : 57 | 0.20
−0.5 0.0 0.5 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6
Projection (MDS)
Component 1 Component 2 These two components explain 40.59 % of the point variability.
2 3 4 5 6 7 8 9 10 11 12 Silhouette width si −0.2 0.0 0.2 0.4 0.6 0.8 1.0
Silhouette plot
Average silhouette width : 0.14 n = 435 12 clusters Cj j : nj | avei∈Cj si 1 : 35 | 0.34 2 : 48 | 0.15 3 : 45 | −0.01 4 : 36 | 0.08 5 : 38 | 0.05 6 : 32 | 0.08 7 : 43 | 0.22 8 : 37 | 0.27 9 : 18 | 0.09 10 : 52 | 0.07 11 : 20 | 0.33 12 : 31 | 0.05
P . Arabie and L. J. Hubert. An overview of combinatorial data analysis. In P . Arabie, L. J. Hubert, and G. De Soete, editors, Clustering and Classification, pages 5–63. World Scientific, River Edge, NJ, 1996.
Michael Brusco and Stephanie Stahl. Branch-and-Bound Applications in Combinatorial Data Analysis. Springer, 2005.
Bioinformatics, 21(7):1280–1281, 2005. Chun-Houh Chen. Generalized association plots: Information visualization via iteratively generated correlation matrices. Statistica Sinica, 12(1):7–29, 2002.
Journal of Classification, 1:75–92, 1984.
. Punnen, editors. The Traveling Salesman Problem and Its Variations, volume 12 of Combinatorial Optimization. Kluwer, Dordrecht, 2002. Michael Hahsler, Christian Buchta, and Kurt Hornik. seriation: Infrastructure for seriation, 2008. R package version 0.1-6.
1967. Lawrence Hubert, Phipps Arabie, and Jacqueline Meulman. Combinatorial Data Analysis: Optimization by Dynamic Programming. Society for Industrial Mathematics, 1987.
. J. Rousseeuw. Finding groups in data: An introduction to cluster analysis. John Wiley and Sons, New York, 1990. Friedrich Leisch. Visualizing cluster analysis and finite mixture models. In Chunhouh Chen, Wolfgang Härdle, and Antony Unwin, editors, Handbook of Data Visualization, Springer Handbooks of Computational Statistics. Springer Verlag, 2008. Robert L. Ling. A computer generated aid for cluster analysis. Communications of the ACM, 16(6):355–361, 1973. Greet Pison, Anja Struyf, and Peter J. Rousseeuw. Displaying a clustering with clusplot. Computational Statistics & Data Analysis, 30(4):381–392, June 1999.
P . J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1):53–65, 1987.
Peter H. A. Sneath and Robert R. Sokal. Numerical Taxonomy. Freeman and Company, San Francisco, 1973.
Computing, 15(2):208–230, 2003.