g ravitational c lustering of the
play

G RAVITATIONAL C LUSTERING OF THE S ELF -O RGANIZING M AP Nejc Ilc - PowerPoint PPT Presentation

ICANNGA 2011, Ljubljana G RAVITATIONAL C LUSTERING OF THE S ELF -O RGANIZING M AP Nejc Ilc Andrej Dobnikar University of Ljubljana Faculty of Computer and Information Science I NTRODUCTION Tools needed to deal with data/web mining


  1. ICANNGA 2011, Ljubljana G RAVITATIONAL C LUSTERING OF THE S ELF -O RGANIZING M AP Nejc Ilc Andrej Dobnikar University of Ljubljana Faculty of Computer and Information Science

  2. I NTRODUCTION • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation ICANNGA, April 2011 2

  3. I NTRODUCTION Visualization of the Internet • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation Credits: Opte Project ICANNGA, April 2011 3

  4. I NTRODUCTION • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation ICANNGA, April 2011 4

  5. I NTRODUCTION Connections between neurons in human brain • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation Credits: Van J. Wedeen, M.D., MGH/Harvard U. ICANNGA, April 2011 5

  6. I NTRODUCTION • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation ICANNGA, April 2011 6

  7. I NTRODUCTION Heat map of gene expression profile • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation Credits: Manfred Gessler ICANNGA, April 2011 7

  8. I NTRODUCTION • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation ICANNGA, April 2011 8

  9. I NTRODUCTION Image segmentation • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation Credits: T . Riklin-Raviv, N. Sochen and N. Kiryati ICANNGA, April 2011 9

  10. I NTRODUCTION • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation ICANNGA, April 2011 10

  11. C LUSTERING • unsupervised process of organizing data into "natural" groups • approaches  information theory  graphs  fuzzy logic  …  artificial neural networks ICANNGA, April 2011 11

  12. C LUSTERING WITH SOM • Self-Organizing Map [Kohonen, 1982] • Advantages  visualization of high-dimensional data  preserves topology and density of input data • Problem  SOM is not "true" clustering method  more neurons than expected number of clusters  How to group neurons into clusters? ICANNGA, April 2011 12

  13. C LUSTERING OF SOM • K-means, hierarchical [Vesanto & Alhoniemi, 2000] • Emergence SOM [Ultsch, 2007]  watershed algorithm  neurons > 1000 • Surface flooding [Brugger et al., 2008]  automatically finds number of clusters ICANNGA, April 2011 13

  14. GSOM – THE IDEA ICANNGA, April 2011 14

  15. GSOM – L EVEL O NE • train SOM on input data • identify winning neurons • remove interpolating neurons 𝑛 𝑗 = [𝑛 𝑗1 , 𝑛 𝑗2 , … , 𝑛 𝑗𝐸 ] ICANNGA, April 2011 15

  16. GSOM – L EVEL T WO • Gravitational clustering [Wright, 1977; Gomez et al., 2003] • BMU  mass point (m=1) • "Move & merge" steps ICANNGA, April 2011 16

  17. E XPERIMENT • GSOM compared to  EM GMM [Dempster et al., 1977]  CS [Jenssen et al., 2003]  SOMkM [Vesanto & Alhoniemi, 2000] • datasets  6 artificial (2D with complex shapes)  3 real from UCI (Iris, Wine, LetterABC) • 100 runs of algorithm, we measure:  Clustering Error (CE): minimal, average  elapsed time ICANNGA, April 2011 17

  18. R ESULTS – G IANT EM GMM CS CE = 0.0 CE = 0.219 SOMkM GSOM CE = 0.352 CE = 0.0 ICANNGA, April 2011 18

  19. R ESULTS – W AVE EM GMM CS CE = 0.280 CE = 0.130 SOMkM GSOM CE = 0.126 CE = 0.0 ICANNGA, April 2011 19

  20. R ESULTS – RANKS Mean Rank • minimal CE • average CE ICANNGA, April 2011 20

  21. R ESULTS – ELAPSED TIME • Hepta N=212 • LettersABC N=1719 ICANNGA, April 2011 21

  22. R ESULTS – NUMBER OF CLUSTERS • number of detected clusters true dataset GSOM number Giant 2 2 Hepta 7 7 Ring 2 4 Wave 2 2 Moon 4 4 Flag 3 3 Iris 3 3 Wine 3 3 LettersABC 3 7 ICANNGA, April 2011 22

  23. R ESULTS – NUMBER OF CLUSTERS • number of detected clusters true dataset GSOM number Giant 2 2 Hepta 7 7 Ring 2 4 Wave 2 2 Moon 4 4 Flag 3 3 Iris 3 3 Wine 3 3 LettersABC 3 7 ICANNGA, April 2011 23

  24. R ESULTS – NUMBER OF CLUSTERS • number of detected clusters true dataset GSOM number Giant 2 2 Hepta 7 7 Ring 2 4 Wave 2 2 Moon 4 4 Flag 3 3 Iris 3 3 Wine 3 3 LettersABC 3 7 ICANNGA, April 2011 24

  25. GSOM - SUMMARY + finds clusters of complex shapes, linearly non-separable + insensitive to unbalanced density of clusters + number of clusters automatically detected + usage of topology relations – neighbourhood + less computational intensive + intuitive - 8 parameters to adjust - sometimes unstable behaviour ICANNGA, April 2011 25

  26. F UTURE WORK • implementing heuristics for setting parameters automatically • study of clustering ensembles based on GSOM  could non-deterministic nature of GSOM be an advantage? • application of GSOM on clustering of gene expression data ICANNGA, April 2011 26

  27. D ATASETS PROPERTIES number number of number of dataset of points dimensions clusters Giant 862 2 2 Hepta 212 2 7 Ring 800 2 2 Wave 293 2 2 Moon 514 2 4 Flag 640 2 3 Iris 150 4 3 Wine 178 13 3 LettersABC 1719 16 3 ICANNGA, April 2011 27

  28. GSOM PARAMETERS SETTING dataset SOM size SOM grid 𝐇 𝚬𝐇 α p Giant 13 x 11 rect. 0.0008 0.045 0.01 0.1 Hepta 9 x 8 rect. 0.0008 0.060 0.01 0.1 Ring 11 x 10 rect. 0.0008 0.045 0.01 0.1 Wave 14 x 12 rect. 0.0008 0.045 0.01 0.1 Moon 20 x 10 rect. 0.0008 0.045 0.01 0.0 Flag 14 x 9 rect. 0.0008 0.045 0.01 0.1 Iris 12 x 5 rect. 0.0008 0.045 0.01 0.1 Wine 7 x 5 rect. 0.0008 0.030 0.01 0.1 LettersABC 12 x 9 rect. 0.0010 0.030 0.01 0.1 ICANNGA, April 2011 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend