fuzzy clustering in parallel universes
play

Fuzzy Clustering in Parallel Universes Bernd Wiswedel Michael R. - PowerPoint PPT Presentation

Motivation Clustering in Parallel Experiments Remarks Fuzzy Clustering in Parallel Universes Bernd Wiswedel Michael R. Berthold ALTANA Chair for Applied Computer Science Bioinformatics and Information Mining University of Konstanz, Germany


  1. Motivation Clustering in Parallel Experiments Remarks Fuzzy Clustering in Parallel Universes Bernd Wiswedel Michael R. Berthold ALTANA Chair for Applied Computer Science Bioinformatics and Information Mining University of Konstanz, Germany What is to be presented at Nafips’05 Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  2. Motivation Clustering in Parallel Experiments Remarks Outline Motivation Parallel Universes Fuzzy Clustering in Parallel Universes Recall Fuzzy c -Means (One Universe) A new objective function An Algorithm for Parallel Universes Experiments Remarks Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  3. Motivation Clustering in Parallel Experiments Remarks Parallel Universes What are these Parallel Universes? ◮ So far : Data given in a single descriptor space ◮ Mostly high-dimensional and numeric ◮ Definition of exactly one similarity measure ◮ Now : Multiple representations for the data: Parallel Universes ◮ Universes encode different properties of the data ◮ Different similarity measures ◮ Standard learning techniques not applicable Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  4. Motivation Clustering in Parallel Experiments Remarks Parallel Universes Parallel Universes: Bioinformatics Example ◮ Molecular Data Analysis: Descriptors based on ◮ Fingerprints ◮ Numerical Features derived from 3D shapes, surface charge distribution, etc. ◮ Comparisons of chemical graphs ◮ Different descriptors can encode different structural information ◮ None of them show satisfactory prediction results Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  5. Motivation Clustering in Parallel Experiments Remarks Parallel Universes Parallel Universes: Other Examples ◮ Web Mining: Documents described by ◮ Bag of Words , TFDIF ◮ Anchor text of hyperlinks that point to the document ◮ Image Clustering: ◮ Textual description ◮ Color histograms ◮ Textures Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  6. Motivation Clustering in Parallel Experiments Remarks Parallel Universes Parallel Universes: Artificial Example Universe 1 Universe 2 Universe 3 Universe 1 Universe 2 Universe 3 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

  7. Motivation Clustering in Parallel Experiments Remarks Parallel Universes Parallel Universes: Artificial Example Universe 1 Universe 2 Universe 3 Universe 1 Universe 2 Universe 3 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Universe 1 Universe 2 Universe 3 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  8. Motivation Clustering in Parallel Experiments Remarks Parallel Universes Parallel Universes: Challenges ◮ Naive Approach (Michael’s famous Holzhammer): ◮ Consider only one universe at a time: Ignores information in other universes ◮ Construct joint-feature space: Difficult and introduces artefacts. ◮ Need for tools that: ◮ Incorporate all universes at once ◮ Allow to identify (local) cluster that occur only in very few (one) universes Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  9. Motivation Clustering in Parallel Experiments Remarks c -Means Objective Function Algorithm Recall Fuzzy c -Means ◮ One universe with a pre-defined distance measure ◮ Computes iteratively partitioning values v i , k and cluster representatives � w k ◮ Minimizes the objective function | T | c x i ) 2 , � � v m i , k d ( � w k ,� i =1 k =1 subject to: c � ∀ i : v i , k = 1 k =1 Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  10. Motivation Clustering in Parallel Experiments Remarks c -Means Objective Function Algorithm A new Objective Function for Parallel Universes ◮ Multiple Universes (denoted with u ) ◮ Memberships of patterns to universes, z i , u ◮ Minimizes the objective function | T | | U | c u x i , u ) 2 , � � z n � v m i , k , u d u ( � w k , u ,� i , u i =1 u =1 k =1 subject to: | U | � ∀ i : z i , u = 1 u =1 c u � ∀ i , u : v i , k , u = 1 . k =1 Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  11. Motivation Clustering in Parallel Experiments Remarks c -Means Objective Function Algorithm An Algorithm for Parallel Universes Select: Distance metrics d u ( · , · ) 2 ; number of clusters (1) c u (3) Initiate: Partition matrices v i , k , u and cluster proto- types ( � w i , u ) randomly; equal weight for all member- 1 ships z i , u = | U | . (2) Train: (3) Repeat (4) Update partitioning values ( v i , k , u ) (5) Update memberships of patterns to universes ( z i , u ) (6) Compute prototypes ( � w i , u ) (7) until a termination criterion has been satisfied Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  12. Motivation Clustering in Parallel Experiments Remarks Experiments ◮ Artificial Data with up to 5 universes ◮ Patterns were randomly assigned a universe in which they cluster ◮ In all other universes they represent noise ◮ Compared to joint feature space with “correct” number of cluster ◮ Entropy based error measure, 1 for good clustering, 0 for bad one. Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  13. Motivation Clustering in Parallel Experiments Remarks Experiments Universe 1 Universe 2 Universe 3 Universe 1 Universe 2 Universe 3 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Universe 1 Universe 2 Universe 3 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  14. Motivation Clustering in Parallel Experiments Remarks Experiments ◮ Artificial Data with up to 5 universes ◮ Patterns were randomly assigned a universe in which they cluster ◮ In all other universes they represent noise ◮ Compared to joint feature space with “correct” number of cluster ◮ Entropy based error measure, 1 for good clustering, 0 for bad one. Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  15. Motivation Clustering in Parallel Experiments Remarks Experiments Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  16. Motivation Clustering in Parallel Experiments Remarks Conclusions ◮ Data given in multiple descriptor spaces, i. e. Parallel Universes ◮ Extended fuzzy c -Means that uses membership values for patterns to universes ◮ “Good” results on artifical data. ◮ Open problems: Number of clusters, Noise, Overlapping Clusters Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

  17. Motivation Clustering in Parallel Experiments Remarks What did ICML reviewers say? ◮ “Why should the contribution of an example to one view be inverse proportional to the contribution in other views?” ◮ “Relationship to subspace clustering algorithms?” ◮ “You do not justify why you prefer a fuzzy approach compared to a clean probabilistic one?” ◮ “I find it strange to assign objects to views.” Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend