Fuzzy Clustering in Parallel Universes Bernd Wiswedel Michael R. - - PowerPoint PPT Presentation

fuzzy clustering in parallel universes
SMART_READER_LITE
LIVE PREVIEW

Fuzzy Clustering in Parallel Universes Bernd Wiswedel Michael R. - - PowerPoint PPT Presentation

Motivation Clustering in Parallel Experiments Remarks Fuzzy Clustering in Parallel Universes Bernd Wiswedel Michael R. Berthold ALTANA Chair for Applied Computer Science Bioinformatics and Information Mining University of Konstanz, Germany


slide-1
SLIDE 1

Motivation Clustering in Parallel Experiments Remarks

Fuzzy Clustering in Parallel Universes

Bernd Wiswedel Michael R. Berthold

ALTANA Chair for Applied Computer Science Bioinformatics and Information Mining University of Konstanz, Germany

What is to be presented at Nafips’05

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-2
SLIDE 2

Motivation Clustering in Parallel Experiments Remarks

Outline

Motivation Parallel Universes Fuzzy Clustering in Parallel Universes Recall Fuzzy c-Means (One Universe) A new objective function An Algorithm for Parallel Universes Experiments Remarks

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-3
SLIDE 3

Motivation Clustering in Parallel Experiments Remarks Parallel Universes

What are these Parallel Universes?

◮ So far: Data given in a single descriptor space

◮ Mostly high-dimensional and numeric ◮ Definition of exactly one similarity measure

◮ Now: Multiple representations for the data: Parallel Universes

◮ Universes encode different properties of the data ◮ Different similarity measures ◮ Standard learning techniques not applicable Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-4
SLIDE 4

Motivation Clustering in Parallel Experiments Remarks Parallel Universes

Parallel Universes: Bioinformatics Example

◮ Molecular Data Analysis: Descriptors based on

◮ Fingerprints ◮ Numerical Features derived from 3D shapes, surface charge

distribution, etc.

◮ Comparisons of chemical graphs

◮ Different descriptors can encode different structural

information

◮ None of them show satisfactory prediction results

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-5
SLIDE 5

Motivation Clustering in Parallel Experiments Remarks Parallel Universes

Parallel Universes: Other Examples

◮ Web Mining: Documents described by

◮ Bag of Words, TFDIF ◮ Anchor text of hyperlinks that point to the document

◮ Image Clustering:

◮ Textual description ◮ Color histograms ◮ Textures Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-6
SLIDE 6

Motivation Clustering in Parallel Experiments Remarks Parallel Universes

Parallel Universes: Artificial Example

Universe 1 Universe 2 Universe 3

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 3

slide-7
SLIDE 7

Motivation Clustering in Parallel Experiments Remarks Parallel Universes

Parallel Universes: Artificial Example

Universe 1 Universe 2 Universe 3

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 3 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 3

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-8
SLIDE 8

Motivation Clustering in Parallel Experiments Remarks Parallel Universes

Parallel Universes: Challenges

◮ Naive Approach (Michael’s famous Holzhammer):

◮ Consider only one universe at a time: Ignores information in

  • ther universes

◮ Construct joint-feature space: Difficult and introduces

artefacts.

◮ Need for tools that:

◮ Incorporate all universes at once ◮ Allow to identify (local) cluster that occur only in very few

(one) universes

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-9
SLIDE 9

Motivation Clustering in Parallel Experiments Remarks c-Means Objective Function Algorithm

Recall Fuzzy c-Means

◮ One universe with a pre-defined distance measure ◮ Computes iteratively partitioning values vi,k and cluster

representatives wk

◮ Minimizes the objective function |T|

  • i=1

c

  • k=1

vm

i,k d (

wk, xi)2 , subject to: ∀ i :

c

  • k=1

vi,k = 1

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-10
SLIDE 10

Motivation Clustering in Parallel Experiments Remarks c-Means Objective Function Algorithm

A new Objective Function for Parallel Universes

◮ Multiple Universes (denoted with u) ◮ Memberships of patterns to universes, zi,u ◮ Minimizes the objective function |T|

  • i=1

|U|

  • u=1

zn

i,u cu

  • k=1

vm

i,k,udu (

wk,u, xi,u)2 , subject to: ∀ i :

|U|

  • u=1

zi,u = 1 ∀ i, u :

cu

  • k=1

vi,k,u = 1 .

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-11
SLIDE 11

Motivation Clustering in Parallel Experiments Remarks c-Means Objective Function Algorithm

An Algorithm for Parallel Universes

(1) Select: Distance metrics du (·, ·)2; number of clusters cu (3) Initiate: Partition matrices vi,k,u and cluster proto- types ( wi,u) randomly; equal weight for all member- ships zi,u =

1 |U|.

(2) Train: (3) Repeat (4) Update partitioning values (vi,k,u) (5) Update memberships of patterns to universes (zi,u) (6) Compute prototypes ( wi,u) (7) until a termination criterion has been satisfied

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-12
SLIDE 12

Motivation Clustering in Parallel Experiments Remarks

Experiments

◮ Artificial Data with up to 5 universes ◮ Patterns were randomly assigned a universe in which they

cluster

◮ In all other universes they represent noise ◮ Compared to joint feature space with “correct” number of

cluster

◮ Entropy based error measure, 1 for good clustering, 0 for bad

  • ne.

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-13
SLIDE 13

Motivation Clustering in Parallel Experiments Remarks

Experiments

Universe 1 Universe 2 Universe 3

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 3 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Universe 3

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-14
SLIDE 14

Motivation Clustering in Parallel Experiments Remarks

Experiments

◮ Artificial Data with up to 5 universes ◮ Patterns were randomly assigned a universe in which they

cluster

◮ In all other universes they represent noise ◮ Compared to joint feature space with “correct” number of

cluster

◮ Entropy based error measure, 1 for good clustering, 0 for bad

  • ne.

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-15
SLIDE 15

Motivation Clustering in Parallel Experiments Remarks

Experiments

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-16
SLIDE 16

Motivation Clustering in Parallel Experiments Remarks

Conclusions

◮ Data given in multiple descriptor spaces, i. e. Parallel Universes ◮ Extended fuzzy c-Means that uses membership values for

patterns to universes

◮ “Good” results on artifical data. ◮ Open problems: Number of clusters, Noise, Overlapping

Clusters

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes

slide-17
SLIDE 17

Motivation Clustering in Parallel Experiments Remarks

What did ICML reviewers say?

◮ “Why should the contribution of an example to one view be

inverse proportional to the contribution in other views?”

◮ “Relationship to subspace clustering algorithms?” ◮ “You do not justify why you prefer a fuzzy approach compared

to a clean probabilistic one?”

◮ “I find it strange to assign objects to views.”

Bernd Wiswedel, Michael R. Berthold Fuzzy Clustering in Parallel Universes