Subspace Clustering Ensemble Clustering Subspace Clustering, - - PowerPoint PPT Presentation

subspace clustering ensemble clustering subspace
SMART_READER_LITE
LIVE PREVIEW

Subspace Clustering Ensemble Clustering Subspace Clustering, - - PowerPoint PPT Presentation

LUDWIG- MAXIMILIANS- INSTITUTE DATABASE UNIVERSITT FOR SYSTEMS MNCHEN MNCHEN INFORMATICS INFORMATICS GROUP GROUP Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview


slide-1
SLIDE 1

LUDWIG- MAXIMILIANS- UNIVERSITÄT MÜNCHEN DATABASE SYSTEMS GROUP INSTITUTE FOR INFORMATICS MÜNCHEN GROUP INFORMATICS

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other?

MultiClust@KDD 2010

Hans-Peter Kriegel, Arthur Zimek

Ludwig-Maximilians-Universität München Munich, Germany http://www.dbs.ifi.lmu.de {kriegel, zimek}@dbs.ifi.lmu.de

slide-2
SLIDE 2

DATABASE SYSTEMS GROUP

Outline

GROUP

  • 1. Subspace Clustering
  • 2. Ensemble Clustering
  • 3. Alternative Clustering
  • 4. Multiview Clustering
  • 5. Discussion

Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)

2

slide-3
SLIDE 3

DATABASE SYSTEMS GROUP

Subspace Clustering

GROUP

  • Task: identify clusters of similar objects

i il it d fi d t t i b f th d t

  • similarity defined w.r.t. a certain subspace of the data space
  • different subspaces for different clusters

Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)

3

slide-4
SLIDE 4

DATABASE SYSTEMS GROUP

Subspace Clustering

GROUP

  • Subspaces: different

– selection – weighting combination – combination

  • f attributes
  • learn subspace and clustering
  • learn subspace and clustering

simultaneously (interdepency)

  • strategies:

strategies:

– top-down (learn spatial characteristics

  • f initially built sets of objects)

evant attribute

– bottom-up (learn 1-d clusters, combine them to 2-d clusters, etc. (APRIORI)) => many irrelevant clusters

irrele

Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)

a y e e a c us e s

4

relevant attribute/ relevant subspace

slide-5
SLIDE 5

DATABASE SYSTEMS GROUP

Ensemble Clustering

GROUP

  • basic idea: combine different clusterings to obtain one

i l li bl l t i single, more reliable clustering

  • tasks:

h t t di l t i – how to create diverse clusterings – how to combine different clusterings

  • induce diversity of clusterings
  • induce diversity of clusterings

– use different feature-subsets – use different database subsets – use different clustering algorithms

  • correspondence between clusterings

– useful for judging on redundancy of clusters? – a lot of different answers – but: could it not be that different clusterings are just different yet both meaningful?

Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)

clusterings are just different, yet both meaningful?

5

slide-6
SLIDE 6

DATABASE SYSTEMS GROUP

Alternative Clustering

GROUP

  • given a clustering, use diversity or non-redundancy as a

t i t t fi d diff t l t i constraint to find a different clustering

  • techniques:

bl t h i – ensemble techniques – use different subspaces

  • relationship to subspace clustering:
  • relationship to subspace clustering:

– subspace clustering can learn from the treatment of non-redundancy – alternative clustering can learn to allow for a certain level of g redundancy

Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)

6

slide-7
SLIDE 7

DATABASE SYSTEMS GROUP

Multiview Clustering

GROUP

  • seek different clusterings in different subspaces
  • special case of alternative clustering?

– constraint: orthogonality of subspaces

  • special case of subspace clustering?

– allowing maximal overlap of clusters seeking minimally redundant clusters by accommodating different – seeking minimally redundant clusters by accommodating different concepts

  • emphasizes the observation known from subspace

p p clustering: highly overlapping clusters in different subspaces need not g y pp g p be redundant nor meaningless

Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)

7

slide-8
SLIDE 8

DATABASE SYSTEMS GROUP

Discussion

GROUP

subspace clustering

l diff t l t i diff t

ensemble clustering

l diff t b h ll

  • goal: different clusters in different

subspaces

  • problem: redundancy of clusters

(same clusters reported for

  • goal: different subspaces shall

induce the same clusters

  • problem: correspondence of

clusterings? What about actually (same clusters reported for different subspaces) clusterings? What about actually different clusterings?

? multiview clustering

  • goal: find different cluster

alternative clustering

  • goal: given a clustering find a
  • goal: find different cluster

concepts in different subspaces

  • problem: balance between

admissible overlap of clusters and

  • goal: given a clustering, find a

different clustering

  • problem: which level of

redundancy is admissible?

Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)

8

admissible overlap of clusters and difference between concepts redundancy is admissible?

slide-9
SLIDE 9

DATABASE SYSTEMS GROUP

Discussion

GROUP

  • how should we treat diversity of clustering solutions?

– should diverse clusterings always be unified (ensemble)? – should diverse clusterings always be unified (ensemble)? – under which conditions is a unification of diverse clusterings meaningful?

  • can we learn from diversity itself?

– again ensemble: exceptional clustering in one subspace will be t b d d l t ld it t b i ll i t ti ?

  • utnumbered and lost – could it not be especially interesting?
  • how to treat redundancy (esp. overlap)?

when does a cluster qualify as redundant w r t another cluster when – when does a cluster qualify as redundant w.r.t. another cluster, when does it represent a different concept (despite a certain overlap)?

hi h d d subspace clustering alternative clustering

?

l d d

  • how to assess similarity between clustering solutions?

– possible overlap between clusters makes this problem really difficult

high redundancy low redundancy

Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)

poss b e o e ap be ee c us e s a es s p ob e ea y d cu – no simple mapping

9