LUDWIG- MAXIMILIANS- UNIVERSITÄT MÜNCHEN DATABASE SYSTEMS GROUP INSTITUTE FOR INFORMATICS MÜNCHEN GROUP INFORMATICS
Subspace Clustering Ensemble Clustering Subspace Clustering, - - PowerPoint PPT Presentation
Subspace Clustering Ensemble Clustering Subspace Clustering, - - PowerPoint PPT Presentation
LUDWIG- MAXIMILIANS- INSTITUTE DATABASE UNIVERSITT FOR SYSTEMS MNCHEN MNCHEN INFORMATICS INFORMATICS GROUP GROUP Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview
DATABASE SYSTEMS GROUP
Outline
GROUP
- 1. Subspace Clustering
- 2. Ensemble Clustering
- 3. Alternative Clustering
- 4. Multiview Clustering
- 5. Discussion
Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)
2
DATABASE SYSTEMS GROUP
Subspace Clustering
GROUP
- Task: identify clusters of similar objects
i il it d fi d t t i b f th d t
- similarity defined w.r.t. a certain subspace of the data space
- different subspaces for different clusters
Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)
3
DATABASE SYSTEMS GROUP
Subspace Clustering
GROUP
- Subspaces: different
– selection – weighting combination – combination
- f attributes
- learn subspace and clustering
- learn subspace and clustering
simultaneously (interdepency)
- strategies:
strategies:
– top-down (learn spatial characteristics
- f initially built sets of objects)
evant attribute
– bottom-up (learn 1-d clusters, combine them to 2-d clusters, etc. (APRIORI)) => many irrelevant clusters
irrele
Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)
a y e e a c us e s
4
relevant attribute/ relevant subspace
DATABASE SYSTEMS GROUP
Ensemble Clustering
GROUP
- basic idea: combine different clusterings to obtain one
i l li bl l t i single, more reliable clustering
- tasks:
h t t di l t i – how to create diverse clusterings – how to combine different clusterings
- induce diversity of clusterings
- induce diversity of clusterings
– use different feature-subsets – use different database subsets – use different clustering algorithms
- correspondence between clusterings
– useful for judging on redundancy of clusters? – a lot of different answers – but: could it not be that different clusterings are just different yet both meaningful?
Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)
clusterings are just different, yet both meaningful?
5
DATABASE SYSTEMS GROUP
Alternative Clustering
GROUP
- given a clustering, use diversity or non-redundancy as a
t i t t fi d diff t l t i constraint to find a different clustering
- techniques:
bl t h i – ensemble techniques – use different subspaces
- relationship to subspace clustering:
- relationship to subspace clustering:
– subspace clustering can learn from the treatment of non-redundancy – alternative clustering can learn to allow for a certain level of g redundancy
Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)
6
DATABASE SYSTEMS GROUP
Multiview Clustering
GROUP
- seek different clusterings in different subspaces
- special case of alternative clustering?
– constraint: orthogonality of subspaces
- special case of subspace clustering?
– allowing maximal overlap of clusters seeking minimally redundant clusters by accommodating different – seeking minimally redundant clusters by accommodating different concepts
- emphasizes the observation known from subspace
p p clustering: highly overlapping clusters in different subspaces need not g y pp g p be redundant nor meaningless
Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)
7
DATABASE SYSTEMS GROUP
Discussion
GROUP
subspace clustering
l diff t l t i diff t
ensemble clustering
l diff t b h ll
- goal: different clusters in different
subspaces
- problem: redundancy of clusters
(same clusters reported for
- goal: different subspaces shall
induce the same clusters
- problem: correspondence of
clusterings? What about actually (same clusters reported for different subspaces) clusterings? What about actually different clusterings?
? multiview clustering
- goal: find different cluster
alternative clustering
- goal: given a clustering find a
- goal: find different cluster
concepts in different subspaces
- problem: balance between
admissible overlap of clusters and
- goal: given a clustering, find a
different clustering
- problem: which level of
redundancy is admissible?
Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)
8
admissible overlap of clusters and difference between concepts redundancy is admissible?
DATABASE SYSTEMS GROUP
Discussion
GROUP
- how should we treat diversity of clustering solutions?
– should diverse clusterings always be unified (ensemble)? – should diverse clusterings always be unified (ensemble)? – under which conditions is a unification of diverse clusterings meaningful?
- can we learn from diversity itself?
– again ensemble: exceptional clustering in one subspace will be t b d d l t ld it t b i ll i t ti ?
- utnumbered and lost – could it not be especially interesting?
- how to treat redundancy (esp. overlap)?
when does a cluster qualify as redundant w r t another cluster when – when does a cluster qualify as redundant w.r.t. another cluster, when does it represent a different concept (despite a certain overlap)?
hi h d d subspace clustering alternative clustering
?
l d d
- how to assess similarity between clustering solutions?
– possible overlap between clusters makes this problem really difficult
high redundancy low redundancy
Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010)
poss b e o e ap be ee c us e s a es s p ob e ea y d cu – no simple mapping
9