Less is More: Non-Redundant Subspace Clustering Ira Assent Emmanuel - PowerPoint PPT Presentation

Less is More: Non-Redundant Subspace Clustering Ira Assent ◦ Emmanuel Müller • Stephan Günnemann • Ralph Krieger • Thomas Seidl • ◦ Aalborg University, Denmark • RWTH Aachen University, Germany MultiClust Workshop at SIGKDD 2010 July 25, 2010

Effective Models Efficient Computation Evaluation and Exploration of Results Detection of Non-Redundant Subspace Clusters I # boats internet in Miami usage C 1 C 2 C 3 C 5 C 6 sportive C 4 income activities Hidden clusters are described by different attribute sets Each object might be grouped in multiple clusters ⇒ Novel challenges for subspace clustering Less is More: Non-Redundant Subspace Clustering 1 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Detection of Non-Redundant Subspace Clusters II # boats in Miami C 1 Subspace Cluster: (rich; boat owner; car fan; C 3 globetrotter; horse fan) Exp. many projections # horses freq. flyer miles (rich) (boat owner) (rich; globetrotter) # cars ... C 4 income Huge amount of redundant clusters ⇒ Typically number of clusters ≫ number of objects ⇒ Detection of all and only non-redundant subspace clusters Less is More: Non-Redundant Subspace Clustering 2 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Overview Main question How can you use/extend non-redundant clustering ... In this talk, we present A survey of our contributions so far The generality of our techniques Our open source initiatives for the community Research questions arise in the areas of: Effective Models 1 Efficient Computation 2 Evaluation and Exploration of Results 3 Less is More: Non-Redundant Subspace Clustering 3 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Notions and Related Work Abstract subspace clustering definition Definition of object set O 1,2,3,4 1,2,3,4 clustered in subspace S 1,2,3 1,2,3 1,2,4 1,2,4 1,3,4 1,3,4 2,3,4 2,3,4 C = ( O , S ) with O ⊆ DB , S ⊆ DIM 1,2 1,2 1,3 1,4 1,4 2,3 2,3 2,4 3,4 3,4 Selection of result set M 1 1 2 2 3 3 4 4 a subset of all valid subspace clusters ALL M = { ( O 1 , S 1 ) . . . ( O n , S n ) } ⊆ ALL Related work Subspace clustering: focus on definition of ( O , S ) ⇒ Output all valid subspace cluster M = ALL ( ⇒ too many) Projected clustering: focus on definition of disjoint clusters in M ⇒ Unable to detect objects in multiple clusters ( ⇒ too few) Less is More: Non-Redundant Subspace Clustering 4 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Non-Redundant Subspace Clustering Models Select M ⊆ ALL : Exclude redundant subspace clusters... C 2 C 2 C 3 C 1 C 1 Local (pairwise) redundancy elimination [ 1 ][ 2 ] ( O , S ) is non-redundant iff ¬∃ ( O ′ , S ′ ) with O ′ ⊆ O ∧ S ′ ⊃ S ∧ | O ′ | ≥ R · | O | ⇒ Excludes large number of redundant subspace clusters [1] Assent, Krieger, Müller and Seidl: DUSC: Dimensionality Unbiased Subspace Clustering , in ICDM 2007. [2] Assent, Krieger, Müller and Seidl: INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy , in ICDM 2008. Less is More: Non-Redundant Subspace Clustering 5 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Generalization of Redundancy Elimination Relevant subspace clustering model [ 3 ] Include the most interesting subspace clusters Exclude redundant subspace clusters ⇒ Provide most relevant subspace clusters in result set ⇒ Extract novel knowledge with each cluster relevance model all possible relevant clustering clusters interestingness redundancy M ALL ALL of clusters of clusters Given any definition of subspace clusters C = ( O , S ) ⇒ Choose optimal subset M = { C 1 , . . . , C n } ⊆ ALL Proof: Such an optimization is an NP-hard problem [3] Müller, Assent, Günnemann, Krieger and Seidl: Relevant Subspace Clustering: Mining the Most Interesting Non-Redundant Concepts in High Dimensional Data , in ICDM 2009. Less is More: Non-Redundant Subspace Clustering 6 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Redundancy Pruning by Depth-First Processing Pruning Applicable for local redundancy (simple pairwise model) Enables in-process pruning of redundant clusters. depth-first 1,2,3,4 1,2,3,4 1,2,3,4 1,2,3,4 1,2,3 1,2,3 1,2,4 1,2,4 1,3,4 1,3,4 2,3,4 2,3,4 1,2,3 1,2,3 1,2,4 1,2,4 1,3,4 1,3,4 2,3,4 2,3,4 1,2 1,2 1,3 1,4 1,4 2,3 2,3 2,4 3,4 3,4 1,2 1,2 1,3 1,4 1,4 2,3 2,3 2,4 3,4 3,4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 breadth-first Step-by-step processing ( k -D → ( k + 1 ) -D subspace!) ⇒ Scalability to high dimensional data? Less is More: Non-Redundant Subspace Clustering 7 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Scalable Subspace Processing direct jump 1,2,3,4 1,2,3,4 1,2,3 1,2,3 1,2,4 1,2,4 1,3,4 1,3,4 2,3,4 2,3,4 Dimension 2 2 1,2 1,2 1,3 1,4 1,4 2,3 2,3 2,4 3,4 3,4 1 4 2 n 1 1 2 2 3 3 4 4 o i s 1 n e m interval 1 interval 2 i D Dimension 1 Key idea: density estimation + steered jumps Subspace clusters are represented by many low dimensional projections Use 2-D projections to estimate density in higher subspace regions [ 4 ] Use k -D projections to jump directly to ( k + x ) -D subspaces [ x ≫ 1 ] Best-first search: Intelligent steering to promising subspace regions [4] Müller, Assent, Krieger, Günnemann and Seidl: DensEst: Density Estimation for Data Mining in High Dimensional Spaces , in SDM 2009. Less is More: Non-Redundant Subspace Clustering 8 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Challenges in Evaluation and Exploration General challenge for clustering No ground truth available for clustering ⇒ Subjective evaluation by exploration requires visualization techniques and interactive exploration tools ⇒ Objective evaluations are incomparable using different implementations , databases and quality measures ⇒ We provide broad evaluation study & interactive exploration framework Evaluation Study [ 5 ] Characterization of major paradigms Providing comparable baseline implementations Evaluation based on broad set of data sets , quality measures and parameter settings [5] Müller, Günnemann, Assent and Seidl: Evaluating Clustering in Subspace Projections of High Dimensional Data , in VLDB 2009. Less is More: Non-Redundant Subspace Clustering 9 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Open Source Framework OpenSubspace framework Framework for research, education and application [ 6 ][ 7 ][ 8 ][ 9 ] Baselines for algorithm and evaluation measure development OpenSubspace unified algo. A algorithm algo. B repository algo. C algo. D re-implementation rare case: common implementation eval. 1 unified eval. 2 eval. 1 eval. 2 eval. 3 evaluation eval. 3 repository http://dme.rwth-aachen.de/OpenSubspace/ [6] Müller, Assent, Krieger, Jansen and Seidl: Morpheus: Interactive Exploration of Subspace Clustering , in KDD 2008. [7] Assent, Müller, Krieger, Jansen and Seidl: Pleiades: Subspace Clustering and Evaluation , in PKDD 2008. [8] Günnemann, Färber, Kremer, Seidl: CoDA: Interactive Cluster Based Concept Discovery , in VLDB 2010 [9] Müller, Schiffer, Gerwert, Hannen, Jansen, Seidl: SOREX: Subspace Outlier Ranking Exploration Toolkit , in PKDD 2010. Less is More: Non-Redundant Subspace Clustering 10 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Conclusion and Future Work Subspace clustering is still an emerging research field... Is the basis for a lot of further research Alternative subspace clustering Evaluation measures for subspace clustering Benchmark databases for subspace clustering . . . Less is More: Non-Redundant Subspace Clustering 11 / 11

Effective Models Efficient Computation Evaluation and Exploration of Results Conclusion and Future Work Subspace clustering is still an emerging research field... Is the basis for a lot of further research Alternative subspace clustering Evaluation measures for subspace clustering Benchmark databases for subspace clustering . . . Thank you for your attention. Questions? Less is More: Non-Redundant Subspace Clustering 11 / 11

Less is More: Non-Redundant Subspace Clustering Ira Assent Emmanuel - PowerPoint PPT Presentation

Less is More: Non-Redundant Subspace Clustering Ira Assent Emmanuel Mller Stephan Gnnemann Ralph Krieger Thomas Seidl Aalborg University, Denmark RWTH Aachen University, Germany MultiClust Workshop at SIGKDD 2010

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

Chapt er 14: Redundant Arit hmet ic Keshab K. Parhi A non-redundant radix-r number has

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Paper Presentation (EE698M) Abhay Kumar Subspace clustering Cluster data drawn from multiple

Neu Neural C al Collab ollabor orativ ive e Subspace ce Clustering Tong Zhang, Pan Ji ,

SUBSPACE CLUSTERING Sylvain Calinon Robot Learning & Interaction Group Idiap Research

0 -Sparse Subspace Clustering Yingzhen Yang 1 , Jiashi Feng 2 , Nebojsa Jojic 3 , Jianchao Yang

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Motivation High Dimensional Issues Subspace Clustering Full Dimensional Clustering Issues

Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for

Redundant Via Insertion Redundant Via Insertion with Wire Bending with Wire Bending Kuang-

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Introduction Tracking child mortality and identifying the causes of child mortality received

Improvement Collaborative Learning Session 3 Thursday 9 th May Improvement Hub Enabling health

5 TH MARCH, 2014 VIGYAN BHAVAN PEREN DISTRICT TEAM DIMAPUR DISTRICT TEAM Dr. Pradeep Halder,

UNIAPAC International 2017 - 2020 Lord Jesus, I ask you the grace of knowing you and loving you

Las Vegas Stadium Alternative Funding Concept State of Nevada Creates Stadium Authority via

INTRODUCTION Design Criteria Phase 1 Ultimate Parcels / Users 300 / 800 1100 / 1900 Max Day

CONCURRENCY, INTUITION AND FORMAL VERIFICATION: YES, WE CAN! BEN-ARIS TWIN-PROCESS CONUNDRUM

Management, and Policy: A Historical and Regional Perspective Gyles Randall Univ. of Minnesota,

Less is More: Non-Redundant Subspace Clustering Ira Assent Emmanuel - PowerPoint PPT Presentation

Less is More: Non-Redundant Subspace Clustering Ira Assent Emmanuel Mller Stephan Gnnemann Ralph Krieger Thomas Seidl Aalborg University, Denmark RWTH Aachen University, Germany MultiClust Workshop at SIGKDD 2010

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

Chapt er 14: Redundant Arit hmet ic Keshab K. Parhi A non-redundant radix-r number has

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Paper Presentation (EE698M) Abhay Kumar Subspace clustering Cluster data drawn from multiple

Neu Neural C al Collab ollabor orativ ive e Subspace ce Clustering Tong Zhang, Pan Ji ,

SUBSPACE CLUSTERING Sylvain Calinon Robot Learning &amp; Interaction Group Idiap Research

0 -Sparse Subspace Clustering Yingzhen Yang 1 , Jiashi Feng 2 , Nebojsa Jojic 3 , Jianchao Yang

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Motivation High Dimensional Issues Subspace Clustering Full Dimensional Clustering Issues

Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for

Redundant Via Insertion Redundant Via Insertion with Wire Bending with Wire Bending Kuang-

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Introduction Tracking child mortality and identifying the causes of child mortality received

Improvement Collaborative Learning Session 3 Thursday 9 th May Improvement Hub Enabling health

5 TH MARCH, 2014 VIGYAN BHAVAN PEREN DISTRICT TEAM DIMAPUR DISTRICT TEAM Dr. Pradeep Halder,

UNIAPAC International 2017 - 2020 Lord Jesus, I ask you the grace of knowing you and loving you

Las Vegas Stadium Alternative Funding Concept State of Nevada Creates Stadium Authority via

INTRODUCTION Design Criteria Phase 1 Ultimate Parcels / Users 300 / 800 1100 / 1900 Max Day

CONCURRENCY, INTUITION AND FORMAL VERIFICATION: YES, WE CAN! BEN-ARIS TWIN-PROCESS CONUNDRUM

Management, and Policy: A Historical and Regional Perspective Gyles Randall Univ. of Minnesota,

SUBSPACE CLUSTERING Sylvain Calinon Robot Learning & Interaction Group Idiap Research