Projective Clustering Ensembles F. Gullo C. Domeniconi A. Tagarelli - PowerPoint PPT Presentation

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Projective Clustering Ensembles F. Gullo ∗ C. Domeniconi † A. Tagarelli ∗ ∗ Dept. of Electronics, Computer and Systems Science University of Calabria, Italy † Dept. of Computer Science George Mason University, Virginia (USA) IEEE International Conference on Data Mining (ICDM) 2009 F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Clustering Ensembles input a set E = {C 1 , . . . , C m } of clustering solutions (i.e., ensemble ) output a consensus partition C ∗ computed according to a consensus function F goal : to reduce the (inevitable) bias of any clustering solution due to the peculiarities of the specific clustering algorithm being used ( ill-posed nature of clustering) F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Projective Clustering input a set D of D -dimensional points (data objects) output a partition C of D , a set S of subspaces s.t. each S ∈ S is assigned to one (and only one) cluster C ∈ C goal : overcoming issues due to the curse of dimensionality assumption : objects within the same cluster C are close to each other if (and only if) they are projected onto the subspace S associated to C figure borrowed from [Procopiuc et Al., SIGMOD‘02] F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Clustering Ensembles and Projective Clustering have been so far considered as two distinct problems... F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Projective Clustering Ensembles (PCE) PCE problem addressed for the first time: given a set of projective clustering solutions (i.e., a projective ensemble ), the objective is to discover a projective consensus partition Challenge: information about feature-to-cluster assignments have to be considered: traditional clustering ensembles methods do not work! F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Contributions rigorous formulations of PCE as an optimization problem two-objective PCE single-objective PCE well-founded heuristics for each formulation MOEA-PCE EM-PCE F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Outline Introduction 1 Two-objective PCE 2 Single-objective PCE 3 Experimental Evaluation 4 Conclusion 5 F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Projective clustering solution Definition (projective clustering solution) Let D = { � o 1 , . . . ,� o N } be a set of D -dimensional points (data objects). A projective clustering solution C defined over D is a triple �L , Γ , ∆ � : L = { ℓ 1 , . . . , ℓ K } is a set of cluster labels which uniquely represent the K clusters Γ : L × D → S Γ is a function which stores the probability that object � o n belongs to the cluster labeled with ℓ k , ∀ k ∈ [1 .. K ] , n ∈ [1 .. N ], such that � K k =1 Γ kn = 1 , ∀ n ∈ [1 .. N ], where Γ kn hereinafter refers to Γ( ℓ k ,� o n ) ∆ : L × [1 .. D ] → [0 , 1] is a function which stores the probability that the d -th feature is a relevant dimension for the objects in the cluster labeled with ℓ k , ∀ k ∈ [1 .. K ] , d ∈ [1 .. D ], such that � D d =1 ∆ kd = 1 , ∀ k ∈ [1 .. K ], where ∆ kd hereinafter refers to ∆( ℓ k , d ) F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Two-objective PCE Motivation: A projective consensus partition C ∗ = �L ∗ , Γ ∗ , ∆ ∗ � derived from an ensemble E should meet requirements related to: the data object clustering of the solutions in E the feature-to-cluster assignment of the solutions in E = ⇒ PCE can be naturally formulated considering two objectives F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Two-objective PCE: formulation � � C ∗ = arg min Ψ o (ˆ C , E , D ) , Ψ f (ˆ C , E , D ) ˆ C where � � � 1 Ψ o (ˆ ψ o (ˆ C , C ) + ψ o ( C , ˆ C , E , D ) = C ) 2 C ∈E � � � 1 Ψ f (ˆ ψ f (ˆ C , C ) + ψ f ( C , ˆ C , E , D ) = C ) 2 C ∈E and ψ o ( C i , C j ) (resp. ψ f ( C i , C j )) is computed by resorting to the extended Jaccard similarity coefficient applied to the Γ kn (resp. ∆ kd ) values of C i and C j F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Two-objective PCE: heuristic two-objective PCE formulation: objectives are conflicting with each other na¨ ıve solutions given by (linear) combining the two objectives into a single one have several drawbacks: mixing non-commensurable objectives hard setting of the weights needed for the linear combination prior knowledge of the application domain idea : resort to the Multi Objective Evolutionary Algorithms (MOEAs) domain = ⇒ we exploit NSGA-II algorithm F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Two-objective PCE: MOEA-PCE algorithm MOEA-PCE Algorithm Input: a projective ensemble E of size M , defined over a set D of N D -dimens. objects; the number K of clusters in the output projective consensus partitions; the population size t ; the max number I of iterations Output: a set S ∗ of projective consensus partitions 1: S ← populationRandomGen ( E , t , K ), it ← 1 2: repeat 3: ρ ← computeParetoRanking ( S ) S ′ ⊂ S , ˇ S ′′ ⊂ S� : | ˇ �S ′ , S ′′ � ← � ˇ S ′ | = |S| / 2 , | ˇ S ′′ | = 4: S ′ ∪ ˇ S ′′ = S , ρ ( x ′ ) ≤ ρ ( x ′′ ) , ∀ x ′ ∈ ˇ S ′ , x ′′ ∈ ˇ |S| / 2 , ˇ S ′′ S ′ CM ← crossoverAndMutation ( S ′ ) 5: S ← S ′ ∪ S ′ 6: CM 7: it ← it + 1 8: until it = I 9: ρ ← computeParetoRanking ( S ) 10: S ∗ ← { x ′ ∈ S : ρ ( x ′ ) ≤ ρ ( x ′′ ) , ∀ x ′′ ∈ S , x ′′ � = x ′ } F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Two-objective PCE: MOEA-PCE algorithm (2) The proposed MOEA-PCE heuristic is based on the classic MOEA notions of: domination Pareto-optimality Pareto-ranking function ( ρ ) MOEA-PCE works in O ( I t M K 2 ( N + D )) F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Two-objective PCE: MOEA-PCE algorithm (3) Weaknesses of MOEA-PCE: high complexity in the approach efficiency (mostly due to I ) hard setting for I and t results not easily interpretable (multiple output results) F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Single-objective PCE: formulation PCE formulation alternative to two-objective PCE: C ∗ Q (ˆ = arg min C , E ) ˆ C s . t . � K ˆ Γ kn = 1 , ∀ n ∈ [1 .. N ] k =1 � D ˆ ∆ kd = 1 , ∀ k ∈ [1 .. K ] d =1 Γ kn ≥ 0 , ˆ ˆ ∆ kd ≥ 0 , ∀ k ∈ [1 .. K ] , n ∈ [1 .. N ] , d ∈ [1 .. D ] where K N H D � � � � � ˆ � 2 Q (ˆ ˆ α C , E ) = Γ γ hn ∆ kd − δ hd kn k =1 n =1 h =1 d =1 F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Single-objective PCE: formulation (2) K N H D � � � � � ˆ � 2 Q (ˆ ˆ α C , E ) = Γ γ hn ∆ kd − δ hd kn k =1 n =1 h =1 d =1 Rationale of function Q at the basis of the proposed single-objective PCE formulation: it embeds both object-based and feature-based representations of the solutions in the ensemble it is essentially based on measuring, for each object, the “distance error” between the feature-based representation of the clusters in the consensus partition and the clusters in the solutions of the ensemble the discrepancy between two clusters is weighted by the probability that the object belongs to both (i.e., Γ kn × γ hn ) F. Gullo, C. Domeniconi, A. Tagarelli Projective Clustering Ensembles

Projective Clustering Ensembles F. Gullo C. Domeniconi A. Tagarelli - PowerPoint PPT Presentation

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Projective Clustering Ensembles F. Gullo C. Domeniconi A. Tagarelli Dept. of Electronics, Computer and Systems Science University of

Computer Vision Mid-Level Vision Projective Geometry The projective projection of a 3D point:

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Finite projective spaces Leo Storme Ghent University Dept. of Mathematics Krijgslaan 281 - S22

Projective unification in modal logic II Projective unification in modal logic II Piotr Wojtylak

Projective measure without projective Baire D. Schrittesser Universitt Bonn YST 2011 D.

Projective Geometry Shao-Yi Chien Department of Electrical Engineering National Taiwan

Projective superspace Ariunzul Davgadorj Masaryk University, Czech Republic New Frontiers in

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Subspace Clustering Ensembles Carlotta Domeniconi Department of Computer Science George Mason

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES IN INTRODUCTORY PROGRAMMING COURSES NADEEM

Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama

OpenTHOS Multi-window Introduction Chen Gang <chengang@emindsoft.com.cn> 2016-09-24

c i,j max k,m c k,m 4 Wednesday, 2 Oct. 2019 Machine Learning (COMP 135) 3 Wednesday, 2

Stanley, N.W. Tasmania Settled 1826 by the English Van Diemens Land Company Population

MySQL Online Schema Changes at Uber and Tango Ben Black and David Turner Who are we? Ben

der Informatik Moritz Mhlhausen Prof. Marcus Magnor

Projective Clustering Ensembles F. Gullo C. Domeniconi A. Tagarelli - PowerPoint PPT Presentation

Introduction Two-objective PCE Single-objective PCE Experimental Evaluation Conclusion Projective Clustering Ensembles F. Gullo C. Domeniconi A. Tagarelli Dept. of Electronics, Computer and Systems Science University of

Computer Vision Mid-Level Vision Projective Geometry The projective projection of a 3D point:

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Finite projective spaces Leo Storme Ghent University Dept. of Mathematics Krijgslaan 281 - S22

Projective unification in modal logic II Projective unification in modal logic II Piotr Wojtylak

Projective measure without projective Baire D. Schrittesser Universitt Bonn YST 2011 D.

Projective Geometry Shao-Yi Chien Department of Electrical Engineering National Taiwan

Projective superspace Ariunzul Davgadorj Masaryk University, Czech Republic New Frontiers in

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Subspace Clustering Ensembles Carlotta Domeniconi Department of Computer Science George Mason

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES IN INTRODUCTORY PROGRAMMING COURSES NADEEM

Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama

OpenTHOS Multi-window Introduction Chen Gang &lt;chengang@emindsoft.com.cn&gt; 2016-09-24

c i,j max k,m c k,m 4 Wednesday, 2 Oct. 2019 Machine Learning (COMP 135) 3 Wednesday, 2

Stanley, N.W. Tasmania Settled 1826 by the English Van Diemens Land Company Population

MySQL Online Schema Changes at Uber and Tango Ben Black and David Turner Who are we? Ben

der Informatik Moritz Mhlhausen Prof. Marcus Magnor

OpenTHOS Multi-window Introduction Chen Gang <chengang@emindsoft.com.cn> 2016-09-24