Multi-View Clustering via Joint Nonnegative Matrix Factorization - PowerPoint PPT Presentation

Multi-View Clustering via Joint Nonnegative Matrix Factorization Jialu Liu 1 Chi Wang 1 Jing Gao 2 Jiawei Han 1 1 University of Illinois at Urbana-Champaign 2 University at Buffalo May 2, 2013 amss

Outline Multi-View Clustering 1 Multi-View NMF 2 Standard NMF Joint NMF Relation to PLSA 3 Experiments 4 amss

Multi-View Datasets Many datasets in real world are naturally comprised of different representations or views . amss

We need to integrate them Observing that these multiple representations often provide compatible and complementary information, it becomes natural for one to integrate them together to obtain better performance rather than relying on a single view. The key of learning from multiple views (multi-view) is to leverage each view’s own knowledge base in order to outperform simply concatenating views. amss

Three ways to integrate As we are interested in clustering, here are three common strategies. 1 Incorporating multi-view integration into the clustering process directly through optimizing certain loss functions. 2 First projecting multi-view data into a common lower dimensional subspace and then applying any clustering algorithm such as k -means to learn the partition. 3 Late integration or late fusion, in which a clustering solution is derived from each individual view and then all the solutions are fused base on consensus amss

Nonnegative Matrix Factorization Let X = [ X · , 1 , . . . , X · , N ] ∈ R M × N denote the nonnegative data matrix + where each column represents a data point and each row represents one attribute. NMF aims to find two non-negative matrix factors U = [ U i , k ] ∈ R M × K and V = [ V j , k ] ∈ R N × K whose product provides a + + good approximation to X : X ≈ UV T (1) Here K denotes the desired reduced dimension, and to facilitate discussions, we call U the basis matrix and V the coefficient matrix . amss

Update Rule of NMF One of the common reconstruction processes can be formulated as a Frobenius norm optimization problem, defined as: U , V || X − UV T || 2 min F , s . t . U ≥ 0 , V ≥ 0 Multiplicative update rules are executed iteratively to minimize the objective function as follows: ( X T U ) j , k ( XV ) i , k U i , k ← U i , k , V j , k ← V j , k (2) ( UV T V ) i , k ( VU T U ) j , k amss

NMF for Clustering Note that given the NMF formulation in Equation 1, for arbitrary invertible K × K matrix Q , we have UV T = ( UQ − 1 )( QV T ) (3) There can be many possible solutions, and it is important to enforce constraints to ensure uniqueness of the factorization for clustering. One of the common ways is to normalize basis matrix U after convergence of multiplicative updates if we use V for clustering: U i , k �� U 2 U i , k ← , V j , k ← V j , k (4) i , k �� i U 2 i i , k amss

Multi-View Notations Assume that we are now given n v representations (i.e., views). Let { X ( 1 ) , X ( 2 ) , . . . , X ( n v ) } denote the data of all the views, where for each view X ( v ) , we have factorizations that X ( v ) ≈ U ( v ) ( V ( v ) ) T . Here for different views, we have the same number of data points but allow for different number of attributes, hence V ( v ) s are of the same shape but U ( v ) s can differ along the row dimension across multiple views. amss

One Simple Baseline Using the shared coefficient matrix but different basis matrices across views as shown below: n v λ v � X ( v ) − U ( v ) ( V ( ∗ ) ) T � 2 � F v = 1 where λ v is the weight parameter and V ( ∗ ) is the shared consensus. It is easy to verify that this baseline is equivalent to applying NMF directly on the concatenated features of all views. amss

Comment n v λ v � X ( v ) − U ( v ) ( V ( ∗ ) ) T � 2 � F v = 1 However, this simple model cannot be the optimal for two reasons. 1 The hard assumption that fixing one-side factor seems to be too strong and many times we prefer relatively soft constraints. 2 With proper normalization, previous work on single-view NMF has shown to achieve better performance in terms of clustering. amss

Our Framework Normalize View 1 Model 1 Consensus Conse Data Normalize View 2 Model 2 Normalize Normalize n v n v View Model We require models learnt from different views to be softly regularized towards a consensus with proper normalization for clustering. amss

Our Approach The hard assumption that fixing one-side factor seems to be too strong and many times we prefer relatively soft constraints. Firstly, we incorporate the disagreement between coefficient matrix V ( v ) and the consensus matrix V ∗ into NMF: n v n v � X ( v ) − U ( v ) ( V ( v ) ) T � 2 λ v � V ( v ) − V ∗ � 2 � � F + F v = 1 v = 1 s . t . U ( v ) , V ( v ) , V ∗ ≥ 0 (5) amss

Our Approach With proper normalization, previous work on single-view NMF has shown to achieve better performance in terms of clustering. Secondly, we add constraints on coefficient matrices U ( v ) in different views to make V ( v ) s comparable and meaningful for clustering. W.l.o.g., assume || X ( v ) || 1 = 1, we then want to minimize: n v n v � X ( v ) − U ( v ) ( V ( v ) ) T � 2 λ v � V ( v ) − V ∗ � 2 � � F + F v = 1 v = 1 · , k || 1 = 1 and U ( v ) , V ( v ) , V ∗ ≥ 0 s . t . ∀ 1 ≤ k ≤ K , || U ( v ) (6) amss

Why || X || 1 = 1 and || U · , k || 1 = 1? Objective function: n v n v � X ( v ) − U ( v ) ( V ( v ) ) T � 2 λ v � V ( v ) − V ∗ � 2 � � min F + F U ( v ) , V ( v ) , V ∗ v = 1 v = 1 · , k || 1 = 1 and U ( v ) , V ( v ) , V ∗ ≥ 0 s . t . ∀ 1 ≤ k ≤ K , || U ( v ) Given || X || 1 = 1 and || U · , k || 1 = 1, K K � � � � � || X || 1 = || X j || 1 ≈ || U · , k V j , k || 1 = || V j , k || 1 = || V || 1 j k = 1 j k = 1 j Therefore, || V || 1 ≈ 1 amss

Objective Function Previous: n v n v � X ( v ) − U ( v ) ( V ( v ) ) T � 2 λ v � V ( v ) − V ∗ � 2 � � min F + F U ( v ) , V ( v ) , V ∗ v = 1 v = 1 · , k || 1 = 1 and U ( v ) , V ( v ) , V ∗ ≥ 0 s . t . ∀ 1 ≤ k ≤ K , || U ( v ) Now: n v n v � X ( v ) − U ( v ) ( Q ( v ) ) − 1 Q ( v ) ( V ( v ) ) T � 2 λ v � V ( v ) Q ( v ) − V ∗ � 2 � � min F + F U ( v ) , V ( v ) , V ∗ v = 1 v = 1 s . t . ∀ 1 ≤ v ≤ n v , U ( v ) ≥ 0 , V ( v ) ≥ 0 , V ∗ ≥ 0 where � M M M � Q ( v ) = Diag U ( v ) U ( v ) U ( v ) � � � i , 1 , i , 2 , . . . , i , K i = 1 i = 1 i = 1 amss

Iterative Update Rules Fixing V ∗ , minimize over U ( v ) and V ( v ) until convergence: � N ( XV ) i , k + λ v j = 1 V j , k V ∗ j , k U i , k ← U i , k � M � N ( UV T V ) i , k + λ v j = 1 V 2 l = 1 U l , k j , k U ← UQ − 1 , V ← VQ ( X T U ) j , k + λ v V ∗ j , k V j , k ← V j , k ( VU T U ) j , k + λ v V j , k Fixing U ( v ) and V ( v ) , minimize over V ∗ : � n v v = 1 λ v V ( v ) Q ( v ) V ∗ = ≥ 0 � n v v = 1 λ v amss

Use V ∗ for Clustering Once we obtain the consensus matrix V ∗ , the cluster label of data point j could be computed as arg max k V ∗ j , k . Or we can simply use k -means directly on V ∗ where V ∗ is viewed as a latent representation of the original data points. amss

PLSA Probabilistic Latent Semantic Analysis (PLSA) is a traditional topic modeling technique for document analysis. It models the M × N term-document co-occurrence matrix X (each entry X ij is the number of occurrences of word w i in document d j ) as being generated from a mixture model with K components: K � P ( w , d ) = P ( w | k ) P ( d , k ) k = 1 amss

Relation to NMF K � P ( w , d ) = P ( w | k ) P ( d , k ) k = 1 X = ( UQ − 1 )( QV T ) Early studies show that ( UQ − 1 ) (or ( QV T ) ) has the formal properties of conditional probability matrix [ P ( w | k )] ∈ R M × K (or + [ P ( d , k )] T ∈ R K × N ). This provides theoretical foundation for using + NMF to conduct clustering. Due to this connection, joint NMF has a nice probabilistic interpretation: each element in the matrix V ∗ is the consensus of P ( d | k ) ( v ) weighted by λ v P ( d ) ( v ) from different views. amss

Datasets One synthetic and three real world datasets are used in the experiment. Synthetic dataset: It is a two-view dataset where noises are added independently. 3-Sources Text dataset: It is collected from three online news sources: BBC, Reuters, and The Guardian telling the same story. Reuters Multilingual dataset: This test collection contains feature characteristics of documents originally written in different languages. UCI Handwritten Digit dataset: This handwritten digits (0-9) data is from the UCI repository with different features. amss

Datasets The important statistics of four datasets are summarized in the following table. dataset size # view # cluster Synthetic 10000 2 4 3-Sources 169 3 6 Reuters 600 3 6 Digit 2000 2 10 amss

Multi-View Clustering via Joint Nonnegative Matrix Factorization - PowerPoint PPT Presentation

Multi-View Clustering via Joint Nonnegative Matrix Factorization Jialu Liu 1 Chi Wang 1 Jing Gao 2 Jiawei Han 1 1 University of Illinois at Urbana-Champaign 2 University at Buffalo May 2, 2013 amss Outline Multi-View Clustering 1 Multi-View

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Nonnegative matrix factorization and applications in audio signal processing C edric F

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

DSPACE CLUSTERING DSPACE CLUSTERING VIA PUPPET, HAPROXY AND CEPHFS VIA PUPPET, HAPROXY AND

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

A message-passing approach to low-rank matrix reconstruction and application to clustering

Data-driven Clustering via Parameterized Lloyds Families Travis Dick Joint work with

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Welfare Dynamics Measurement: Two Definitions of a Vulnerability Line and Their Application

Ytterbium quantum gases in Florence Leonardo Fallani University of Florence & LENS Credits

High-frequency imaging of a moving object Clifford Nolan University of Limerick Conference in

What deep generative models can do for you: Opportunities, challenges, and open questions Giulia

Robust Portfolio Allocation with Risk Contribution Restrictions Darolles, S., Gourieroux, C., and

The Business of Quant Rohit Singh MIT Administrivia Fridays (Jan 10, 17, 24 & 31st),

Achieving Flexibility in Annual Sample Sizes For a Continuous Multiyear Survey Chris Moriarity,

CSE-571 So far, we discussed the Kalman filter: Gaussian, linearization problems

Multi-View Clustering via Joint Nonnegative Matrix Factorization - PowerPoint PPT Presentation

Multi-View Clustering via Joint Nonnegative Matrix Factorization Jialu Liu 1 Chi Wang 1 Jing Gao 2 Jiawei Han 1 1 University of Illinois at Urbana-Champaign 2 University at Buffalo May 2, 2013 amss Outline Multi-View Clustering 1 Multi-View

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Nonnegative matrix factorization and applications in audio signal processing C edric F

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

DSPACE CLUSTERING DSPACE CLUSTERING VIA PUPPET, HAPROXY AND CEPHFS VIA PUPPET, HAPROXY AND

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

A message-passing approach to low-rank matrix reconstruction and application to clustering

Data-driven Clustering via Parameterized Lloyds Families Travis Dick Joint work with

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Welfare Dynamics Measurement: Two Definitions of a Vulnerability Line and Their Application

Ytterbium quantum gases in Florence Leonardo Fallani University of Florence &amp; LENS Credits

High-frequency imaging of a moving object Clifford Nolan University of Limerick Conference in

What deep generative models can do for you: Opportunities, challenges, and open questions Giulia

Robust Portfolio Allocation with Risk Contribution Restrictions Darolles, S., Gourieroux, C., and

The Business of Quant Rohit Singh MIT Administrivia Fridays (Jan 10, 17, 24 &amp; 31st),

Achieving Flexibility in Annual Sample Sizes For a Continuous Multiyear Survey Chris Moriarity,

CSE-571 So far, we discussed the Kalman filter: Gaussian, linearization problems

Ytterbium quantum gases in Florence Leonardo Fallani University of Florence & LENS Credits

The Business of Quant Rohit Singh MIT Administrivia Fridays (Jan 10, 17, 24 & 31st),