HINMF: A Matrix Factorization Method for Clustering in - PowerPoint PPT Presentation

HINMF: A Matrix Factorization Method for Clustering in Heterogeneous Information Networks Jialu Liu Jiawei Han University of Illinois at Urbana-Champaign August 5, 2013 amss

Outline HIN and Multi-View data 1 Previous Work 2 Standard NMF MultiNMF Relation to PLSA HINMF 3 4 Experiments amss

Heterogeneous Information Networks Term Venue Images Paper Term Author Users Tags Author In heterogeneous information networks (HIN), multiple types of nodes are connected by multiple types of links. amss

Star Schema Term Venue Images Paper Term Author Users Tags Author Star Schema By-typed Grey: Center type, White: Attribute type amss

Multi-View Learning Many datasets in real world are naturally comprised of different representations or views . amss

Connection between HIN and Multi-View data HIN following star schema can be viewed as a kind of multi-view relational data. Attribute types provide “views” for the center type. HIN Multi-View Data HIN with Star Schema amss

Common Motivation Observing that multiple subnetworks/representations often provide compatible and complementary information, it becomes natural for one to integrate them together to obtain better performance rather than relying on a single homogenenous/bipartite network or view. amss

Nonnegative Matrix Factorization Let X = [ X · , 1 , . . . , X · , N ] ∈ R M × N denote the nonnegative data matrix + where each column represents a data point and each row represents one attribute. NMF aims to find two non-negative matrix factors U = [ U i , k ] ∈ R M × K and V = [ V j , k ] ∈ R N × K whose product provides a + + good approximation to X : X ≈ UV T (1) Here K denotes the desired reduced dimension, and to facilitate discussions, we call U the basis matrix and V the coefficient matrix . amss

Update Rule of NMF One of the common reconstruction processes can be formulated as a Frobenius norm optimization problem, defined as: U , V || X − UV T || 2 min F , s . t . U ≥ 0 , V ≥ 0 Multiplicative update rules are executed iteratively to minimize the objective function as follows: ( X T U ) j , k ( XV ) i , k U i , k ← U i , k , V j , k ← V j , k (2) ( UV T V ) i , k ( VU T U ) j , k amss

NMF for Clustering Note that given the NMF formulation in Equation 1, for arbitrary invertible K × K matrix Q , we have UV T = ( UQ − 1 )( QV T ) (3) There can be many possible solutions, and it is important to enforce constraints to ensure uniqueness of the factorization for clustering. One of the common ways is to normalize basis matrix U after convergence of multiplicative updates if we use V for clustering: U i , k �� U 2 U i , k ← , V j , k ← V j , k (4) i , k �� i U 2 i i , k amss

Multi-View Notations Assume that we are now given n v representations (i.e., views). Let { X ( 1 ) , X ( 2 ) , . . . , X ( n v ) } denote the data of all the views, where for each view X ( v ) , we have factorizations that X ( v ) ≈ U ( v ) ( V ( v ) ) T . Here for different views, we have the same number of data points but allow for different number of attributes, hence V ( v ) s are of the same shape but U ( v ) s can differ along the row dimension across multiple views. amss

Framework of MultiNMF Normalize View 1 Model 1 Consensus Conse Data Normalize View 2 Model 2 Normalize Normalize n v n v View Model Models learnt from different views are requried to be softly regularized towards a consensus with proper normalization for clustering. amss

The Approach Firstly, the disagreement between coefficient matrix V ( v ) and the consensus matrix V ∗ are incorporated into NMF: n v n v � X ( v ) − U ( v ) ( V ( v ) ) T � 2 λ v � V ( v ) − V ∗ � 2 � � F + F v = 1 v = 1 s . t . U ( v ) , V ( v ) , V ∗ ≥ 0 (5) amss

The Approach Secondly, constraints on coefficient matrices U ( v ) in different views are added to make V ( v ) s comparable and meaningful for clustering. W.l.o.g., assume || X ( v ) || 1 = 1, we then want to minimize: n v n v � X ( v ) − U ( v ) ( V ( v ) ) T � 2 λ v � V ( v ) − V ∗ � 2 � � F + F v = 1 v = 1 · , k || 1 = 1 and U ( v ) , V ( v ) , V ∗ ≥ 0 s . t . ∀ 1 ≤ k ≤ K , || U ( v ) (6) amss

Why || X || 1 = 1 and || U · , k || 1 = 1? Objective function: n v n v � X ( v ) − U ( v ) ( V ( v ) ) T � 2 λ v � V ( v ) − V ∗ � 2 � � min F + F U ( v ) , V ( v ) , V ∗ v = 1 v = 1 · , k || 1 = 1 and U ( v ) , V ( v ) , V ∗ ≥ 0 s . t . ∀ 1 ≤ k ≤ K , || U ( v ) Given || X || 1 = 1 and || U · , k || 1 = 1, K K � � � � � || X || 1 = || X j || 1 ≈ || U · , k V j , k || 1 = || V j , k || 1 = || V || 1 j k = 1 j k = 1 j Therefore, || V || 1 ≈ 1 amss

Objective Function Previous: n v n v � X ( v ) − U ( v ) ( V ( v ) ) T � 2 λ v � V ( v ) − V ∗ � 2 � � min F + F U ( v ) , V ( v ) , V ∗ v = 1 v = 1 · , k || 1 = 1 and U ( v ) , V ( v ) , V ∗ ≥ 0 s . t . ∀ 1 ≤ k ≤ K , || U ( v ) Now: n v n v � X ( v ) − U ( v ) ( Q ( v ) ) − 1 Q ( v ) ( V ( v ) ) T � 2 λ v � V ( v ) Q ( v ) − V ∗ � 2 � � min F + F U ( v ) , V ( v ) , V ∗ v = 1 v = 1 s . t . ∀ 1 ≤ v ≤ n v , U ( v ) ≥ 0 , V ( v ) ≥ 0 , V ∗ ≥ 0 where � M M M � Q ( v ) = Diag U ( v ) U ( v ) U ( v ) � � � i , 1 , i , 2 , . . . , i , K i = 1 i = 1 i = 1 amss

Iterative Update Rules Fixing V ∗ , minimize over U ( v ) and V ( v ) until convergence: � N ( XV ) i , k + λ v j = 1 V j , k V ∗ j , k U i , k ← U i , k � M � N ( UV T V ) i , k + λ v j = 1 V 2 l = 1 U l , k j , k U ← UQ − 1 , V ← VQ ( X T U ) j , k + λ v V ∗ j , k V j , k ← V j , k ( VU T U ) j , k + λ v V j , k Fixing U ( v ) and V ( v ) , minimize over V ∗ : � n v v = 1 λ v V ( v ) Q ( v ) V ∗ = ≥ 0 � n v v = 1 λ v amss

Use V ∗ for Clustering Once we obtain the consensus matrix V ∗ , the cluster label of data point j could be computed as arg max k V ∗ j , k . Or we can simply use k -means directly on V ∗ where V ∗ is viewed as a latent representation of the original data points. amss

PLSA Probabilistic Latent Semantic Analysis (PLSA) is a traditional topic modeling technique for document analysis. It models the M × N term-document co-occurrence matrix X (each entry X ij is the number of occurrences of word w i in document d j ) as being generated from a mixture model with K components: K � P ( w , d ) = P ( w | k ) P ( d , k ) k = 1 amss

Relation to NMF K � P ( w , d ) = P ( w | k ) P ( d , k ) k = 1 X = ( UQ − 1 )( QV T ) Early studies show that ( UQ − 1 ) (or ( QV T ) ) has the formal properties of conditional probability matrix [ P ( w | k )] ∈ R M × K (or + [ P ( d , k )] T ∈ R K × N ). This provides theoretical foundation for using + NMF to conduct clustering. Due to this connection, joint NMF has a nice probabilistic interpretation: each element in the matrix V ∗ is the consensus of P ( d | k ) ( v ) weighted by λ v P ( d ) ( v ) from different views. amss

Extend MultiNMF to HIN Assume that we are now given T attribute types. Let { X ( 1 ) , X ( 2 ) , . . . , X ( T ) } denote the sub-networks, where for each subnework X ( t ) , we have factorizations that X ( t ) ≈ U ( t ) ( V ( t ) ) T . Normalize Sub-network 1 Model 1 Consensus Conse HIN N o r m a Sub-network 2 Model 2 l i z e Normalize Normalize S u b - n e t w o r k T Model T amss

HINMF > MultiNMF + HIN In HINMF, 1 We expect to get clustering on both center and attribute types at the same time. 2 We wish to learn the strength of each subnetwork automatically. amss

Objective Function T � � X ( t ) − U ( t ) ( V ( t ) ) T � 2 � β ( t ) min F U ( t ) s , V ( t ) s , V ∗ ,β ( t ) s t = 1 � + α � V ( t ) Q ( t ) − V ∗ � 2 F ] (7) U ( t ) ≥ 0 , V ( t ) ≥ 0 , V ∗ ≥ 0 , s . t . ∀ 1 ≤ t ≤ T , exp − β ( t ) = 1 � t We use α as a fixed parameter tuning the weight between NMF reconstruction error and the disagreement term. β ( t ) ’s are relative weights of different sub-networks learnt amss automatically from the HIN.

HINMF: A Matrix Factorization Method for Clustering in - PowerPoint PPT Presentation

HINMF: A Matrix Factorization Method for Clustering in Heterogeneous Information Networks Jialu Liu Jiawei Han University of Illinois at Urbana-Champaign August 5, 2013 amss Outline HIN and Multi-View data 1 Previous Work 2 Standard NMF

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Matrix Factorization For Topic Models Dr. Derek Greene Insight Latent Space Workshop

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Multi-View Clustering via Joint Nonnegative Matrix Factorization Jialu Liu 1 Chi Wang 1 Jing Gao 2

Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Viraj Anagal Kaushik Mada Presented to Dr. Mohamed Mahmoud ECE 6900 Fall 2014 Date: 09/29/2014 1

Short Summery Dynamic networks Omega network Static networks Main topologies:

ON EIGENFUNCTIONS OF THE STAR GRAPHS Vladislav Kabanov Institute of Mathematics and Mechanics

Our Pl Planet: EARTH February 20 th , 2019 The webinar will begin at 1:00 p.m. (MT) and will be

Fast and Efficient Container Startup at the Edge via Dependency Scheduling Silvery Fu 1 , Radhika

The Startup Sequence of STM32 Corrado Santoro ARSLAB - Autonomous and Robotic Systems Laboratory

Developing a negotiation strategy Goals of this session: Answer the most Which items for

NFS Don Porter CSE 506 Figure 1 The Fllesystem Interface The VFS interface is implemented