Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven - PowerPoint PPT Presentation

Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven De Lathauwer, Wolfgang Gl¨ anzel, Bart De Moor ESAT-SCD Katholieke Universiteit Leuven TDA, September, 14, 2010, Bari, Italy

Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

Introduction Motivation ◮ Booming demand: grouping multi-view data for better partition (Web mining, Social network, Literature analysis). ◮ Clustering methods ◮ Most methods: single-view data ◮ Hybrid clustering: multi-view data ◮ Tensor methods ◮ powerful tool to handle multi-way data sources. ◮ multi-linear singular value decomposition (MLSVD) (Tucker, 1964 & 1966; De Lathauwer et al, 2000a)

Spectral clustering Spectral projection by each 2D single -view Original Data Hybrid clustering Spectral projection by MSVD Figure: Demo of a hybrid clustering by MLSVD on synthetic 3D data sets

Introduction Related work ◮ Hybrid clustering: multiple kernel fusion (MKF)(Joachims et al, 2001) and clustering ensemble (Strehl & Ghosh, 2002) ◮ MLSVD based clustering on image recognition (Huang & Ding, 2008) ◮ Multi-way latent semantic analysis (Sun et al, 2006) ◮ CANDECOMP/PARAFAC (CP): Scientific publication data with multiple linkage (Dunlavy, Kolda, et al, 2006; Selee, Kolda et al, 2007)

Introduction Main contributions ◮ An extendable framework of hybrid clustering based on MLSVD ◮ Modelling the multi-view data as a tensor ◮ Seeking a joint optimal subspace by tensor analysis ◮ Two novel clustering algorithms: AHC-MLSVD and WHC-HOOI. ◮ Experiments on both synthetic data and real Application on Web of Science (WoS) journal database.

Hybrid clustering Spectral clustering Given S ∈ R N × N , the affine matrix (similarity matrix) of a graph G ; D , the degree matrix; our Laplacian matrix L = D − 1 / 2 SD − 1 / 2 (1) Let an relaxed indicator matrix be U , U ∈ R N × M , M is the number of clusters tr ( U T LU ) , max U (2) s.t. U T U = I . Eigenvalue decomposition of matrix L : the solution of spectral clustering (Luxburg, 2007)

Hybrid clustering Concept overview . . . Data source 1 Data source 2 Data source n Data source . . . L (1) L (2) L (3) L Laplacian matrix 1 Laplacian matrix 2 Laplacian matrix n Laplacian matrix Matrix decomposition Laplacian tensor Subject Optimal Objects U subspace Tensor decomposition Clustering Weighting vector U T Optimal Optimal U subspace subspace Clustering Hybrid clustering based Spectral Clustering on MLSVD based on SVD

Hybrid clustering Laplacian tensor From a set of K Laplacian matrices L ( i ) ∈ R N × N , i = 1 , ..., K to a Laplacian tensor A ∈ R N × N × K N Objects N Objects Laplacian View K Matrix Multiple K N views ... Objects Objects Laplacian N Tensor N Objects Laplacian View 2 Matrix View 1 N Objects Figure: The formulation of a Laplacian tensor

Hybrid clustering AHC-MLSVD Averaging multi-view data for joint analysis: Identity matrix K I K K Views Tensor N Objects M Subjects Decomposition K N Objects N Objects N Objects M Subjects M U T Laplacian U Tensor Core M Tensor Joint optimal subspace Figure: Average hybrid clustering of multi-view data U ∈ R N × M , the joint optimal subspace I ∈ R K × K , an indentity matrix.

Hybrid clustering AHC-MLSVD The optimization of average hybrid clustering, 2 U �A × 1 U T × 2 U T × 3 I � max F , (3) s.t. U T U = I . The solution of MLSVD (Tucker, 1964 & 1966; De Lathauwer et al, 2000a) ◮ An approximate solution ◮ Usually satisfied results ◮ An upper bound on the approximation error

Hybrid clustering WHC-HOOI Taking the effect of each single-view data into account Weighting Vector K W 1 K Views Tensor N Objects M Subjects Decomposition 1 N Objects N Objects N Objects M Subjects M Laplacian U T U Tensor Core M Tensor Joint optimal subspace Figure: Weighted hybrid clustering of multi-view data W = { α 1 , α 2 , · · · , α K } T : the weighting factor of each view.

Hybrid clustering WHC-HOOI The equivalent optimization of weighted hybrid clustering U , W �A × 1 U T × 2 U T × 3 W T � 2 max F , (4) s.t. U T U = I and W T W = 1 . The solution of higher-order orthogonal iteration (HOOI) (Kroonenberg & De Leeuw, 1980; De Lathauwer et al, 2000b) ◮ An optimal solution ◮ An appropriate weight for each view data ◮ Other tensor methods

Experiments Clustering of a multiplex network Multiplex network: a group of networks which share the same nodes but multiple types of links (Mucha et al, 2010) The synthetic multiplex network: ◮ Three clusters with each having 50,100, 200 members respectively ◮ Three views generated by different noise ◮ Three interaction matrices from each view = ⇒ a tensor Figure: The adjacent matrices from a synthetic multiplex network

Clustering of a multiplex network Spectral clustering Spectral projection by each single -view View A3 View A2 View A1 A Multiplex network Hybrid clustering Spectral projection by MLSVD

Experiments Application on Web of Science (WoS) journal database ◮ Objective: Obtain a good scientific mapping from the WoS journals ◮ Integrating two view data: textual information and journal cross-citations. N = 8 , 305 and d text = 669 , 700 ◮ Cosine similarity matrix of both text and cross-citation

Experiments Clustering evaluation measures ◮ Standard categories: Essential Science Indicator (ESI) from WoS ◮ Normalized mutual information (NMI) NMI = 2 × H ( { c i } ) , { l i } (5) H ( { c i } ) H ( { l i } ) where H ( { c i } , { l i } ) is the mutual information between clustering n labels { c i } n i = 1 and reference category indicators l i i = 1 , H ( { c i } ) and H ( { l i } ) are their entropies. ◮ Cognitive analysis by a bibliometrist

Experiments Clustering performance 0.55 0.5 0.45 NMI Index 0.4 0.35 0.3 0.25 0.2 e D S n x t F M M G A t V O I o K o L e M K M S V A S i T M O t P S C c − L a H t V S a P M i d − C M C − C A C H H W A |−−−−−−−−−−−−−−−−Multi−view −−−−−−−−−−−−−−−−−−−−−−−−−| |−Single−view −| Different Clustering Methods Figure: NMI validation of various clustering methods on WoS journal database (Cluster number:22)

Experiments Visualization of the journal clusters obtained by HC-MLSVD 13.crack,turbul,heat 17.dope,crystal,optic 9.catalyst,polym,acid 20.algebra,theorem,asymptot 8.protein,cell,gene 6.semant,phonolog,cortex 10.music,literari,essai 16.polit,social,court 19.student,teacher,school 18.tumor,cancer,carcinoma 11.speci,habitat,forest 1.firm,price,market 22.dog,hors,infect 14.galaxi,star,stellar 21.nurs,schizophrenia,health 3.cultivar,plant,milk 12.soil,water,sludg 2.steel,microstructur,corros 5.surgeri,clinic,arteri 15.quantum,quark,neutrino 4.ocean,seismic,rock 7.algorithm,fuzzi,wireless Figure: Visualization of 22 clusters on the WoS journal database ( the node : the journal clusters where the circle size is proportional to its scale; the edge : cross-citation between two journal clusters; the annotated terms : the top three text terms within each journal clusters)

Discussion and outlook Discussion ◮ Extendable hybrid clustering framework: ◮ Other learning tasks of multi-view data ( classification, spectral embedding, collaborative filtering) ◮ Other tensor based solutions ◮ Other matrices (similarity matices, modularity matrices) N Nodes Modualrity N Nodes Matrix View K ... Multiple K ... views N Nodes N Nodes N Nodes K Modualrity N Nodes Modularity Matrix Tensor View 2 N Nodes N Nodes Modualrity 2 Matrix View 1 1

Discussion and outlook Outlook ◮ Scalable issue: large-scale database and efficient implementation ◮ Multiple-model tensor (Currently 3-model): dynamic data analysis ◮ Other potential tensor methods (CP , INDSCAL,DEDICOM)

Acknowledgement Research supported by (1) KUL ESAT SISTA research group; (2) China Scholarship Council (CSC, No. 2006153005); (3) Thanks for discussion with Dr. Carlos Alzate in K.U.Leuven.

Thank you for your attending!

Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven - PowerPoint PPT Presentation

Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven De Lathauwer, Wolfgang Gl anzel, Bart De Moor ESAT-SCD Katholieke Universiteit Leuven TDA, September, 14, 2010, Bari, Italy Outline Introduction Hybrid clustering of

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-View Clustering via Joint Nonnegative Matrix Factorization Jialu Liu 1 Chi Wang 1 Jing Gao 2

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

DSPACE CLUSTERING DSPACE CLUSTERING VIA PUPPET, HAPROXY AND CEPHFS VIA PUPPET, HAPROXY AND

CHAPTER VIII VIII CHAPTER Data Clustering and Data Clustering and Self- -Organizing Feature

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Data-driven Clustering via Parameterized Lloyds Families Travis Dick Joint work with

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Beta Presentation GameChang3rs Learning Management System The Capstone Experience Team Michael

Discovering Frequent Topological Structures from Graph Datasets R. Jin C. Wang D. Polshakov S.

On the Development and Optimization of Hybrid Parallel Codes for Integral Equation Formulations 7

Tools for Success: Developing an Academic Advising Syllabus NACADA Region 1 P3: March 8, 2017:

Office of Data and Accountability W HY D ATA A CCURACY ? Accurate student data is important

HW Mountz School Analysis of 2017-2018 Academic Progress Spring Lake Board of Education Meeting

Welcome! Asset Verification Service (AVS) The purpose of AVS is to automate verification of

EudraVigilance auditable requirement project: ADRreports.eu portal update Patients and Consumers