hybrid clustering of multi view data via mlsvd
play

Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven - PowerPoint PPT Presentation

Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven De Lathauwer, Wolfgang Gl anzel, Bart De Moor ESAT-SCD Katholieke Universiteit Leuven TDA, September, 14, 2010, Bari, Italy Outline Introduction Hybrid clustering of


  1. Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven De Lathauwer, Wolfgang Gl¨ anzel, Bart De Moor ESAT-SCD Katholieke Universiteit Leuven TDA, September, 14, 2010, Bari, Italy

  2. Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

  3. Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

  4. Introduction Motivation ◮ Booming demand: grouping multi-view data for better partition (Web mining, Social network, Literature analysis). ◮ Clustering methods ◮ Most methods: single-view data ◮ Hybrid clustering: multi-view data ◮ Tensor methods ◮ powerful tool to handle multi-way data sources. ◮ multi-linear singular value decomposition (MLSVD) (Tucker, 1964 & 1966; De Lathauwer et al, 2000a)

  5. Spectral clustering Spectral projection by each 2D single -view Original Data Hybrid clustering Spectral projection by MSVD Figure: Demo of a hybrid clustering by MLSVD on synthetic 3D data sets

  6. Introduction Related work ◮ Hybrid clustering: multiple kernel fusion (MKF)(Joachims et al, 2001) and clustering ensemble (Strehl & Ghosh, 2002) ◮ MLSVD based clustering on image recognition (Huang & Ding, 2008) ◮ Multi-way latent semantic analysis (Sun et al, 2006) ◮ CANDECOMP/PARAFAC (CP): Scientific publication data with multiple linkage (Dunlavy, Kolda, et al, 2006; Selee, Kolda et al, 2007)

  7. Introduction Main contributions ◮ An extendable framework of hybrid clustering based on MLSVD ◮ Modelling the multi-view data as a tensor ◮ Seeking a joint optimal subspace by tensor analysis ◮ Two novel clustering algorithms: AHC-MLSVD and WHC-HOOI. ◮ Experiments on both synthetic data and real Application on Web of Science (WoS) journal database.

  8. Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

  9. Hybrid clustering Spectral clustering Given S ∈ R N × N , the affine matrix (similarity matrix) of a graph G ; D , the degree matrix; our Laplacian matrix L = D − 1 / 2 SD − 1 / 2 (1) Let an relaxed indicator matrix be U , U ∈ R N × M , M is the number of clusters tr ( U T LU ) , max U (2) s.t. U T U = I . Eigenvalue decomposition of matrix L : the solution of spectral clustering (Luxburg, 2007)

  10. Hybrid clustering Concept overview . . . Data source 1 Data source 2 Data source n Data source . . . L (1) L (2) L (3) L Laplacian matrix 1 Laplacian matrix 2 Laplacian matrix n Laplacian matrix Matrix decomposition Laplacian tensor Subject Optimal Objects U subspace Tensor decomposition Clustering Weighting vector U T Optimal Optimal U subspace subspace Clustering Hybrid clustering based Spectral Clustering on MLSVD based on SVD

  11. Hybrid clustering Laplacian tensor From a set of K Laplacian matrices L ( i ) ∈ R N × N , i = 1 , ..., K to a Laplacian tensor A ∈ R N × N × K N Objects N Objects Laplacian View K Matrix Multiple K N views ... Objects Objects Laplacian N Tensor N Objects Laplacian View 2 Matrix View 1 N Objects Figure: The formulation of a Laplacian tensor

  12. Hybrid clustering AHC-MLSVD Averaging multi-view data for joint analysis: Identity matrix K I K K Views Tensor N Objects M Subjects Decomposition K N Objects N Objects N Objects M Subjects M U T Laplacian U Tensor Core M Tensor Joint optimal subspace Figure: Average hybrid clustering of multi-view data U ∈ R N × M , the joint optimal subspace I ∈ R K × K , an indentity matrix.

  13. Hybrid clustering AHC-MLSVD The optimization of average hybrid clustering, 2 U �A × 1 U T × 2 U T × 3 I � max F , (3) s.t. U T U = I . The solution of MLSVD (Tucker, 1964 & 1966; De Lathauwer et al, 2000a) ◮ An approximate solution ◮ Usually satisfied results ◮ An upper bound on the approximation error

  14. Hybrid clustering WHC-HOOI Taking the effect of each single-view data into account Weighting Vector K W 1 K Views Tensor N Objects M Subjects Decomposition 1 N Objects N Objects N Objects M Subjects M Laplacian U T U Tensor Core M Tensor Joint optimal subspace Figure: Weighted hybrid clustering of multi-view data W = { α 1 , α 2 , · · · , α K } T : the weighting factor of each view.

  15. Hybrid clustering WHC-HOOI The equivalent optimization of weighted hybrid clustering U , W �A × 1 U T × 2 U T × 3 W T � 2 max F , (4) s.t. U T U = I and W T W = 1 . The solution of higher-order orthogonal iteration (HOOI) (Kroonenberg & De Leeuw, 1980; De Lathauwer et al, 2000b) ◮ An optimal solution ◮ An appropriate weight for each view data ◮ Other tensor methods

  16. Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

  17. Experiments Clustering of a multiplex network Multiplex network: a group of networks which share the same nodes but multiple types of links (Mucha et al, 2010) The synthetic multiplex network: ◮ Three clusters with each having 50,100, 200 members respectively ◮ Three views generated by different noise ◮ Three interaction matrices from each view = ⇒ a tensor Figure: The adjacent matrices from a synthetic multiplex network

  18. Clustering of a multiplex network Spectral clustering Spectral projection by each single -view View A3 View A2 View A1 A Multiplex network Hybrid clustering Spectral projection by MLSVD

  19. Experiments Application on Web of Science (WoS) journal database ◮ Objective: Obtain a good scientific mapping from the WoS journals ◮ Integrating two view data: textual information and journal cross-citations. N = 8 , 305 and d text = 669 , 700 ◮ Cosine similarity matrix of both text and cross-citation

  20. Experiments Clustering evaluation measures ◮ Standard categories: Essential Science Indicator (ESI) from WoS ◮ Normalized mutual information (NMI) NMI = 2 × H ( { c i } ) , { l i } (5) H ( { c i } ) H ( { l i } ) where H ( { c i } , { l i } ) is the mutual information between clustering n labels { c i } n i = 1 and reference category indicators l i i = 1 , H ( { c i } ) and H ( { l i } ) are their entropies. ◮ Cognitive analysis by a bibliometrist

  21. Experiments Clustering performance 0.55 0.5 0.45 NMI Index 0.4 0.35 0.3 0.25 0.2 e D S n x t F M M G A t V O I o K o L e M K M S V A S i T M O t P S C c − L a H t V S a P M i d − C M C − C A C H H W A |−−−−−−−−−−−−−−−−Multi−view −−−−−−−−−−−−−−−−−−−−−−−−−| |−Single−view −| Different Clustering Methods Figure: NMI validation of various clustering methods on WoS journal database (Cluster number:22)

  22. Experiments Visualization of the journal clusters obtained by HC-MLSVD 13.crack,turbul,heat 17.dope,crystal,optic 9.catalyst,polym,acid 20.algebra,theorem,asymptot 8.protein,cell,gene 6.semant,phonolog,cortex 10.music,literari,essai 16.polit,social,court 19.student,teacher,school 18.tumor,cancer,carcinoma 11.speci,habitat,forest 1.firm,price,market 22.dog,hors,infect 14.galaxi,star,stellar 21.nurs,schizophrenia,health 3.cultivar,plant,milk 12.soil,water,sludg 2.steel,microstructur,corros 5.surgeri,clinic,arteri 15.quantum,quark,neutrino 4.ocean,seismic,rock 7.algorithm,fuzzi,wireless Figure: Visualization of 22 clusters on the WoS journal database ( the node : the journal clusters where the circle size is proportional to its scale; the edge : cross-citation between two journal clusters; the annotated terms : the top three text terms within each journal clusters)

  23. Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

  24. Discussion and outlook Discussion ◮ Extendable hybrid clustering framework: ◮ Other learning tasks of multi-view data ( classification, spectral embedding, collaborative filtering) ◮ Other tensor based solutions ◮ Other matrices (similarity matices, modularity matrices) N Nodes Modualrity N Nodes Matrix View K ... Multiple K ... views N Nodes N Nodes N Nodes K Modualrity N Nodes Modularity Matrix Tensor View 2 N Nodes N Nodes Modualrity 2 Matrix View 1 1

  25. Discussion and outlook Outlook ◮ Scalable issue: large-scale database and efficient implementation ◮ Multiple-model tensor (Currently 3-model): dynamic data analysis ◮ Other potential tensor methods (CP , INDSCAL,DEDICOM)

  26. Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

  27. Acknowledgement Research supported by (1) KUL ESAT SISTA research group; (2) China Scholarship Council (CSC, No. 2006153005); (3) Thanks for discussion with Dr. Carlos Alzate in K.U.Leuven.

  28. Thank you for your attending!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend