Hybrid Clustering of multi-view data via MLSVD
Xinhai Liu, Lieven De Lathauwer, Wolfgang Gl¨ anzel, Bart De Moor
ESAT-SCD Katholieke Universiteit Leuven
Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven - - PowerPoint PPT Presentation
Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven De Lathauwer, Wolfgang Gl anzel, Bart De Moor ESAT-SCD Katholieke Universiteit Leuven TDA, September, 14, 2010, Bari, Italy Outline Introduction Hybrid clustering of
ESAT-SCD Katholieke Universiteit Leuven
◮ Booming demand: grouping multi-view data for better partition
◮ Clustering methods
◮ Most methods: single-view data ◮ Hybrid clustering: multi-view data
◮ Tensor methods
◮ powerful tool to handle multi-way data sources. ◮ multi-linear singular value decomposition (MLSVD) (Tucker, 1964 &
Spectral projection by each 2D single -view Spectral clustering Spectral projection by MSVD Hybrid clustering Original Data
◮ Hybrid clustering: multiple kernel fusion (MKF)(Joachims et al,
◮ MLSVD based clustering on image recognition (Huang & Ding,
◮ Multi-way latent semantic analysis (Sun et al, 2006) ◮ CANDECOMP/PARAFAC (CP): Scientific publication data with
◮ An extendable framework of hybrid clustering based on MLSVD
◮ Modelling the multi-view data as a tensor ◮ Seeking a joint optimal subspace by tensor analysis
◮ Two novel clustering algorithms: AHC-MLSVD and WHC-HOOI. ◮ Experiments on both synthetic data and real Application on Web
U
Data source 1
Laplacian matrix n Optimal subspace
Data source 2
Data source n
Laplacian matrix 2 Laplacian matrix 1
Data source
Optimal subspace Laplacian matrix
Subject Objects
U L(1)
U
Clustering
L(2) L(3) L
Matrix decomposition Tensor decomposition Spectral Clustering based on SVD Hybrid clustering based
Laplacian tensor
UT
Optimal subspace Weighting vector
Clustering
Laplacian Matrix Laplacian Tensor Laplacian Matrix N Objects View 1 N Objects N Objects N Objects Multiple views View K View 2
K N Objects N Objects
Laplacian Tensor Core Tensor
UT
N Objects M Subjects
U
M Subjects N Objects Tensor Decomposition Joint optimal subspace N Objects N Objects K Views M M K
I
Identity matrix K K
U A ×1 UT ×2 UT ×3 I 2 F,
◮ An approximate solution ◮ Usually satisfied results ◮ An upper bound on the approximation error
W
Laplacian Tensor Core Tensor
UT
N Objects M Subjects
U
M Subjects N Objects Weighting Vector Tensor Decomposition Joint optimal subspace N Objects N Objects K Views M M K 1 1
U,W A ×1 UT ×2 UT ×3 W T 2 F,
◮ An optimal solution ◮ An appropriate weight for each view data ◮ Other tensor methods
◮ Three clusters with each having 50,100, 200 members
◮ Three views generated by different noise ◮ Three interaction matrices from each view =
View A1 View A2 View A3
A Multiplex network Spectral projection by each single -view Spectral clustering Hybrid clustering Spectral projection by MLSVD
◮ Objective: Obtain a good scientific mapping from the WoS
◮ Integrating two view data: textual information and journal
◮ Cosine similarity matrix of both text and cross-citation
◮ Standard categories: Essential Science Indicator (ESI) from WoS ◮ Normalized mutual information (NMI)
i=1 and reference category indicators li n i=1, H({ci})
◮ Cognitive analysis by a bibliometrist
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 Different Clustering Methods NMI Index C i t a t i
T e x t M K F P M M M V S K M S C M G S A A d a c V
e C P − A L S A H C − M L S V D W H C − H O O I |−Single−view −| |−−−−−−−−−−−−−−−−Multi−view −−−−−−−−−−−−−−−−−−−−−−−−−|
1.firm,price,market 2.steel,microstructur,corros 3.cultivar,plant,milk 4.ocean,seismic,rock 5.surgeri,clinic,arteri 6.semant,phonolog,cortex 7.algorithm,fuzzi,wireless 8.protein,cell,gene 9.catalyst,polym,acid 10.music,literari,essai 11.speci,habitat,forest 12.soil,water,sludg 13.crack,turbul,heat 14.galaxi,star,stellar 15.quantum,quark,neutrino 16.polit,social,court 17.dope,crystal,optic 18.tumor,cancer,carcinoma 19.student,teacher,school 20.algebra,theorem,asymptot 21.nurs,schizophrenia,health 22.dog,hors,infect
◮ Extendable hybrid clustering framework:
◮ Other learning tasks of multi-view data ( classification, spectral
◮ Other tensor based solutions ◮ Other matrices (similarity matices, modularity matrices)
Modualrity Matrix Modualrity Matrix Modularity Tensor Modualrity Matrix N Nodes N Nodes N Nodes N Nodes Multiple views
K N Nodes N Nodes
N Nodes N Nodes
1 2 K
View 1 View K View 2
◮ Scalable issue: large-scale database and efficient
◮ Multiple-model tensor (Currently 3-model): dynamic data
◮ Other potential tensor methods (CP