Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven - - PowerPoint PPT Presentation

hybrid clustering of multi view data via mlsvd
SMART_READER_LITE
LIVE PREVIEW

Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven - - PowerPoint PPT Presentation

Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven De Lathauwer, Wolfgang Gl anzel, Bart De Moor ESAT-SCD Katholieke Universiteit Leuven TDA, September, 14, 2010, Bari, Italy Outline Introduction Hybrid clustering of


slide-1
SLIDE 1

Hybrid Clustering of multi-view data via MLSVD

Xinhai Liu, Lieven De Lathauwer, Wolfgang Gl¨ anzel, Bart De Moor

ESAT-SCD Katholieke Universiteit Leuven

TDA, September, 14, 2010, Bari, Italy

slide-2
SLIDE 2

Outline

Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

slide-3
SLIDE 3

Outline

Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

slide-4
SLIDE 4

Introduction

Motivation

◮ Booming demand: grouping multi-view data for better partition

(Web mining, Social network, Literature analysis).

◮ Clustering methods

◮ Most methods: single-view data ◮ Hybrid clustering: multi-view data

◮ Tensor methods

◮ powerful tool to handle multi-way data sources. ◮ multi-linear singular value decomposition (MLSVD) (Tucker, 1964 &

1966; De Lathauwer et al, 2000a)

slide-5
SLIDE 5

Spectral projection by each 2D single -view Spectral clustering Spectral projection by MSVD Hybrid clustering Original Data

Figure: Demo of a hybrid clustering by MLSVD on synthetic 3D data sets

slide-6
SLIDE 6

Introduction

Related work

◮ Hybrid clustering: multiple kernel fusion (MKF)(Joachims et al,

2001) and clustering ensemble (Strehl & Ghosh, 2002)

◮ MLSVD based clustering on image recognition (Huang & Ding,

2008)

◮ Multi-way latent semantic analysis (Sun et al, 2006) ◮ CANDECOMP/PARAFAC (CP): Scientific publication data with

multiple linkage (Dunlavy, Kolda, et al, 2006; Selee, Kolda et al, 2007)

slide-7
SLIDE 7

Introduction

Main contributions

◮ An extendable framework of hybrid clustering based on MLSVD

◮ Modelling the multi-view data as a tensor ◮ Seeking a joint optimal subspace by tensor analysis

◮ Two novel clustering algorithms: AHC-MLSVD and WHC-HOOI. ◮ Experiments on both synthetic data and real Application on Web

  • f Science (WoS) journal database.
slide-8
SLIDE 8

Outline

Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

slide-9
SLIDE 9

Hybrid clustering

Spectral clustering Given S ∈ RN×N, the affine matrix (similarity matrix) of a graph G; D, the degree matrix; our Laplacian matrix L = D−1/2SD−1/2 (1) Let an relaxed indicator matrix be U, U ∈ RN×M, M is the number of clusters max

U

tr(UT LU), s.t. UT U = I. (2) Eigenvalue decomposition of matrix L: the solution of spectral clustering (Luxburg, 2007)

slide-10
SLIDE 10

Hybrid clustering

Concept overview

Data source 1

Laplacian matrix n Optimal subspace

Data source 2

.

Data source n

Laplacian matrix 2 Laplacian matrix 1

Data source

Optimal subspace Laplacian matrix

Subject Objects

U L(1)

. .

U

Clustering

. . .

L(2) L(3) L

Matrix decomposition Tensor decomposition Spectral Clustering based on SVD Hybrid clustering based

  • n MLSVD

Laplacian tensor

UT

Optimal subspace Weighting vector

Clustering

slide-11
SLIDE 11

Hybrid clustering

Laplacian tensor From a set of K Laplacian matrices L(i) ∈ RN×N, i = 1, ..., K to a Laplacian tensor A ∈ RN×N×K

Laplacian Matrix Laplacian Tensor Laplacian Matrix N Objects View 1 N Objects N Objects N Objects Multiple views View K View 2

...

K N Objects N Objects

Figure: The formulation of a Laplacian tensor

slide-12
SLIDE 12

Hybrid clustering

AHC-MLSVD Averaging multi-view data for joint analysis:

Laplacian Tensor Core Tensor

UT

N Objects M Subjects

U

M Subjects N Objects Tensor Decomposition Joint optimal subspace N Objects N Objects K Views M M K

I

Identity matrix K K

Figure: Average hybrid clustering of multi-view data

U ∈ RN×M, the joint optimal subspace I ∈ RK×K, an indentity matrix.

slide-13
SLIDE 13

Hybrid clustering

AHC-MLSVD The optimization of average hybrid clustering, max

U A ×1 UT ×2 UT ×3 I 2 F,

s.t. UT U = I. (3) The solution of MLSVD (Tucker, 1964 & 1966; De Lathauwer et al, 2000a)

◮ An approximate solution ◮ Usually satisfied results ◮ An upper bound on the approximation error

slide-14
SLIDE 14

Hybrid clustering

WHC-HOOI Taking the effect of each single-view data into account

W

Laplacian Tensor Core Tensor

UT

N Objects M Subjects

U

M Subjects N Objects Weighting Vector Tensor Decomposition Joint optimal subspace N Objects N Objects K Views M M K 1 1

Figure: Weighted hybrid clustering of multi-view data

W = {α1, α2, · · · , αK}T: the weighting factor of each view.

slide-15
SLIDE 15

Hybrid clustering

WHC-HOOI The equivalent optimization of weighted hybrid clustering max

U,W A ×1 UT ×2 UT ×3 W T 2 F,

s.t. UT U = I and W T W = 1. (4) The solution of higher-order orthogonal iteration (HOOI) (Kroonenberg & De Leeuw, 1980; De Lathauwer et al, 2000b)

◮ An optimal solution ◮ An appropriate weight for each view data ◮ Other tensor methods

slide-16
SLIDE 16

Outline

Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

slide-17
SLIDE 17

Experiments

Clustering of a multiplex network Multiplex network: a group of networks which share the same nodes but multiple types of links (Mucha et al, 2010) The synthetic multiplex network:

◮ Three clusters with each having 50,100, 200 members

respectively

◮ Three views generated by different noise ◮ Three interaction matrices from each view =

⇒ a tensor

Figure: The adjacent matrices from a synthetic multiplex network

slide-18
SLIDE 18

Clustering of a multiplex network

View A1 View A2 View A3

A Multiplex network Spectral projection by each single -view Spectral clustering Hybrid clustering Spectral projection by MLSVD

slide-19
SLIDE 19

Experiments

Application on Web of Science (WoS) journal database

◮ Objective: Obtain a good scientific mapping from the WoS

journals

◮ Integrating two view data: textual information and journal

cross-citations. N = 8, 305 and dtext = 669, 700

◮ Cosine similarity matrix of both text and cross-citation

slide-20
SLIDE 20

Experiments

Clustering evaluation measures

◮ Standard categories: Essential Science Indicator (ESI) from WoS ◮ Normalized mutual information (NMI)

NMI = 2 × H({ci}), {li} H({ci})H({li}) (5) where H({ci}, {li}) is the mutual information between clustering labels {ci}n

i=1 and reference category indicators li n i=1, H({ci})

and H({li}) are their entropies.

◮ Cognitive analysis by a bibliometrist

slide-21
SLIDE 21

Experiments

Clustering performance

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 Different Clustering Methods NMI Index C i t a t i

  • n

T e x t M K F P M M M V S K M S C M G S A A d a c V

  • t

e C P − A L S A H C − M L S V D W H C − H O O I |−Single−view −| |−−−−−−−−−−−−−−−−Multi−view −−−−−−−−−−−−−−−−−−−−−−−−−|

Figure: NMI validation of various clustering methods on WoS journal database (Cluster number:22)

slide-22
SLIDE 22

Experiments

Visualization of the journal clusters obtained by HC-MLSVD

1.firm,price,market 2.steel,microstructur,corros 3.cultivar,plant,milk 4.ocean,seismic,rock 5.surgeri,clinic,arteri 6.semant,phonolog,cortex 7.algorithm,fuzzi,wireless 8.protein,cell,gene 9.catalyst,polym,acid 10.music,literari,essai 11.speci,habitat,forest 12.soil,water,sludg 13.crack,turbul,heat 14.galaxi,star,stellar 15.quantum,quark,neutrino 16.polit,social,court 17.dope,crystal,optic 18.tumor,cancer,carcinoma 19.student,teacher,school 20.algebra,theorem,asymptot 21.nurs,schizophrenia,health 22.dog,hors,infect

Figure: Visualization of 22 clusters on the WoS journal database (the node: the journal clusters where the circle size is proportional to its scale; the edge: cross-citation between two journal clusters; the annotated terms: the top three text terms within each journal clusters)

slide-23
SLIDE 23

Outline

Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

slide-24
SLIDE 24

Discussion and outlook

Discussion

◮ Extendable hybrid clustering framework:

◮ Other learning tasks of multi-view data ( classification, spectral

embedding, collaborative filtering)

◮ Other tensor based solutions ◮ Other matrices (similarity matices, modularity matrices)

Modualrity Matrix Modualrity Matrix Modularity Tensor Modualrity Matrix N Nodes N Nodes N Nodes N Nodes Multiple views

...

K N Nodes N Nodes

...

N Nodes N Nodes

1 2 K

View 1 View K View 2

slide-25
SLIDE 25

Discussion and outlook

Outlook

◮ Scalable issue: large-scale database and efficient

implementation

◮ Multiple-model tensor (Currently 3-model): dynamic data

analysis

◮ Other potential tensor methods (CP

, INDSCAL,DEDICOM)

slide-26
SLIDE 26

Outline

Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement

slide-27
SLIDE 27

Acknowledgement

Research supported by (1) KUL ESAT SISTA research group; (2) China Scholarship Council (CSC, No. 2006153005); (3) Thanks for discussion with Dr. Carlos Alzate in K.U.Leuven.

slide-28
SLIDE 28

Thank you for your attending!