Co-clustering for large datasets
Mohamed Nadif
LIPADE, Université Paris Descartes, France
Travaux menés avec G. Govaert et L. Lazhar
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 1 / 35
Co-clustering for large datasets Mohamed Nadif LIPADE, Universit - - PowerPoint PPT Presentation
Co-clustering for large datasets Mohamed Nadif LIPADE, Universit Paris Descartes, France Travaux mens avec G. Govaert et L. Lazhar Nadif (LIPADE) AAFD14, April 29-30, 2014 Co-clustering 1 / 35 Introduction Outline Introduction 1
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 1 / 35
Introduction
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 2 / 35
Introduction Co-clustering methods
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 3 / 35
Introduction Co-clustering methods
data3 100 200 300 400 500 100 200 300 400 500 600 700 800 900 1000 Reordred data: co−clustering result 100 200 300 400 500 100 200 300 400 500 600 700 800 900 1000
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 4 / 35
Introduction Co-clustering methods
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 5 / 35
Introduction Co-clustering methods
zi zi1 zi2 zi3 3 1 2 1 3 1 2 1 1 1
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 6 / 35
Introduction Co-clustering methods
mean
T1
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 7 / 35
Introduction Binary data
1 2 3 4 5 6 7 8 9 10 a 1 1 1 1 1 b 1 1 1 1 1 c 1 1 1 d 1 1 1 e 1 1 1 1 1 f 1 1 1 1 g 1 1 1 h 1 1 1 1 1 1 1 i 1 1 1 j 1 1 1 1 2 1 3 5 8 10 2 4 6 7 9 a 1 1 1 1 1 A d 1 1 1 h 1 1 1 1 1 1 1 b 1 1 1 1 1 B e 1 1 1 1 1 f 1 1 1 1 j 1 1 1 c 1 1 1 C g 1 1 1 i 1 1 1
1 2 A 1 B 1 C
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 8 / 35
Introduction Binary data
1 2 1 3 5 8 10 2 4 6 7 9 a 1 1 1 1 1 A d 1 1 1 h 1 1 1 1 1 1 1 b 1 1 1 1 1 B e 1 1 1 1 1 f 1 1 1 1 j 1 1 1 c 1 1 1 C g 1 1 1 i 1 1 1
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 9 / 35
Introduction Binary data
kℓ
iℓ
kj
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 10 / 35
Introduction Continuous data
mean
T1
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 11 / 35
Introduction Continuous data
(LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 12 / 35
Introduction Continuous data
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 13 / 35
Introduction Continuous data
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 14 / 35
Introduction Continuous data
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 15 / 35
Introduction Continuous data
iℓ
kj
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 16 / 35
Introduction Continuous data
Balanced data: data2 100 200 300 50 100 150 200 250 300 350 400 450 500 Reordred data: co−clustering result 100 200 300 50 100 150 200 250 300 350 400 450 500 data3 100 200 300 400 500 100 200 300 400 500 600 700 800 900 1000 Reordred data: co−clustering result 100 200 300 400 500 100 200 300 400 500 600 700 800 900 1000 Unbalanced data: data1 100 200 300 50 100 150 200 250 300 350 400 450 500 Reordred data: co−clustering result 100 200 300 50 100 150 200 250 300 350 400 450 500
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 17 / 35
Latent block model and CML approach
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 18 / 35
Latent block model and CML approach
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 19 / 35
Latent block model and CML approach Bernoulli Latent block models
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 20 / 35
Latent block model and CML approach Bernoulli Latent block models
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 21 / 35
Latent block model and CML approach Bernoulli Latent block models
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 22 / 35
Latent block model and CML approach Gaussian latent block models
(LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 23 / 35
Latent block model and CML approach Gaussian latent block models
kℓ
ij
iℓ−2µkℓxw iℓ+µ2 kℓ
kℓ
iℓ
iℓ
kj −2µkℓxz kj +µ2 kℓ
kℓ
kj
kj
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 24 / 35
Latent block model and CML approach Gaussian latent block models
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 25 / 35
Latent block model and CML approach Asymmetric Gaussian model
kℓ
1 2σ2 kℓ
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 26 / 35
Latent block model and CML approach Asymmetric Gaussian model
kℓ
ij
iℓ−2µkℓxw iℓ+µ2 kℓ
kℓ
iℓ
iℓ
kj −2µkℓxz kj +µ2 kℓ
kℓ
kj
kj
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 27 / 35
Latent block model and CML approach Asymmetric Gaussian model
error Models LBVEM LBCEM CEM EM EM-w CEM-w M1 1 1 1 1 δ(z, z′) M2 11 12 21 19 15 15 M3 29 41 41 39 44 42 M1 − δ(w, w′) M2 5 5 30 − 30 30 M3 20 35 48 − 47 48
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 28 / 35
Factorization
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 29 / 35
Factorization Nonnegative Matrix Factorization
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 30 / 35
Factorization Nonnegative Matrix Factorization
i u2 ik
1
2
3
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 31 / 35
Factorization Nonnegative Matrix Tri-Factorization
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 32 / 35
Factorization Nonnegative Matrix Tri-Factorization
dataset performance measure DNMF ODNMF ONM3F ONMTF NBVD Classic30 Acc 96.67 100 100 100 96.67 NMI 89.97 100 100 100 89.97 Classic150 Acc 98.66 98.66 99.33 98.66 98.66 NMI 94.04 94.04 97.02 94.04 94.04 NG2 Acc 77.6 86.2 74.6 74.2 77.4 NMI 19.03 43.47 18.27 16.03 23.31 Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 33 / 35
Conclusion
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 34 / 35
Conclusion
Nadif (LIPADE) AAFD’14, April 29-30, 2014 Co-clustering 35 / 35