Deep Generative Models for Clustering: A Semi-supervised and - PowerPoint PPT Presentation

Classification: SVHN Performance Semi-supervised test error ( % ) benchmarks on SVHN for 1000 randomly and evenly distributed labeled data. With n labeled examples Model 1000 55 . 33 ( ± 0 . 11 ) M1+TSVM, Kingma et al, NIPS’14 M1+M2, Kingma et al, NIPS’14 36 . 02 ( ± 0 . 10 ) SWWAE, Zhao et al, ICLR’16 23 . 56 ADGM, Maale et al, ICML’16 22 . 86 16 . 61 ( ± 0 . 24 ) SDGM, Maale et al, ICML’16 Improved GAN, Salimans et al, NIPS’16 8 . 11 ( ± 1 . 3 ) Proposed 21 . 74 ( ± 0 . 41 ) Smaller values for test error indicate better performance. All the results of the related works are reported from the original papers. 37

Clustering: Visualization MNIST (a) Epoch 1 (b) Epoch 5 (c) Epoch 20 (d) Epoch 50 (e) Epoch 80 (f) Epoch 100 38

Image Generation Use feature vector obtained from g φ ( x ) and vary the category c (one-hot). 39

Unsupervised Clustering

What if we have no labeled data? No Labeled data (CIFAR-10) Large amount of Unlabeled data (ImageNet) Can we learn good representations and cluster data in a unsupervised way? 40

Unsupervised Clustering: Related Works

Intuition 41

Use of pretrained features 42

Fine-tuning • Deep Embedding Clustering (DEC), Xie et al. ICML’16 • Deep Clustering Network (DCN), Yang et al. ICML’17 43 • Improved Deep Embedding Clustering (IDEC), Guo et al. IJCAI’17

End-To-End • Joint Unsupervised Learning (JULE), Yang et al. CVPR’16 • Deep Embedded Regularized Clustering (DEPICT), Dizaji et al. ICCV’17 44

Complex Structure: Generative Models • Variational Deep Embedding, Jian et al. IJCAI’17 • Gaussian Mixture Variational Autoencoders, Dilokthanakul et al. arXiv’17. 45

Unsupervised Clustering: Proposed Method

Intuition 46

Our Probabilistic Model Inference Model Generative Model 47

Stacked M1+M2 generative model Inference Model Generative Model M1 model 48 Semi-supervised Learning with Deep Generative Models, Kingma et al, NIPS’14

Stacked M1+M2 generative model Inference Model Generative Model M1 model Inference Model Generative Model M2 model 48 Semi-supervised Learning with Deep Generative Models, Kingma et al, NIPS’14

Stacked M1+M2 generative model Inference Model Generative Model M1 model Inference Model Generative Model M2 model Inference Model Generative Model Probabilistic graphical models of M1+M2 48 Semi-supervised Learning with Deep Generative Models, Kingma et al, NIPS’14

Stacked M1+M2: Training Train M1 model and use its feature representations to train M2 model separately. 49

Problem with hierarchical stochastic latent variables Inactive Stochastic Units: Ladder Variational Autoencoders , Sonderby et al, NIPS’16 50

Problem with hierarchical stochastic latent variables Inactive Stochastic Units: Ladder Variational Autoencoders , Sonderby et al, NIPS’16 Solutions require complex models: Inference Model Generative Model 51 Auxiliary Deep Generative Models, Maaloe et al, ICML’16

Avoiding the problem of hierarchical stochastic variables Replace the stochastic layer that produces x with a deterministic one ˆ x = g ( x ) . 52

Other Differences with M1+M2 model • Use of Gumbel-Softmax instead of marginalization over stochastic discrete variables. 53

Other Differences with M1+M2 model • Use of Gumbel-Softmax instead of marginalization over stochastic discrete variables. • Training end-to-end instead of pre-training. 53

Other Differences with M1+M2 model • Use of Gumbel-Softmax instead of marginalization over stochastic discrete variables. • Training end-to-end instead of pre-training. • Unsupervised model: Labels are not required. 53

Variational Lower Bound Inference Model Generative Model Loss Function: L total = L R + L C + L G 54

Proposed Model 55

Reconstruction Loss L total = L R + L C + L G L BCE = − ( x log(˜ x ) + (1 − x ) log(1 − ˜ x )) x || 2 L MSE = || x − ˜ 56

Gaussian and Categorical Regularizers L total = L R + L C + L G L G = KL ( N ( µ ( x ) , σ ( x )) || N (0 , 1)) L C = KL ( C at ( π ) || U (0 , 1)) K K = − 1 � 1 + log σ 2 k − µ 2 k − σ 2 � = π k log ( Kπ k ) k 2 57 k =0 k =1

Experiments

Datasets USPS (9298, 10, 16x16) MNIST (70000, 10, 28x28) REUTERS-10K (10000, 4) 58

Analysis of Clustering Performance 30.0 ACC NMI 25.0 rformance (%) 20.0 15.0 Pe 10.0 5.0 1 30 50 80 100 Ite ration numbe r Clustering performance at each epoch, considering all loss weights equal to 1 59

Analysis of loss functions weights L total = L R + L C + w G L G 80.0 80.0 L R L R L C L C L G L G 60.0 60.0 ACC (%) NMI (%) 40.0 40.0 20.0 20.0 0.0 0.0 1 3 5 7 9 1 3 5 7 9 Loss function we ight (w ∗ ) Loss function we ight (w ∗ ) 60

Quantitative Results: Clustering Performance Clustering performance, ACC (%) and NMI (%), on all datasets. MNIST USPS REUTERS-10K Method ACC NMI ACC NMI ACC NMI k -means 53 . 24 - 66 . 82 - 51 . 62 - GMM 53 . 73 - - - 54 . 72 - AE+k-means 81 . 82 74 . 73 69 . 31 66 . 20 70 . 52 39 . 79 AE+GMM 82 . 18 - - - 70 . 13 - GMVAE 82 . 31 ( ± 4 ) - - - - - DCN 83 . 00 81 . 00 - - - - DEC 86 . 55 83 . 72 74 . 08 75 . 29 73 . 68 49 . 76 IDEC 88 . 06 86 . 72 76 . 05 78 . 46 75 . 64 49 . 81 VaDE 94 . 46 - - - 79 . 83 - Proposed 85 . 75 ( ± 8 ) 82 . 13 ( ± 5 ) 72 . 58 ( ± 3 ) 67 . 01 ( ± 2 ) 80 . 41 ( ± 5 ) 52 . 13 ( ± 5 ) Larger values for ACC and NMI indicate better performance. Colored rows denote methods that require pre-training. All the results of the related works are reported from the original papers. 61

Quantitative Results: Classification - MNIST Performance MNIST test error-rate ( % ) for kNN. k Method 3 5 10 VAE 18 . 43 15 . 69 14 . 19 DLGMM 9 . 14 8 . 38 8 . 42 VaDE 2 . 20 2 . 14 2 . 22 Proposed 3 . 46 3 . 30 3 . 44 Smaller values for test error indicate better performance. 62

Qualitative Results: Image Generation 10 clusters Fix the category c (one-hot) and vary the latent variable z . 63

Qualitative Results: Image Generation 7 clusters 14 clusters 64

Qualitative Results: Style Generation Input a test image x (first column) through q φ ( z | ˆ x ) . 65

Qualitative Results: Style Generation Use vector obtained from q φ ( z | ˆ x ) and vary the category c (one-hot). 65

Qualitative Results: Clustering Visualization (a) Epoch 1 (b) Epoch 5 (c) Epoch 20 (d) Epoch 50 (e) Epoch 150 (f) Epoch 300 Visualization of the feature representations on MNIST data set at different epochs. 66

Conclusions and Future Work

Contributions For semi-supervised clustering our contributions were: • a semi-supervised auxiliary task which aims to define clustering assignments. 67

Contributions For semi-supervised clustering our contributions were: • a semi-supervised auxiliary task which aims to define clustering assignments. • a regularization on the feature representations of the data. 67

Contributions For semi-supervised clustering our contributions were: • a semi-supervised auxiliary task which aims to define clustering assignments. • a regularization on the feature representations of the data. • a loss function that combines a variational loss with our auxiliary task to guide the learning process. 67

Contributions For unsupervised clustering our contributions were: • a combination of deterministic and stochastic layers to solve the problem of hierarchical stochastic variables, allowing an end-to-end learning. 68

Contributions For unsupervised clustering our contributions were: • a combination of deterministic and stochastic layers to solve the problem of hierarchical stochastic variables, allowing an end-to-end learning. • a simple deep generative model represented by the combination of a simple Gaussian and categorical distribution. 68

Future Works • Use of clustering algorithms (e.g., K -means, DBSCAN, agglomerative clustering, etc.) over the feature representations to improve the learning process. 69

Future Works • Use of clustering algorithms (e.g., K -means, DBSCAN, agglomerative clustering, etc.) over the feature representations to improve the learning process. • Improvements of our probabilistic generative model can be performed by using generative adversarial models (GANs). 69

Deep Generative Models for Clustering: A Semi-supervised and - PowerPoint PPT Presentation

Deep Generative Models for Clustering: A Semi-supervised and Unsupervised Approach Jhosimar George Arias Figueroa Advisor: Gerberth Ad n Ram rez Rivera Masters Thesis Defense February 19, 2018 Motivation Huge amount of unlabeled

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-supervised Learning with Deep Generative Models Diedrik P. Kingma, Danilo J. Rezende, Shakir

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Deep Hybrid Models: Bridging Discriminative and Generative Approaches Volodymyr Kuleshov and

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

generative design systems Generative Brief Design Definitions Workshop Processes

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

UPEM geocoding and clustering methods applied to EUPRO FP3 subdataset Lionel Villard, Michel

3D Deep Clustering a clustering framework for unsupervised learning of 3D object feature

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Together Not Apart: Competition, Competitiveness and Clusters By Dr. Kusha Haraksingh Chairman,

Compared to GAMA and Illustris Mara Celeste Artale Instituto de Astronoma y Fsica del

RoboCup Rescue Simulation League CSU_Yunlu From Central South University Participated in Robocup

Come Converge! Lets Talk About Clustering Alanis Chew and Madeline Cope Department of

On the complex network clustering using DryadLINQ Stojan Trajanovski ( st508 ) MPhil in Advanced

Deep Generative Models for Clustering: A Semi-supervised and - PowerPoint PPT Presentation

Deep Generative Models for Clustering: A Semi-supervised and Unsupervised Approach Jhosimar George Arias Figueroa Advisor: Gerberth Ad n Ram rez Rivera Masters Thesis Defense February 19, 2018 Motivation Huge amount of unlabeled

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-supervised Learning with Deep Generative Models Diedrik P. Kingma, Danilo J. Rezende, Shakir

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Deep Hybrid Models: Bridging Discriminative and Generative Approaches Volodymyr Kuleshov and

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

generative design systems Generative Brief Design Definitions Workshop Processes

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

UPEM geocoding and clustering methods applied to EUPRO FP3 subdataset Lionel Villard, Michel

3D Deep Clustering a clustering framework for unsupervised learning of 3D object feature

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Together Not Apart: Competition, Competitiveness and Clusters By Dr. Kusha Haraksingh Chairman,

Compared to GAMA and Illustris Mara Celeste Artale Instituto de Astronoma y Fsica del

RoboCup Rescue Simulation League CSU_Yunlu From Central South University Participated in Robocup

Come Converge! Lets Talk About Clustering Alanis Chew and Madeline Cope Department of

On the complex network clustering using DryadLINQ Stojan Trajanovski ( st508 ) MPhil in Advanced

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan