Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, - PowerPoint PPT Presentation

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff McLachlan (University of Queensland, Australia) JOCLAD 2018, Lisbona, April 5th, 2018

Outline Deep Learning Mixture Models Deep Gaussian Mixture Models ECDA 2017 Deep GMM 2

Deep Learning Deep Learning ECDA 2017 Deep GMM 3

Deep Learning Deep Learning Deep Learning is a trendy topic in the machine learning community ECDA 2017 Deep GMM 4

Deep Learning What is Deep Learning? Deep Learning is a set of algorithms in machine learning able to gradually learning a huge number of parameters in an architecture composed by multiple non linear transformations (multi-layer structure) ECDA 2017 Deep GMM 5

Deep Learning Example of Learning ECDA 2017 Deep GMM 6

Deep Learning Example of Deep Learning ECDA 2017 Deep GMM 7

Deep Learning Facebook’s DeepFace DeepFace (Yaniv Taigman) is a deep learning facial recognition system that employs a nine-layer neural network with over 120 million connection weights. It identifies human faces in digital images with an accuracy of 97 . 35%. ECDA 2017 Deep GMM 8

Mixture Models Mixture Models ECDA 2017 Deep GMM 9

Mixture Models Gaussian Mixture Models (GMM) In model based clustering data are assumed to come from a finite mixture model (McLachlan and Peel, 2000; Fraley and Raftery, 2002). ECDA 2017 Deep GMM 10

Mixture Models Gaussian Mixture Models (GMM) In model based clustering data are assumed to come from a finite mixture model (McLachlan and Peel, 2000; Fraley and Raftery, 2002). For quantitative data each mixture component is usually modeled as a multivariate Gaussian distribution: k � π j φ ( p ) ( y ; µ j , Σ j ) f ( y ; θ ) = j =1 ECDA 2017 Deep GMM 10

Mixture Models Gaussian Mixture Models (GMM) In model based clustering data are assumed to come from a finite mixture model (McLachlan and Peel, 2000; Fraley and Raftery, 2002). For quantitative data each mixture component is usually modeled as a multivariate Gaussian distribution: k � π j φ ( p ) ( y ; µ j , Σ j ) f ( y ; θ ) = j =1 Growing popularity, widely used. ECDA 2017 Deep GMM 10

Mixture Models Gaussian Mixture Models (GMM) However, in the recent years, a lot of research has been done to address two issues: High-dimensional data: when the number of observed variables is large, it is well known that GMM represents an over-parameterized solution ECDA 2017 Deep GMM 11

Mixture Models Gaussian Mixture Models (GMM) However, in the recent years, a lot of research has been done to address two issues: High-dimensional data: when the number of observed variables is large, it is well known that GMM represents an over-parameterized solution Non-Gaussian data: when data are not Gaussian, GMM could requires more components than true clusters thus requiring merging or alternative distributions. ECDA 2017 Deep GMM 11

Mixture Models High dimensional data Some solutions (among the others): Dimensionally reduced model Model based clustering based clustering ECDA 2017 Deep GMM 12

Mixture Models High dimensional data Some solutions (among the others): Dimensionally reduced model Model based clustering based clustering Banfield and Raftery (1993) and Celeux and Govaert (1995): proposed constrained GMM based on parameterization of the generic component-covariance matrix based on its spectral decomposition: Σ i = λ i A ⊤ i D i A i Bouveyron et al. (2007): proposed a different parameterization of the generic component-covariance matrix ECDA 2017 Deep GMM 12

Mixture Models High dimensional data Some solutions (among the others): Dimensionally reduced model Model based clustering based clustering Ghahrami and Hilton (1997) and Banfield and Raftery (1993) and McLachlan et al. (2003): Celeux and Govaert (1995): Mixtures of Factor Analyzers proposed constrained GMM based (MFA) on parameterization of the generic Yoshida et al. (2004), Baek and component-covariance matrix based McLachlan (2008), Montanari and on its spectral decomposition: Σ i = λ i A ⊤ Viroli (2010) : i D i A i Factor Mixture Analysis (FMA) or Bouveyron et al. (2007): Common MFA proposed a different McNicolas and Murphy (2008): parameterization of the generic eight paraterizations of the component-covariance matrix covariance matrices in MFA ECDA 2017 Deep GMM 12

Mixture Models Non-Gaussian data Some solutions (among the others): More components than clusters Non-Gaussian distributions ECDA 2017 Deep GMM 13

Mixture Models Non-Gaussian data Some solutions (among the others): More components than clusters Non-Gaussian distributions Merging mixture components (Hennig, 2010; Baudry et al., 2010; Melnykov, 2016) Mixtures of mixtures models (Li, 2005) and in the dimensional reduced space mixtures of MFA (Viroli, 2010) ECDA 2017 Deep GMM 13

Mixture Models Non-Gaussian data Some solutions (among the others): More components than clusters Non-Gaussian distributions Mixtures of skew-normal, skew-t and canonical fundamental skew distributions (Lin, 2009; Lee and McLachlan, 2011-2017) Merging mixture components (Hennig, 2010; Baudry et al., 2010; Mixtures of generalized hyperbolic Melnykov, 2016) distributions (Subedi and McNicholas, 2014; Franczak et al., Mixtures of mixtures models (Li, 2014) 2005) and in the dimensional reduced space mixtures of MFA MFA with non-Normal distributions (Viroli, 2010) (McLachlan et al. 2007; Andrews and McNicholas, 2011; and many recent proposals by McNicholas, McLachlan and colleagues) ECDA 2017 Deep GMM 13

Deep Gaussian Mixture Models Deep Gaussian Mixture Models ECDA 2017 Deep GMM 14

Deep Gaussian Mixture Models Why Deep Mixtures? A Deep Gaussian Mixture Model (DGMM) is a network of multiple layers of latent variables, where, at each layer, the variables follow a mixture of Gaussian distributions. ECDA 2017 Deep GMM 15

Deep Gaussian Mixture Models Gaussian Mixtures vs Deep Gaussian Mixtures Given data y , of dimension n × p , the mixture model k 1 � π j φ ( p ) ( y ; µ j , Σ j ) f ( y ; θ ) = j =1 can be rewritten as a linear model with a certain prior probability: y = µ j + Λ j z + u with probab π j where z ∼ N (0 , I p ) u is an independent specific random errors with u ∼ N (0 , Ψ j ) Σ j = Λ j Λ ⊤ j + Ψ j ECDA 2017 Deep GMM 16

Deep Gaussian Mixture Models Gaussian Mixtures vs Deep Gaussian Mixtures Now suppose we replace z ∼ N (0 , I p ) with k 2 π (2) φ ( p ) ( z ; µ (2) , Σ (2) � f ( z ; θ ) = ) j j j j =1 This defines a Deep Gaussian Mixture Model (DGMM) with h = 2 layers. ECDA 2017 Deep GMM 17

Deep Gaussian Mixture Models Deep Gaussian Mixtures Imagine h = 2, k 2 = 4 and k 1 = 2: ECDA 2017 Deep GMM 18

Deep Gaussian Mixture Models Deep Gaussian Mixtures Imagine h = 2, k 2 = 4 and k 1 = 2: k = 8 possible paths (total subcomponents) M = 6 real subcomponents (shared set of parameters) ECDA 2017 Deep GMM 18

Deep Gaussian Mixture Models Deep Gaussian Mixtures Imagine h = 2, k 2 = 4 and k 1 = 2: k = 8 possible paths (total subcomponents) M = 6 real subcomponents (shared set of parameters) M < k thanks to the tying ECDA 2017 Deep GMM 18

Deep Gaussian Mixture Models Deep Gaussian Mixtures Imagine h = 2, k 2 = 4 and k 1 = 2: k = 8 possible paths (total subcomponents) M = 6 real subcomponents (shared set of parameters) M < k thanks to the tying Special mixtures of mixtures model (Li, 2005) ECDA 2017 Deep GMM 18

Deep Gaussian Mixture Models Do we really need DGMM? Consider the k = 4 clustering problem Smile data 2 1 0 −1 −2 −2 −1 0 1 2 ECDA 2017 Deep GMM 19

Deep Gaussian Mixture Models Do we really need DGMM? A deep mixture with h = 2 , k 1 = 4 , k 2 = 2 ( k = 8 paths, M = 6) ECDA 2017 Deep GMM 20

Deep Gaussian Mixture Models Do we really need DGMM? A deep mixture with h = 2 , k 1 = 4 , k 2 = 2 ( k = 8 paths, M = 6) Adjusted Rand Index 0.9 0.8 0.7 0.6 0.5 0.4 kmeans pam hclust mclust msn mst deepmixt ECDA 2017 Deep GMM 20

Deep Gaussian Mixture Models Do we really need DGMM? A deep mixture with h = 2 , k 1 = 4 , k 2 = 2 ( k = 8 paths, M = 6) ECDA 2017 Deep GMM 21

Deep Gaussian Mixture Models Do we really need DGMM? A deep mixture with h = 2 , k 1 = 4 , k 2 = 2 ( k = 8 paths, M = 6) In the DGMM we cluster data k 1 groups ( k 1 < k ) through f ( y | z ): the remaining components in the previous layer(s) act as density approximation of global non-Gaussian components ECDA 2017 Deep GMM 21

Deep Gaussian Mixture Models Do we really need DGMM? A deep mixture with h = 2 , k 1 = 4 , k 2 = 2 ( k = 8 paths, M = 6) In the DGMM we cluster data k 1 groups ( k 1 < k ) through f ( y | z ): the remaining components in the previous layer(s) act as density approximation of global non-Gaussian components Automatic tool for merging mixture components: merging is unit-dependent ECDA 2017 Deep GMM 21

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, - PowerPoint PPT Presentation

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff McLachlan (University of Queensland, Australia) JOCLAD 2018, Lisbona, April 5th, 2018 Outline Deep Learning Mixture Models Deep Gaussian Mixture

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Using Gaussian Mixture Models to Detect Figurative Language in Context Linlin Li and Caroline

ELEN E6884 - Topics in Signal Processing Recap Topic: Speech Recognition Gaussian Mixture

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Vine copula mixture models and clustering for non-Gaussian data Statistical Methods in Machine

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Optimal transport for Gaussian mixture models Yongxin Chen, Tryphon T. Georgiou and Allen

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Why Steel Needs Protection Wind Mill Offshore Structure Industrial/Mar ine Vehicle Pigments

Breaking the cycle of submarine cable failures G.Richardson 1 , J. Shaw 2 and K.Wells 3 1 Glen

Insights for the water sector helping decision-makers move forward Canadian Water Network frames

Digital Twins Technology IDA Mechanical & IPU 17:00 17:05 Welcome & Introductions

1 Types of Databases Entrez Nucleotides NCBI Field Guide NCBI Field Guide Primary Primary

MEMS : an overview - What ? why ? how ? - Magnetic MEMS Micro-magnets for MEMS -

Mechanics of Soft Materials Tuesday and Thursday L13, 2:00-3:30 PM What Are Soft Materials?

assembling metals and composites ASTech International Conference MMP 2015 November 25th,

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, - PowerPoint PPT Presentation

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff McLachlan (University of Queensland, Australia) JOCLAD 2018, Lisbona, April 5th, 2018 Outline Deep Learning Mixture Models Deep Gaussian Mixture

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Gaussian Mixture Models &amp; EM CE-717: Machine Learning Sharif University of Technology M.

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Using Gaussian Mixture Models to Detect Figurative Language in Context Linlin Li and Caroline

ELEN E6884 - Topics in Signal Processing Recap Topic: Speech Recognition Gaussian Mixture

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Vine copula mixture models and clustering for non-Gaussian data Statistical Methods in Machine

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Optimal transport for Gaussian mixture models Yongxin Chen, Tryphon T. Georgiou and Allen

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Why Steel Needs Protection Wind Mill Offshore Structure Industrial/Mar ine Vehicle Pigments

Breaking the cycle of submarine cable failures G.Richardson 1 , J. Shaw 2 and K.Wells 3 1 Glen

Insights for the water sector helping decision-makers move forward Canadian Water Network frames

Digital Twins Technology IDA Mechanical &amp; IPU 17:00 17:05 Welcome &amp; Introductions

1 Types of Databases Entrez Nucleotides NCBI Field Guide NCBI Field Guide Primary Primary

MEMS : an overview - What ? why ? how ? - Magnetic MEMS Micro-magnets for MEMS -

Mechanics of Soft Materials Tuesday and Thursday L13, 2:00-3:30 PM What Are Soft Materials?

assembling metals and composites ASTech International Conference MMP 2015 November 25th,

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Digital Twins Technology IDA Mechanical & IPU 17:00 17:05 Welcome & Introductions