Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen - PowerPoint PPT Presentation

Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen 1 2 , and Richard Nock 3 1 ´ Ecole Polytechnique (Paris, France) 2 Sony Computer Science Laboratories (Tokyo, Japan) 3 UAG-CEREGMIA (Martinique, France) January 2010 V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 1 / 16

Introduction Mixture models Mixture models A mixture model f is a powerful framework to estimate probability density function n � f ( x ) = α i f i ( x ) i =1 where f i – statistical distribution α i – weight such that α i ≥ 0 and � n i =1 α i = 1 Mixtures of Gaussians (MoG) or Gaussian mixture model (GMM) − ( x − µ i ) T Σ − 1 � � 1 ( x − µ i ) i f i ( x ; µ i , Σ i ) = (2 π ) d/ 2 | Σ i | 1 / 2 exp 2 Mixture of exponential families f i ( x ; Θ i ) = exp {� Θ i , t ( x ) � − F ( Θ i ) + k ( x ) } V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 2 / 16

Introduction Mixture simplification Mixture simplification Mixture models usually contain a lot of components ⇒ Estimation of statistical measures is computationally expensive ⇒ Need to reduce the number of components Re-lear a simpler mixture model from dataset (Computationally expensive) Simplify the mixture model f (Most appropriated method) Let f be a mixture of n components Mixture simplification problem How to compute a mixture g of m ( m < n ) components such as g is the best approximation of f ? What is the optimal value of m ? 2.5 2 1.5 1 0.5 0 −0.5 0 0.5 1 1.5 Density estimation using kernel-based Parzen estimator V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 3 / 16

Bregman divergence and Bregman centroid Exponential family Exponential family Exponential family is a wide class of distributions f ( x ; Θ ) = exp {� Θ , t ( x ) � − F ( Θ ) + k ( x ) } where Θ – Natural parameters F ( Θ ) – Log normalizer t ( x ) – Sufficient statistic k ( x ) – Carrier measure Gaussian, Laplacian, Poisson, binomial, multinomial, Bernoulli, Rayleigh, Gamma, Beta, Dirichlet distributions are all exponential families. Gaussian distribution is an exponential family Σ − 1 µ, 1 2 Σ − 1 � Θ = (Θ , θ ) = � F ( Θ ) = 1 4 tr(Θ − 1 θθ ⊤ ) − 1 2 log det Θ + d 2 logπ t ( x ) = ( x, − xx ⊤ ) k ( x ) = 0 Frank Nielsen and Vincent Garcia Statistical exponential families: A digest with flash cards ArXiV, http://arxiv.org/abs/0911.4863, November 2009 V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 4 / 16

Bregman divergence and Bregman centroid Relative entropy and Bregman divergence Relative entropy and Bregman divergence The fundamental measure between statistical distributions is the relative entropy, also called the Kullback-Leibler divergence f i ( x ) log f i ( x ) � D KL ( f i || f j ) = f j ( x ) d x The Kullback-Leibler divergence is an asymetric distance For two distributions belonging to the same EF, we have D KL ( f i || f j ) = D F ( Θ j || Θ i ) where D F ( Θ j || Θ i ) = F ( Θ j ) − F ( Θ i ) − � Θ j − Θ i , ∇ F ( Θ i ) � ⇒ We can define algorithms adapted to MEF while classical algorithms are adapted to MOG V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 5 / 16

Bregman divergence and Bregman centroid Bregman centroids Bregman centroids A mixture of exponential families f n � f ( x ) = α i f i ( x ; Θ i ) i =1 can be seen as a set of weighted distributions � � S = { α 1 , Θ 1 } , { α 2 , Θ 2 } , · · · , { α n , Θ n } Bregman centroids 1 � Θ R = arg min α i D F ( Θ i � Θ ) � i α i Θ i 1 � Θ L = arg min α i D F ( Θ � Θ i ) � i α i Θ i 1 � Θ S = arg min α i SD F ( Θ , Θ i ) � i α i Θ i where SD F is the symmetric Bregman divergence SD F ( Θ , Θ i ) = D F ( Θ i � Θ ) + D F ( Θ � Θ i ) 2 V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 6 / 16

Bregman divergence and Bregman centroid Bregman centroids Bregman centroids Right-sided centroid � i α i Θ i Θ R = � i α i Left-sided centroid � � i α i ∇ F ( Θ i ) � Θ L = ∇ F ∗ � i α i Computation of the symmetric centroid Θ S Compute Θ R and Θ L 1 Symmetric centroid belongs to the geodesic link between Θ R and Θ L 2 Θ λ = ∇ F ∗ � � λ ∇ F ( Θ R ) + (1 − λ ) ∇ F ( Θ L ) We know that 3 SD F ( Θ S , Θ R ) = SD F ( Θ S , Θ L ) A standard binary search on λ allows one to quickly find the symmetric centroid for a given precision 4 V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 7 / 16

Bregman divergence and Bregman centroid Bregman centroids Bregman centroids 0.18 Inital Gaussians Right−sided centroid Left−sided centroid 0.16 Symmetric centroid 0.14 0.12 0.1 f(x) 0.08 0.06 0.04 0.02 0 −10 0 10 20 30 40 50 60 x Initial set contains 4 univariate Gaussians σ 2 = 6 Right-sided centroid Left-sided centroid Symmetric centroid V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 8 / 16

Bregman hierarchical clustering Hierarchical clustering Hierarchical clustering Methods consisting in building a hierarchical clustering of a set of objects (points) Agglomerative method Divisive method Let S be a set of n points and let {S 1 , S 2 , · · · , S n } be a partition of S S 1 ∪ S 2 ∪ · · · ∪ S n = S S 1 ∩ S 2 ∩ · · · ∩ S n = ∅ Agglomerative method: Find the two closest subsets S i and S j 1 2 Merge the subsets S i and S j 3 Go back to 1. until one single set remains The hierarchical clustering is stored in a dendrogram (hierachical data structure) Classical distances between sets A and B (linkage criteria) Criterion Formula � � a ∈ A, b ∈ B } Minimum distance D min ( A, B ) = min { d ( a, b ) � � a ∈ A, b ∈ B } Maximum distance D max ( A, B ) = max { d ( a, b ) 1 Average distance D av ( A, B ) = � � b ∈ B d ( a, b ) | A || B | a ∈ A V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 9 / 16

Bregman hierarchical clustering Bregman hierarchical clustering Bregman hierarchical clustering Adaptation of the hierachical clustering to mixtures of exponential families A mixture of exponential families f is seen as a set of weighted distributions � � S = { α 1 , Θ 1 } , { α 2 , Θ 2 } , · · · , { α n , Θ n } The distance d () between two distributions is the weighted Bregman divergence d ( { α i , Θ i }�{ α j , Θ j } ) = α i α j D F ( Θ i || Θ j ) The right-sided, the left-sided, and the symmetric Bregman divergence can be used The process starts with subsets containing one weighted distribution Find closest distribution subsets using classical linkage criteria The final dendrogam is called hierarchical mixture model V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 10 / 16

Bregman hierarchical clustering Bregman hierarchical clustering Bregman hierarchical clustering: mixture simplification From the hierarchical mixture model (denoted h ), we can extract a simpler mixture g of m components (resolution m ): m � g = β j g j j =1 Extract from h the m subsets {S 1 , · · · , S m } remaining after the iteration n − m 1 The distribution g j is the centroid (right-sided, left-sided, or symmetric centroid) of the subset S j 2 The weight β j is computed as 3 � { α i , Θ i } ∈ S j β j = α i s . t . i The hierarchical mixture model contains all the resolution from 1 (one distribution) to n (initial mixture model) The simplification process is fast (computation of m centroids) V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 11 / 16

Experiments Mixture simplification Mixture simplification Evolution of the mixture simplification quality D KL ( f, g ) as a function of the resolution Influence of the linkage criterion Influence of the Bregman divergence side Initial mixture f : 32 Gaussians 3D learnt from the image Baboon 1.5 15 Minimum distance Right−sided Maximum distance Left−sided Average distance Symmetric 1 10 D KL (f,g) D KL (f,g) 0.5 5 0 0 5 10 15 20 25 30 5 10 15 20 25 30 Resolution Resolution V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 12 / 16

Experiments Mixture simplification Mixture simplification Application of mixture simplification to clustering-based image segmentation m = 1 m = 2 m = 4 m = 8 m = 16 m = 32 Image V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 13 / 16

Experiments Optimal mixture model Optimal mixture model Optimal mixture model g has to be as compact as possible reach a minimum quality D KL ( f, g ) < t Hierarchical mixture model allows to quickly compute a simpler mixture A standard binary search allows to find the optimal mixture model for a given mixture quality 2 Baboon 1.8 Lena Shanty 1.6 Colormap 1.4 1.2 D KL (f,g) 1 0.8 0.6 0.4 0.2 0 5 10 15 20 25 30 m V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 14 / 16

Experiments Optimal mixture model Optimal mixture model Optimal mixture model contains with D KL ( f, g ) < 0 . 2 Baboon: 11 components Lena: 14 components Shantytown: 16 components Colormap: 23 components Estimation of D KL ( f, g ) = 99% of the computation time V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 15 / 16

Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen - PowerPoint PPT Presentation

Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen 1 2 , and Richard Nock 3 1 Ecole Polytechnique (Paris, France) 2 Sony Computer Science Laboratories (Tokyo, Japan) 3 UAG-CEREGMIA (Martinique, France) January 2010 V. Garcia

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Using Gaussian Mixture Models to Detect Figurative Language in Context Linlin Li and Caroline

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

ELEN E6884 - Topics in Signal Processing Recap Topic: Speech Recognition Gaussian Mixture

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof February 2020 1 [2 points]

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Optimal transport for Gaussian mixture models Yongxin Chen, Tryphon T. Georgiou and Allen

Characterizing transcriptomes using ngs data T. Kllman BILS/Scilife Lab/Uppsala University May

Low Cost Microwave Radiometer WP 2600 Design of a Low Cost Radiometer Thomas Rose Radiometer

Primary 1 All slides will be uploaded. No hardcopy will be provided. All feedback will

7/17/2019 The 5 L Language guages of Appreci ciatio ation The Secret to Em ployee Engagem ent

1 University of Wisconsin, Madison, WI, USA 2 Microsoft Research Asia, Beijing, China 3 Shanghai

Code Modification Forum Ashling Hotel, Dublin Wednesday, 11 December 2019 Agenda 1. Review of

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

I // THE START STUDENT SERVICES Chatbot Digital Self-Service FAQ Knowledge Base General

Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen - PowerPoint PPT Presentation

Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen 1 2 , and Richard Nock 3 1 Ecole Polytechnique (Paris, France) 2 Sony Computer Science Laboratories (Tokyo, Japan) 3 UAG-CEREGMIA (Martinique, France) January 2010 V. Garcia

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Using Gaussian Mixture Models to Detect Figurative Language in Context Linlin Li and Caroline

Gaussian Mixture Models &amp; EM CE-717: Machine Learning Sharif University of Technology M.

ELEN E6884 - Topics in Signal Processing Recap Topic: Speech Recognition Gaussian Mixture

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Assignment 3 Zahra Sheikhbahaee Zeou Hu &amp; Colin Vandenhof February 2020 1 [2 points]

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Optimal transport for Gaussian mixture models Yongxin Chen, Tryphon T. Georgiou and Allen

Characterizing transcriptomes using ngs data T. Kllman BILS/Scilife Lab/Uppsala University May

Low Cost Microwave Radiometer WP 2600 Design of a Low Cost Radiometer Thomas Rose Radiometer

Primary 1 All slides will be uploaded. No hardcopy will be provided. All feedback will

7/17/2019 The 5 L Language guages of Appreci ciatio ation The Secret to Em ployee Engagem ent

1 University of Wisconsin, Madison, WI, USA 2 Microsoft Research Asia, Beijing, China 3 Shanghai

Code Modification Forum Ashling Hotel, Dublin Wednesday, 11 December 2019 Agenda 1. Review of

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

I // THE START STUDENT SERVICES Chatbot Digital Self-Service FAQ Knowledge Base General

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof February 2020 1 [2 points]