Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen - - PowerPoint PPT Presentation

hierarchical gaussian mixture model
SMART_READER_LITE
LIVE PREVIEW

Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen - - PowerPoint PPT Presentation

Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen 1 2 , and Richard Nock 3 1 Ecole Polytechnique (Paris, France) 2 Sony Computer Science Laboratories (Tokyo, Japan) 3 UAG-CEREGMIA (Martinique, France) January 2010 V. Garcia


slide-1
SLIDE 1

Hierarchical Gaussian Mixture Model

Vincent Garcia1, Frank Nielsen1 2, and Richard Nock3

1 ´ Ecole Polytechnique (Paris, France) 2 Sony Computer Science Laboratories (Tokyo, Japan) 3 UAG-CEREGMIA (Martinique, France)

January 2010

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 1 / 16

slide-2
SLIDE 2

Introduction Mixture models

Mixture models

A mixture model f is a powerful framework to estimate probability density function f(x) =

n

  • i=1

αifi(x) where

fi – statistical distribution αi – weight such that αi ≥ 0 and n

i=1 αi = 1

Mixtures of Gaussians (MoG) or Gaussian mixture model (GMM) fi(x; µi, Σi) = 1 (2π)d/2|Σi|1/2 exp

  • − (x − µi)T Σ−1

i

(x − µi) 2

  • Mixture of exponential families

fi(x; Θi) = exp {Θi, t(x) − F(Θi) + k(x)}

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 2 / 16

slide-3
SLIDE 3

Introduction Mixture simplification

Mixture simplification

Mixture models usually contain a lot of components ⇒ Estimation of statistical measures is computationally expensive ⇒ Need to reduce the number of components

Re-lear a simpler mixture model from dataset (Computationally expensive) Simplify the mixture model f (Most appropriated method)

Let f be a mixture of n components Mixture simplification problem

How to compute a mixture g of m (m < n) components such as g is the best approximation of f ? What is the optimal value of m ?

−0.5 0.5 1 1.5 0.5 1 1.5 2 2.5

Density estimation using kernel-based Parzen estimator

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 3 / 16

slide-4
SLIDE 4

Bregman divergence and Bregman centroid Exponential family

Exponential family

Exponential family is a wide class of distributions f(x; Θ) = exp {Θ, t(x) − F(Θ) + k(x)} where

Θ – Natural parameters F (Θ) – Log normalizer t(x) – Sufficient statistic k(x) – Carrier measure

Gaussian, Laplacian, Poisson, binomial, multinomial, Bernoulli, Rayleigh, Gamma, Beta, Dirichlet distributions are all exponential families. Gaussian distribution is an exponential family

Θ = (Θ, θ) =

  • Σ−1µ, 1

2 Σ−1

F (Θ) = 1

4 tr(Θ−1θθ⊤) − 1 2 log det Θ + d 2 logπ

t(x) = (x, −xx⊤) k(x) = 0

Frank Nielsen and Vincent Garcia Statistical exponential families: A digest with flash cards ArXiV, http://arxiv.org/abs/0911.4863, November 2009

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 4 / 16

slide-5
SLIDE 5

Bregman divergence and Bregman centroid Relative entropy and Bregman divergence

Relative entropy and Bregman divergence

The fundamental measure between statistical distributions is the relative entropy, also called the Kullback-Leibler divergence DKL(fi||fj) =

  • fi(x) log fi(x)

fj(x) dx The Kullback-Leibler divergence is an asymetric distance For two distributions belonging to the same EF, we have DKL(fi||fj) = DF (Θj||Θi) where DF (Θj||Θi) = F(Θj) − F(Θi) − Θj − Θi, ∇F(Θi) ⇒ We can define algorithms adapted to MEF while classical algorithms are adapted to MOG

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 5 / 16

slide-6
SLIDE 6

Bregman divergence and Bregman centroid Bregman centroids

Bregman centroids

A mixture of exponential families f f(x) =

n

  • i=1

αifi(x; Θi) can be seen as a set of weighted distributions S =

  • {α1, Θ1}, {α2, Θ2}, · · · , {αn, Θn}
  • Bregman centroids

ΘR = arg min

Θ

1

  • i αi
  • i

αiDF (ΘiΘ) ΘL = arg min

Θ

1

  • i αi
  • i

αiDF (ΘΘi) ΘS = arg min

Θ

1

  • i αi
  • i

αiSDF (Θ, Θi) where SDF is the symmetric Bregman divergence SDF (Θ, Θi) = DF (ΘiΘ) + DF (ΘΘi) 2

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 6 / 16

slide-7
SLIDE 7

Bregman divergence and Bregman centroid Bregman centroids

Bregman centroids

Right-sided centroid ΘR =

  • i αiΘi
  • i αi

Left-sided centroid ΘL = ∇F ∗

i αi∇F(Θi)

  • i αi
  • Computation of the symmetric centroid ΘS

1

Compute ΘR and ΘL

2

Symmetric centroid belongs to the geodesic link between ΘR and ΘL Θλ = ∇F ∗ λ∇F (ΘR) + (1 − λ)∇F (ΘL)

  • 3

We know that SDF (ΘS, ΘR) = SDF (ΘS, ΘL)

4

A standard binary search on λ allows one to quickly find the symmetric centroid for a given precision

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 7 / 16

slide-8
SLIDE 8

Bregman divergence and Bregman centroid Bregman centroids

Bregman centroids

−10 10 20 30 40 50 60 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 x f(x) Inital Gaussians Right−sided centroid Left−sided centroid Symmetric centroid

Initial set contains 4 univariate Gaussians σ2 = 6 Right-sided centroid Left-sided centroid Symmetric centroid

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 8 / 16

slide-9
SLIDE 9

Bregman hierarchical clustering Hierarchical clustering

Hierarchical clustering

Methods consisting in building a hierarchical clustering of a set of objects (points)

Agglomerative method Divisive method

Let S be a set of n points and let {S1, S2, · · · , Sn} be a partition of S

S1 ∪ S2 ∪ · · · ∪ Sn = S S1 ∩ S2 ∩ · · · ∩ Sn = ∅

Agglomerative method:

1

Find the two closest subsets Si and Sj

2

Merge the subsets Si and Sj

3

Go back to 1. until one single set remains

The hierarchical clustering is stored in a dendrogram (hierachical data structure) Classical distances between sets A and B (linkage criteria) Criterion Formula Minimum distance Dmin(A, B) = min{d(a, b)

  • a ∈ A, b ∈ B}

Maximum distance Dmax(A, B) = max{d(a, b)

  • a ∈ A, b ∈ B}

Average distance Dav(A, B) =

1 |A||B|

  • a∈A
  • b∈B d(a, b)
  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 9 / 16

slide-10
SLIDE 10

Bregman hierarchical clustering Bregman hierarchical clustering

Bregman hierarchical clustering

Adaptation of the hierachical clustering to mixtures of exponential families A mixture of exponential families f is seen as a set of weighted distributions S =

  • {α1, Θ1}, {α2, Θ2}, · · · , {αn, Θn}
  • The distance d() between two distributions is the weighted Bregman divergence

d({αi, Θi}{αj, Θj}) = αiαjDF (Θi||Θj) The right-sided, the left-sided, and the symmetric Bregman divergence can be used The process starts with subsets containing one weighted distribution Find closest distribution subsets using classical linkage criteria The final dendrogam is called hierarchical mixture model

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 10 / 16

slide-11
SLIDE 11

Bregman hierarchical clustering Bregman hierarchical clustering

Bregman hierarchical clustering: mixture simplification

From the hierarchical mixture model (denoted h), we can extract a simpler mixture g of m components (resolution m): g =

m

  • j=1

βjgj

1

Extract from h the m subsets {S1, · · · , Sm} remaining after the iteration n − m

2

The distribution gj is the centroid (right-sided, left-sided, or symmetric centroid) of the subset Sj

3

The weight βj is computed as βj =

  • i

αi s.t. {αi, Θi} ∈ Sj

The hierarchical mixture model contains all the resolution from 1 (one distribution) to n (initial mixture model) The simplification process is fast (computation of m centroids)

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 11 / 16

slide-12
SLIDE 12

Experiments Mixture simplification

Mixture simplification

Evolution of the mixture simplification quality DKL(f, g) as a function of the resolution Influence of the linkage criterion Influence of the Bregman divergence side Initial mixture f: 32 Gaussians 3D learnt from the image Baboon

5 10 15 20 25 30 0.5 1 1.5 Resolution DKL(f,g) Minimum distance Maximum distance Average distance 5 10 15 20 25 30 5 10 15 Resolution DKL(f,g) Right−sided Left−sided Symmetric

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 12 / 16

slide-13
SLIDE 13

Experiments Mixture simplification

Mixture simplification

Application of mixture simplification to clustering-based image segmentation m = 1 m = 2 m = 4 m = 8 m = 16 m = 32 Image

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 13 / 16

slide-14
SLIDE 14

Experiments Optimal mixture model

Optimal mixture model

Optimal mixture model g has to

be as compact as possible reach a minimum quality DKL(f, g) < t

Hierarchical mixture model allows to quickly compute a simpler mixture A standard binary search allows to find the optimal mixture model for a given mixture quality

5 10 15 20 25 30 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 m DKL(f,g) Baboon Lena Shanty Colormap

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 14 / 16

slide-15
SLIDE 15

Experiments Optimal mixture model

Optimal mixture model

Optimal mixture model contains with DKL(f, g) < 0.2

Baboon: 11 components Lena: 14 components Shantytown: 16 components Colormap: 23 components

Estimation of DKL(f, g) = 99% of the computation time

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 15 / 16

slide-16
SLIDE 16

Conclusion

Conclusion

New method to build a hierarchical mixture model Adapted to any mixture of exponential families The proposed method allows to

Quickly simplify the initial mixture model Learn the optimal number of components in the simplified mixture

ICASSP 2010 (accepted), Signal Processing (Elsevier, submitted) jMEF: A JavaTMlibrary to create and manage mixtures of exponential families:

Estimation of a MEF using the Bregman soft clustering. Simplification of a MEF using the Bregman hard clustering. Hierarchical representation of a MEF using the Bregman hierarchical clustering. Learn the optimal MEF using the Bregman hierarchical clustering.

Cross platform and open-source http://www.lix.polytechnique.fr/∼nielsen/MEF/ JMLR Machine Learning Open Source Software (submitted) All results presented have be done using jMEF

  • V. Garcia (X, Paris, France)

Hierarchical Gaussian Mixture Model January 2010 16 / 16