k-Maximum Likelihood Estimator for mixtures of generalized Gaussians - - PowerPoint PPT Presentation

k maximum likelihood estimator for mixtures of
SMART_READER_LITE
LIVE PREVIEW

k-Maximum Likelihood Estimator for mixtures of generalized Gaussians - - PowerPoint PPT Presentation

Motivation and background k -Maximum Likelihood estimator Mixtures of generalized Gaussian distribution k-Maximum Likelihood Estimator for mixtures of generalized Gaussians ICPR 2012, Tokyo, Japan Olivier Schwander Aurlien Schutz Yannick


slide-1
SLIDE 1

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution

k-Maximum Likelihood Estimator for mixtures of generalized Gaussians

ICPR 2012, Tokyo, Japan Olivier Schwander Aurélien Schutz Yannick Berthoumieu Frank Nielsen

Laboratoire d’informatique, École Polytechnique, France Laboratoire IMS, Université de Bordeaux, France Sony Computer Science Laboratories Inc., Tokyo, Japan

November 14, 2012 (updated version)

Olivier Schwander k-MLE for generalized Gaussians

slide-2
SLIDE 2

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution

Outline

Motivation and background Target applications Generalized Gaussian Exponential families k-Maximum Likelihood estimator Complete log-likelihood Algorithm Key points Mixtures of generalized Gaussian distribution Direct applications of k-MLE Rewriting complete log-likelihood Experiments

Olivier Schwander k-MLE for generalized Gaussians

slide-3
SLIDE 3

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Target applications Generalized Gaussian Exponential families

Textures

Description

◮ Wavelet transform

Tasks

◮ Classification ◮ Retrieval

Brodatz

Olivier Schwander k-MLE for generalized Gaussians

slide-4
SLIDE 4

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Target applications Generalized Gaussian Exponential families

Popular models

Modeling wavelet coefficient distribution

◮ generalized Gaussian distribution (Do 2002, Mallat 1996) ◮ mixture of generalized Gaussian distributions (Allili 2012)

Olivier Schwander k-MLE for generalized Gaussians

slide-5
SLIDE 5

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Target applications Generalized Gaussian Exponential families

Generalized Gaussian

Definition

f (x; µ, α, β) = β 2αΓ(1/β) exp

  • −|x − µ|β

α

  • ◮ µ: mean (real number)

◮ α: scale (positive real number) ◮ β: shape (positive real number)

Multivariate version: a product of one dimensional laws

Olivier Schwander k-MLE for generalized Gaussians

slide-6
SLIDE 6

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Target applications Generalized Gaussian Exponential families

Properties and examples

Contains

◮ Gaussian β = 2 ◮ Laplace β = 1 ◮ Uniform β → ∞

Maximum likelihood estimator

◮ Iterative procedure

(Newton-Raphson)

Exponential family

◮ For a fixed β

−10 −5 5 10 0.00 0.05 0.10 0.15 0.20

β = 0.5 β = 1.0 β = 2.0 β = 10.0 Olivier Schwander k-MLE for generalized Gaussians

slide-7
SLIDE 7

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Target applications Generalized Gaussian Exponential families

Exponential families

Definition

p(x; λ) = pF(x; θ) = exp (t(x)|θ − F(θ) + k(x))

◮ λ source parameter ◮ t(x) sufficient statistic ◮ θ natural parameter ◮ F(θ) log-normalizer ◮ k(x) carrier measure

F is a stricly convex and differentiable function ·|· is a scalar product

Generalized Gaussian

Fixed µ and β

◮ t(x) = −|x − µ|β ◮ θ = α−β ◮ F(θ) =

−β log(θ) + log

  • β

2Γ(1/β)

  • ◮ k(x) = 0

Olivier Schwander k-MLE for generalized Gaussians

slide-8
SLIDE 8

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Target applications Generalized Gaussian Exponential families

A large class of distributions

Gaussian or normal (generic, isotropic Gaussian, diagonal Gaussian, rectified Gaussian or Wald distributions, log-normal), Poisson, Bernoulli, binomial, multinomial (trinomial, Hardy-Weinberg distribution), Laplacian, Gamma (including the chi-squared), Beta, exponential, Wishart, Dirichlet, Rayleigh, probability simplex, negative binomial distribution, Weibull, Fisher-von Mises, Pareto distributions, skew logistic, hyperbolic secant, negative binomial, etc.

With a large set of tools

◮ Bregman Soft Clustering (EM like algorithm) ◮ Bregman Hard Clustering (k-means like algorithm) ◮ Kullback-Leibler divergence (through Bregman divergence)

Strong links with the Bregman divergences (Banerjee 2005)

Olivier Schwander k-MLE for generalized Gaussians

slide-9
SLIDE 9

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Target applications Generalized Gaussian Exponential families

Bregman divergence

Definition and properties

◮ BF(p, q) = F(p) − F(q) + p − q|∇F(q) ◮ F is a stricly convex and differentiable function ◮ Centroids known in closed-form

Legendre duality

◮ F ⋆(η) = supθ {θ, η − F(θ)} ◮ η = ∇F(θ), θ = ∇F ⋆(η)

Bijection with exponential families

log pF(x|θ) = −BF ∗(t(x) : η) + F ∗(t(x)) + k(x)

Olivier Schwander k-MLE for generalized Gaussians

slide-10
SLIDE 10

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Complete log-likelihood Algorithm Key points

Usual setup: expectation-maximization

Joint probability with missing component labels

◮ Observations from a finite mixture

p(x1, z1, . . . , xn, zn) =

  • i

p(zi|ω)p(xi|zi, θ)

◮ Marginalization

p(x1, . . . , xn|ω, θ) =

  • i
  • j

p(zi = j|ω)p(xi|zi = j, θ)

EM maximizes

¯ l = 1 n log p(x1, . . . , zn) = 1 n

  • i

log

  • j

p(zi = j|ω)p(xi|zi = j, θ)

Olivier Schwander k-MLE for generalized Gaussians

slide-11
SLIDE 11

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Complete log-likelihood Algorithm Key points

Complete log-likelihood

Complete average log-likelihood

¯ l′ = 1 n log p(x1, z1, . . . , xn, zn) = 1 n

  • i

log

  • j
  • (ωjp(xi, θj))δ(zi)

= 1 n

  • i
  • j

δ(zi) (log p(xi, θj) + log ωj)

But p is an exponential family

log p(xi, θj) = log pF(xi, θj) = −BF ∗(t(x), ηj) + F ⋆(t(x)) + k(x)

  • does not depend on θ

Olivier Schwander k-MLE for generalized Gaussians

slide-12
SLIDE 12

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Complete log-likelihood Algorithm Key points

With fixed weights

Equivalent problem

◮ Minimizing

−¯ l′ = 1 n

  • i
  • j

δ(zi) (BF ∗(t(x), ηj) − log ωj) = 1 n

  • i

min

j

(BF ∗(t(x), ηj) − log ωj) Bregman k-means with BF ⋆ − log ωj for divergence

Olivier Schwander k-MLE for generalized Gaussians

slide-13
SLIDE 13

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Complete log-likelihood Algorithm Key points

k-Maximum Likelihood estimator

Nielsen 2012

  • 1. Initialization (random or

k-MLE++)

  • 2. Assignment

zi = arg min BF ⋆ − log ωj (gives a partition in cluster Cj)

  • 3. Update of the ηj parameters

ηj =

1 |Cj|

  • x∈Ci t(x) (Bregman

centroid)

  • 4. Goto step 2 until local convergence
  • 5. Update of the weights ωj = |Cj|

n

  • 6. Goto step 2 until local convergence

Update parameters Initialization Assignment Update weights Until convergence Until convergence

Olivier Schwander k-MLE for generalized Gaussians

slide-14
SLIDE 14

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Complete log-likelihood Algorithm Key points

Key points

k-MLE

◮ optimizes the complete log-likelihood ◮ is faster than EM ◮ converges finitely to a local maximum

Limitations

◮ All the components must belong to the same family ◮ F ⋆ may be difficult to compute (without closed form)

What if each component belongs to a different EF ?

Olivier Schwander k-MLE for generalized Gaussians

slide-15
SLIDE 15

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Direct applications of k-MLE Rewriting complete log-likelihood Experiments

Direct applications of k-MLE

  • r of EM (Bregman Soft Clustering)

A mixture model

◮ with all components in same the mixture model ◮ generalized Gaussian sharing the same µ: same mean ◮ generalized Gaussian sharing the same β: same shape ◮ one degree of freedom: α (scale)

May be useful

◮ See mixtures of Laplace distributions (β = 1)

Not enough for texture description

Olivier Schwander k-MLE for generalized Gaussians

slide-16
SLIDE 16

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Direct applications of k-MLE Rewriting complete log-likelihood Experiments

Complete log-likelihood revisited

Complete average log-likelihood

¯ l′ = 1 n log p(x1, z1, . . . , xn, zn) = 1 n

  • i
  • j

δ(zi) (log p(xi, θj) + log ωj)

Each component is an exponential family

¯ l′ = 1 n

n

  • i=1

k

  • j=1

δj(zi)

  • −BFj ∗(t(xi) : ηj) + Fj ∗(t(xi)) + kj(xi) + log ωj
  • −Uj(xi,ηj)

Olivier Schwander k-MLE for generalized Gaussians

slide-17
SLIDE 17

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Direct applications of k-MLE Rewriting complete log-likelihood Experiments

Optimizing the log-likelihood

Equivalent problem

◮ Minimizing

−¯ l′ = 1 n

n

  • i=1

k

  • j=1

δj(zi)Uj (xi, ηj)

Uj

◮ Not a distance nor a divergence ◮ Can even be negative

k-means still works well (Assignment step with maximum likelihood)

Olivier Schwander k-MLE for generalized Gaussians

slide-18
SLIDE 18

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Direct applications of k-MLE Rewriting complete log-likelihood Experiments

Full algorithm: k-MLE-GG

  • 1. Initialization
  • 2. Assignment

zi = arg maxj log(ωjpFj(xi|θj))

  • 3. Update of the ηj parameters
  • 4. Goto step 2 until local

convergence

  • 5. Choose the exponential family

(µj and βj with MLE)

  • 6. Update of the weights ωj
  • 7. Goto step 2 until local

convergence

Update parameters Initialization Assignment Update weights Until convergence Until convergence Choose family

Olivier Schwander k-MLE for generalized Gaussians

slide-19
SLIDE 19

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Direct applications of k-MLE Rewriting complete log-likelihood Experiments

Comparaison with Gaussian EM

On simulated data

5 10 15 20 25 20 40 60 80 100 120 140 Time

EM k-MLE

5 10 15 20 25 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Log-likelihood ratio

EM k-MLE

◮ A mixture of generalized Gaussian is faster to learn than a

mixture of simple Gaussians !

◮ Performs similarly (log-likelihood)

Olivier Schwander k-MLE for generalized Gaussians

slide-20
SLIDE 20

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Direct applications of k-MLE Rewriting complete log-likelihood Experiments

Comparaison with generalized Gaussian EM

Allili 2010

On a texture of the Brodatz dataset

Performs similarly on a classification task

Olivier Schwander k-MLE for generalized Gaussians

slide-21
SLIDE 21

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Direct applications of k-MLE Rewriting complete log-likelihood Experiments

Conclusion

Contributions

◮ Extension of a powerful algorithm ◮ More general than k-MLE or EM ◮ Still faster than a classical EM ◮ Mixtures with components not belonging to the same

exponential family

Perspectives

◮ Exponential law / Rayleigh → Weibull ◮ Any parametrized exponential family

Olivier Schwander k-MLE for generalized Gaussians

slide-22
SLIDE 22

Motivation and background k-Maximum Likelihood estimator Mixtures of generalized Gaussian distribution Direct applications of k-MLE Rewriting complete log-likelihood Experiments

Bibliography

◮ F. Nielsen k-MLE: A fast algorithm for learning statistical

mixture models http://arxiv.org/abs/1203.5181

◮ M.S. Allili Wavelet Modelling Using Finite Mixtures of

Generalized Gaussian Distributions: Application to Texture Discrimination and Retrieval. IEEE Trans. on Image Processing, , 2012.

Olivier Schwander k-MLE for generalized Gaussians