Bhattacharyya clustering with applications to mixture - - PowerPoint PPT Presentation

bhattacharyya clustering with applications to mixture
SMART_READER_LITE
LIVE PREVIEW

Bhattacharyya clustering with applications to mixture - - PowerPoint PPT Presentation

Mean Exponential Family Application Bhattacharyya clustering with applications to mixture simplifications ICPR 2010, Istanbul, Turkey Frank Nielsen 1 , 2 Sylvain Boltz 1 Olivier Schwander 1 , 3 1 Ecole Polytechnique, France 2 Sony Computer


slide-1
SLIDE 1

Mean Exponential Family Application

Bhattacharyya clustering with applications to mixture simplifications

ICPR 2010, Istanbul, Turkey Frank Nielsen1,2 Sylvain Boltz1 Olivier Schwander1,3

Ecole Polytechnique, France

2Sony Computer Science Laboratories, Japan 3´

ENS Cachan, France

August, 24 2010

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-2
SLIDE 2

Mean Exponential Family Application

Mean Definition Burbea-Rao divergences Burbea-Rao centroid Exponential Family Definition Bhattacharyya distance Closed-form formula Application Statistical mixtures Mixture simplification

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-3
SLIDE 3

Mean Exponential Family Application

Introduction

Bhattacharyya distance

◮ Widely used to compare probability density functions ◮ Good statistical properties, related to Fisher information ◮ Measures the overlap between two distributions

Bhattacharyya coefficient

Bc(p, q) = p(x)q(x)dx ≤ 1

Bhattacharyya distance

B(p, q) = − log Bc(p, q) ≥ 0

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-4
SLIDE 4

Mean Exponential Family Application

Contributions

Drawbacks

◮ Few closed-form formula

are known

◮ Centroid estimation only

for univariate Gaussian, without guarantees

Results

◮ Bhattacharyya between

exponential families, using Burbea-Rao divergencecs

◮ Efficient scheme for

centroid

◮ Application to

simplification of Gaussian mixtures

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-5
SLIDE 5

Mean Exponential Family Application Definition Burbea-Rao divergences Burbea-Rao centroid

What is a mean ?

Euclidean geometry

◮ Given a set of n points {pi}, ◮ the center of mass (a.k.a. center of gravity) is

c = 1 n

  • i

pi

Unique minimizer of average squared Euclidean distance

c = arg min

p

  • i

p − pi2

Definitions

◮ By axiomatization ◮ By optimization

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-6
SLIDE 6

Mean Exponential Family Application Definition Burbea-Rao divergences Burbea-Rao centroid

Axiomatization

Axioms for a mean function M(x1, x2)

◮ Reflexivity : M(x, x) = x ◮ Symmetry : M(x1, x2) = M(x2, x1) ◮ Continuity : M(·, ·) continuous ◮ Strict monotonicity : M(x1, x2) < M(x′ 1, x2) for x1 < x′ 1 ◮ Anonymity :

M(M(x11, x12), M(x21, x22)) = M(M(x11, x21), M(x12, x22))

Yields to a unique family

M(x1, x2) = f −1 f (x1) + f (x2) 2

  • with f continuous, strictly monotonous and increasing function

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-7
SLIDE 7

Mean Exponential Family Application Definition Burbea-Rao divergences Burbea-Rao centroid

Examples and f -representation

Some f -means

◮ Arithmetic mean : x1+x2 2

with f (x) = x

◮ Geometric mean : √x1x2 with f (x) = log x ◮ Harmonic mean : 2

1 x1 + 1 x2

with f (x) = 1

x

Arithmetic mean on the f -representation

◮ y = f (x) ◮ f (¯

x) = 1

n

  • i f (xi)

◮ ¯

y = 1

n

  • i yi

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-8
SLIDE 8

Mean Exponential Family Application Definition Burbea-Rao divergences Burbea-Rao centroid

Optimization

Problem

min

x

  • i

ωid(x, pi) = min

x L(x; ({xi}, {ωi}), d

Entropic mean (Ben-Tal et al., 1989)

◮ d(p, q) = If (p, q) = pf ( q p) (Csiszar f -divergence) ◮ f is a strictly convex differentiable function with f (1) = 0 and

f ′(1) = 0

Some entropic means

◮ Arithmetic mean : f (x) = − log x + x − 1 ◮ Geometric mean : f (x) = x log x − x + 1 ◮ Harmonic mean : f (x) = (x − 1)2

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-9
SLIDE 9

Mean Exponential Family Application Definition Burbea-Rao divergences Burbea-Rao centroid

Bregman means

Bregman divergence

◮ BF(p, q) = F(p) − F(q) + p − q|∇F(q) ◮ F is a strictly convex and differentiable function

Convex problem

◮ unique minimizer ◮ c = ∇F −1 ( i ωi∇F(xi))

Since BF is not symmetrical, there is another centroid

◮ Left-sided one : minx

  • i ωiBF(x, pi)

◮ Right-sided one : minx

  • i ωiBF(pi, x)

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-10
SLIDE 10

Mean Exponential Family Application Definition Burbea-Rao divergences Burbea-Rao centroid

Burbea-Rao divergence

Based on Jensen inequality for a convex function F

BRF(p, q) = F(p) + F(q) 2 − F(p + q 2 ) ≥ 0

Special case : Jensen-Shannon divergence

◮ JS(p, q) = KL(p, p+q 2 ) + KL(q, p+q 2 ) ◮ JS(p, q) = H( p+q 2 ) − H(p)+H(q) 2

− ≥ 0

◮ H(x) = −F(x) = −x log x (Shannon entropy)

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-11
SLIDE 11

Mean Exponential Family Application Definition Burbea-Rao divergences Burbea-Rao centroid

Symmetrizing Bregman divergences

Jeffreys-Bregman divergence

SF(p, q) = 1 2(BF(p, q) + BF(q, p)) = 1 2p − q|∇F(p) − ∇F(q)

Jensen-Bregman divergence

JF(p, q) = 1 2

  • BF(p, p + q

2 ) + BF(q, p + q 2 )

  • =

F(p) + F(q) 2 − F p + q 2

  • =

BRF(p, q)

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-12
SLIDE 12

Mean Exponential Family Application Definition Burbea-Rao divergences Burbea-Rao centroid

Burbea-Rao centroid

Optimization problem

◮ c = arg minx

  • i ωiBRF(x, pi) = arg min L(x)

◮ L(x) ≡ 1

2F(x)

convex

  • i

ωiF(c + pi 2 )

  • concave

ConCave Convex Procedure (CCCP, NIPS2001)

◮ iterative scheme ◮ ∇Lconvex(x(k+1)) = ∇Lconcave(x(k)) ◮ converges to a local minimum

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-13
SLIDE 13

Mean Exponential Family Application Definition Burbea-Rao divergences Burbea-Rao centroid

ConCave Convex Procedure

Possible decomposition for function with bounded Hessian

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-14
SLIDE 14

Mean Exponential Family Application Definition Burbea-Rao divergences Burbea-Rao centroid

Iterative algorithm for Burbea-Rao centroids

Initialization

x(0) : center of mass (Bregman right-sided centroid), or symmetrized KL divergence

Iteration

∇F(x(k+1)) =

  • i

ωi∇F

  • x(t) + pi

2

  • Centroid

x(t+1) = ∇F −1

  • i

ωi∇F

  • x(t) + pi

2

  • Nielsen, Boltz, Schwander

Bhattacharyya clustering and mixture simplification

slide-15
SLIDE 15

Mean Exponential Family Application Definition Bhattacharyya distance Closed-form formula

Exponential family

Definition

p(x; λ) = pF(x; θ) = exp (t(x)|θ − F(θ) + k(x))

◮ λ source parameter ◮ θ natural parameter ◮ F(θ) log-normalizer ◮ k(x) carrier measure

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-16
SLIDE 16

Mean Exponential Family Application Definition Bhattacharyya distance Closed-form formula

Example

Poisson distribution

p(x; λ) = λx x! exp(−λ)

◮ t(x) = x ◮ θ = log λ ◮ F(θ) = exp(θ)

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-17
SLIDE 17

Mean Exponential Family Application Definition Bhattacharyya distance Closed-form formula

Multivariate normal distribution

Gaussian

p(x; µ, Σ) = 1 2π √ det Σ exp

  • −(x − µ)tΣ−1(x − µ)

2

  • Exponential family

◮ θ = (θ1, θ2) =

  • Σ−1µ, 1

2Σ−1 ◮ F(θ) = 1 4tr

  • θ−1

1 θ2θT 2

  • − 1

2 log det θ1 + d 2 log π ◮ t(x) = (x, −xtx) ◮ k(x) = 0

Composite vector-matrix inner product

θ, θ′ = θt

1θ′ 1 + tr(θt 2θ′ 2)

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-18
SLIDE 18

Mean Exponential Family Application Definition Bhattacharyya distance Closed-form formula

Bhattacharyya distance

Bhattacharyya coefficient

◮ Amount of overlap between distributions ◮ Bc(p, q) =

p(x)q(x)dx

Bhattacharyya distance

◮ B(p, q) = − log Bc(p, q)

Metrization

◮ Hellinger-Matusita metric ◮ H(p, q) =

  • 1 − B(p, q)

◮ Gives the same Voronoi diagram

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-19
SLIDE 19

Mean Exponential Family Application Definition Bhattacharyya distance Closed-form formula

Closed-form formula

Bc(p, q) = p(x)q(x)dx =

  • exp
  • t(x), θp + θq

2 − F(θp + θq) 2 + k(x)

  • dx

= exp

  • F

θp + θq 2

  • − F(θp) + F(θq)

2

  • > 0

B(p, q) = − log Bc(p, q) = BRF(θp, θq) ≥ 0

Equivalence

◮ Bhattacharyya between two member of the same EF ◮ Burbea-Rao between natural parameters using log-normalizer

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-20
SLIDE 20

Mean Exponential Family Application Definition Bhattacharyya distance Closed-form formula

Examples

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-21
SLIDE 21

Mean Exponential Family Application Statistical mixtures Mixture simplification

Gaussian Mixture Models

Mixture

◮ Pr(X = x) = i ωiPr(X = x|µi, Σi) ◮ each Pr(X = x|µi, Σi) is a multivariate normal distribution

Soft Clustering

Expectation-Maximization algorithm, equivalent to soft Bregman clustering

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-22
SLIDE 22

Mean Exponential Family Application Statistical mixtures Mixture simplification

Statistical images

http ://www.informationgeometry.org/MEF/ RGBxy representation : 5D point set

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-23
SLIDE 23

Mean Exponential Family Application Statistical mixtures Mixture simplification

Mixture simplification

Initialization

◮ Mixture of Gaussians, with Bregman soft clustering (≡ EM)

Simplification

◮ k-means using Bhattacharyya distance and centroids

Different k

◮ Hierarchical clustering

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-24
SLIDE 24

Mean Exponential Family Application Statistical mixtures Mixture simplification

Hierarchical clustering

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-25
SLIDE 25

Mean Exponential Family Application Statistical mixtures Mixture simplification

Conclusion

Results

◮ Symmetrizing Bregman yields Burbea-Rao divergences ◮ Bhattacharyya between exponential families yields Burbea-Rao ◮ Closed-form formula for Bhattacharyya between EF ◮ Efficient scheme for BR centroid using CCCP

Applications

◮ Simplification of Gaussian Mixture Models ◮ Hierarchical Clustering

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

slide-26
SLIDE 26

Mean Exponential Family Application Statistical mixtures Mixture simplification

References

◮ Statistical exponential families : A digest with flash cards,F.

Nielsen and V. Garcia, arXiv 2009

◮ An optimal Bhattacharyya centroid algorithm for Gaussian

clustering with applications in automatic speech recognition, ICASSP 2000.

◮ The concave-convex procedure, A. Yuille and A. Rangarajan,

Neural Computation, vol. 15, no. 4, pp. 915-936, 2003.

◮ The Burbea-Rao and Bhattacharyya centroids, F. Nielsen, and

  • S. Boltz, arXiv 2010

www.informationgeometry.org

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification