Simplifying mixtures of Parzen windows GRETSI 2011, Bordeaux, France - - PowerPoint PPT Presentation

simplifying mixtures of parzen windows
SMART_READER_LITE
LIVE PREVIEW

Simplifying mixtures of Parzen windows GRETSI 2011, Bordeaux, France - - PowerPoint PPT Presentation

Mixture Models Simplification Software library Simplifying mixtures of Parzen windows GRETSI 2011, Bordeaux, France Olivier Schwander Frank Nielsen cole Polytechnique September 6, 2011 Olivier Schwander Simplifying mixtures of Parzen


slide-1
SLIDE 1

Mixture Models Simplification Software library

Simplifying mixtures of Parzen windows

GRETSI 2011, Bordeaux, France Olivier Schwander Frank Nielsen

École Polytechnique

September 6, 2011

Olivier Schwander Simplifying mixtures of Parzen windows

slide-2
SLIDE 2

Mixture Models Simplification Software library

Outline

Mixture Models Statistical mixtures Getting mixtures Simplification k-means One-step clustering Experiments Software library Presentation

Olivier Schwander Simplifying mixtures of Parzen windows

slide-3
SLIDE 3

Mixture Models Simplification Software library Statistical mixtures Getting mixtures

Mixture models

Mixture

◮ Pr(X = x) = i ωiPr(X = x|µi, Θi) ◮ each Pr(X = x|µi, Θi) is a probability density function

Famous special case

Gaussian Mixtures Models (GMM)

Olivier Schwander Simplifying mixtures of Parzen windows

slide-4
SLIDE 4

Mixture Models Simplification Software library Statistical mixtures Getting mixtures

Getting mixtures

Expectation-Maximization Kernel density estimation

◮ or Parzen windows methods ◮ one Kernel by data point (often a Gaussian kernel) ◮ fixed bandwidth

50 100 150 200 250 50 100 150 200 50 100 150 200 250 0.000 0.002 0.004 0.006 0.008 0.010 0.012

Olivier Schwander Simplifying mixtures of Parzen windows

slide-5
SLIDE 5

Mixture Models Simplification Software library Statistical mixtures Getting mixtures

Why simplification ?

A lot of components

◮ 120 × 120 = 14400 Gaussian in the previous curve

KDE: good approximation but

◮ very large mixture: time and memory problems ◮ low number of components is often enough (EM)

EM: small approximation but

We may want a fixed number of components without learning a new mixture

◮ EM is slow ◮ we don’t have the original dataset, just the model

Olivier Schwander Simplifying mixtures of Parzen windows

slide-6
SLIDE 6

Mixture Models Simplification Software library k-means One-step clustering Experiments

k-means

50 100 150 200 250 50 100 150 200 50 100 150 200 250 0.000 0.002 0.004 0.006 0.008 0.010 0.012 50 100 150 200 250 300 0.000 0.002 0.004 0.006 0.008 0.010 0.012

Olivier Schwander Simplifying mixtures of Parzen windows

slide-7
SLIDE 7

Mixture Models Simplification Software library k-means One-step clustering Experiments

k-means

What do we need ?

◮ A distance (or a divergence, or a dissimilarity measure) ◮ A centroid

Olivier Schwander Simplifying mixtures of Parzen windows

slide-8
SLIDE 8

Mixture Models Simplification Software library k-means One-step clustering Experiments

Kullback-Liebler divergence

Divergence

◮ D(PQ) =

  • p(x) log p(x)

q(x) dx

◮ Not a symmetric divergence

Centroids

◮ Left-sided one: minx

  • i ωiBF(x, pi)

◮ Right-sided one: minx

  • i ωiBF(pi, x)

◮ Various symmetrizations ! ◮ Known in closed-form

Olivier Schwander Simplifying mixtures of Parzen windows

slide-9
SLIDE 9

Mixture Models Simplification Software library k-means One-step clustering Experiments

Fisher divergence

Riemannian metric on the statistical manifold

Fisher information matrix

gij = I(θi, θj) = E ∂ ∂θi log p(X; θ) ∂ ∂θj log p(X; θ)

  • ds =

gijdθidθj

Olivier Schwander Simplifying mixtures of Parzen windows

slide-10
SLIDE 10

Mixture Models Simplification Software library k-means One-step clustering Experiments

Fisher divergence formula

Known for 0-mean Gaussian

◮ Not really interesting for mixtures. . . ◮ Open problem for others cases

For 1D data

◮ Poincaré hyperbolic distance in the Poincaré upper half-plane

FRD(fp, fq) = √ 2 ln

  • ( µp

√ 2, σp) − ( µq √ 2, σq)

  • +
  • ( µp

√ 2, σp) − ( µp √ 2, σp)

  • ( µp

√ 2, σp) − ( µp √ 2, σp)

  • ( µp

√ 2, σp) − ( µp √ 2, σp)

  • Olivier Schwander

Simplifying mixtures of Parzen windows

slide-11
SLIDE 11

Mixture Models Simplification Software library k-means One-step clustering Experiments

Fisher centroids

No closed-form formula

◮ even for 1D Gaussian ◮ brute-force search for the minimizer ? not very elegant

Olivier Schwander Simplifying mixtures of Parzen windows

slide-12
SLIDE 12

Mixture Models Simplification Software library k-means One-step clustering Experiments

Model centroids

Centroid in constant curvature spaces

◮ from Poincaré upper half-plane

to Poincaré disk

◮ from Poincaré disk to Klein

disk

◮ from Klein disk to Minkowski

model

◮ Center of Mass and

renormalization

◮ from Minkowski model to

. . . to Poincaré upper half-plane

O ω1p′

1

ω2p′

2

p′

1

p′

2

ω1p′

1 + ω2p′ 2

p1 p2 Minkowski model Klein disk c c′

  • Galperin. A concept of the mass center of a system of material points in the

constant curvature spaces. 1993

Olivier Schwander Simplifying mixtures of Parzen windows

slide-13
SLIDE 13

Mixture Models Simplification Software library k-means One-step clustering Experiments

One-step clustering

What are we looking for ?

◮ the best model ? Failure, we just have a local minimum... ◮ a good enough model ? which constraints ?

What happens if we don’t do the iterations of the k-means ?

◮ Faster ! ◮ Quality ?

Olivier Schwander Simplifying mixtures of Parzen windows

slide-14
SLIDE 14

Mixture Models Simplification Software library k-means One-step clustering Experiments

Experiments: log-likelihood

◮ EM and k-means with KL are very good even no matter the

number of components

◮ k-means with model centroids and one-step k-means with

model centroids just need a little more components

Olivier Schwander Simplifying mixtures of Parzen windows

slide-15
SLIDE 15

Mixture Models Simplification Software library k-means One-step clustering Experiments

Experiments: time

◮ KL even slower than EM (closed-form formula does not mean

cheap computation)

◮ one-step clustering really fast, with good quality

Olivier Schwander Simplifying mixtures of Parzen windows

slide-16
SLIDE 16

Mixture Models Simplification Software library k-means One-step clustering Experiments

Bioinformatics application: prediction of RNA 3D structure

Previous work

◮ Direchlet process mixtures ◮ High quality models but too slow

Original data KDE and simplified KDE EM and simplified EM

joint work with A. SIM, M. LEVITT and J. BERNAUER, INRIA and Stanford

Olivier Schwander Simplifying mixtures of Parzen windows

slide-17
SLIDE 17

Mixture Models Simplification Software library Presentation

pyMEF: a Python library for Exponential families

Manipulation of mixture of EF

◮ direct creation of mixtures ◮ learning of mixtures: Bregman soft clustering ◮ simplification of mixtures: Bregman hard clustering, Model

hard clustering

◮ vizualization

Goals

◮ generic framework for EF (and Information Geometry) ◮ rapid prototyping (Python shell)

Olivier Schwander Simplifying mixtures of Parzen windows

slide-18
SLIDE 18

Mixture Models Simplification Software library Presentation

Conclusion

A better way to get mixtures

◮ Compact mixtures ◮ Fast to learn ◮ Fast to use

One-step clustering

◮ Would need to be validated by a real application

pyMEF

◮ A library for all that ◮ With release soon (hopefully) ◮ http://www.lix.polytechnique.fr/~schwander/pyMEF

Olivier Schwander Simplifying mixtures of Parzen windows