Mixture Models and EM Henrik I. Christensen Robotics & - - PowerPoint PPT Presentation

mixture models and em
SMART_READER_LITE
LIVE PREVIEW

Mixture Models and EM Henrik I. Christensen Robotics & - - PowerPoint PPT Presentation

Introduction K-means Clustering MoG Summary Mixture Models and EM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT)


slide-1
SLIDE 1

Introduction K-means Clustering MoG Summary

Mixture Models and EM

Henrik I. Christensen

Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Henrik I. Christensen (RIM@GT) Mixture Models/EM 1 / 29

slide-2
SLIDE 2

Introduction K-means Clustering MoG Summary

Outline

1

Introduction

2

K-means Clustering

3

Mixtures of Gaussians

4

Summary

Henrik I. Christensen (RIM@GT) Mixture Models/EM 2 / 29

slide-3
SLIDE 3

Introduction K-means Clustering MoG Summary

Introduction

In many cases the uni-modal assumption of a normal distribution is a major challenge. I.e. handling multiple hypotheses or modelling of multiple instances - people, ... Mixtures of Gaussians are a way to model richer distributions The mixture of Gaussians can be considered a model with latent variables. Expectation Maximization (EM) is a general technique to find maximum likelihood estimators for models with latent variables Mixture models are widely used for clustering of data K-means is another clustering technique that has similarities to EM

Henrik I. Christensen (RIM@GT) Mixture Models/EM 3 / 29

slide-4
SLIDE 4

Introduction K-means Clustering MoG Summary

Outline

1

Introduction

2

K-means Clustering

3

Mixtures of Gaussians

4

Summary

Henrik I. Christensen (RIM@GT) Mixture Models/EM 4 / 29

slide-5
SLIDE 5

Introduction K-means Clustering MoG Summary

K-means clustering

Consider clustering of data {x1, x2, ..., xN} into K groups. Assume for now that each data point is in a D-dimensional Euclidean space Each cluster is represented by a “center” estimate µi Challenge: how to find an optimal assignment of data to clusters? Have an indicator variable rni ∈ {0, 1} Named the 1-of-K coding

Henrik I. Christensen (RIM@GT) Mixture Models/EM 5 / 29

slide-6
SLIDE 6

Introduction K-means Clustering MoG Summary

K-means - Objective Function

We can then define an objective function / distortion measure J =

N

  • n=1

K

  • i=1

rni||xn − µi||2 Basically the squared distance to the “centres” Goal: rni the optimal assignment to clusters µi the centers of clusters to minimize J

Henrik I. Christensen (RIM@GT) Mixture Models/EM 6 / 29

slide-7
SLIDE 7

Introduction K-means Clustering MoG Summary

Iterative Algorithm

1 Choose initial values for µi 2 Minimize J wrt rni 3 Minimize J wrt µi 4 Repeat 2 - 3 until convergence Henrik I. Christensen (RIM@GT) Mixture Models/EM 7 / 29

slide-8
SLIDE 8

Introduction K-means Clustering MoG Summary

Algorithm details

Consider the indicator rni = 1 if i = arg minj ||xn − µj||2

  • therwise

Extremum for J is then defined by 2

N

  • n=1

rni(xn − µi) = 0

  • r

µi =

  • n rnixn
  • n rni

So µ is the mean of the kth cluster thus the name k-means

Henrik I. Christensen (RIM@GT) Mixture Models/EM 8 / 29

slide-9
SLIDE 9

Introduction K-means Clustering MoG Summary

Small Example

(a) −2 2 −2 2 (b) −2 2 −2 2 (c) −2 2 −2 2 (d) −2 2 −2 2 (e) −2 2 −2 2 (f) −2 2 −2 2 (g) −2 2 −2 2 (h) −2 2 −2 2 (i) −2 2 −2 2

Henrik I. Christensen (RIM@GT) Mixture Models/EM 9 / 29

slide-10
SLIDE 10

Introduction K-means Clustering MoG Summary

Objective Function

J 1 2 3 4 500 1000

Henrik I. Christensen (RIM@GT) Mixture Models/EM 10 / 29

slide-11
SLIDE 11

Introduction K-means Clustering MoG Summary

Considerations

The iteration over all data points in each iteration can be a challenge The “smart” selection of candidate points for the cluster centers is

  • important. Even uniform distribution could be ok.

Organization of data is graph/mesh can be essential for efficient access / handling of data

Henrik I. Christensen (RIM@GT) Mixture Models/EM 11 / 29

slide-12
SLIDE 12

Introduction K-means Clustering MoG Summary

Iterative Updating

Sequential updating can be organized with: µnew

i

= µold

i

+ ηt(xn − µold

i

) Where ηt is the learning rate and it typically decreases as more points are considered.

Henrik I. Christensen (RIM@GT) Mixture Models/EM 12 / 29

slide-13
SLIDE 13

Introduction K-means Clustering MoG Summary

Generalization of K-means

In general the Euclidean norm might not always be optimal The generalized version of the objective / distortion function is J =

N

  • n=1

K

  • i=1

rniD(xn, µi) Here D(., .) is a dissimilarity measure that even might handle robust

  • utlier rejection.

Henrik I. Christensen (RIM@GT) Mixture Models/EM 13 / 29

slide-14
SLIDE 14

Introduction K-means Clustering MoG Summary

Example of clustering - Image Compression

K=2 K=3 K=10 Original

K = 2 K = 3 K = 10 Original image

Henrik I. Christensen (RIM@GT) Mixture Models/EM 14 / 29

slide-15
SLIDE 15

Introduction K-means Clustering MoG Summary

Example of clustering - Image Compression

K=2 K=3 K=10 Original

Henrik I. Christensen (RIM@GT) Mixture Models/EM 15 / 29

slide-16
SLIDE 16

Introduction K-means Clustering MoG Summary

Outline

1

Introduction

2

K-means Clustering

3

Mixtures of Gaussians

4

Summary

Henrik I. Christensen (RIM@GT) Mixture Models/EM 16 / 29

slide-17
SLIDE 17

Introduction K-means Clustering MoG Summary

Mixtures of Gaussians

Recall the original definition of mixtures p(x) =

K

  • i=1

πiN(x|µkΣk) Define an indicator variable zi that is characterized by

zk ∈ {0, 1} Only one of the dimensions has unit value

  • k zk = 1

Assume p(x, z) and a conditional p(x|z) We can then assume p(zk = 1) = πk

Henrik I. Christensen (RIM@GT) Mixture Models/EM 17 / 29

slide-18
SLIDE 18

Introduction K-means Clustering MoG Summary

The parameterization

We have for {πi} that

0 ≤ πi ≤ 1

  • i πi = 1

p(z) can be be considered

p(z) =

K

  • i=1

πzi

i

Similarly p(x|zi = 1) = N(x|µi, Σi)

  • r

p(x|z) =

K

  • i=1

N(x|µi, Σi)zi ⇒ p(x) =

  • z

p(z)p(x|z) =

K

  • i=1

πiN(x|µi, Σi)

Henrik I. Christensen (RIM@GT) Mixture Models/EM 18 / 29

slide-19
SLIDE 19

Introduction K-means Clustering MoG Summary

Mixtures

So why all the extra stuff? We can think of p(x) as an observation over a joint distribution p(x, z) where z is a latent variable. For reference introduce p(zi = 1|x) also denoted γ(zi) γ(zi) = p(zi = 1|x) = πiN(x|µi, Σi)

  • j πjN(x|µj, Σj)

Henrik I. Christensen (RIM@GT) Mixture Models/EM 19 / 29

slide-20
SLIDE 20

Introduction K-means Clustering MoG Summary

Data Example

(a) 0.5 1 0.5 1 (b) 0.5 1 0.5 1 (c) 0.5 1 0.5 1 Henrik I. Christensen (RIM@GT) Mixture Models/EM 20 / 29

slide-21
SLIDE 21

Introduction K-means Clustering MoG Summary

Maximum Likelihood

Suppose we have a dataset X = {x1, x2, ..., xN} How can we model it using a mixture model? ln p(X|π, µ, Σ) =

N

  • j=1

ln K

  • i=1

πiN(xj|µi, Σi)

  • xn

zn N µ Σ π

Henrik I. Christensen (RIM@GT) Mixture Models/EM 21 / 29

slide-22
SLIDE 22

Introduction K-means Clustering MoG Summary

EM for Gaussian Mixures

Consider the extremum for the ln p()

N

  • i=1

πiN(xi|µk, Σk)

  • j πjN(xi|µj, Σj)Σ−1

k (xi − µk) = 0

⇒ µk = 1 Nk

N

  • i=1

γ(zik)xi where Nk =

  • i

γ(zik)

Henrik I. Christensen (RIM@GT) Mixture Models/EM 22 / 29

slide-23
SLIDE 23

Introduction K-means Clustering MoG Summary

EM for Gaussian Mixures

In a similar fashion we can compute the co-variance Σk = 1 Nk

N

  • i=1

γ(zik)(xi − µk)(xi − µk)T If we maximize wrt to mixing (πi) we need to optimize ln p but also consider the constraint π = 1 Using a Lagrange multiplier we have ln p(X|π, µ, Σ) + λ K

  • i=1

πi − 1

  • Henrik I. Christensen (RIM@GT)

Mixture Models/EM 23 / 29

slide-24
SLIDE 24

Introduction K-means Clustering MoG Summary

EM for Gaussian Mixtures

We obtain

N

  • i=1

N(xi|µk, Σk)

  • j πjN(xi|µj, Σj) + λ

Which creates the intuitive solution πi = Ni N

Henrik I. Christensen (RIM@GT) Mixture Models/EM 24 / 29

slide-25
SLIDE 25

Introduction K-means Clustering MoG Summary

EM for Gaussian Mixtures

Select a set of values for π, µ, and Σ Perform an initial analysis (expectation) Re-estimate the values (maximize the likelihood) Iterate

Henrik I. Christensen (RIM@GT) Mixture Models/EM 25 / 29

slide-26
SLIDE 26

Introduction K-means Clustering MoG Summary

The detailed version

1 Initialize parameters 2 Evaluate (E Step)

γ(zi) = p(zi = 1|x) = πiN(x|µi, Σi)

  • j πjN(x|µj, Σj)

3 Re-estimate parameters, µnew

k

, Σnew

k

and πnew

k

4 Evaluate ln p(X|π, µ, Σ) and check for convergence Henrik I. Christensen (RIM@GT) Mixture Models/EM 26 / 29

slide-27
SLIDE 27

Introduction K-means Clustering MoG Summary

Small Example 2

(a) −2 2 −2 2 (b) −2 2 −2 2 (c) L = 1 −2 2 −2 2 (d) L = 2 −2 2 −2 2 (e) L = 5 −2 2 −2 2 (f) L = 20 −2 2 −2 2 Henrik I. Christensen (RIM@GT) Mixture Models/EM 27 / 29

slide-28
SLIDE 28

Introduction K-means Clustering MoG Summary

Outline

1

Introduction

2

K-means Clustering

3

Mixtures of Gaussians

4

Summary

Henrik I. Christensen (RIM@GT) Mixture Models/EM 28 / 29

slide-29
SLIDE 29

Introduction K-means Clustering MoG Summary

Summary

Discussed Mixture Models as a way to model data Mixture models as a way to consider p(x,z) with latent z K-Means introduced as a flexible clustering technique Introduction to EM for Gaussian Mixture Models Next time a more general version of EM will be introduced

Henrik I. Christensen (RIM@GT) Mixture Models/EM 29 / 29