Expectation Maximization Henrik I. Christensen Robotics & - - PowerPoint PPT Presentation

expectation maximization
SMART_READER_LITE
LIVE PREVIEW

Expectation Maximization Henrik I. Christensen Robotics & - - PowerPoint PPT Presentation

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Expectation Maximization Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu


slide-1
SLIDE 1

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Expectation Maximization

Henrik I. Christensen

Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Henrik I. Christensen (RIM@GT) EM 1 / 23

slide-2
SLIDE 2

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Outline

1

Introduction

2

Another view of EM

3

Bernoulli Mixtures

4

EM for Bayesian regression

5

EM Algorithm in General

6

Summary

Henrik I. Christensen (RIM@GT) EM 2 / 23

slide-3
SLIDE 3

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Introduction

Last time we discussed mixture models Use of K-means and EM as a way to partition data More generally estimation of latent variables such as class membership Today a few other perspectives on EM will be discussed.

Henrik I. Christensen (RIM@GT) EM 3 / 23

slide-4
SLIDE 4

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Outline

1

Introduction

2

Another view of EM

3

Bernoulli Mixtures

4

EM for Bayesian regression

5

EM Algorithm in General

6

Summary

Henrik I. Christensen (RIM@GT) EM 4 / 23

slide-5
SLIDE 5

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

An alternative view

Find ML solution to model with latent variables Have a set of observed variables - X Have a set of latent variables - Z Have a set of model parameters - θ Our criteria function is ln p(X|θ) = ln

  • Z

p(X, Z|θ)

  • Unfortunately sum inside ln expression

Henrik I. Christensen (RIM@GT) EM 5 / 23

slide-6
SLIDE 6

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

An anternative view

If Z was observed or known it would be simpler If { X, Z} - the complete set was known - great X alone is considered an incomplete dataset. However we can compute / estimate p(X|Z, θ) Iteratively we can update Z to be a good estimate of the distribution. The estimate of Z can be used to update the model parameters

Henrik I. Christensen (RIM@GT) EM 6 / 23

slide-7
SLIDE 7

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

An alternative view

1 Choose initial value of θold 2 E-Step: Compute p(Z|X, θold) 3 M-Step: Compute θnew

θnew = arg max

θ

Q(θ, θold) where (complete data log likelihood is) Q(θ, θold) =

  • Z

p(Z|x, θold) ln P(X, Z|θ)

4 Check for convergence - return to 2 if not done. Henrik I. Christensen (RIM@GT) EM 7 / 23

slide-8
SLIDE 8

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Outline

1

Introduction

2

Another view of EM

3

Bernoulli Mixtures

4

EM for Bayesian regression

5

EM Algorithm in General

6

Summary

Henrik I. Christensen (RIM@GT) EM 8 / 23

slide-9
SLIDE 9

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Mixtures of Bernoulli Distributions

What if we had a mixture of discrete random variables? Consider a Bernoulli example X is here described by D binary variables, xi, controlled by the average µi, ie. p(x|µ) =

D

  • i=1

µxi

i (1 − µi)(1−xi)

then we have E[x] = µ cov[c] = diag{µi(1 − µi)}

Henrik I. Christensen (RIM@GT) EM 9 / 23

slide-10
SLIDE 10

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Mixtures of Bernoulli Distributions

A mixture would then be p(x|µ, π) =

K

  • k=1

πkp(x|µk) and E[x] =

K

  • k=1

πkµk cov[x] =

K

  • k=1

πk

  • Σk + µkµT

k

  • − E[x]E[x]T

Our objective function would be ln p(X|µ, π) =

N

  • i=1

ln K

  • k=1

πkp(xi|µk)

  • Henrik I. Christensen (RIM@GT)

EM 10 / 23

slide-11
SLIDE 11

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

EM for Bernoulli Mixtures

If we have an unobserved latent variable, z. p(x|z, µ) =

K

  • k=1

p(x|µk)zk and the mixture of variables p(z|π) =

K

  • k=1

πzk

k

The objective function is then

ln p(X, Z|µ, π) =

N

  • n=1

K

  • k=1

znk

  • ln πk +

D

  • i=1

[xni ln µki + (1 − xni) ln(1 − µki)]

  • Henrik I. Christensen (RIM@GT)

EM 11 / 23

slide-12
SLIDE 12

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

EM for Bernoulli Mixtures

As before we can compute the responsibility γ γ(znk) = E[znk] = πkp(xn|µk) K

j=1 πjp(xn|µj)

From this we can derive a structure as seen earlier Nk =

N

  • n=1

γ(znk) µk = ¯ xk = 1 Nk

N

  • n=1

γ(znk)xn πk = Nk N

Henrik I. Christensen (RIM@GT) EM 12 / 23

slide-13
SLIDE 13

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Small Bernoulli Mixture Example

Henrik I. Christensen (RIM@GT) EM 13 / 23

slide-14
SLIDE 14

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Outline

1

Introduction

2

Another view of EM

3

Bernoulli Mixtures

4

EM for Bayesian regression

5

EM Algorithm in General

6

Summary

Henrik I. Christensen (RIM@GT) EM 14 / 23

slide-15
SLIDE 15

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Bayesian Linear Regression

We have p(w|t) = N(w|mN, SN) where mN = SN(S−1

0 m0 + αΦTt)

S−1

N

= S−1 + βΦΦT The log-likelihood is when ln p(t, w|α, β) = ln p(t|w, β) + ln p(w|α)

Henrik I. Christensen (RIM@GT) EM 15 / 23

slide-16
SLIDE 16

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

EM for Bayesian Linear Regression

In the E step - compute posterior for w In the M step - compute α and β given w We can derive (see book) α = M mT

Nmn + Tr(SN)

and a similar expression for β For responsibility we get likewise γ = M − αTr(SN)

Henrik I. Christensen (RIM@GT) EM 16 / 23

slide-17
SLIDE 17

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Outline

1

Introduction

2

Another view of EM

3

Bernoulli Mixtures

4

EM for Bayesian regression

5

EM Algorithm in General

6

Summary

Henrik I. Christensen (RIM@GT) EM 17 / 23

slide-18
SLIDE 18

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

A general version of EM

The general problem we are trying to address is

We have a set of observed variables - X We have a set of latent variables - Z We have a model parameter set - θ Goal to maximize p(X|θ) Assumption:

Hard to optimize p(X|θ) directly Easier to optimize p(X, Z|θ)

Lets assume we can define a distribution q(Z)

Henrik I. Christensen (RIM@GT) EM 18 / 23

slide-19
SLIDE 19

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

A general version of EM

We are trying to optimize ln p(X|θ) = ln p(X, Z|θ) − ln p(Z|X, θ) We can rewrite this to ln p(X|θ) = L(q, θ) + KL(q||p) where L(q, θ) =

  • Z

q(Z) ln p(X, Z|θ) q(Z)

  • KL(q||p)

= −

  • Z

q(Z) ln p(Z|X, θ) q(Z)

  • So L(q, θ) is an estimate of the joint distribution and KL is the

Kullback-Leibler comparison of q(Z) to p(Z|X, θ).

Henrik I. Christensen (RIM@GT) EM 19 / 23

slide-20
SLIDE 20

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

A general version of EM

We can now formulate the general algorithm The E-step is used for maximization of L(q, θ) with a fixed θ The M-step allow optimization of L(.) wrt θ

Henrik I. Christensen (RIM@GT) EM 20 / 23

slide-21
SLIDE 21

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Outline

1

Introduction

2

Another view of EM

3

Bernoulli Mixtures

4

EM for Bayesian regression

5

EM Algorithm in General

6

Summary

Henrik I. Christensen (RIM@GT) EM 21 / 23

slide-22
SLIDE 22

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

Summary

Expectation maximization is widely used in robotics and estimation in general Basically iterative generation of a model and optimization of the model Particularly useful for estimation with mixture models - optimize the models and the mixture coefficients iteratively rather than in batch An important tool to have available for estimation and learning

Henrik I. Christensen (RIM@GT) EM 22 / 23

slide-23
SLIDE 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary

A useful reference

  • M. J. Wainwright & M. Jordan, Graphical Models, Exponential

Families and Variational Inference, Foundations and Trends in Machine Learning, No 1-2, Vol 1., 2008 http://www.nowpublishers.com/product.aspx?product=MAL

Henrik I. Christensen (RIM@GT) EM 23 / 23