CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, - - PowerPoint PPT Presentation

csci 8980 advanced topics in graphical models mixture
SMART_READER_LITE
LIVE PREVIEW

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, - - PowerPoint PPT Presentation

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families Instructor: Arindam Banerjee September


slide-1
SLIDE 1

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Instructor: Arindam Banerjee September 11, 2007

slide-2
SLIDE 2

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM

Since zi are independent, optimal ˜ p(Z) =

i ˜

p(zi)

slide-3
SLIDE 3

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM

Since zi are independent, optimal ˜ p(Z) =

i ˜

p(zi) Sufficient to work with such ˜ p in F(˜ p, θ)

slide-4
SLIDE 4

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM

Since zi are independent, optimal ˜ p(Z) =

i ˜

p(zi) Sufficient to work with such ˜ p in F(˜ p, θ) Then F(˜ p, θ) =

i Fi(˜

pi, θ) where Fi(˜ pi, θ) = E˜

pi[log p(xi, zi|θ)] + H(˜

pi)

slide-5
SLIDE 5

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM

Since zi are independent, optimal ˜ p(Z) =

i ˜

p(zi) Sufficient to work with such ˜ p in F(˜ p, θ) Then F(˜ p, θ) =

i Fi(˜

pi, θ) where Fi(˜ pi, θ) = E˜

pi[log p(xi, zi|θ)] + H(˜

pi) Incremental algorithm that works one point at a time

slide-6
SLIDE 6

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM (Contd.)

Basic Incremental EM

slide-7
SLIDE 7

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM (Contd.)

Basic Incremental EM

E-step: Choose a data item i to be updated Set ˜ p(t)

j

= ˜ p(t−1)

j

for j = i Set ˜ p(t)

i

= p(zi|xi, θ(t))

slide-8
SLIDE 8

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM (Contd.)

Basic Incremental EM

E-step: Choose a data item i to be updated Set ˜ p(t)

j

= ˜ p(t−1)

j

for j = i Set ˜ p(t)

i

= p(zi|xi, θ(t)) M-step: Set θ(t) to argmaxθ E˜

p(t)[log p(x, z|θ)]

slide-9
SLIDE 9

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM (Contd.)

Basic Incremental EM

E-step: Choose a data item i to be updated Set ˜ p(t)

j

= ˜ p(t−1)

j

for j = i Set ˜ p(t)

i

= p(zi|xi, θ(t)) M-step: Set θ(t) to argmaxθ E˜

p(t)[log p(x, z|θ)]

M-step needs to look at all components of ˜ p

slide-10
SLIDE 10

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM (Contd.)

Basic Incremental EM

E-step: Choose a data item i to be updated Set ˜ p(t)

j

= ˜ p(t−1)

j

for j = i Set ˜ p(t)

i

= p(zi|xi, θ(t)) M-step: Set θ(t) to argmaxθ E˜

p(t)[log p(x, z|θ)]

M-step needs to look at all components of ˜ p Can be simplified by using sufficient statistics

slide-11
SLIDE 11

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM (Contd.)

Basic Incremental EM

E-step: Choose a data item i to be updated Set ˜ p(t)

j

= ˜ p(t−1)

j

for j = i Set ˜ p(t)

i

= p(zi|xi, θ(t)) M-step: Set θ(t) to argmaxθ E˜

p(t)[log p(x, z|θ)]

M-step needs to look at all components of ˜ p Can be simplified by using sufficient statistics For a distribution p(x|θ), s(x) is a sufficient statistic if p(x|s(x), θ) = p(x|s(x)) = ⇒ p(x|θ) = h(x)q(s(x)|θ)

slide-12
SLIDE 12

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM with Sufficient Statistics

EM with sufficient statistics

slide-13
SLIDE 13

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM with Sufficient Statistics

EM with sufficient statistics

E-step: Set ˜ s(t) = E˜

p[s(x, z)] where ˜

p(z) = p(z|x, θ(t−1))

slide-14
SLIDE 14

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM with Sufficient Statistics

EM with sufficient statistics

E-step: Set ˜ s(t) = E˜

p[s(x, z)] where ˜

p(z) = p(z|x, θ(t−1)) M-step: Set θ(t) to θ, the max likelihood given ˜ s(t)

slide-15
SLIDE 15

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM with Sufficient Statistics

EM with sufficient statistics

E-step: Set ˜ s(t) = E˜

p[s(x, z)] where ˜

p(z) = p(z|x, θ(t−1)) M-step: Set θ(t) to θ, the max likelihood given ˜ s(t)

Incremental EM with sufficient statistics

slide-16
SLIDE 16

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM with Sufficient Statistics

EM with sufficient statistics

E-step: Set ˜ s(t) = E˜

p[s(x, z)] where ˜

p(z) = p(z|x, θ(t−1)) M-step: Set θ(t) to θ, the max likelihood given ˜ s(t)

Incremental EM with sufficient statistics

E-step: Choose a data item i to be updated Set ˜ s(t)

j

= ˜ s(t−1)

j

, for j = i Set ˜ s(t)

i

= E˜

pi[si(xi, zi)], where ˜

pi(zi) = p(zi|xi, θ(t−1)) Set ˜ s(t) = ˜ s(t−1) − ˜ s(t−1)

i

+ ˜ s(t)

i

slide-17
SLIDE 17

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Incremental EM with Sufficient Statistics

EM with sufficient statistics

E-step: Set ˜ s(t) = E˜

p[s(x, z)] where ˜

p(z) = p(z|x, θ(t−1)) M-step: Set θ(t) to θ, the max likelihood given ˜ s(t)

Incremental EM with sufficient statistics

E-step: Choose a data item i to be updated Set ˜ s(t)

j

= ˜ s(t−1)

j

, for j = i Set ˜ s(t)

i

= E˜

pi[si(xi, zi)], where ˜

pi(zi) = p(zi|xi, θ(t−1)) Set ˜ s(t) = ˜ s(t−1) − ˜ s(t−1)

i

+ ˜ s(t)

i

M-step: Set θ(t) to θ, the max likelihood given ˜ s(t)

slide-18
SLIDE 18

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Example

Consider a mixture of 2 univariate Gaussians

slide-19
SLIDE 19

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Example

Consider a mixture of 2 univariate Gaussians Parameter set θ = (α, µ1, σ1, µ2, σ2)

slide-20
SLIDE 20

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Example

Consider a mixture of 2 univariate Gaussians Parameter set θ = (α, µ1, σ1, µ2, σ2) Sufficient statistics si(xi, zi) = [zi (1 − zi) zixi (1 − zi)xi zix2

i (1 − zi)x2 i ]

slide-21
SLIDE 21

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Example

Consider a mixture of 2 univariate Gaussians Parameter set θ = (α, µ1, σ1, µ2, σ2) Sufficient statistics si(xi, zi) = [zi (1 − zi) zixi (1 − zi)xi zix2

i (1 − zi)x2 i ]

Given s(x, z) =

i s(xi, zi) = (n1, n2, m1, m2, q1, q2)

α = n1 n1 + n2 , µh = mh nh , σ2

h = qh

nh − mh nh 2

slide-22
SLIDE 22

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Sparse EM

Consider a mixture model with many components

slide-23
SLIDE 23

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Sparse EM

Consider a mixture model with many components Most p(z|x, θ) will be negligibly small

slide-24
SLIDE 24

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Sparse EM

Consider a mixture model with many components Most p(z|x, θ) will be negligibly small Computation can be saved by freezing these

slide-25
SLIDE 25

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Sparse EM

Consider a mixture model with many components Most p(z|x, θ) will be negligibly small Computation can be saved by freezing these Only a small set of component posteriors need to be updated ˜ p(t)(z) =

  • q(t)

z ,

if z ∈ St Q(t)r(t)

z

if z ∈ St

slide-26
SLIDE 26

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Sparse EM

Consider a mixture model with many components Most p(z|x, θ) will be negligibly small Computation can be saved by freezing these Only a small set of component posteriors need to be updated ˜ p(t)(z) =

  • q(t)

z ,

if z ∈ St Q(t)r(t)

z

if z ∈ St St = set of plausible values

slide-27
SLIDE 27

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Sparse EM

Consider a mixture model with many components Most p(z|x, θ) will be negligibly small Computation can be saved by freezing these Only a small set of component posteriors need to be updated ˜ p(t)(z) =

  • q(t)

z ,

if z ∈ St Q(t)r(t)

z

if z ∈ St St = set of plausible values

Can be determined by a reasonable hueristic

slide-28
SLIDE 28

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Other Variants

Generalized EM

slide-29
SLIDE 29

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Other Variants

Generalized EM

M-step finds θ(t) = argmaxθ E˜

p[log p(x, z|θ)]

slide-30
SLIDE 30

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Other Variants

Generalized EM

M-step finds θ(t) = argmaxθ E˜

p[log p(x, z|θ)]

Instead find θ(t) such that E˜

p[log p(x, z|θ(t))] ≥ E˜ p[log p(x, z|θ(t−1))]

slide-31
SLIDE 31

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Other Variants

Generalized EM

M-step finds θ(t) = argmaxθ E˜

p[log p(x, z|θ)]

Instead find θ(t) such that E˜

p[log p(x, z|θ(t))] ≥ E˜ p[log p(x, z|θ(t−1))]

Hard assignments

slide-32
SLIDE 32

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Other Variants

Generalized EM

M-step finds θ(t) = argmaxθ E˜

p[log p(x, z|θ)]

Instead find θ(t) such that E˜

p[log p(x, z|θ(t))] ≥ E˜ p[log p(x, z|θ(t−1))]

Hard assignments

Winner-take-all variant of EM

slide-33
SLIDE 33

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Other Variants

Generalized EM

M-step finds θ(t) = argmaxθ E˜

p[log p(x, z|θ)]

Instead find θ(t) such that E˜

p[log p(x, z|θ(t))] ≥ E˜ p[log p(x, z|θ(t−1))]

Hard assignments

Winner-take-all variant of EM Assign 1 to one component, zero to all others

slide-34
SLIDE 34

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Other Variants

Generalized EM

M-step finds θ(t) = argmaxθ E˜

p[log p(x, z|θ)]

Instead find θ(t) such that E˜

p[log p(x, z|θ(t))] ≥ E˜ p[log p(x, z|θ(t−1))]

Hard assignments

Winner-take-all variant of EM Assign 1 to one component, zero to all others Hard clustering, equivalent to kmeans

slide-35
SLIDE 35

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Other Variants

Generalized EM

M-step finds θ(t) = argmaxθ E˜

p[log p(x, z|θ)]

Instead find θ(t) such that E˜

p[log p(x, z|θ(t))] ≥ E˜ p[log p(x, z|θ(t−1))]

Hard assignments

Winner-take-all variant of EM Assign 1 to one component, zero to all others Hard clustering, equivalent to kmeans Does not directly optimize L(θ)

slide-36
SLIDE 36

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Other Variants

Generalized EM

M-step finds θ(t) = argmaxθ E˜

p[log p(x, z|θ)]

Instead find θ(t) such that E˜

p[log p(x, z|θ(t))] ≥ E˜ p[log p(x, z|θ(t−1))]

Hard assignments

Winner-take-all variant of EM Assign 1 to one component, zero to all others Hard clustering, equivalent to kmeans Does not directly optimize L(θ) But optimizes a lower bound on L(θ)

slide-37
SLIDE 37

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Auxiliary Functions

Consider the problem of minimizing F(x)

slide-38
SLIDE 38

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Auxiliary Functions

Consider the problem of minimizing F(x) G(x, x′) is an auxiliary function to F(x) if G(x, x′) ≥ F(x) G(x, x) = F(x)

slide-39
SLIDE 39

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Auxiliary Functions

Consider the problem of minimizing F(x) G(x, x′) is an auxiliary function to F(x) if G(x, x′) ≥ F(x) G(x, x) = F(x) F is non-decreasing under the following updates xt = argminx G(x, x(t−1))

slide-40
SLIDE 40

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Auxiliary Functions

Consider the problem of minimizing F(x) G(x, x′) is an auxiliary function to F(x) if G(x, x′) ≥ F(x) G(x, x) = F(x) F is non-decreasing under the following updates xt = argminx G(x, x(t−1)) By definition F(xt) ≤ G(xt, x(t−1)) ≤ G(x(t−1), x(t−1)) = F(x(t−1))

slide-41
SLIDE 41

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Auxiliary Functions

Consider the problem of minimizing F(x) G(x, x′) is an auxiliary function to F(x) if G(x, x′) ≥ F(x) G(x, x) = F(x) F is non-decreasing under the following updates xt = argminx G(x, x(t−1)) By definition F(xt) ≤ G(xt, x(t−1)) ≤ G(x(t−1), x(t−1)) = F(x(t−1)) The sequence is guaranteed to converge to a local minima

slide-42
SLIDE 42

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Auxiliary Functions

Consider the problem of minimizing F(x) G(x, x′) is an auxiliary function to F(x) if G(x, x′) ≥ F(x) G(x, x) = F(x) F is non-decreasing under the following updates xt = argminx G(x, x(t−1)) By definition F(xt) ≤ G(xt, x(t−1)) ≤ G(x(t−1), x(t−1)) = F(x(t−1)) The sequence is guaranteed to converge to a local minima The argument reverses for maximization problems

slide-43
SLIDE 43

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Auxiliary Functions

Consider the problem of minimizing F(x) G(x, x′) is an auxiliary function to F(x) if G(x, x′) ≥ F(x) G(x, x) = F(x) F is non-decreasing under the following updates xt = argminx G(x, x(t−1)) By definition F(xt) ≤ G(xt, x(t−1)) ≤ G(x(t−1), x(t−1)) = F(x(t−1)) The sequence is guaranteed to converge to a local minima The argument reverses for maximization problems EM updates are a special case of the general technique

slide-44
SLIDE 44

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixture of Gaussians

For multi-variate Gaussians, each component ph(x|µh, Σh) = 1 (2π)d/2|Σh|1/2 exp

  • −1

2(x − µh)TΣ−1

h (x − µh)

slide-45
SLIDE 45

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixture of Gaussians

For multi-variate Gaussians, each component ph(x|µh, Σh) = 1 (2π)d/2|Σh|1/2 exp

  • −1

2(x − µh)TΣ−1

h (x − µh)

  • The Mixture of Gaussians (MoG) model

p(x|α, Θ) =

k

  • h=1

αhph(x|µh, Σh)

slide-46
SLIDE 46

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixture of Gaussians

For multi-variate Gaussians, each component ph(x|µh, Σh) = 1 (2π)d/2|Σh|1/2 exp

  • −1

2(x − µh)TΣ−1

h (x − µh)

  • The Mixture of Gaussians (MoG) model

p(x|α, Θ) =

k

  • h=1

αhph(x|µh, Σh) One of the most widely used mixture models

slide-47
SLIDE 47

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixture of Gaussians

For multi-variate Gaussians, each component ph(x|µh, Σh) = 1 (2π)d/2|Σh|1/2 exp

  • −1

2(x − µh)TΣ−1

h (x − µh)

  • The Mixture of Gaussians (MoG) model

p(x|α, Θ) =

k

  • h=1

αhph(x|µh, Σh) One of the most widely used mixture models Recent years have seen progress on non-EM algorithm

slide-48
SLIDE 48

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

EM for Mixture of Gaussians: E-step

E-step is a direct application of Bayes rule p(h|x, α, Θ) = αhph(x|µh, Σh) k

h′=1 αh′ph′(x|µh′, Σh′)

slide-49
SLIDE 49

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

EM for Mixture of Gaussians: E-step

E-step is a direct application of Bayes rule p(h|x, α, Θ) = αhph(x|µh, Σh) k

h′=1 αh′ph′(x|µh′, Σh′)

Use current parameter values on the r.h.s.

slide-50
SLIDE 50

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

EM for Mixture of Gaussians: E-step

E-step is a direct application of Bayes rule p(h|x, α, Θ) = αhph(x|µh, Σh) k

h′=1 αh′ph′(x|µh′, Σh′)

Use current parameter values on the r.h.s. Incremental and sparse variants can be applied in practice

slide-51
SLIDE 51

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

EM for Mixture of Gaussians: M-step

The auxiliary function Q(θ, θ(t−1)) =

  • i
  • h

log(αh)p(h|xi|θ(t−1)) +

  • i
  • h

log ph(x|µh, Σh)p(h|xi, θ(t−1))

slide-52
SLIDE 52

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

EM for Mixture of Gaussians: M-step

The auxiliary function Q(θ, θ(t−1)) =

  • i
  • h

log(αh)p(h|xi|θ(t−1)) +

  • i
  • h

log ph(x|µh, Σh)p(h|xi, θ(t−1)) Optimize over (αh, µh, Σh), [h]k

1

slide-53
SLIDE 53

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

EM for Mixture of Gaussians: M-step

The auxiliary function Q(θ, θ(t−1)) =

  • i
  • h

log(αh)p(h|xi|θ(t−1)) +

  • i
  • h

log ph(x|µh, Σh)p(h|xi, θ(t−1)) Optimize over (αh, µh, Σh), [h]k

1

α is a discrete distribution, forms additional constraint

slide-54
SLIDE 54

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

EM for Mixture of Gaussians: M-step

The auxiliary function Q(θ, θ(t−1)) =

  • i
  • h

log(αh)p(h|xi|θ(t−1)) +

  • i
  • h

log ph(x|µh, Σh)p(h|xi, θ(t−1)) Optimize over (αh, µh, Σh), [h]k

1

α is a discrete distribution, forms additional constraint Focus on first term for αh, true for all mixtures

slide-55
SLIDE 55

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

EM for Mixture of Gaussians: M-step

The auxiliary function Q(θ, θ(t−1)) =

  • i
  • h

log(αh)p(h|xi|θ(t−1)) +

  • i
  • h

log ph(x|µh, Σh)p(h|xi, θ(t−1)) Optimize over (αh, µh, Σh), [h]k

1

α is a discrete distribution, forms additional constraint Focus on first term for αh, true for all mixtures Focus on second term for (µh, Σh)

slide-56
SLIDE 56

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

EM for Mixture of Gaussians: M-step (Contd.)

For any finite mixture model αh = 1 N

N

  • i=1

p(h|xi, θ(t−1))

slide-57
SLIDE 57

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

EM for Mixture of Gaussians: M-step (Contd.)

For any finite mixture model αh = 1 N

N

  • i=1

p(h|xi, θ(t−1)) For Mixture of Gaussians µh =

  • i xip(h|xi, θ(t−1))
  • i p(h|xi, θ(t−1))

Σh =

  • i p(h|xi, θn)(xi − µh)(xi − µh)T
  • i p(h|xi, θn)
slide-58
SLIDE 58

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Exponential Family Distributions

Multi-variate parametric distributions of the form pψ(x|θ) = exp(xTθ − ψ(θ))p0(x)

slide-59
SLIDE 59

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Exponential Family Distributions

Multi-variate parametric distributions of the form pψ(x|θ) = exp(xTθ − ψ(θ))p0(x) x is the sufficient statistic

slide-60
SLIDE 60

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Exponential Family Distributions

Multi-variate parametric distributions of the form pψ(x|θ) = exp(xTθ − ψ(θ))p0(x) x is the sufficient statistic θ is the natural parameter

slide-61
SLIDE 61

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Exponential Family Distributions

Multi-variate parametric distributions of the form pψ(x|θ) = exp(xTθ − ψ(θ))p0(x) x is the sufficient statistic θ is the natural parameter ψ(·) is the cumulant or log-partition function

slide-62
SLIDE 62

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Exponential Family Distributions

Multi-variate parametric distributions of the form pψ(x|θ) = exp(xTθ − ψ(θ))p0(x) x is the sufficient statistic θ is the natural parameter ψ(·) is the cumulant or log-partition function Expectation parameter µ = E[X] = ∇ψ(θ)

slide-63
SLIDE 63

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Exponential Family Distributions

Multi-variate parametric distributions of the form pψ(x|θ) = exp(xTθ − ψ(θ))p0(x) x is the sufficient statistic θ is the natural parameter ψ(·) is the cumulant or log-partition function Expectation parameter µ = E[X] = ∇ψ(θ) Examples: Gaussian, Bernoulli, Poisson, Multinomial, Dirichlet

slide-64
SLIDE 64

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

The Cumulant Function

The Laplace transform viewpoint L(θ) = exp(ψ(θ)) =

  • x

exp(xTθ)p0(x) dx = Ep0[exp(xTθ)]

slide-65
SLIDE 65

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

The Cumulant Function

The Laplace transform viewpoint L(θ) = exp(ψ(θ)) =

  • x

exp(xTθ)p0(x) dx = Ep0[exp(xTθ)] Holder’s inequality implies: For 1 ≤ p, q ≤ ∞, 1/p + 1/q = 1, E[|X|p]1/pE[|Y |q]1/q ≥ E[|XY |]

slide-66
SLIDE 66

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

The Cumulant Function

The Laplace transform viewpoint L(θ) = exp(ψ(θ)) =

  • x

exp(xTθ)p0(x) dx = Ep0[exp(xTθ)] Holder’s inequality implies: For 1 ≤ p, q ≤ ∞, 1/p + 1/q = 1, E[|X|p]1/pE[|Y |q]1/q ≥ E[|XY |] Hence λψ(θ1) + (1 − λ)ψ(θ2) = log

  • Ep0[exp(xTθ1)]λEp0[exp(xTθ2)]1−λ

≥ log

  • Ep0[exp(xT(λθ1 + (1 − λ)θ2))]
  • = ψ(λθ1 + (1 − λ)θ2)
slide-67
SLIDE 67

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

The Cumulant Function

The Laplace transform viewpoint L(θ) = exp(ψ(θ)) =

  • x

exp(xTθ)p0(x) dx = Ep0[exp(xTθ)] Holder’s inequality implies: For 1 ≤ p, q ≤ ∞, 1/p + 1/q = 1, E[|X|p]1/pE[|Y |q]1/q ≥ E[|XY |] Hence λψ(θ1) + (1 − λ)ψ(θ2) = log

  • Ep0[exp(xTθ1)]λEp0[exp(xTθ2)]1−λ

≥ log

  • Ep0[exp(xT(λθ1 + (1 − λ)θ2))]
  • = ψ(λθ1 + (1 − λ)θ2)

The cumulant ψ(θ) is a convex function

slide-68
SLIDE 68

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Maximum Likelihood Estimation, Conjugate

Let s = s(x) be the sufficient statistic for a set of points x

slide-69
SLIDE 69

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Maximum Likelihood Estimation, Conjugate

Let s = s(x) be the sufficient statistic for a set of points x Then maximizing log-likelihood is φ(s) = max

θ

(sTθ − ψ(θ))

slide-70
SLIDE 70

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Maximum Likelihood Estimation, Conjugate

Let s = s(x) be the sufficient statistic for a set of points x Then maximizing log-likelihood is φ(s) = max

θ

(sTθ − ψ(θ)) Has a unique maximizer since ψ(θ) is convex

slide-71
SLIDE 71

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Maximum Likelihood Estimation, Conjugate

Let s = s(x) be the sufficient statistic for a set of points x Then maximizing log-likelihood is φ(s) = max

θ

(sTθ − ψ(θ)) Has a unique maximizer since ψ(θ) is convex The conjugate of ψ is φ(s) = sup

θ

(sTθ − ψ(θ))

slide-72
SLIDE 72

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Maximum Likelihood Estimation, Conjugate

Let s = s(x) be the sufficient statistic for a set of points x Then maximizing log-likelihood is φ(s) = max

θ

(sTθ − ψ(θ)) Has a unique maximizer since ψ(θ) is convex The conjugate of ψ is φ(s) = sup

θ

(sTθ − ψ(θ)) φ is a convex function of s

slide-73
SLIDE 73

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Maximum Likelihood Estimation, Conjugate

Let s = s(x) be the sufficient statistic for a set of points x Then maximizing log-likelihood is φ(s) = max

θ

(sTθ − ψ(θ)) Has a unique maximizer since ψ(θ) is convex The conjugate of ψ is φ(s) = sup

θ

(sTθ − ψ(θ)) φ is a convex function of s Technically, ψ, φ are “Legendre” functions

slide-74
SLIDE 74

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixtures of Exponential Family Distributions

A finite mixture model p(x|α, Θ) =

k

  • h=1

αhpψ(x|θh)

slide-75
SLIDE 75

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixtures of Exponential Family Distributions

A finite mixture model p(x|α, Θ) =

k

  • h=1

αhpψ(x|θh) ψ determines the family

slide-76
SLIDE 76

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixtures of Exponential Family Distributions

A finite mixture model p(x|α, Θ) =

k

  • h=1

αhpψ(x|θh) ψ determines the family All mixture components are of the same family

slide-77
SLIDE 77

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixtures of Exponential Family Distributions

A finite mixture model p(x|α, Θ) =

k

  • h=1

αhpψ(x|θh) ψ determines the family All mixture components are of the same family θ determines the distribution in the family

slide-78
SLIDE 78

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixtures of Exponential Family Distributions

A finite mixture model p(x|α, Θ) =

k

  • h=1

αhpψ(x|θh) ψ determines the family All mixture components are of the same family θ determines the distribution in the family Each component has different parameters

slide-79
SLIDE 79

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixtures of Exponential Family Distributions (Contd.)

E-step: Exactly same as before αh = 1 N

N

  • i=1

p(h|xi, θ(t−1))

slide-80
SLIDE 80

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixtures of Exponential Family Distributions (Contd.)

E-step: Exactly same as before αh = 1 N

N

  • i=1

p(h|xi, θ(t−1)) M-step: Taking gradient w.r.t. θh ∇ψ(θh) =

  • i xip(h|xi, θ(t−1))
  • i p(h|xi, θ(t−1))
slide-81
SLIDE 81

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixtures of Exponential Family Distributions (Contd.)

E-step: Exactly same as before αh = 1 N

N

  • i=1

p(h|xi, θ(t−1)) M-step: Taking gradient w.r.t. θh ∇ψ(θh) =

  • i xip(h|xi, θ(t−1))
  • i p(h|xi, θ(t−1))

∇ψ is monotonic increasing, inverse is well defined

slide-82
SLIDE 82

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixtures of Exponential Family Distributions (Contd.)

E-step: Exactly same as before αh = 1 N

N

  • i=1

p(h|xi, θ(t−1)) M-step: Taking gradient w.r.t. θh ∇ψ(θh) =

  • i xip(h|xi, θ(t−1))
  • i p(h|xi, θ(t−1))

∇ψ is monotonic increasing, inverse is well defined Recall the expression for µh for Gaussian mixtures

slide-83
SLIDE 83

Variants Auxiliary Functions Mixture of Gaussians Exponential Family Mixtures of Exponential Families Bayes Net

Mixture Models as a Bayes Net

π R θ x n k

z