SLIDE 1 Lecture
12
.
Expectation
Maximization
Variational
Inference
Scribes
:
Daniel
Zeiberg
Alesia
Chernihova
SLIDE 2 Maximum
Likelihood
Estimation
in
6mm
Easy
:
Estimate n
for
" observed "
t
log
ply
,
2-
Am
)
?
fly
,
77
)
Problem
:
Need
to
marginalize
Z
pcyly
)
=
ldtplu.tt/y1=n7!/dznpCyn.7nly7oQylogpCy1yl=ptqy,fqnI
,
ldznexplyitly
,
Integral
throws
spanner
in
worms
SLIDE 3 Expectation
Maximization
*
* *
Objective :
n.ME
= a
.gg?gxlogpCy11u.E,n
) Repeat
until
convergence
'
(
unchanged )
1
.
For
n in
7
,
.
Ni
yuh
ie
El IlZn=h
) )
=
/ dznpcznlyn.io
)
Ilan
#
Points
in
cluster
he
2.
For
W
in
I
,
. . .
Ki
N
N
Mu
=
t
E yuh yn
Empirical
Nh
:-.
I
ruh
µ
h
h =
'
Mean
h 't
Ih
=
,
&
!
trnhynynt
Empirical
Covariance nu
=
Nh
IN
Fraction
in
cluster
h
SLIDE 4 Expectation
Maximization
:
Example
Iteration
:
O
SLIDE 5 Expectation
Maximization
:
Example
Iteration
:
1
SLIDE 6 Expectation
Maximization
:
Example
Iteration
:
2
SLIDE 7 Expectation
Maximization
:
Example
Iteration
:
3
SLIDE 8 Expectation
Maximization
:
Example
Iteration
:
4
SLIDE 9 Expectation
Maximization
:
Example
Iteration
:
5
SLIDE 10 Expectation
Maximization
:
Example
Iteration
:
6
SLIDE 11 Intermezzo
,
:
Jensen
's
Inequality
Convex
Functions
Area
above
f- (
tx
,
t
Ci
xz
)
fix
, . . . •
fkn
curve
is
a
set
s
t
fix
, )
t
It
flat
fix
.
'
g
X
,
Xz
Concave
Functions
Area
below
f- (
tx
,
t
Ci
xz
)
fix
flat
cu
is a
,
convex
set
,
7 t
fix
, )
+
It
flat
s
X
,
Xz
Corrolary
:
Random Variables
t.ci#xnl:Efii.i:iit::::.
SLIDE 12 Lowen
Bounds
Marginal
Likelihoods
Idea
:
Use
Jensen
's
inequality to
define
Lower
Bound
2-
i
six
,
=
fax
aix
)
4¥,
=
E
goal
L
:
"
. I
lost
I
slog
#
⇐
, 14
It
't
[
Lower bound
boy
7
Gaussian
Mixture
Model
2- I O )
;
= Id 't
pig
. 't
:O )
=
I
at
act
; y , PgY¥
=
pig
;o ,
pig
, 7
;D
)
£10,81
:-.
Ez
.mg
, I log
, )
s
log
pay
;
SLIDE 13 Intermezzo
:
Kullback
Divergence
9¥
Measures
how
much
KL( qcxsll
MIX ) )
:
DX
941
by
n
"
'
a ,×
,
deviates
from
MIN
Properties
I
.
KL
( a
call
mix
) )
3
- ( Positive femi
- definite)
- KL ( g
kill MIN )
=
lax
guy
leg
"9¥ ,
=
E.
⇐ galley "g , )
± log ( E⇐g*l"gift )
=
log lil
.
KL ( q C x ) 11171×1)
=
91×1
=
Mk )
9kt
171×7
I
dxqix
, log 94¥ ,
→
lax
Mix
,
leg
,
=
SLIDE 14 KL divergence
vs
Lower
Bound
;o )
=
£10,21
=
Eagan , lloypggjt.sc ]
P 's
:O 'P
't
's
:o)
=
#z~q , ;y)|log
pig
:o)
+ leg Plaything ) does not
depend
rewrite
as
Kttdiv
=
log
pig
:o)
←
an ;n|log9pYftTo , ]
=
log
ply
:O)
qhsy
)
H
paly
;o ) )
a
\ Does
not
depend
y
Depends
y
Implication
:
Maximizing
£ ( O ,y )
wrt
y
is
equivalent
to
minimizing
KL
(
917
; g)
11
pcttly
;
O ) )
SLIDE 15 Algorithm
:
Generalized
Expectation
Maximization
Objective
:
Lto
.hr
.
Egj
,
.gg/loyPlY'tt-9/slogpcy;o7
9175g )
Initialize
;
A
Repeat
until
£10
, y )
unchanged
:
1
.
Expectation
Step
Computes
expected
y
=
ang
mate
I ( O
,
8 )
sufficient
statistics
r
2
.
Maximization
Step
Maximizes O
given O
=
anymore L(
0,8 )
computed
statistics
SLIDE 16 Algorithm
:
Generalized
Expectation
Maximization
Objective
:
Lto
,
Hi
.
Egj
,
.gg/loyPlY'tt-9/slogpcy;o7
9175g )
Initialize
:
O
qc.zi-hsyl-ku-pf2-n-hlyn.cl
Repeat
until
£10,8)
unchanged
"
q
moments
EE
ttt ) )
1
,
Expectation
Step
determine distribution
8
=
angngax
I I o
,
r )
=
angginkhfgct.gl//pC71yiO
))
2
.
Maximization
Step
D=
anymore LIO
,
r )
SLIDE 17 Maximization
Step
:
Update
Parameters
a
n
. rt =
Ea ,
. .gs/logPg::?IT
]
K
µ
f
tdyiz)
\
,
leg
ply
it
I
y
)
=
E
Mtf
, thlynl
Ithih
]
) Iftar )
htt
! yall
9.
r )
= Ig go.gg/bgpisitin
) I
N
= ?
fthlynl-g-YEgn.nl#zn=hH=n.Etulyn1rnu
)
Nh
SLIDE 18 Maximization
Step
:
Update
Parameters
a
n
. rt =
Ea ,
. ,nflogPgY÷]
K
µ
leg
ply
it
I
g)
=
E
Mt
f.
, tulynl
Ithih
)
) Iftar )
htt
Iq
Cly
,
r )
= (I
? tulyn )
run )
Nh
=
Maximum
Likelihood
:
Match
moments
to
expected
suff
stats
ftp..mn/thl5tY--oa-M--fuE?tulynlrnu
bye
SLIDE 19 Algorithm
:
Generalized
Expectation
Maximization
Objective
:
Lto
,
Hi
.
Egj
,
.gg/loyPlYitt-9/slogpcy;o7
917
; y )
Initialize
:
O
qc.zi-hsyl-ku-pftn-hlyn.cl
Repeat
until
£10,8)
unchanged
"
q
Moments
EE
ttt ) )
1
,
Expectation
Step
determine distribution
8
=
angngax
I
C a
,
r )
=
angginkhfgct.gl//pC71yiO
))
z
.
Maximization
Step
use
computed
suff
.
stats
update
parameters
D=
anymore
L ( O
,
r )
by
matching
moments
SLIDE 20 Variational
Inference
Idea
,
Approximate
posterior
by
maximizing
a
variational lower
bound
,
ply
?
Ptt
,
Ely
)
"
9)
=
* go.o.cn/eogPgY.if?T )
7,0
ly
=
log
ply )
t
Eg
llogpqlcz.co#)--legpcys-KL/9lZ.osoDllpcz.o1ys
)
q
s
log
pay ,
Maximizing
L lol )
is
the
Same
as
minimizing
KL
SLIDE 21 Intuition
: Minimizing
KL
divergences
Ply
,
×
,
,X
. )
=
PCYIX
, ,×z )
pcx
,
,xz
)
Z
/
qcx
)
=
Norm
(
x
;µ
,
2)
g)
qc ,× ,
,×z
,
÷
q(×
,
)q(
×
.
, qcx
, )
÷
Norm
/×
,
;µ
, ,6 ? )
qkz )
:=
Normkn
;µz,6i )
LC
aim
)
:-.
EaaflogPlgYIlI@ply.x
,
,×z
)
=
pcyipix
,
,×ziy )
=
leg
ply )
x
,
,x
, )
Hpcx , ,×zly
) )
Intuition
:
KL
divergence
under
approximates
variance
SLIDE 22 Intuition
: Minimizing
KL
divergences
z
g
P(
Yixi
,×z )
=
pcylx
,
,×up(
1-
, ,x > )
x )
=
Norm
(
x
;
p
,
[ )
g)
pagan
qc ,× ,
,×z
,
÷
qk
, ,
qkz
)
Propagate
qlx
, )
:-.
Norm (
×
,
;µ
,
,6
, )
91×21
:-.
µorm(
Xz
;µz
, G)
*
KL( plx
. ,xzly )
119k
,
,xz ))
9k
,
,×z
I
KL(
qlx
,
,xz )
Hpcx
.
,xzly
) )
= |d×
,
dx
.
qk
,
,xz
1 log
, ,Xz1y )
ligng
.
q
leg 'f
=
a
log
ph
=
as
Intuition
:
q
( ×
,
,xe )
→
PC
x
, ,xz
ly
)→0
SLIDE 23 Algorithm :
Variational
Expectation
Maximization
Define
:
qcz
, O
; g)
=
gets
Of
't
)
g C O 's 90 )
Objective
:L
Clot
, do )
=
Et
qiao
,
flog
'
"q¥to
I slog
ply ,
until
Ileft
, 00 )
converges
( change
smaller
than
some
threshold )
1
.
Expectation
Step
lot
=
argy.mx
L
( oligo)
Analogous
to
EM
step
for
j
z
.
Maximization
Step
Updates
distribution
010
=
angurax
£
( 97,010 )
glo
; go )
instead
40
point
estimate
O
SLIDE 24 Example
:
Gaussian
Mixture
( Simplified )
Generative
Model
f !
I
si
is:/
huh
, d
~
Norm ( pro ,d
,
So
)
2-
n
( YK
,
. . . ,Yk )
ynl7n=h
n
Norml
pea
,
EI
)
SLIDE 25 Model Selection
µ
Margined
likelihood
"
Average
Evidence
livelihood
"
£
I
log
ply
)
log
pigs
=
log ldtdoply.z.io )
K=2
I
Number
Clusters
Intuition
:
Can
avoid
fitting
by keeping
model
with
highest
£
SLIDE 26 Gaussian
Mixture
:
Derivation
Updates
9
I 9
Ltr
.
miss
=
E go.ngimm.si/bsPgIYIYgTp..m,s
, )
=
E
gcaayuillogpcy.7.ms/-EgmlloylgiIY-Eqyuslloglgii-j
)
depends
y
,
m
,
S
depends
an
y
depends
m
, s
gets
11pm)
E
:
Solve
I
= a
M
:
Solve
Ofm
=
¥
=
SLIDE 27 Gaussian
Mixture
:
Derivation
Updates
Idea
:
Exploit
Exponential
Families
Eaczsgcy
, flog
ply
It
, y ) )
hiyqexpfyuttlyn
)
)
=
Eta
guy
, I &!£!IHn=h) log
pcynlzi-h.lu )]
=
.
Ego,fIha=hD
#
crplyittlyni
T
I
T
depends
017=8
depends
0/7=4,5
)
SLIDE 28 Example
:
Gaussian
Mixture
( Simplified )
Generative
Model
f
! !
I
"
"
i
s:/
Variational
Distribution
µ
h
, d
~
Norm ( pro ,d
,
So
)
gyu
,
7)
=
9441917
)
2-
n
n
Discrete ( 11k
,
. . . ,Yk )
9171
=
917187
ynl7n=h
n
Norml
pea
, EI
)
9 ( in
=
I?
.NL/uuimu,5u
)
h
E
✓
uh
a
exp I Eg
!µ
, I log
plynltn-h.MY )
M
i
N mu
=
mo
t
Tywyn
si
=
1-
mi
( ÷ . )
t
frm