SLIDE 13 13
Eric Xing 25
The Baum Welch algorithm
The complete log likelihood The expected complete log likelihood EM
- The E step
- The M step ("symbolically" identical to MLE)
∏ ∏ ∏
⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = =
= = − n T t t n t n T t t n t n n c
x x p y y p y p p
1 2 1 1
) | ( ) | ( ) ( log ) , ( log ) , ; (
, , , , ,
y x y x θ l
∑∑ ∑∑ ∑
= = −
⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =
−
n T t k i y p i t n k t n n T t j i y y p j t n i t n n i y p i n c
b y x a y y y
n t n n t n t n n n
1 2 1 1
1 1
, ) | ( , , , ) | , ( , , ) | ( ,
log log log ) , ; (
, , , ,
x x x
y x θ π l
) | (
, , , n i t n i t n i t n
y p y x 1 = = = γ ) | , (
, , , , , , n j t n i t n j t n i t n j i t n
y y p y y x 1 1
1 1
= = = =
− −
ξ
∑ ∑ ∑ ∑
− = =
=
n T t i t n n T t j i t n ML ij
a
1 1 2 , , ,
γ ξ
∑ ∑ ∑ ∑
− = =
=
n T t i t n k t n n T t i t n ML ik
x b
1 1 1 , , ,
γ γ N
n i n ML i
∑
=
1 ,
γ π
Eric Xing 26
Unsupervised ML estimation
- Given x = x1…xN for which the true state path y = y1…yN is
unknown,
0.
Starting with our best guess of a model M, parameters θ:
1.
Estimate Aij , Bik in the training data
,
2.
Update θ according to Aij , Bik
- Now a "supervised learning" problem
3.
Repeat 1 & 2, until convergence This is called the Baum-Welch Algorithm We can get to a provably more (or equally) likely parameter set θ each iteration
k t n t n i t n ik
x y B
, , ,
∑
=
∑
−
=
t n j t n i t n ij
y y A
, , , 1