Lecture 4
Hidden Markov Models Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus Nussbaum-Thom
Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com
Lecture 4 Hidden Markov Models Michael Picheny, Bhuvana - - PowerPoint PPT Presentation
Lecture 4 Hidden Markov Models Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus Nussbaum-Thom Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com 10
Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com
2 / 157
3 / 157
6
4 / 157
w∈vocab
test, A′ w)
w∈vocab
test|w)
5 / 157
2 (x−µj)T Σ−1 j
(x−µj)
6 / 157
7 / 157
w∈vocab
test|w)
8 / 157
9 / 157
w∈vocab
w∈vocab
w∈vocab
10 / 157
11 / 157
12 / 157
13 / 157
14 / 157
15 / 157
16 / 157
17 / 157
18 / 157
19 / 157
1
2
20 / 157
10
21 / 157
22 / 157
10
10
23 / 157
N
x
24 / 157
1 ) = log(pH)H(pT)T = H log pH + T log(1 − pH)
25 / 157
N , pT = T N .
H N
1 )
26 / 157
H
T
27 / 157
1 ) =
x
28 / 157
1
2
29 / 157
30 / 157
14
H 0.9 T 0.1 T 0.8 H 0.2
31 / 157
32 / 157
?
N
N
33 / 157
N
N
34 / 157
N
xi pxi−1,xi = 1 for all xi−1.
35 / 157
❲✴♣❘❀❲ ❘✴♣❲❀❘ ❈✴♣❘❀❈ ❘✴♣❈❀❘ ❲✴♣❈❀❲ ❈✴♣❲❀❈ ❈✴♣st❛rt❀❈ ❘✴♣st❛rt❀❘ ❈✴♣❈❀❈ ❲✴♣❲❀❲ ❘✴♣❘❀❘ ❲✴♣st❛rt❀❲
36 / 157
❲✴✵✳✺ ❘✴✵✳✻ ❈ ✴ ✵ ✳ ✹ ❘✴✵✳✼ ❲ ✴ ✵ ✳ ✷ ❈✴✵✳✶ ❈✴✵✳✺ ❘✴✵✳✷ ❈✴✵✳✶ ❲✴✵✳✸ ❘✴✵✳✶ ❲✴✵✳✸
N
N
37 / 157
N
N
c(xi−1,xi) xi−1,xi
38 / 157
1 ) =
xi−1,xi =
39 / 157
xi−1,xi =
R,C =
40 / 157
❲✴✶ ❘✴✵ ❈✴✺ ❘✴✻ ❲✴✷ ❈✴✹ ❈✴✵ ❘✴✵ ❈✴✶✽ ❲✴✷ ❘✴✶✻ ❲✴✶
❲✴✵✳✵✹✺ ❘✴✵✳✵✵✵ ❈✴✵✳✷✷✼ ❘✴✵✳✷✸✶ ❲✴✵✳✵✼✼ ❈✴✵✳✻✻✼ ❈✴✵✳✵✵✵ ❘✴✵✳✵✵✵ ❈✴✵✳✻✾✷ ❲✴✵✳✸✸✸ ❘✴✵✳✼✷✼ ❲✴✶✳✵✵✵
41 / 157
42 / 157
1 ) =
xi−1,xi =
43 / 157
44 / 157
45 / 157
46 / 157
47 / 157
❈✴✵✳✹ ❲✴✵✳✻
❲✴✵✳✵✹✺ ❘✴✵✳✵✵✵ ❈ ✴ ✵ ✳ ✷ ✷ ✼ ❘✴✵✳✷✸✶ ❲ ✴ ✵ ✳ ✵ ✼ ✼ ❈✴✵✳✻✻✼ ❈✴✵✳✵✵✵ ❘✴✵✳✵✵✵ ❈✴✵✳✻✾✷ ❲✴✵✳✸✸✸ ❘✴✵✳✼✷✼ ❲✴✶✳✵✵✵ 48 / 157
49 / 157
50 / 157
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
51 / 157
1
2
3
52 / 157
1
2
3
4
53 / 157
h
54 / 157
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
55 / 157
h
h
56 / 157
h
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
57 / 157
→S
xt
58 / 157
→S
xt
p∈P(S,t) P(p)
p′∈P(S′,t−1),S′ xt →S
xt
S′ xt →S
xt
p′∈P(S′,t−1) P(p′)
S′ xt →S
xt
59 / 157
h
S
S′ xt →S
xt
60 / 157
S′ xt →S
xt
h
S
61 / 157
1 2 3 4 19 1 3 3 10 1 1
S′→S{d(S′) + distance(S′, S)}
S′ xt →S
xt
62 / 157
h
63 / 157
S′ xt →S
xt
S′ xt →S
xt
S
S
64 / 157
65 / 157
28
66 / 157
29
67 / 157
33
68 / 157
34
69 / 157
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
C
C
70 / 157
h
71 / 157
72 / 157
1
2
3
4
73 / 157
74 / 157
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
75 / 157
→S
xt
76 / 157
→S
xt
→S
xt
→S
xt
→S
xt
77 / 157
→S
xt
78 / 157
→S
xt
79 / 157
h
S
S′ xt →S
xt
→S
xt
80 / 157
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
C
C
81 / 157
82 / 157
1
2
3
4
83 / 157
84 / 157
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
→S′.
→S′ = 1
85 / 157
1 ) =
→S′
x
→S′
x
S x →S′ =
x
x
86 / 157
❲✴✶ ❘✴✵ ❈✴✺ ❘✴✻ ❲✴✷ ❈✴✹ ❈✴✵ ❘✴✵ ❈✴✶✽ ❲✴✷ ❘✴✶✻ ❲✴✶
❲✴✵✳✵✹✺ ❘✴✵✳✵✵✵ ❈✴✵✳✷✷✼ ❘✴✵✳✷✸✶ ❲✴✵✳✵✼✼ ❈✴✵✳✻✻✼ ❈✴✵✳✵✵✵ ❘✴✵✳✵✵✵ ❈✴✵✳✻✾✷ ❲✴✵✳✸✸✸ ❘✴✵✳✼✷✼ ❲✴✶✳✵✵✵
87 / 157
88 / 157
89 / 157
90 / 157
91 / 157
x
x
S x →S′ =
x
x
x
x
92 / 157
x
x
93 / 157
x
T
x
T
→S′,t)
x
x
x
xt
xt
→S′,t)
94 / 157
xt
xt
→S′,t)
xt
→S′,t)
xt
→S′,t)
95 / 157
xt
xt
xt
xt
96 / 157
xt
xt
xt
xt
→S′,t)
xt
xt
→S′
xt
→S′ × α(S, t − 1) × β(S′, t)
97 / 157
x
x
T
x
xt
xt
→S′ × α(S, t − 1) × β(S′, t)
98 / 157
xt
xt
xt
→S′ × α(S, t − 1) × β(S′, t)
x
T
x
S x →S′ =
x
x
99 / 157
→S
→S × α(S′, t − 1)
100 / 157
xt+1
→ S′
S
xt+1
→ S′ × β(S′, t + 1)
101 / 157
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
→c × α(c, 1) + pw C →c × α(w, 1)
102 / 157
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
→c × β(c, 3) + pc W →w × β(w, 3)
103 / 157
x
→S′
C
C
104 / 157
x
→S′
C
→c × α(c, 1) × β(c, 2)
105 / 157
x
S x →S′
C
C
x
x
106 / 157
x
S x →S′
x
x
C
T
C
c C →c =
C
x
107 / 157
108 / 157
109 / 157
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
❈✴✵✳✶✸ ❲✴✵✳✵✵ ❈✴✵✳✵✵ ❲✴✵✳✵✵ ❈✴✵✳✽✻ ❲✴✵✳✵✶ ❈✴✵✳✻✷ ❲✴✵✳✸✽
110 / 157
❈✴✵✳✹✹ ❲✴✵✳✹✻ ❈✴✵✳✹✷ ❲✴✵✳✹✽ ❈✴✵✳✵✸ ❲✴✵✳✵✼ ❈✴✵✳✵✹ ❲✴✵✳✵✻
❈✴✵✳✾✶ ❲✴✵✳✵✵ ❈✴✵✳✹✹ ❲✴✵✳✸✵ ❈✴✵✳✵✾ ❲✴✵✳✵✵ ❈✴✵✳✵✼ ❲✴✵✳✷✵
111 / 157
112 / 157
113 / 157
25
0.4
0.03
0.08
0.04
0.004
0.21
0.021
0.008
0.7 0.3 0.8 0.2
0.3 0.7 0.5 0.5 a b
114 / 157
0.21
0.04
0.08
115 / 157
27
0.33
116 / 157
28
117 / 157
29
118 / 157
119 / 157
120 / 157
42
121 / 157
43
7 paths:
P(X) = Σi p(X,pathi) = .008632
122 / 157
123 / 157
124 / 157
Step P(X) 1 0.008632 2 0.02438 3 0.02508 100 0.03125004 600 0.037037037 converged
125 / 157
Si Sj at-1(i)
bt(j)
xt
126 / 157
57
127 / 157
58
128 / 157
59
Time: 0 1 2 3 4 Obs: f a ab aba abaa State: 1 2 3 .547 .246 .045 .151 .101 .067 .045 .302 .201 .134 .151 .553 .821 .167x.0625x.333x.5/.008632
129 / 157
130 / 157
1
2
3
4
131 / 157
w∈vocab
test, A′ w)
w∈vocab
test|w)
test ⇒ xtest.
132 / 157
w∈vocab
133 / 157
134 / 157
135 / 157
136 / 157
❲✴✵✳✸ ❈✴✵✳✼ ❲✴✶✳✵
✵✿✷
✵✿✽
✴✵✳✸ ✵✿✼
✵✿✸
✴✵✳✼ ✵✿✹
✵✿✻
✴✶✳✵
✵✿✷
✵✿✽
✵✿✼
✵✿✸
✵✳✸ ✵✳✼ ✶✳✵ 137 / 157
138 / 157
1
2
139 / 157
❲✴✵✳✸ ❈✴✵✳✼ ❲✴✶✳✵
✷✴✵✳✸ ✶✴✵✳✼ ✸✴✶✳✵ 140 / 157
g
→S′.
g
→S′ = 1
2 (x−µg,j)T Σ−1 g,j (x−µg,j) 141 / 157
→S′.
❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✶ ❲✴✵✳✶ ❈✴✵✳✻ ❲✴✵✳✷ ❈✴✵✳✷ ❲✴✵✳✻
142 / 157
g
→S′.
2(x−µg,j)T Σ−1 g,j (x−µg,j) 143 / 157
1,1 = 1.
✶✴✵✳✸ ✶✴✵✳✼ ✶✴✶✳✵
→1 ×
−
(0.3−µ1,1)2 2σ2 1,1
→2 ×
−
(−0.1−µ1,1)2 2σ2 1,1
144 / 157
→S′ . . .
g
→S′ × Pg(x)
145 / 157
→S
→S × Pg(xt) × α(S′, t − 1)
146 / 157
g
g
→S′ × Pg(xt) × α(S, t − 1) × β(S′, t)
x
T
x
S x →S′ =
x
x
g
147 / 157
1
2
148 / 157
w∈vocab
149 / 157
✶✴✵✳✺ ✷✴✵✳✺ ✸✴✵✳✺ ✹✴✵✳✺ ✺✴✵✳✺ ✻✴✵✳✺ ✶✴✵✳✺ ✷✴✵✳✺ ✸✴✵✳✺ ✹✴✵✳✺ ✺✴✵✳✺ ✻✴✵✳✺
150 / 157
w∈vocab
test, A′ w)
w∈vocab
test|w)
test, A′ w) ≈ − log P(A′ test|w)
151 / 157
152 / 157
153 / 157
154 / 157
155 / 157
156 / 157
157 / 157