Lecture 11: Viterbi and Forward Algorithms
Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16
1 CS6501 Natural Language Processing
Lecture 11: Viterbi and Forward Algorithms Kai-Wei Chang CS @ - - PowerPoint PPT Presentation
Lecture 11: Viterbi and Forward Algorithms Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501 Natural Language Processing 1 Quiz 1 Quiz 1 30 25 20 15 10 5 0 [0-5]
Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16
1 CS6501 Natural Language Processing
CS6501 Natural Language Processing 2
5 10 15 20 25 30 [0-5] [6-10] [11-15] [16-20] [21-25]
Quiz 1
3 CS6501 Natural Language Processing
CS6501 Natural Language Processing ‹#›
CS6501 Natural Language Processing 5
How likely the sentence ”I love cat” occurs POS tags of ”I love cat” occurs How to learn the model?
CS6501 Natural Language Processing 6
𝒖
1 𝑄 𝑥. 𝑢. 𝑄 𝑢. ∣ 𝑢.40 𝒖
= 𝑄 “𝐽 𝑚𝑝𝑤𝑓 𝑑𝑏𝑢”,”𝑂𝑂𝑂” 𝝁 + 𝑄 “𝐽 𝑚𝑝𝑤𝑓 𝑑𝑏𝑢”,”𝑂𝑂𝑊” 𝝁 +𝑄 “𝐽 𝑚𝑝𝑤𝑓 𝑑𝑏𝑢”,”𝑂𝑊𝑂” 𝝁 + 𝑄 “𝐽 𝑚𝑝𝑤𝑓 𝑑𝑏𝑢”, ”𝑂𝑊𝑊” 𝝁 +𝑄 “𝐽 𝑚𝑝𝑤𝑓 𝑑𝑏𝑢”,”𝑊𝑂𝑂” 𝝁 + 𝑄 “𝐽 𝑚𝑝𝑤𝑓 𝑑𝑏𝑢”, ”𝑊𝑂𝑊” 𝝁 +𝑄 “𝐽 𝑚𝑝𝑤𝑓 𝑑𝑏𝑢”, ”𝑊𝑊𝑂” 𝝁 + 𝑄 “𝐽 𝑚𝑝𝑤𝑓 𝑑𝑏𝑢”,”𝑊𝑊𝑊” 𝝁
CS6501 Natural Language Processing 7
1 𝑄 𝑥. 𝑢. 𝑄 𝑢. ∣ 𝑢.40 𝒖
CS6501 Natural Language Processing 8
𝑄(𝑢C = 2|𝑢0 = 1) 𝑗 = 1 𝑗 = 2 𝑗 = 3 𝑗 = 4 𝑄(𝑢J = 1|𝑢C = 1) 𝑄(𝑥J|𝑢J = 1) ⋯ 𝝁 is the parameter set of
slides for simplicity’s sake
CS6501 Natural Language Processing 9
𝑗 = 1 𝑗 = 2 𝑗 = 3 𝑗 = 4 N V A N V A N V A N V A 𝑄(𝐽|𝑂) 𝑄(𝑓𝑏𝑢|𝑊) 𝑄(𝑏|𝑊) 𝑄(𝑔𝑗𝑡ℎ|A) 𝑄(𝑊|𝑂) 𝑄(𝑊|𝑊) 𝑄(𝑂| < 𝑇 >) 𝑄(𝐵|𝑊) ⋯
1 𝑄 𝑥. 𝑢. 𝑄 𝑢. ∣ 𝑢.40 𝒖
CS6501 Natural Language Processing 10
𝑗 = 1 𝑗 = 2 𝑗 = 3 𝑗 = 4 N V A N V A N V A N V A ⋯
CS6501 Natural Language Processing 11
𝒖g
𝒖gij k
CS6501 Natural Language Processing 12
𝑗 = 1 𝑗 = 2 𝑗 = 3 𝑗 = 4 N V A N V A N V A N V A ⋯ tag sequences tag @ i=k P(𝒙𝒍, 𝑢Y = 𝑟) tag sequences
𝒖g
k
kl
km
CS6501 Natural Language Processing 13
𝑗 = 𝑙 − 1 𝑗 = 𝑙 N V A N V A N V A N V A ⋯
𝒖g
k
kl
km
CS6501 Natural Language Processing 14
𝑗 = 𝑙 − 1 𝑗 = 𝑙 N V A N V A N V A N V A ⋯ Let’s call it 𝛽Y(𝑟) This is 𝛽Y40(𝑟′)
v 𝛽Y(𝑟)= ∑ 𝛽Y40(𝑟′)
km
𝑄(𝑢Y = 𝑟 ∣ 𝑢Y40 = 𝑟′)𝑄(𝑥Y ∣ 𝑢Y = 𝑟)
CS6501 Natural Language Processing 15
𝑗 = 𝑙 − 1 𝑗 = 𝑙 N V A N V A N V A N V A ⋯
km
km
CS6501 Natural Language Processing 16
𝑗 = 𝑙 − 1 𝑗 = 𝑙 N V A N V A N V A N V A ⋯
CS6501 Natural Language Processing 17
𝑗 = 𝑙 − 1 𝑗 = 𝑙 N V A N V A N V A N V A ⋯ initial probability 𝑞(𝑢0 = 𝑟)
CS6501 Natural Language Processing 18
From Julia Hockenmaier, Intro to NLP
CS6501 Natural Language Processing 19
CS6501 Natural Language Processing 20
km
i
i
CS6501 Natural Language Processing 21
.fwd=0
CS6501 Natural Language Processing 22
p(…|C) p(…|H) p(…|START) p(1|…) 0.5 0.1 p(2|…) 0.4 0.2 p(3|…) 0.1 0.7 p(C|…) 0.8 0.2 0.5 p(H|…) 0.2 0.8 0.5 ard" o
C H C H C H 0.5 0.5 0.8 0.8 0.8 0.8 0.2 0.2 0.2 0.2 0.5 0.5 0.4 0.1 0.1 0.2
CS6501 Natural Language Processing 23
How likely the sentence ”I love cat” occurs POS tags of ”I love cat” occurs How to learn the model?
CS6501 Natural Language Processing 24
initial probability 𝑞(𝑢0)
CS6501 Natural Language Processing 25
CS6501 Natural Language Processing 26
𝒖
1 𝑄 𝑥. 𝑢. 𝑄 𝑢. ∣ 𝑢.40
CS6501 Natural Language Processing 27
𝑄(𝑢C = 2|𝑢0 = 1) 𝑗 = 1 𝑗 = 2 𝑗 = 3 𝑗 = 4 𝑄(𝑢J = 1|𝑢C = 1) 𝑄(𝑥J|𝑢J = 1) ⋯
𝒖
1 𝑄 𝑥. 𝑢. 𝑄 𝑢. ∣ 𝑢.40
CS6501 Natural Language Processing 28
𝑗 = 1 𝑗 = 2 𝑗 = 3 𝑗 = 4 N V A N V A N V A N V A ⋯
CS6501 Natural Language Processing 29
𝒖𝒍 𝑄(𝒖Y,𝒙𝒍) = 𝑛𝑏𝑦k max 𝒖𝒍i𝟐 𝑄(𝒖Y40, 𝑢Y = 𝑟, 𝒙𝒍)
CS6501 Natural Language Processing 30
𝑗 = 1 𝑗 = 2 𝑗 = 3 𝑗 = 4 N V A N V A N V A N V A ⋯ tag sequences tag @ i=k tag sequences
𝒖𝒍i𝟐 𝑄(𝒖Y40, 𝑢Y = 𝑟, 𝒙𝒍)
= 𝑛𝑏𝑦kl max
𝒖𝒍i𝟐 𝑄(𝒖Y4𝟑,𝑢Y = 𝑟,𝑢Y40 = 𝑟m,𝒙𝒍)
= 𝑛𝑏𝑦kl max
𝒖𝒍i𝟐 𝑄 𝒖Y4𝟑, 𝑢Y40 = 𝑟m,𝒙𝒍4𝟐 𝑄 𝑢Y = 𝑟,𝑢Y40 = 𝑟m 𝑄 𝑥Y
𝑢Y = 𝑟
CS6501 Natural Language Processing 31
𝑗 = 𝑙 − 1 𝑗 = 𝑙 N V A N V A N V A N V A ⋯ Let’s call it 𝜀Y(𝑟) This is 𝜀Y40(𝑟′)
v 𝜀Y 𝑟 = max
km 𝜀Y40(𝑟m)𝑄(𝑢Y = 𝑟 ∣ 𝑢Y40 = 𝑟′)𝑄(𝑥Y ∣ 𝑢Y = 𝑟)
CS6501 Natural Language Processing 32
𝑗 = 𝑙 − 1 𝑗 = 𝑙 N V A N V A N V A N V A ⋯
kl 𝜀Y40 𝑟m 𝑄 𝑢Y = 𝑟 ∣ 𝑢Y40 = 𝑟m 𝑄 𝑥Y ∣ 𝑢Y = 𝑟
kl 𝜀Y40 𝑟m 𝑄 𝑢Y = 𝑟 ∣ 𝑢Y40 = 𝑟m
CS6501 Natural Language Processing 33
𝑗 = 𝑙 − 1 𝑗 = 𝑙 N V A N V A N V A N V A ⋯
CS6501 Natural Language Processing 34
𝑗 = 𝑙 − 1 𝑗 = 𝑙 N V A N V A N V A N V A ⋯ initial probability 𝑞(𝑢0 = 𝑟)
CS6501 Natural Language Processing 35
CS6501 Natural Language Processing 36
kl 𝜀Y40 𝑟m 𝑄 𝑢Y = 𝑟 ∣ 𝑢Y40 = 𝑟m
CS6501 Natural Language Processing 37
CS6501 Natural Language Processing 38
.fwd=0
CS6501 Natural Language Processing 39
CS6501 Natural Language Processing 40
p(…|C) p(…|H) p(…|START) p(1|…) 0.5 0.1 p(2|…) 0.4 0.2 p(3|…) 0.1 0.7 p(C|…) 0.8 0.2 0.5 p(H|…) 0.2 0.8 0.5 ard" o
C H C H C H 0.5 0.5 0.8 0.8 0.8 0.8 0.2 0.2 0.2 0.2 0.5 0.5 0.4 0.1 0.1 0.2
CS6501 Natural Language Processing 41
CS6501 Natural Language Processing 42
How likely the sentence ”I love cat” occurs POS tags of ”I love cat” occurs How to learn the model?