Sequence Models
Spring 2020 CMPT 825: Natural Language Processing Adapted from slides from Danqi Chen and Karthik Narasimhan (Princeton COS 484)
SFU NatLangLab
Sequence Models Spring 2020 Adapted from slides from Danqi Chen and - - PowerPoint PPT Presentation
SFU NatLangLab CMPT 825: Natural Language Processing Sequence Models Spring 2020 Adapted from slides from Danqi Chen and Karthik Narasimhan (Princeton COS 484) Overview Hidden markov models (HMM) Viterbi algorithm Maximum entropy
SFU NatLangLab
function words
determiners (the, a)
frequently
(google), adjectives, adverbs
the training set. (e.g. man/NN)
words (Sequence modeling!)
s1 s2 s3 s4
s1 s2 s3 s4
The/?? cat/?? sat/?? on/?? the/?? mat/??
events.
s1 s2 s3 s4 the cat sat
s1 s2 s3 s4
s1 s2 s3 s4
s1 s2 s3 s4
s1 s2 s3 s4
s1 s2 s3 s4
DT NN IN VBD DT 0.5 0.8 0.05 0.1 NN 0.05 0.2 0.15 0.6 IN 0.5 0.2 0.05 0.25 VBD 0.3 0.3 0.3 0.1
the cat sat
mat DT 0.5 NN 0.01 0.2 0.01 0.01 0.2 IN 0.4 VBD 0.01 0.1 0.01 0.01
DT NN IN VBD DT 0.5 0.8 0.05 0.1 NN 0.05 0.2 0.15 0.6 IN 0.5 0.2 0.05 0.25 VBD 0.3 0.3 0.3 0.1
the cat sat
mat DT 0.5 NN 0.01 0.2 0.01 0.01 0.2 IN 0.4 VBD 0.01 0.1 0.01 0.01
given the
⟨s1, s2, . . . , sn⟩ ⟨o1, o2, . . . , on⟩
? ? ? ?
given the
⟨s1, s2, . . . , sn⟩ ⟨o1, o2, . . . , on⟩
? ? ? ?
given the
⟨s1, s2, . . . , sn⟩ ⟨o1, o2, . . . , on⟩
? ? ? ?
DT ? ? ? The
DT NN ? ? The cat
DT NN VBD IN The cat sat
DT NN VBD IN the
DT NN VBD IN cat the DT NN VBD IN
M[2,DT] = max
k
M[1,k] P(DT|k) P(cat|DT) M[2,NN] = max
k
M[1,k] P(NN|k) P(cat|NN) M[2,VBD] = max
k
M[1,k] P(VBD|k) P(cat|VBD) M[2,IN] = max
k
M[1,k] P(IN|k) P(cat|IN)
DT NN VBD IN The cat sat
DT NN VBD IN DT NN VBD IN DT NN VBD IN
k
k
DT NN VBD IN The cat sat
DT NN VBD IN DT NN VBD IN DT NN VBD IN
k
k
DT NN VBD IN The cat sat
DT NN VBD IN DT NN VBD IN DT NN VBD IN
DT NN VBD IN The cat sat
DT NN VBD IN DT NN VBD IN DT NN VBD IN
DT NN VBD IN The
The cat DT NN VBD IN
DT NN VBD IN
The cat DT NN VBD IN DT NN VBD IN
The cat DT NN VBD IN DT NN VBD IN
sat
DT NN VBD IN DT NN VBD IN
k
DT NN VBD IN The cat sat
DT NN VB IN The cat sat
DT NN VB IN The cat sat
̂ S = arg max
S
P(S|O) = arg max
S
∏
i
P(si|oi, si−1) P(si|oi, si−1) ∝ exp(w ⋅ f(si, oi, si−1))
DT NN VB IN The cat sat
DT NN VB IN The cat sat
S = arg max
S
P(S|O) = arg max
S
∏
i
P(si|on, oi−1, . . . , o1, si−1, . . . , s1) P(si|si−1, . . . , s1, O) ∝ exp(w ⋅ f(si, si−1, . . . , s1, O)
S
S
DT NN VBD IN The cat sat
(assume features only on previous time step and current obs)
S
S
DT NN VBD IN The cat sat
S
S
DT NN VBD IN The cat sat
S
S
k
i
s1 s2 s3 s4
DT NN VB IN The cat sat
s1 s2 s3 s4
s1 s2 s3 s4