1
Sequential Data Modeling – The Structured Perceptron
Sequential Data Modeling - The Structured Perceptron
Graham Neubig Nara Institute of Science and Technology (NAIST)
Sequential Data Modeling - The Structured Perceptron Graham Neubig - - PowerPoint PPT Presentation
Sequential Data Modeling The Structured Perceptron Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Sequential Data Modeling The Structured Perceptron Prediction
1
Sequential Data Modeling – The Structured Perceptron
Graham Neubig Nara Institute of Science and Technology (NAIST)
2
Sequential Data Modeling – The Structured Perceptron
3
Sequential Data Modeling – The Structured Perceptron
A book review Oh, man I love this book! This book is so boring... Is it positive? yes no
Binary Prediction (2 choices)
A tweet On the way to the park! 公園に行くなう! Its language English Japanese
Multi-class Prediction (several choices)
A sentence I read a book Its parts-of-speech
Structured Prediction (millions of choices)
I read a book
DET NN VBD N
Sequential prediction is a subset
4
Sequential Data Modeling – The Structured Perceptron
5
Sequential Data Modeling – The Structured Perceptron
Given
Gonso was a Sanron sect priest (754-827) in the late Nara and early Heian periods.
Predict
Yes!
Shichikuzan Chigogataki Fudomyoo is a historical site located at Magura, Maizuru City, Kyoto Prefecture.
No!
6
Sequential Data Modeling – The Structured Perceptron
Gonso was a Sanron sect priest ( 754 – 827 ) in the late Nara and early Heian periods . Shichikuzan Chigogataki Fudomyoo is a historical site located at Magura , Maizuru City , Kyoto Prefecture .
7
Sequential Data Modeling – The Structured Perceptron
Gonso was a Sanron sect priest ( 754 – 827 ) in the late Nara and early Heian periods . Shichikuzan Chigogataki Fudomyoo is a historical site located at Magura , Maizuru City , Kyoto Prefecture .
Contains “priest” → probably person! Contains “site” → probably not person! Contains “(<#>-<#>)” → probably person! Contains “Kyoto Prefecture” → probably not person!
8
Sequential Data Modeling – The Structured Perceptron
and negative if it indicates “no”
contains “priest” contains “(<#>-<#>)” contains “site” contains “Kyoto Prefecture” wcontains “priest” = 2 wcontains “(<#>-<#>)” = 1 wcontains “site” = -3 wcontains “Kyoto Prefecture” = -1 Kuya (903-972) was a priest born in Kyoto Prefecture.
2 + -1 + 1 = 2
9
Sequential Data Modeling – The Structured Perceptron
I
i( x))
10
Sequential Data Modeling – The Structured Perceptron
x = A site , located in Maizuru , Kyoto
φunigram “A”(x) = 1 φunigram “site”(x) = 1 φunigram “,”(x) = 2 φunigram “located”(x) = 1 φunigram “in”(x) = 1 φunigram “Maizuru”(x) = 1 φunigram “Kyoto”(x) = 1 φunigram “the”(x) = 0 φunigram “temple”(x) = 0
…
The rest are all 0
instead of feature indexes (φ1)
11
Sequential Data Modeling – The Structured Perceptron
x = A site , located in Maizuru , Kyoto
φunigram “A”(x) = 1 φunigram “site”(x) = 1 φunigram “,”(x) = 2 φunigram “located”(x) = 1 φunigram “in”(x) = 1 φunigram “Maizuru”(x) = 1 φunigram “Kyoto”(x) = 1 wunigram “a” = 0 wunigram “site” = -3 wunigram “located” = 0 wunigram “Maizuru” = 0 wunigram “,” = 0 wunigram “in” = 0 wunigram “Kyoto” = 0 φunigram “priest”(x) = 0 wunigram “priest” = 2 φunigram “black”(x) = 0 wunigram “black” = 0
+ + + + + + + + + =
12
Sequential Data Modeling – The Structured Perceptron
13
Sequential Data Modeling – The Structured Perceptron
y x
1
FUJIWARA no Chikamori ( year of birth and death unknown ) was a samurai and poet who lived at the end of the Heian period .
1
Ryonen ( 1646 - October 29 , 1711 ) was a Buddhist nun of the Obaku Sect who lived from the early Edo period to the mid-Edo period .
A moat settlement is a village surrounded by a moat .
Fushimi Momoyama Athletic Park is located in Momoyama-cho , Kyoto City , Kyoto Prefecture .
14
Sequential Data Modeling – The Structured Perceptron
create map w for I iterations for each labeled pair x, y in the data phi = create_features(x) y' = predict_one(w, phi) if y' != y update_weights(w, phi, y)
15
Sequential Data Modeling – The Structured Perceptron
– Features for positive examples get a higher weight
– Features for negative examples get a lower weight
→ Every time we update, our predictions get better!
16
Sequential Data Modeling – The Structured Perceptron
x = A site , located in Maizuru , Kyoto y = -1
wunigram “A” = -1 wunigram “site” = -1 wunigram “,” = -2 wunigram “located” = -1 wunigram “in” = -1 wunigram “Maizuru” = -1 wunigram “Kyoto” = -1
17
Sequential Data Modeling – The Structured Perceptron
x = Shoken , monk born in Kyoto y = 1
wunigram “A” = -1 wunigram “site” = -1 wunigram “,” = -1 wunigram “located” = -1 wunigram “in” = 0 wunigram “Maizuru” = -1 wunigram “Kyoto” = 0
wunigram “Shoken” = 1 wunigram “monk” = 1 wunigram “born” = 1
18
Sequential Data Modeling – The Structured Perceptron
19
Sequential Data Modeling – The Structured Perceptron
sequence Y
Natural language processing ( NLP ) is a field of computer science
JJ NN NN -LRB- NN -RRB- VBZ DT NN IN NN NN
20
Sequential Data Modeling – The Structured Perceptron
sentence”
Natural language processing ( NLP ) is a field of computer science
JJ NN NN LRB NN RRB VBZ DT NN IN NN NN
21
Sequential Data Modeling – The Structured Perceptron
Y
Y
Y
Model of word/POS interactions “natural” is probably a JJ Model of POS/POS interactions NN comes after DET
22
Sequential Data Modeling – The Structured Perceptron
natural language processing ( nlp ) ... <s> JJ NN NN LRB NN RRB ... </s>
PT(JJ|<s>) PT(NN|JJ) PT(NN|NN) … PE(natural|JJ) PE(language|NN) PE(processing|NN) …
P(Y)≈∏i=1
I+ 1
PT (y i∣y i−1) P(X∣Y)≈∏1
I
PE( xi∣y i)
* * * *
23
Sequential Data Modeling – The Structured Perceptron
natural language processing ( nlp ) is … <s> JJ NN NN LRB NN RRB VB … </s>
PT(LRB|NN) = c(NN LRB)/c(NN) = 1/3 PE(language|NN) = c(NN → language)/c(NN) = 1/3
c(JJ→natural)++ c(NN→language)++ c(<s> JJ)++ c(JJ NN)++
… …
24
Sequential Data Modeling – The Structured Perceptron
probability
25
Sequential Data Modeling – The Structured Perceptron
first word for every POS 1:NN 1:JJ 1:VB
1:PRN 1:NNP
…
0:<S>
I
best_score[“1 NN”] = -log PT(NN|<S>) + -log PE(I | NN) best_score[“1 JJ”] = -log PT(JJ|<S>) + -log PE(I | JJ) best_score[“1 VB”] = -log PT(VB|<S>) + -log PE(I | VB) best_score[“1 PRN”] = -log PT(PRN|<S>) + -log PE(I | PRN) best_score[“1 NNP”] = -log PT(NNP|<S>) + -log PE(I | NNP)
26
Sequential Data Modeling – The Structured Perceptron
possible previous POS tags 1:NN 1:JJ 1:VB
1:PRN 1:NNP
… I
best_score[“2 NN”] = min( best_score[“1 NN”] + -log PT(NN|NN) + -log PE(visited | NN), best_score[“1 JJ”] + -log PT(NN|JJ) + -log PE(visited | NN), best_score[“1 VB”] + -log PT(NN|VB) + -log PE(visited | NN), best_score[“1 PRN”] + -log PT(NN|PRN) + -log PE(visited | NN), best_score[“1 NNP”] + -log PT(NN|NNP) + -log PE(visited | NN), ... )
2:NN 2:JJ 2:VB
2:PRN 2:NNP
… visited
best_score[“2 JJ”] = min( best_score[“1 NN”] + -log PT(JJ|NN) + -log PE(visited | JJ), best_score[“1 JJ”] + -log PT(JJ|JJ) + -log PE(visited | JJ), best_score[“1 VB”] + -log PT(JJ|VB) + -log PE(visited | JJ), ...
27
Sequential Data Modeling – The Structured Perceptron
28
Sequential Data Modeling – The Structured Perceptron
Classifiers
Perceptron Lots of features Binary prediction
Generative Models
HMM Conditional probabilities Structured prediction
29
Sequential Data Modeling – The Structured Perceptron
Classifiers
Perceptron Lots of features Binary prediction
Generative Models
HMM Conditional probabilities Structured prediction
Structured perceptron → Classification with lots of features
30
Sequential Data Modeling – The Structured Perceptron
31
Sequential Data Modeling – The Structured Perceptron
P(X ,Y)=∏1
I
PE( x i∣y i)∏i=1
I+ 1
PT( y i∣y i−1)
Normal HMM:
32
Sequential Data Modeling – The Structured Perceptron
P(X ,Y)=∏1
I
PE( x i∣y i)∏i=1
I+ 1
PT( y i∣y i−1)
Normal HMM:
logP(X ,Y)=∑1
I
logPE( xi∣y i)∑i=1
I+ 1
logPT ( y i∣y i −1)
Log Likelihood:
33
Sequential Data Modeling – The Structured Perceptron
P(X ,Y)=∏1
I
PE( x i∣y i)∏i=1
I+ 1
PT( y i∣y i−1)
Normal HMM:
logP(X ,Y)=∑1
I
logPE( xi∣y i)∑i=1
I+ 1
logPT ( y i∣y i −1)
Log Likelihood:
S(X ,Y)=∑1
I
w E , y i ,x i∑i=1
I+ 1
wT ,y i−1 , y i
Score
34
Sequential Data Modeling – The Structured Perceptron
P(X ,Y)=∏1
I
PE( x i∣y i)∏i=1
I+ 1
PT( y i∣y i−1)
Normal HMM:
logP( X ,Y )=∑i=1
I
logPE (xi∣y i)+∑i=1
I+1
logPT(y i∣y i−1)
Log Likelihood:
S(X ,Y )=∑i =1
I
w E , y i ,x i+∑i =1
I+1
w E ,y i −1 ,y i
Score
w E , y i ,x i=logPE(x i∣y i)
When:
w T ,y i−1 , y i=logPT( y i∣y i−1)
log P(X,Y) = S(X,Y)
35
Sequential Data Modeling – The Structured Perceptron
I visited Nara PRN VBD NNP
I visited Nara NNP VBD NNP
φT,<S>,PRN(X,Y1) = 1 φT,PRN,VBD(X,Y1) = 1 φT,VBD,NNP(X,Y1) = 1 φT,NNP,</S>(X,Y1) = 1 φE,PRN,”I”(X,Y1) = 1 φE,VBD,”visited”(X,Y1) = 1 φE,NNP,”Nara”(X,Y1) = 1 φT,<S>,NNP(X,Y1) = 1 φT,NNP,VBD(X,Y1) = 1 φT,VBD,NNP(X,Y1) = 1 φT,NNP,</S>(X,Y1) = 1 φE,NNP,”I”(X,Y1) = 1 φE,VBD,”visited”(X,Y1) = 1 φE,NNP,”Nara”(X,Y1) = 1 φCAPS,PRN(X,Y1) = 1 φCAPS,NNP(X,Y1) = 1 φCAPS,NNP(X,Y1) = 2 φSUF,VBD,”...ed”(X,Y1) = 1 φSUF,VBD,”...ed”(X,Y1) = 1
36
Sequential Data Modeling – The Structured Perceptron
37
Sequential Data Modeling – The Structured Perceptron
1:NN 1:JJ 1:VB
1:PRN 1:NNP
…
0:<S>
I
best_score[“1 NN”] = wT,<S>,NN + wE,NN,I best_score[“1 JJ”] = wT,<S>,JJ + wE,JJ,I best_score[“1 VB”] = wT,<S>,VB + wE,VB,I best_score[“1 PRN”] = wT,<S>,PRN + wE,PRN,I best_score[“1 NNP”] = wT,<S>,NNP + wE,NNP,I
38
Sequential Data Modeling – The Structured Perceptron
1:NN 1:JJ 1:VB
1:PRN 1:NNP
…
0:<S>
I
best_score[“1 NN”] = wT,<S>,NN + wE,NN,I + wCAPS,NN best_score[“1 JJ”] = wT,<S>,JJ + wE,JJ,I + wCAPS,JJ best_score[“1 VB”] = wT,<S>,VB + wE,VB,I + wCAPS,VB best_score[“1 PRN”] = wT,<S>,PRN + wE,PRN,I + wCAPS,PRN best_score[“1 NNP”] = wT,<S>,NNP + wE,NNP,I + wCAPS,NNP
39
Sequential Data Modeling – The Structured Perceptron
increase score of positive examples decrease score of negative examples
40
Sequential Data Modeling – The Structured Perceptron
I visited Nara PRN VBD NNP
I visited Nara NNP VBD NNP
41
Sequential Data Modeling – The Structured Perceptron
I visited Nara NNP VBD NNP
I visited Nara PRN VBD NN
I visited Nara PRN VB NNP
42
Sequential Data Modeling – The Structured Perceptron
the highest score:
43
Sequential Data Modeling – The Structured Perceptron
φT,<S>,PRN(X,Y1) = 1 φT,PRN,VBD(X,Y1) = 1 φT,VBD,NNP(X,Y1) = 1 φT,NNP,</S>(X,Y1) = 1 φE,PRN,”I”(X,Y1) = 1 φE,VBD,”visited”(X,Y1) = 1 φE,NNP,”Nara”(X,Y1) = 1 φT,<S>,NNP(X,Y1) = 1 φT,NNP,VBD(X,Y1) = 1 φT,VBD,NNP(X,Y1) = 1 φT,NNP,</S>(X,Y1) = 1 φE,NNP,”I”(X,Y1) = 1 φE,VBD,”visited”(X,Y1) = 1 φE,NNP,”Nara”(X,Y1) = 1 φCAPS,PRN(X,Y1) = 1 φCAPS,NNP(X,Y1) = 1 φCAPS,NNP(X,Y1) = 2 φSUF,VBD,”...ed”(X,Y1) = 1 φSUF,VBD,”...ed”(X,Y1) = 1 φT,<S>,PRN(X,Y1) = 1 φT,NNP,VBD(X,Y1) = -1 φT,VBD,NNP(X,Y1) = 0 φT,NNP,</S>(X,Y1) = 0 φE,NNP,”I”(X,Y1) = -1 φE,VBD,”visited”(X,Y1) = 0 φE,NNP,”Nara”(X,Y1) = 0 φCAPS,NNP(X,Y1) = -1 φSUF,VBD,”...ed”(X,Y1) = 0 φT,<S>,NNP(X,Y1) = -1 φE,PRN,”I”(X,Y1) = 1 φT,PRN,VBD(X,Y1) = 1 φCAPS,PRN(X,Y1) = 1
44
Sequential Data Modeling – The Structured Perceptron
create map w for I iterations for each labeled pair X, Y_prime in the data Y_hat = hmm_viterbi(w, X) phi_prime = create_features(X, Y_prime) phi_hat = create_features(X, Y_hat) w += phi_prime - phi_hat
45
Sequential Data Modeling – The Structured Perceptron
structured prediction model
46
Sequential Data Modeling – The Structured Perceptron