1
NLP Programming Tutorial 11 – The Structured Perceptron
NLP Programming Tutorial 11 - The Structured Perceptron Graham - - PowerPoint PPT Presentation
NLP Programming Tutorial 11 The Structured Perceptron NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 11 The Structured Perceptron
1
NLP Programming Tutorial 11 – The Structured Perceptron
2
NLP Programming Tutorial 11 – The Structured Perceptron
A book review Oh, man I love this book! This book is so boring... Is it positive? yes no
Binary Prediction (2 choices)
A tweet On the way to the park! 公園に行くなう! Its language English Japanese
Multi-class Prediction (several choices)
A sentence I read a book Its syntactic parse
Structured Prediction (millions of choices)
I read a book
DET NN NP VBD VP S N
3
NLP Programming Tutorial 11 – The Structured Perceptron
A book review Oh, man I love this book! This book is so boring... Is it positive? yes no
Binary Prediction (2 choices)
A tweet On the way to the park! 公園に行くなう! Its language English Japanese
Multi-class Prediction (several choices)
A sentence I read a book Its syntactic parse
Structured Prediction (millions of choices)
I read a book
DET NN NP VBD VP S N
Most NLP Problems!
4
NLP Programming Tutorial 11 – The Structured Perceptron
Perceptron, SVM, Neural Net Lots of features Binary prediction
HMM POS Tagging CFG Parsing Conditional probabilities Structured prediction
5
NLP Programming Tutorial 11 – The Structured Perceptron
Perceptron, SVM, Neural Net Lots of features Binary prediction
HMM POS Tagging CFG Parsing Conditional probabilities Structured prediction
6
NLP Programming Tutorial 11 – The Structured Perceptron
Collins “Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms” ACL02
Huang+ “Forest Reranking: Discriminative Parsing with Non-Local Features” ACL08
Liang+ “An End-to-End Discriminative Approach to Machine Translation” ACL06 (Neubig+ “Inducing a Discriminative Parser for Machine Translation Reordering, EMNLP12”, Plug :) )
Roark+ “Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm” ACL04
7
NLP Programming Tutorial 11 – The Structured Perceptron
Natural language processing ( NLP ) is a field of computer science
JJ NN NN -LRB- NN -RRB- VBZ DT NN IN NN NN
8
NLP Programming Tutorial 11 – The Structured Perceptron
natural language processing ( nlp ) ... <s> JJ NN NN LRB NN RRB ... </s>
PT(JJ|<s>) PT(NN|JJ) PT(NN|NN) … PE(natural|JJ) PE(language|NN) PE(processing|NN) …
P(Y)≈∏i=1
I+1
PT (y i∣y i−1) P(X∣Y)≈∏1
I
PE( xi∣y i)
* * * *
9
NLP Programming Tutorial 11 – The Structured Perceptron
10
NLP Programming Tutorial 11 – The Structured Perceptron
P(X ,Y)=∏1
I
PE( x i∣y i)∏i=1
I+1
PT( y i∣y i−1)
Normal HMM:
11
NLP Programming Tutorial 11 – The Structured Perceptron
P(X ,Y)=∏1
I
PE( x i∣y i)∏i=1
I+1
PT( y i∣y i−1)
Normal HMM:
logP(X ,Y)=∑1
I
logPE( xi∣y i)∑i=1
I+1
logPT ( y i∣y i −1)
Log Likelihood:
12
NLP Programming Tutorial 11 – The Structured Perceptron
P(X ,Y)=∏1
I
PE( x i∣y i)∏i=1
I+1
PT( y i∣y i−1)
Normal HMM:
logP(X ,Y)=∑1
I
logPE( xi∣y i)∑i=1
I+1
logPT ( y i∣y i −1)
Log Likelihood:
S(X ,Y)=∑1
I
w E , y i ,x i∑i=1
I+1
wT ,y i−1 , y i
Score
13
NLP Programming Tutorial 11 – The Structured Perceptron
P(X ,Y)=∏1
I
PE( x i∣y i)∏i=1
I+1
PT( y i∣y i−1)
Normal HMM:
logP(X ,Y)=∑1
I
logPE( xi∣y i)∑i=1
I+1
logPT ( y i∣y i −1)
Log Likelihood:
S(X ,Y)=∑1
I
w E , y i ,x i∑i=1
I+1
w E , y i−1, y i
Score
w E , y i ,x i=logPE(x i∣y i)
When:
w T ,y i−1 , y i=logPT( y i∣y i−1)
log P(X,Y) = S(X,Y)
14
NLP Programming Tutorial 11 – The Structured Perceptron
φT,<S>,PRP(X,Y1) = 1 φT,PRP,VBD(X,Y1) = 1 φT,VBD,NNP(X,Y1) = 1 φT,NNP,</S>(X,Y1) = 1 φE,PRP,”I”(X,Y1) = 1 φE,VBD,”visited”(X,Y1) = 1 φE,NNP,”Nara”(X,Y1) = 1 φT,<S>,NNP(X,Y1) = 1 φT,NNP,VBD(X,Y1) = 1 φT,VBD,NNP(X,Y1) = 1 φT,NNP,</S>(X,Y1) = 1 φE,NNP,”I”(X,Y1) = 1 φE,VBD,”visited”(X,Y1) = 1 φE,NNP,”Nara”(X,Y1) = 1 φCAPS,PRP(X,Y1) = 1 φCAPS,NNP(X,Y1) = 1 φCAPS,NNP(X,Y1) = 2 φSUF,VBD,”...ed”(X,Y1) = 1 φSUF,VBD,”...ed”(X,Y1) = 1
15
NLP Programming Tutorial 11 – The Structured Perceptron
16
NLP Programming Tutorial 11 – The Structured Perceptron
probability
17
NLP Programming Tutorial 11 – The Structured Perceptron
1:PRP 1:NNP
0:<S>
best_score[“1 NN”] = -log PT(NN|<S>) + -log PE(I | NN) best_score[“1 JJ”] = -log PT(JJ|<S>) + -log PE(I | JJ) best_score[“1 VB”] = -log PT(VB|<S>) + -log PE(I | VB) best_score[“1 PRP”] = -log PT(PRP|<S>) + -log PE(I | PRP) best_score[“1 NNP”] = -log PT(NNP|<S>) + -log PE(I | NNP)
18
NLP Programming Tutorial 11 – The Structured Perceptron
1:PRP 1:NNP
best_score[“2 NN”] = min( best_score[“1 NN”] + -log PT(NN|NN) + -log PE(visited | NN), best_score[“1 JJ”] + -log PT(NN|JJ) + -log PE(language | NN), best_score[“1 VB”] + -log PT(NN|VB) + -log PE(language | NN), best_score[“1 PRP”] + -log PT(NN|PRP) + -log PE(language | NN), best_score[“1 NNP”] + -log PT(NN|NNP) + -log PE(language | NN), ... )
2:PRP 2:NNP
best_score[“2 JJ”] = min( best_score[“1 NN”] + -log PT(JJ|NN) + -log PE(language | JJ), best_score[“1 JJ”] + -log PT(JJ|JJ) + -log PE(language | JJ), best_score[“1 VB”] + -log PT(JJ|VB) + -log PE(language | JJ), ...
19
NLP Programming Tutorial 11 – The Structured Perceptron
1:PRP 1:NNP
0:<S>
best_score[“1 NN”] = wT,<S>,NN + wE,NN,I best_score[“1 JJ”] = wT,<S>,JJ + wE,JJ,I best_score[“1 VB”] = wT,<S>,VB + wE,VB,I best_score[“1 PRP”] = wT,<S>,PRP + wE,PRP,I best_score[“1 NNP”] = wT,<S>,NNP + wE,NNP,I
20
NLP Programming Tutorial 11 – The Structured Perceptron
1:PRP 1:NNP
0:<S>
best_score[“1 NN”] = wT,<S>,NN + wE,NN,I + wCAPS,NN best_score[“1 JJ”] = wT,<S>,JJ + wE,JJ,I + wCAPS,JJ best_score[“1 VB”] = wT,<S>,VB + wE,VB,I + wCAPS,VB best_score[“1 PRP”] = wT,<S>,PRP + wE,PRP,I + wCAPS,PRP best_score[“1 NNP”] = wT,<S>,NNP + wE,NNP,I + wCAPS,NNP
21
NLP Programming Tutorial 11 – The Structured Perceptron
22
NLP Programming Tutorial 11 – The Structured Perceptron
23
NLP Programming Tutorial 11 – The Structured Perceptron
24
NLP Programming Tutorial 11 – The Structured Perceptron
25
NLP Programming Tutorial 11 – The Structured Perceptron
26
NLP Programming Tutorial 11 – The Structured Perceptron
create_trans( )
create_emit( )
φ[“T,NNP,VBD”] = 1 φ[“E,NNP,Nara”] = 1 φ[“CAPS,NNP”] = 1
27
NLP Programming Tutorial 11 – The Structured Perceptron
28
NLP Programming Tutorial 11 – The Structured Perceptron
split line into words I = length(words) make maps best_score, best_edge best_score[“0 <s>”] = 0 # Start with <s> best_edge[“0 <s>”] = NULL for i in 0 … I-1: for each prev in keys of possible_tags for each next in keys of possible_tags if best_score[“i prev”] and transition[“prev next”] exist score = best_score[“i prev”] +
w*(create_t(prev,next)+create_e(next,word[i])) if best_score[“i+1 next”] is new or < score best_score[“i+1 next”] = score best_edge[“i+1 next”] = “i prev” # Finally, do the same for </s>
29
NLP Programming Tutorial 11 – The Structured Perceptron
30
NLP Programming Tutorial 11 – The Structured Perceptron
and run the program on data/wiki-en-test.norm
script/gradepos.pl data/wiki-en-test.pos my_answer.pos
31
NLP Programming Tutorial 11 – The Structured Perceptron