nlp programming tutorial 11 the structured perceptron
play

NLP Programming Tutorial 11 - The Structured Perceptron Graham - PowerPoint PPT Presentation

NLP Programming Tutorial 11 The Structured Perceptron NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 11 The Structured Perceptron


  1. NLP Programming Tutorial 11 – The Structured Perceptron NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1

  2. NLP Programming Tutorial 11 – The Structured Perceptron Prediction Problems Given x, predict y A book review Is it positive? Binary Oh, man I love this book! Prediction yes (2 choices) This book is so boring... no A tweet Its language Multi-class On the way to the park! English Prediction (several choices) Japanese 公園に行くなう! A sentence Its syntactic parse S VP Structured I read a book NP Prediction (millions of choices) N VBD DET NN 2 I read a book

  3. NLP Programming Tutorial 11 – The Structured Perceptron Prediction Problems Given x, predict y A book review Is it positive? Binary Oh, man I love this book! Prediction yes (2 choices) This book is so boring... no A tweet Its language Multi-class On the way to the park! English Prediction (several choices) Japanese 公園に行くなう! Most NLP A sentence Its syntactic parse Problems! S VP Structured I read a book NP Prediction (millions of choices) N VBD DET NN 3 I read a book

  4. NLP Programming Tutorial 11 – The Structured Perceptron So Far, We Have Learned Classifiers Generative Models Perceptron, SVM, Neural Net HMM POS Tagging CFG Parsing Lots of features Conditional probabilities Binary prediction Structured prediction 4

  5. NLP Programming Tutorial 11 – The Structured Perceptron Structured Perceptron Classifiers Generative Models Perceptron, SVM, Neural Net HMM POS Tagging CFG Parsing Lots of features Conditional probabilities Binary prediction Structured prediction Structured perceptron → Classification with lots of features over structured models! 5

  6. NLP Programming Tutorial 11 – The Structured Perceptron Uses of Structured Perceptron (or Variants) ● POS Tagging with HMMs Collins “Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms” ACL02 ● Parsing Huang+ “Forest Reranking: Discriminative Parsing with Non-Local Features” ACL08 ● Machine Translation Liang+ “An End-to-End Discriminative Approach to Machine Translation” ACL06 (Neubig+ “Inducing a Discriminative Parser for Machine Translation Reordering, EMNLP12”, Plug :) ) ● Discriminative Language Models Roark+ “Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm” ACL04 6

  7. NLP Programming Tutorial 11 – The Structured Perceptron Example: Part of Speech (POS) Tagging ● Given a sentence X, predict its part of speech sequence Y Natural language processing ( NLP ) is a field of computer science JJ NN NN -LRB- NN -RRB- VBZ DT NN IN NN NN ● A type of structured prediction 7

  8. NLP Programming Tutorial 11 – The Structured Perceptron Hidden Markov Models (HMMs) for POS Tagging ● POS→POS transition probabilities I + 1 P ( Y )≈ ∏ i = 1 ● Like a bigram model! P T ( y i ∣ y i − 1 ) ● POS→Word emission probabilities I P ( X ∣ Y )≈ ∏ 1 P E ( x i ∣ y i ) P T (JJ|<s>) P T (NN|JJ) P T (NN|NN) … * * <s> JJ NN NN LRB NN RRB ... </s> natural language processing ( nlp ) ... P E (natural|JJ) P E (language|NN) P E (processing|NN) * * … 8

  9. NLP Programming Tutorial 11 – The Structured Perceptron Why are Features Good? ● Can easily try many different ideas ● Are capital letters usually nouns? ● Are words that end with -ed usually verbs? -ing? 9

  10. NLP Programming Tutorial 11 – The Structured Perceptron Restructuring HMM With Features I I + 1 P ( X ,Y )= ∏ 1 P E ( x i ∣ y i ) ∏ i = 1 P T ( y i ∣ y i − 1 ) Normal HMM: 10

  11. NLP Programming Tutorial 11 – The Structured Perceptron Restructuring HMM With Features I I + 1 P ( X ,Y )= ∏ 1 P E ( x i ∣ y i ) ∏ i = 1 P T ( y i ∣ y i − 1 ) Normal HMM: I I + 1 log P ( X ,Y )= ∑ 1 log P E ( x i ∣ y i ) ∑ i = 1 log P T ( y i ∣ y i − 1 ) Log Likelihood: 11

  12. NLP Programming Tutorial 11 – The Structured Perceptron Restructuring HMM With Features I I + 1 P ( X ,Y )= ∏ 1 P E ( x i ∣ y i ) ∏ i = 1 P T ( y i ∣ y i − 1 ) Normal HMM: I I + 1 log P ( X ,Y )= ∑ 1 log P E ( x i ∣ y i ) ∑ i = 1 log P T ( y i ∣ y i − 1 ) Log Likelihood: I I + 1 S ( X ,Y )= ∑ 1 w E , y i ,x i ∑ i = 1 w T ,y i − 1 , y i Score 12

  13. NLP Programming Tutorial 11 – The Structured Perceptron Restructuring HMM With Features I I + 1 P ( X ,Y )= ∏ 1 P E ( x i ∣ y i ) ∏ i = 1 P T ( y i ∣ y i − 1 ) Normal HMM: I I + 1 log P ( X ,Y )= ∑ 1 log P E ( x i ∣ y i ) ∑ i = 1 log P T ( y i ∣ y i − 1 ) Log Likelihood: I I + 1 S ( X ,Y )= ∑ 1 w E , y i ,x i ∑ i = 1 w E , y i − 1 , y i Score w E , y i ,x i = log P E ( x i ∣ y i ) w T ,y i − 1 , y i = log P T ( y i ∣ y i − 1 ) When: log P(X,Y) = S(X,Y) 13

  14. NLP Programming Tutorial 11 – The Structured Perceptron Example I visited Nara φ( ) = PRP VBD NNP φ T,<S>,PRP (X,Y 1 ) = 1 φ T,PRP,VBD (X,Y 1 ) = 1 φ T,VBD,NNP (X,Y 1 ) = 1 φ T,NNP,</S> (X,Y 1 ) = 1 φ E,PRP,”I” (X,Y 1 ) = 1 φ E,VBD,”visited” (X,Y 1 ) = 1 φ E,NNP,”Nara” (X,Y 1 ) = 1 φ CAPS,PRP (X,Y 1 ) = 1 φ CAPS,NNP (X,Y 1 ) = 1 φ SUF,VBD,”...ed” (X,Y 1 ) = 1 I visited Nara φ( ) = NNP VBD NNP φ T,<S>,NNP (X,Y 1 ) = 1 φ T,NNP,VBD (X,Y 1 ) = 1 φ T,VBD,NNP (X,Y 1 ) = 1 φ T,NNP,</S> (X,Y 1 ) = 1 φ E,NNP,”I” (X,Y 1 ) = 1 φ E,VBD,”visited” (X,Y 1 ) = 1 φ E,NNP,”Nara” (X,Y 1 ) = 1 φ CAPS,NNP (X,Y 1 ) = 2 φ SUF,VBD,”...ed” (X,Y 1 ) = 1 14

  15. NLP Programming Tutorial 11 – The Structured Perceptron Finding the Best Solution ● We must find the POS sequence that satisfies: Y = argmax Y ∑ i w i ϕ i ( X ,Y ) ̂ 15

  16. NLP Programming Tutorial 11 – The Structured Perceptron Remember: HMM Viterbi Algorithm ● Forward step, calculate the best path to a node ● Find the path to each node with the lowest negative log probability ● Backward step, reproduce the path ● This is easy, almost the same as word segmentation 16

  17. NLP Programming Tutorial 11 – The Structured Perceptron Forward Step: Part 1 ● First, calculate transition from <S> and emission of the first word for every POS I 1:NN 0:<S> best_score[“1 NN”] = -log P T (NN|<S>) + -log P E (I | NN) 1:JJ best_score[“1 JJ”] = -log P T (JJ|<S>) + -log P E (I | JJ) 1:VB best_score[“1 VB”] = -log P T (VB|<S>) + -log P E (I | VB) 1:PRP best_score[“1 PRP”] = -log P T (PRP|<S>) + -log P E (I | PRP) 1:NNP best_score[“1 NNP”] = -log P T (NNP|<S>) + -log P E (I | NNP) … 17

  18. NLP Programming Tutorial 11 – The Structured Perceptron Forward Step: Middle Parts ● For middle words, calculate the minimum score for all possible previous POS tags I visited best_score[“2 NN”] = min( 1:NN 2:NN best_score[“1 NN”] + -log P T (NN|NN) + -log P E (visited | NN), best_score[“1 JJ”] + -log P T (NN|JJ) + -log P E (language | NN), 1:JJ 2:JJ best_score[“1 VB”] + -log P T (NN|VB) + -log P E (language | NN), best_score[“1 PRP”] + -log P T (NN|PRP) + -log P E (language | NN), 1:VB 2:VB best_score[“1 NNP”] + -log P T (NN|NNP) + -log P E (language | NN), ... ) 1:PRP 2:PRP best_score[“2 JJ”] = min( best_score[“1 NN”] + -log P T (JJ|NN) + -log P E (language | JJ), 1:NNP 2:NNP best_score[“1 JJ”] + -log P T (JJ|JJ) + -log P E (language | JJ), … … best_score[“1 VB”] + -log P T (JJ|VB) + -log P E (language | JJ), 18 ...

  19. NLP Programming Tutorial 11 – The Structured Perceptron HMM Viterbi with Features ● Same as probabilities, use feature weights I 1:NN 0:<S> best_score[“1 NN”] = w T,<S>,NN + w E,NN,I 1:JJ best_score[“1 JJ”] = w T,<S>,JJ + w E,JJ,I 1:VB best_score[“1 VB”] = w T,<S>,VB + w E,VB,I 1:PRP best_score[“1 PRP”] = w T,<S>,PRP + w E,PRP,I 1:NNP best_score[“1 NNP”] = w T,<S>,NNP + w E,NNP,I … 19

  20. NLP Programming Tutorial 11 – The Structured Perceptron HMM Viterbi with Features ● Can add additional features I 1:NN 0:<S> best_score[“1 NN”] = w T,<S>,NN + w E,NN,I + w CAPS,NN 1:JJ best_score[“1 JJ”] = w T,<S>,JJ + w E,JJ,I + w CAPS,JJ 1:VB best_score[“1 VB”] = w T,<S>,VB + w E,VB,I + w CAPS,VB 1:PRP best_score[“1 PRP”] = w T,<S>,PRP + w E,PRP,I + w CAPS,PRP 1:NNP best_score[“1 NNP”] = w T,<S>,NNP + w E,NNP,I + w CAPS,NNP … 20

  21. NLP Programming Tutorial 11 – The Structured Perceptron Learning In the Structured Perceptron ● Remember the perceptron algorithm ● If there is a mistake: w ← w + y ϕ( x ) ● Update weights to: increase score of positive examples decrease score of negative examples ● What is positive/negative in structured perceptron? 21

  22. NLP Programming Tutorial 11 – The Structured Perceptron Learning in the Structured Perceptron ● Positive example, correct feature vector: I visited Nara φ( ) PRP VBD NNP ● Negative example, incorrect feature vector: I visited Nara φ( ) NNP VBD NNP 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend