lecture 17 statistical parsing with pcfg
play

Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ - PowerPoint PPT Presentation

Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501-NLP 1 Reading list v Look at Mike Collins note on PCFGs and lexicalized PCFG


  1. Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501-NLP 1

  2. Reading list v Look at Mike Collins’ note on PCFGs and lexicalized PCFG http://www.cs.columbia.edu/~mcollins/ CS6501-NLP 2

  3. Phrase structure (constituency) trees v Can be modeled by Context-free grammars CS6501-NLP 3

  4. CKY algorithm § for J := 1 to n § Add to [J-1,J] all categories for the J th word § for width := 2 to n § for start := 0 to n-width // this is I § Define end := start + width // this is J § for mid := start+1 to end-1 // find all I-to-J phrases § for every rule X à Y Z in the grammar if Y in [start,mid] and Z in [mid,end] then add X to [start,end] CS6501-NLP 4

  5. Weighted CKY: Viterbi algorithm • initialize all entries of chart to ∞ • for i := 1 to n • for each rule R of the form X à word[i] • chart[X,i-1,i] max ( weight(R) ) • for width := 2 to n Assume the weights • for start := 0 to n-width are log probabilities • Define end := start + width of rules • for mid := start+1 to end-1 • for each rule R of the form X à Y Z • chart[X,start,end] = max( weight(R) + chart[Y,start,mid] + chart[Z,mid,end]) • return chart[ROOT,0,n] CS6501-NLP 5 Slides are modified from Jason Eisner’s NLP course

  6. Likelihood of a parse tree WHY?? CS6501-NLP 6

  7. Probabilistic Trees v Just like language models or HMM for POS tagging v We make independent assumptions! S NP VP time PP VP flies P NP like Det N an arrow 7 CS6501-NLP

  8. Chain rule: One word at a time p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an) CS6501-NLP 8

  9. Chain rule + Indep. assumptions (to get trigram model) p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an) CS6501-NLP 9

  10. Chain rule – written differently p(time flies like an arrow) = p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an ) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x) CS6501-NLP 10

  11. Chain rule + Indep. assumptions p(time flies like an arrow) = p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an ) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x) CS6501-NLP 11

  12. Chain rule: One node at a time p(time) S S S S NP VP NP VP | S ) * p( VP | NP VP ) | S ) = p( p( time NP PP VP time flies P NP S S p(flies, time|time) * p( | VP ) like Det N NP NP VP an arrow time time PP VP S S * p( | VP ) * … NP NP VP time time PP VP PP VP CS6501-NLP 12 flies

  13. Chain rule + Indep. assumptions S S S S NP VP NP VP | S ) * p( VP | NP VP ) | S ) = p( p( time NP PP VP time flies P NP S S * p( | VP ) like Det N NP NP VP an arrow time time PP VP S S * p( | VP ) * … NP NP VP time time PP VP PP VP CS6501-NLP 13 flies

  14. Simplified notation S NP VP | S ) = p( S → NP VP | S ) * p( NP → time | NP ) p( time PP VP flies P NP * p( VP → VP NP | VP ) like Det N an arrow * p( VP → flies | VP ) * … CS6501-NLP 14

  15. Three basic problems for HMMs v Likelihood of the input: v Forward algorithm How likely the sentence ”I love cat” occurs v Decoding (tagging) the input: v Viterbi algorithm POS tags of ”I love cat” occurs v Estimation (learning): How to learn the model? v Find the best model parameters v Case 1: supervised – tags are annotated v Maximum likelihood estimation (MLE) v Case 2: unsupervised -- only unannotated text v Forward-backward algorithm CS6501-NLP 15

  16. Phrase Structure Trees Three basic problems for HMMs v Likelihood of the input: v Inside algorithm How likely the sentence ”I love cat” occurs v Decoding (Parsing) the input: v CKY algorithm Parse tree of ”I love cat” v Estimation (Learning): How to learn the model? v Find the best model parameters v Case 1: supervised – tags are annotated v Maximum likelihood estimation (MLE) v Case 2: unsupervised -- only unannotated text v Inside-Outside algorithm CS6501-NLP 16

  17. Phrase Structure Trees Three basic problems for HMMs v Likelihood of the input: v Inside algorithm How likely the sentence ”I love cat” occurs v Decoding (Parsing) the input: v CKY algorithm Parse tree of ”I love cat” v Estimation (Learning): How to learn the model? v Find the best model parameters v Case 1: supervised – tags are annotated v Maximum likelihood estimation (MLE) v Case 2: unsupervised -- only unannotated text v Inside-Outside algorithm CS6501-NLP 17

  18. Phrase Structure Trees Three basic problems for HMMs v Likelihood of the input: v Inside algorithm How likely the sentence ”I love cat” occurs v Decoding (Parsing) the input: v CKY algorithm Parse tree of ”I love cat” v Estimation (Learning): How to learn the model? v Find the best model parameters v Case 1: supervised – tags are annotated v Maximum likelihood estimation (MLE) v Case 2: unsupervised -- only unannotated text v Inside-Outside algorithm CS6501-NLP 18

  19. Probabilistic CKY: Inside algorithm • initialize all entries of chart to 0 • for i := 1 to n • for each rule R of the form X à word[i] • chart[X,i-1,i] += prob(R) • for width := 2 to n • for start := 0 to n-width • Define end := start + width • for mid := start+1 to end-1 • for each rule R of the form X à Y Z • chart[X,start,end] += prob(R) * chart[Y,start,mid] * chart[Z,mid,end] • return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 19

  20. S à NP VP How to build a width-6 phrase NP à Det N NP à NP PP VP à V NP VP à VP PP PP à P NP ? 1 7 = 1 2 + 2 7 1 3 + 3 7 1 4 + 4 7 1 5 + 5 7 1 6 + 6 7 CS6501: NLP 20

  21. CKY: Recognition algorithm v initialize all entries of chart to false v for i := 1 to n v for each rule R of the form X à word[i] v chart[X,i-1,i] |= in_grammar(R) v for width := 2 to n v for start := 0 to n-width Pay attention to the orange code … v Define end := start + width v for mid := start+1 to end-1 v for each rule R of the form X à Y Z v chart[X,start,end] |= in_grammar(R) & chart[Y,start,mid] & chart[Z,mid,end] v return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 21

  22. Weighted CKY: Viterbi algorithm (min-cost) v initialize all entries of chart to ∞ v for i := 1 to n v for each rule R of the form X à word[i] v chart[X,i-1,i] min= weight(R) v for width := 2 to n Pay attention to the v for start := 0 to n-width orange code … v Define end := start + width v for mid := start+1 to end-1 v for each rule R of the form X à Y Z v chart[X,start,end] min= weight(R) + chart[Y,start,mid] + chart[Z,mid,end] v return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 22

  23. Weighted CKY: Viterbi algorithm (max-prob) v initialize all entries of chart to 0 v for i := 1 to n v for each rule R of the form X à word[i] v chart[X,i-1,i] max= weight(R) v for width := 2 to n Pay attention to the v for start := 0 to n-width orange code … v Define end := start + width v for mid := start+1 to end-1 v for each rule R of the form X à Y Z v chart[X,start,end] max= weight(R) * chart[Y,start,mid] * chart[Z,mid,end] v return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 23

  24. Weighted CKY: Viterbi algorithm (max-logprob) v initialize all entries of chart to - ∞ v for i := 1 to n v for each rule R of the form X à word[i] v chart[X,i-1,i] max= weight(R) v for width := 2 to n Pay attention to the v for start := 0 to n-width orange code … v Define end := start + width v for mid := start+1 to end-1 v for each rule R of the form X à Y Z v chart[X,start,end] max= weight(R) + chart[Y,start,mid] + chart[Z,mid,end] v return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 24

  25. Probabilistic CKY: Inside algorithm • initialize all entries of chart to 0 • for i := 1 to n • for each rule R of the form X à word[i] • chart[X,i-1,i] += prob(R) • for width := 2 to n • for start := 0 to n-width • Define end := start + width • for mid := start+1 to end-1 • for each rule R of the form X à Y Z • chart[X,start,end] += prob(R) * chart[Y,start,mid] * chart[Z,mid,end] • return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 25

  26. Semiring-weighted CKY: General algorithm! ⊗ is like “and”/ ∀ : • initialize all entries of chart to € combines all of several • for i := 1 to n pieces into an X • for each rule R of the form X à word[i] ⊕ is like “or”/ ∃ : considers the • chart[X,i-1,i] ⊕ = semiring_weight(R) alternative ways to • for width := 2 to n build the X • for start := 0 to n-width • Define end := start + width • for mid := start+1 to end-1 • for each rule R of the form X à Y Z • chart[X,start,end] ⊕ = semiring_weight(R) ⊗ chart[Y,start,mid] ⊗ chart[Z,mid,end] • return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend