part of speech tagging parsing
play

Part-of-Speech Tagging & Parsing You all have accounts for MySQL - PDF document

4/23/09 Announcements We do have a Hadoop cluster! Its offsite. I need to know all groups who want it! Part-of-Speech Tagging & Parsing You all have accounts for MySQL on the cubist machine (cubist.cs.washington.edu) Chlo


  1. 4/23/09
 Announcements • We do have a Hadoop cluster! ▫ It’s offsite. I need to know all groups who want it! Part-of-Speech Tagging & Parsing • You all have accounts for MySQL on the cubist machine (cubist.cs.washington.edu) Chloé Kiddon ▫ Your folder is /projects/instr/cse454/a-f (slides adapted and/or stolen outright from Andrew McCallum, Christopher Manning, and Julia Hockenmaier) • I’ll have a better email out this afternoon I hope • Grading HW1 should be finished by next week. Timely warning Part-of-speech tagging • POS tagging and parsing are two large topics in • Often want to know what part of speech (POS) NLP or word class (noun,verb,…) should be assigned to words in a piece of text • Usually covered in 2-4 lectures • Part-of-speech tagging assigns POS labels to words • We have an hour and twenty minutes.  JJ JJ NNS VBP RB Colorless green ideas sleep furiously. 1


  2. 4/23/09
 Why do we care? Penn Treebank Tagset 1. CC Coordinating conjunction 20. RB Adverb • Parsing (come to later) 2. CD Cardinal number 21. RBR Adverb, comparative 3. DT Determiner 22. RBS Adverb, superlative 4. EX Existential there 23. RP Particle 5. FW Foreign word 24. SYM Symbol • Speech synthesis 6. IN Preposition or subordinating 25. TO to conjunction 26. UH Interjection ▫ INsult or inSULT , overFLOW or OVERflow, 7. JJ Adjective 27. VB Verb, base form 8. JJR Adjective, comparative 28. VBD Verb, past tense REad or reAD 9. JJS Adjective, superlative 29. VBG Verb, gerund or 10. LS List item marker present participle 11. MD Modal 30. VBN Verb, past participle • Information extraction: entities, relations 12. NN Noun, singular or mass 31. VBP Verb, non-3rd person 13. NNS Noun, plural singular present ▫ Romeo loves Juliet vs. lost loves found again 14. NP Proper noun, singular 32. VBZ Verb, 3rd person 15. NPS Proper noun, plural singular present 16. PDT Predeterminer 33. WDT Wh-determiner 17. POS Possessive ending 34. WP Wh-pronoun • Machine translation 18. PP Personal pronoun 35. WP$ Possessive wh-pronoun 19. PP$ Possessive pronoun 36. WRB Wh-adverb Ambiguity How many words are ambiguous? Buffalo buffalo buffalo. Hockenmaier 2


  3. 4/23/09
 Naïve approach! We have more information • Pick the most common tag for the word • We are not just tagging words, we are tagging sequences of words For a sequence of words W : W = w 1 w 2 w 3 …w n We are looking for a sequence of tags T : T = t 1 t 2 t 3 … t n • 91% success rate! where P( T | W ) is maximized Andrew McCallum Andrew McCallum In an ideal world… Bayes’ Rule • Find all instances of a sequence in the dataset and pick the most common sequence of tags • To find P(T|W), use Bayes’ Rule: ▫ Count(“heat oil in a large pot”) = 0 ???? P ( T | W ) = P ( W | T ) × P ( T ) P ( T | W ) ∝ P ( W | T ) × P ( T ) P ( T | W ) = ▫ Uhh… P ( W ) • Spare data problem • We can maximize P(T|W) by maximizing • Most sequences will never occur, or will occur P(W|T)*P(T) too few times for good predictions Andrew McCallum 3


  4. 4/23/09
 Finding P(T) Markov assumption • Generally, • Assume that the probability of a tag only depends on the tag that came directly before P ( t 1 t 2 … t n ) = P ( t 1 ) × P ( t 2 … t n | t 1 ) it P ( t i | t 1 t 2 … t i − 1 ) = P ( t i | t i − 1 ) P ( t 1 t 2 … t n ) = P ( t 1 ) × P ( t 2 | t 1 ) × P ( t 3 … t n | t 1 t 2 ) P ( t 1 t 2 … t n ) = ∏ P ( t i | t 1 t 2 … t i − 1 ) • Then, i P ( t 1 t 2 … t n ) = ∏ P ( t 1 t 2 … t n ) = P ( t 1 ) × P ( t 2 | t 1 ) × P ( t 3 | t 2 ) × … × P ( t n | t n − 1 ) P ( t i | t i − 1 ) • Usually not feasible to accurately estimate i • Only need to count tag bigrams. more than tag bigrams (possibly trigrams) Putting it all together Process as an HMM • We can similarly assume • Start in an initial state t 0 with probability π (t 0 ) • Move from state t i to t j with transition probability a(t j | P ( w i | t 1 … t n ) = P ( w i | t i ) t i ) • In state t i , emit symbol w k with emission probability • So: b(w k |t i ) P ( w 1 … w n | t 1 … t n ) = P ( w 1 | t 1 ) × P ( w 2 | t 2 ) × … × P ( w n | t n ) . 02 . 3 . 3 . 7 . 47 . 6 • And the final equation becomes: Det Adj Noun Verb P ( w 1 | t 1 ) × P ( w 2 | t 2 ) × … × P ( w n | t n ) × P ( W | T ) × P ( T ) = . 51 . 1 P(w|Det) P(w|Adj) P(w|Noun) P ( t 1 ) × P ( t 2 | t 1 ) × P ( t 3 | t 2 ) × … × P ( t n | t n − 1 ) a .4 good .02 price .001 the .4 low .04 deal .0001 4


  5. 4/23/09
 Three Questions for HMMs Three Questions for HMMs 1. Evaluation – Given a sequence of words 1. Evaluation – Given a sequence of words W = w 1 w 2 w 3 …w n and an HMM model Θ , what is W = w 1 w 2 w 3 …w n and an HMM model Θ , what is P( W | Θ ) P( W | Θ ) 2. Decoding – Given a sequence of words W and 2. Tagging – Given a sequence of words W and an HMM model Θ , find the most probable an HMM model Θ , find the most probable parse T = t 1 t 2 t 3 … t n parse T = t 1 t 2 t 3 … t n 3. Learning – Given a tagged (or untagged) 3. Learning – Given a tagged (or untagged) dataset, find the HMM Θ that maximizes the dataset, find the HMM Θ that maximizes the data data Tagging Trellis • Need to find the most likely tag sequence given a sequence of words Evaluation Task: ▫ maximizes P(W|T)*P(T) and thus P(T|W) t 1 Decoding Task: Decoding Task: P(w 1 ,w 2 ,…,w i ) • Use Viterbi! tags given in t j at time i max P(w 1 ,w 2 ,…,w i ) t j given in t j at time i t N time steps 5


  6. 4/23/09
 Trellis Tagging initialization Evaluation Task: t 1 t 1 Decoding Task: P(w 1 ,w 2 ,…,w i ) tags tags given in t j at time i max log P(w 1 ,w 2, …,w i ) = log P(w 1 |t j ) + log P(t j ) t j t j given in t j at time i t N t N time steps time steps Tagging recursive step Tagging recursive step t 1 t 1 [ ] = argmax log P ( t j | t k ) + trellis [ w 1 ][ t k ] k [ ] [ ] tags = max log P ( t j | t k ) + trellis [ w 1 ][ t k ] tags k t j + log P ( w 2 | t j ) t j [ ] [ ] = max log P ( t j | t k ) + trellis [ w 1 ][ t k ] k + log P ( w 2 | t j ) t N t N time steps time steps 6


  7. 4/23/09
 Use back pointers to pick best Pick best trellis cell for last word sequence t 1 t 1 tags tags t j t j t N t N time steps time steps Learning a POS-tagging HMM Problem with supervised learning • Estimate the parameters in the model using • Requires a large hand-labeled corpus counts ▫ Doesn’t scale to new languages ▫ Expensive to produce P ( t i | t i − 1 ) → Count ( t i − 1 t i ) ▫ Doesn’t scale to new domains Count ( t i − 1 ) • Instead , apply unsupervised learning with Count ( w i tagged t i ) P ( w i | t i ) → Expectation Maximization (EM) Count ( all words tagged t i ) ▫ Expectation step: calculate probability of all sequences using set of parameters ▫ Maximization step: re-estimate parameters using • With smoothing, this model can get 95-96% results from E-step correct tagging 7


  8. 4/23/09
 Lots of other techniques! Seems like POS-tagging is solved • Trigram models (more common) • Penn Treebank POS-tagging accuracy ≈ human • Text normalization ceiling • Error-based transformation learning ▫ Human agreement 97% (“Brill learning”) ▫ Rule-based system • In other languages, not so much  Calculate initial states: proper noun detection, tagged corpus  Acquire transformation rules  Change VB to NN when prev word was adjective  The long race finally ended • Minimally supervised learning ▫ Unlabeled data but have a dictionary So now we are HMM Masters Syntax • We can use HMMs to… • Refers to the study of the way words are arranged together, and the relationship ▫ Tag words in a sentence with their parts of between them. speech ▫ Extract entities and other information from a • Prescriptive vs. Descriptive sentence • Goal of syntax is to model the knowledge of • Can we use them to determine syntactic that people unconsciously have about the categories? grammar of their native language • Parsing extracts the syntax from a sentence 8


  9. 4/23/09
 Parsing applications Basic English sentence structure • High-precision Question-Answering systems Ike cake eats • Named Entity Recognition (NER) and information extraction Noun Noun (subject) • Opinion extraction in product reviews Verb (object) (head) • Improved interaction during computer applications/games Hockenmaier Can we build an HMM? Words take arguments I eat cake.  I sleep cake.  Noun Verb Noun I give you cake.  (subject) (head) (object) I give cake. Hmm… I eat you cake???  • Subcategorization ▫ Intransitive verbs : take only a subject ▫ Transitive verbs : take a subject and an object Ike, dogs, … eat, sleep, … cake, science, … ▫ Ditransitive verbs : take a subject, object, and indirect object • Selectional preferences ▫ The object of eat should be edible Hockenmaier Hockenmaier 9


Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend