Part-of-Speech Tagging & Parsing You all have accounts for MySQL - PDF document

4/23/09  Announcements • We do have a Hadoop cluster! ▫ It’s offsite. I need to know all groups who want it! Part-of-Speech Tagging & Parsing • You all have accounts for MySQL on the cubist machine (cubist.cs.washington.edu) Chloé Kiddon ▫ Your folder is /projects/instr/cse454/a-f (slides adapted and/or stolen outright from Andrew McCallum, Christopher Manning, and Julia Hockenmaier) • I’ll have a better email out this afternoon I hope • Grading HW1 should be finished by next week. Timely warning Part-of-speech tagging • POS tagging and parsing are two large topics in • Often want to know what part of speech (POS) NLP or word class (noun,verb,…) should be assigned to words in a piece of text • Usually covered in 2-4 lectures • Part-of-speech tagging assigns POS labels to words • We have an hour and twenty minutes.  JJ JJ NNS VBP RB Colorless green ideas sleep furiously. 1 

4/23/09  Why do we care? Penn Treebank Tagset 1. CC Coordinating conjunction 20. RB Adverb • Parsing (come to later) 2. CD Cardinal number 21. RBR Adverb, comparative 3. DT Determiner 22. RBS Adverb, superlative 4. EX Existential there 23. RP Particle 5. FW Foreign word 24. SYM Symbol • Speech synthesis 6. IN Preposition or subordinating 25. TO to conjunction 26. UH Interjection ▫ INsult or inSULT , overFLOW or OVERflow, 7. JJ Adjective 27. VB Verb, base form 8. JJR Adjective, comparative 28. VBD Verb, past tense REad or reAD 9. JJS Adjective, superlative 29. VBG Verb, gerund or 10. LS List item marker present participle 11. MD Modal 30. VBN Verb, past participle • Information extraction: entities, relations 12. NN Noun, singular or mass 31. VBP Verb, non-3rd person 13. NNS Noun, plural singular present ▫ Romeo loves Juliet vs. lost loves found again 14. NP Proper noun, singular 32. VBZ Verb, 3rd person 15. NPS Proper noun, plural singular present 16. PDT Predeterminer 33. WDT Wh-determiner 17. POS Possessive ending 34. WP Wh-pronoun • Machine translation 18. PP Personal pronoun 35. WP$ Possessive wh-pronoun 19. PP$ Possessive pronoun 36. WRB Wh-adverb Ambiguity How many words are ambiguous? Buffalo buffalo buffalo. Hockenmaier 2 

4/23/09  Naïve approach! We have more information • Pick the most common tag for the word • We are not just tagging words, we are tagging sequences of words For a sequence of words W : W = w 1 w 2 w 3 …w n We are looking for a sequence of tags T : T = t 1 t 2 t 3 … t n • 91% success rate! where P( T | W ) is maximized Andrew McCallum Andrew McCallum In an ideal world… Bayes’ Rule • Find all instances of a sequence in the dataset and pick the most common sequence of tags • To find P(T|W), use Bayes’ Rule: ▫ Count(“heat oil in a large pot”) = 0 ???? P ( T | W ) = P ( W | T ) × P ( T ) P ( T | W ) ∝ P ( W | T ) × P ( T ) P ( T | W ) = ▫ Uhh… P ( W ) • Spare data problem • We can maximize P(T|W) by maximizing • Most sequences will never occur, or will occur P(W|T)*P(T) too few times for good predictions Andrew McCallum 3 

4/23/09  Finding P(T) Markov assumption • Generally, • Assume that the probability of a tag only depends on the tag that came directly before P ( t 1 t 2 … t n ) = P ( t 1 ) × P ( t 2 … t n | t 1 ) it P ( t i | t 1 t 2 … t i − 1 ) = P ( t i | t i − 1 ) P ( t 1 t 2 … t n ) = P ( t 1 ) × P ( t 2 | t 1 ) × P ( t 3 … t n | t 1 t 2 ) P ( t 1 t 2 … t n ) = ∏ P ( t i | t 1 t 2 … t i − 1 ) • Then, i P ( t 1 t 2 … t n ) = ∏ P ( t 1 t 2 … t n ) = P ( t 1 ) × P ( t 2 | t 1 ) × P ( t 3 | t 2 ) × … × P ( t n | t n − 1 ) P ( t i | t i − 1 ) • Usually not feasible to accurately estimate i • Only need to count tag bigrams. more than tag bigrams (possibly trigrams) Putting it all together Process as an HMM • We can similarly assume • Start in an initial state t 0 with probability π (t 0 ) • Move from state t i to t j with transition probability a(t j | P ( w i | t 1 … t n ) = P ( w i | t i ) t i ) • In state t i , emit symbol w k with emission probability • So: b(w k |t i ) P ( w 1 … w n | t 1 … t n ) = P ( w 1 | t 1 ) × P ( w 2 | t 2 ) × … × P ( w n | t n ) . 02 . 3 . 3 . 7 . 47 . 6 • And the final equation becomes: Det Adj Noun Verb P ( w 1 | t 1 ) × P ( w 2 | t 2 ) × … × P ( w n | t n ) × P ( W | T ) × P ( T ) = . 51 . 1 P(w|Det) P(w|Adj) P(w|Noun) P ( t 1 ) × P ( t 2 | t 1 ) × P ( t 3 | t 2 ) × … × P ( t n | t n − 1 ) a .4 good .02 price .001 the .4 low .04 deal .0001 4 

4/23/09  Three Questions for HMMs Three Questions for HMMs 1. Evaluation – Given a sequence of words 1. Evaluation – Given a sequence of words W = w 1 w 2 w 3 …w n and an HMM model Θ , what is W = w 1 w 2 w 3 …w n and an HMM model Θ , what is P( W | Θ ) P( W | Θ ) 2. Decoding – Given a sequence of words W and 2. Tagging – Given a sequence of words W and an HMM model Θ , find the most probable an HMM model Θ , find the most probable parse T = t 1 t 2 t 3 … t n parse T = t 1 t 2 t 3 … t n 3. Learning – Given a tagged (or untagged) 3. Learning – Given a tagged (or untagged) dataset, find the HMM Θ that maximizes the dataset, find the HMM Θ that maximizes the data data Tagging Trellis • Need to find the most likely tag sequence given a sequence of words Evaluation Task: ▫ maximizes P(W|T)*P(T) and thus P(T|W) t 1 Decoding Task: Decoding Task: P(w 1 ,w 2 ,…,w i ) • Use Viterbi! tags given in t j at time i max P(w 1 ,w 2 ,…,w i ) t j given in t j at time i t N time steps 5 

4/23/09  Trellis Tagging initialization Evaluation Task: t 1 t 1 Decoding Task: P(w 1 ,w 2 ,…,w i ) tags tags given in t j at time i max log P(w 1 ,w 2, …,w i ) = log P(w 1 |t j ) + log P(t j ) t j t j given in t j at time i t N t N time steps time steps Tagging recursive step Tagging recursive step t 1 t 1 [ ] = argmax log P ( t j | t k ) + trellis [ w 1 ][ t k ] k [ ] [ ] tags = max log P ( t j | t k ) + trellis [ w 1 ][ t k ] tags k t j + log P ( w 2 | t j ) t j [ ] [ ] = max log P ( t j | t k ) + trellis [ w 1 ][ t k ] k + log P ( w 2 | t j ) t N t N time steps time steps 6 

4/23/09  Use back pointers to pick best Pick best trellis cell for last word sequence t 1 t 1 tags tags t j t j t N t N time steps time steps Learning a POS-tagging HMM Problem with supervised learning • Estimate the parameters in the model using • Requires a large hand-labeled corpus counts ▫ Doesn’t scale to new languages ▫ Expensive to produce P ( t i | t i − 1 ) → Count ( t i − 1 t i ) ▫ Doesn’t scale to new domains Count ( t i − 1 ) • Instead , apply unsupervised learning with Count ( w i tagged t i ) P ( w i | t i ) → Expectation Maximization (EM) Count ( all words tagged t i ) ▫ Expectation step: calculate probability of all sequences using set of parameters ▫ Maximization step: re-estimate parameters using • With smoothing, this model can get 95-96% results from E-step correct tagging 7 

4/23/09  Lots of other techniques! Seems like POS-tagging is solved • Trigram models (more common) • Penn Treebank POS-tagging accuracy ≈ human • Text normalization ceiling • Error-based transformation learning ▫ Human agreement 97% (“Brill learning”) ▫ Rule-based system • In other languages, not so much  Calculate initial states: proper noun detection, tagged corpus  Acquire transformation rules  Change VB to NN when prev word was adjective  The long race finally ended • Minimally supervised learning ▫ Unlabeled data but have a dictionary So now we are HMM Masters Syntax • We can use HMMs to… • Refers to the study of the way words are arranged together, and the relationship ▫ Tag words in a sentence with their parts of between them. speech ▫ Extract entities and other information from a • Prescriptive vs. Descriptive sentence • Goal of syntax is to model the knowledge of • Can we use them to determine syntactic that people unconsciously have about the categories? grammar of their native language • Parsing extracts the syntax from a sentence 8 

4/23/09  Parsing applications Basic English sentence structure • High-precision Question-Answering systems Ike cake eats • Named Entity Recognition (NER) and information extraction Noun Noun (subject) • Opinion extraction in product reviews Verb (object) (head) • Improved interaction during computer applications/games Hockenmaier Can we build an HMM? Words take arguments I eat cake.  I sleep cake.  Noun Verb Noun I give you cake.  (subject) (head) (object) I give cake. Hmm… I eat you cake???  • Subcategorization ▫ Intransitive verbs : take only a subject ▫ Transitive verbs : take a subject and an object Ike, dogs, … eat, sleep, … cake, science, … ▫ Ditransitive verbs : take a subject, object, and indirect object • Selectional preferences ▫ The object of eat should be edible Hockenmaier Hockenmaier 9 

Part-of-Speech Tagging & Parsing You all have accounts for MySQL - PDF document

4/23/09 Announcements We do have a Hadoop cluster! Its offsite. I need to know all groups who want it! Part-of-Speech Tagging & Parsing You all have accounts for MySQL on the cubist machine (cubist.cs.washington.edu) Chlo

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

WELCOME! Mens Fellowship Breakfast October 4, 2019 5 Important Truths: 5 1. Jesus is

Genesis 3:1 15 Blue Bible pg 3 Genesis 3:1 15 Blue Bible pg 3 Genesis 3:1 15 ESV 1

INTEGRATED CARE CONFERENCE 2019 CHLA EMERGENCY DEPARTMENT & SBIRT- SUBSTANCE USE PREVENTION

THEOLOGICAL HILLS TO DIE ON 1. Inspiration Of Bible 2. Trinity Or Tri-Unity 3. Deity Of Christ 4.

The semantics of Jamaican Creole verbal reduplication Benjamin Slade Dept. of Linguistics

What is wrong with the world? terrorism suicide poverty intolerance racism corruption war

Peters sermon at Pentecost: Theology and Application Acts 2:14-41 Discussion questions 1.

Gods Plan for the Ages Series Lesson #007 April 29, 2014 Dean Bible Ministries

Part-of-Speech Tagging & Parsing You all have accounts for MySQL - PDF document

4/23/09 Announcements We do have a Hadoop cluster! Its offsite. I need to know all groups who want it! Part-of-Speech Tagging & Parsing You all have accounts for MySQL on the cubist machine (cubist.cs.washington.edu) Chlo

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning &amp; H.

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

WELCOME! Mens Fellowship Breakfast October 4, 2019 5 Important Truths: 5 1. Jesus is

Genesis 3:1 15 Blue Bible pg 3 Genesis 3:1 15 Blue Bible pg 3 Genesis 3:1 15 ESV 1

INTEGRATED CARE CONFERENCE 2019 CHLA EMERGENCY DEPARTMENT &amp; SBIRT- SUBSTANCE USE PREVENTION

THEOLOGICAL HILLS TO DIE ON 1. Inspiration Of Bible 2. Trinity Or Tri-Unity 3. Deity Of Christ 4.

The semantics of Jamaican Creole verbal reduplication Benjamin Slade Dept. of Linguistics

What is wrong with the world? terrorism suicide poverty intolerance racism corruption war

Peters sermon at Pentecost: Theology and Application Acts 2:14-41 Discussion questions 1.

Gods Plan for the Ages Series Lesson #007 April 29, 2014 Dean Bible Ministries

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

INTEGRATED CARE CONFERENCE 2019 CHLA EMERGENCY DEPARTMENT & SBIRT- SUBSTANCE USE PREVENTION