Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 - PowerPoint PPT Presentation

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and Language Processing, chapter 8 2. Foundations of Statistical Natural Language Processing, chapter 10 1

Review • Tagging (part-of-speech tagging) – The process of assigning (labeling) a part-of-speech or other lexical class marker to each word in a sentence (or a corpus) • Decide whether each word is a noun, verb, adjective, or whatever The/ AT representative/ NN put/ VBD chairs/ NNS on/ IN the/ AT table/ NN – An intermediate layer of representation of syntactic structure • When compared with syntactic parsing – Above 96% accuracy for most successful approaches 2

Introduction • Parts-of-speech – Known as POS, word classes, lexical tags, morphology classes • Tag sets – Penn Treebank : 45 word classes used (Francis, 1979) • Penn Treebank is a parsed corpus – Brown corpus: 87 word classes used (Marcus et al., 1993) – …. The /DT grand /JJ jury /NN commented /VBD on /IN a /DT number /NN of /IN other /JJ topics /NNS . /. 3

The Penn Treebank POS Tag Set 4

Disambiguation • Resolve the ambiguities and chose the proper tag for the context • Most English words are unambiguous (have only one tag) but many of the most common words are ambiguous – E.g.: “ can ” can be a (an auxiliary) verb or a noun – E.g.: statistics of Brown corpus - 11.5% word types are ambiguous - But 40% tokens are ambiguous (However, the probabilities of tags associated a word are not equal → many ambiguous tokens are easy to disambiguate) 5

Process of POS Tagging A Specified Tagset A String of Words Tagging Algorithm A Single Best Tag of Each Word VB DT NN . Book that flight . VBZ DT NN VB NN ? Does that flight serve dinner ? 6

POS Tagging Algorithms • Fall into One of Two Classes • Rule-based Tagger – Involve a large database of hand-written disambiguation rules • E.g. a rule specifies that an ambiguous word is a noun rather than a verb if it follows a determiner • ENGTWOL : a simple rule-based tagger based on the constraint grammar architecture • Stochastic/Probabilistic Tagger – Use a training corpus to compute the probability of a given word having a given context – E.g.: the HMM tagger chooses the best tag for a given word (maximize the product of word likelihood and tag sequence probability ) 7

POS Tagging Algorithms • Transformation-based/Brill Tagger – A hybrid approach – Like rule-based approach , determine the tag of an ambiguous word based on rules – Like stochastic approach , the rules are automatically included from previous tagged training corpus with the machine learning technique 8

Rule-based POS Tagging • Two-stage architecture – First stage : Use a dictionary to assign each word a list of potential part-of-speech – Second stage : Use large lists of hand-written disambiguation rules to winnow down this list to a single part-of-speech for each word Pavlov had shown that salivation … An example for Pavlov PAVLOV N NOM SG PROPER The ENGTOWL tagger had HAVE V PAST VFIN SVO HAVE PCP2 SVO shown SHOW PCP2 SVOO SVO SV that ADV A set of 1,100 constraints PRON DEM SG can be applied to the input DET CENTRAL DEM SG sentence CS salivation N NOM SG 9

Rule-based POS Tagging • Simple lexical entries in the ENGTWOL lexicon past participle 10

Rule-based POS Tagging Example: one It isn’t that odd! A ADV I consider that odd. NUM Compliment 11

HMM-based Tagging • Also called Maximum Likelihood Tagging – Pick the most-likely tag for a word • For a given sentence or words sequence , an HMM tagger chooses the tag sequence that maximizes the following probability ( ) ( ) = ⋅ − tag arg max P word tag P tag previous n 1 tags i i i tag sequence probability word/lexical likelihood N-gram HMM tagger 12

HMM-based Tagging • Assumptions made here – Words are independent of each other • A word’s identity only depends on its tag – “ Limited Horizon ” and “ Time Invariant ” (“ Stationary ”) • A word’s tag only depends on the previous tag ( limited horizon ) and the dependency does not change over time ( time invariance ) • Time invariance means the tag dependency won’t change as tag sequence appears different positions of a sentence 13

HMM-based Tagging • Apply bigram-HMM tagger to choose the best tag for a given word – Choose the tag t i for word w i that is most probable given the previous tag t i-1 and current word w i ( ) = t arg max P t t , w − i j i 1 i j – Through some simplifying Markov assumptions ) ( ) ( = t arg max P t t P w t − i j i 1 i j j tag sequence probability word/lexical likelihood 14

HMM-based Tagging • Apply bigram-HMM tagger to choose the best tag for a given word ( ) = t arg max P t t , w − i j i 1 i j ( ) P t , w t = − j i i 1 arg max ( ) The same for all tags P w t j − i i 1 ( ) = arg max P t , w t − j i i 1 j ( ) ( ) The probability of a word = arg max P w t , t P t t only depends on its tag − − i i 1 j j i 1 j ( ) ( ) ( ) ) ( = = arg max P w t P t t arg max P t t P w t − − i j j i 1 j i 1 i j j j 15

HMM-based Tagging • Example: Choose the best tag for a given word Secretariat/NNP is /VBZ expected/VBN to/TO race/VB tomorrow/NN 0.34 0.00003 to/TO race/??? P (VB|TO) P (race|VB)=0.00001 0.021 0.00041 P (NN|TO) P (race|NN)=0.000007 Pretend that the previous word has already tagged 16

HMM-based Tagging • Apply bigram-HMM tagger to choose the best sequence of tags for a given sentence ( ) ˆ = T arg max P T W T ( ) ( ) P T P W T = arg max ( ) P W T ( ) ( ) = arg max P T P W T T ) ( ) ( = arg max P t , t ,..., t P w , w ,..., w t , t ,..., t n n n 1 2 1 1 1 2 t t t , ,..., 1 2 n [ ] n ( ) ( ) ∏ = arg max P t t , t ,..., t P w w ,..., w , t , t ,..., t − − i 1 2 i 1 i 1 i 1 1 2 n t , t ,..., t 1 2 n = i 1 [ ] n ( ) ( ) ∏ = arg max P t t , t ,..., t P w t The probability of a word − i 1 2 i 1 i i t , t ,..., t 1 2 n only depends on its tag = i 1 17

HMM-based Tagging • The Viterbi algorithm for the bigram-HMM tagger t J t J t J t J t J Tag State π J π t j+1 t j+1 t j+1 t j+1 t j+1 + j 1 π MAX MAX t j t j t j t j t j j π − j 1 t j-1 t j-1 t j-1 t j-1 t j-1 π 1 t 1 t 1 t 1 t 1 t 1 1 2 i n -1 n Word Sequence w 1 w 2 w n-1 w n w i 18

HMM-based Tagging • The Viterbi algorithm for the bigram-HMM tagger ( ) ( ) δ = ≤ ≤ 1. Initializa tion k π P w t , 1 k J [ ] ( 1 k 1 ) k ( ) ( ) ( ) δ = δ ≤ ≤ ≤ ≤ 2. Induction j max k P t t P w t , 2 i n, 1 k J − i i 1 j i k j i [ ] ( ) ( ) ( ) ψ = δ j argmax k P t t − i i 1 j k ≤ ≤ 1 j J ( ) = δ 3.Terminat ion X argmax j n n ≤ ≤ 1 j J = for i : n- 1 to 1 step - 1 do ( ) = ψ X X + i i i 1 end 19

HMM-based Tagging • Apply trigram-HMM tagger to choose the best sequence of tags for a given sentence – When trigram model is used     ( ) ( ) n ( ) n ( )  ∏ ∏ ˆ = T arg max P t P t t P t t , t P w t    − − 1 2 1 i i 2 i 1 i i     t , t ,.., t 1 2 n = = i 3 i 1 • Maximum likelihood estimation based on the relative frequencies observed in the pre-tagged training corpus (labeled data) ( ) ( ) c t t t Smoothing is needed ! = P t t , t − − i 2 i 1 i ( ) − − i i 2 i 1 c t t t − − i 2 i 1 i ( ) ( ) c w , t = P w t i i ( ) i i c t i 20

HMM-based Tagging • Apply trigram-HMM tagger to choose the best sequence of tags for a given sentence with tag history t J Tag State with tag history t j MAX with tag history t 1 J copies of tag states 1 2 i n -1 n Word Sequence w 1 w 2 w n-1 w n w i 21

HMM-based Tagging • Probability re-estimation based on unlabeled data • EM (Expectation-Maximization) algorithm is applied – Start with a dictionary that lists which tags can be assigned to which words » word likelihood function cab be estimated » tag transition probabilities set to be equal – EM algorithm learns (re-estimates) the word likelihood function for each tag and the tag transition probabilities • However, a tagger trained on hand-tagged data worked better than one trained via EM 22

Transformation-based Tagging • Also called Brill tagging – An instance of Transformation-Based Learning (TBL) • Spirits – Like the rule-based approach , TBL is based on rules that specify what tags should be assigned to what word – Like the stochastic approach , rules are automatically induced from the data by the machine learning technique • Note that TBL is a supervised learning technique – It assumes a pre-tagged training corpus 23

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 - PowerPoint PPT Presentation

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and Language Processing, chapter 8 2. Foundations of Statistical Natural Language Processing, chapter 10 1 Review Tagging (part-of-speech tagging)

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Syntactic Processing: Parts-of-Speech Tagging CSE354 - Spring 2020 Task Syntactic

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

1/33 David Parnas 2006 August 08 12:15 slides Software Quality Research Laboratory -

DOCUMENT DIGITIZATION Rethinking it with Machine Learning Nischal Harohalli Padmanabha QConAI

Module Interface Documentation - Using The Trace Function Method Outline (TFM) Review of basic

CLSW 2020 Revisiting Tibetan Word Segmentation with Neural Networks Sangjie Duanzhu, Cizhen

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural LMs, Recurrent

Architecture and Simplicity UNC COMP 523 Wed Sep 9, 2020 Prof. Jeff Terrell 1 / 31

Time Complexity: P and NP 17-0 Big-Oh Notation Recall that g = O ( f ) iff n

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 - PowerPoint PPT Presentation

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and Language Processing, chapter 8 2. Foundations of Statistical Natural Language Processing, chapter 10 1 Review Tagging (part-of-speech tagging)

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning &amp; H.

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Syntactic Processing: Parts-of-Speech Tagging CSE354 - Spring 2020 Task Syntactic

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

1/33 David Parnas 2006 August 08 12:15 slides Software Quality Research Laboratory -

DOCUMENT DIGITIZATION Rethinking it with Machine Learning Nischal Harohalli Padmanabha QConAI

Module Interface Documentation - Using The Trace Function Method Outline (TFM) Review of basic

CLSW 2020 Revisiting Tibetan Word Segmentation with Neural Networks Sangjie Duanzhu, Cizhen

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural LMs, Recurrent

Architecture and Simplicity UNC COMP 523 Wed Sep 9, 2020 Prof. Jeff Terrell 1 / 31

Time Complexity: P and NP 17-0 Big-Oh Notation Recall that g = O ( f ) iff n

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.