Lecture 5: Part-of-Speech Tagging Julia Hockenmaier - PowerPoint PPT Presentation

CS447: Natural Language Processing http://courses.engr.illinois.edu/cs447 Lecture 5: Part-of-Speech Tagging Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

POS tagging Tagset: NNP: proper noun POS tagger CD: numeral, JJ: adjective, ... Raw text Tagged text Pierre_NNP Vinken_NNP ,_, 61_CD Pierre Vinken , 61 years old years_NNS old_JJ ,_, will_MD join_VB , will join the board as a the_DT board_NN as_IN a_DT nonexecutive director Nov. nonexecutive_JJ director_NN Nov._NNP 29 . 29_CD ._. � 2 CS447: Natural Language Processing (J. Hockenmaier)

Why POS tagging? POS tagging is a prerequisite for further analysis: – Speech synthesis: How to pronounce “lead”? INsult or inSULT, OBject or obJECT, OVERflow or overFLOW,   DIScount or disCOUNT, CONtent or conTENT – Parsing: What words are in the sentence? – Information extraction: Finding names, relations, etc. – Machine Translation: The noun “content” may have a different translation from the adjective. � 3 CS447: Natural Language Processing (J. Hockenmaier)

POS Tagging Words often have more than one POS:   - The back door (adjective) - On my back (noun) - Win the voters back (particle) - Promised to back the bill (verb)   The POS tagging task is to determine the POS tag   for a particular instance of a word.   Since there is ambiguity, we cannot simply look up the correct POS in a dictionary. These examples from Dekang Lin � 4 CS447: Natural Language Processing (J. Hockenmaier)

Defining a tagset CS447: Natural Language Processing (J. Hockenmaier) � 5

Defining a tag set We have to define an inventory of labels for the word classes (i.e. the tag set)   - Most taggers rely on models that have to be trained on annotated (tagged) corpora . Evaluation also requires annotated corpora. - Since human annotation is expensive/time-consuming,   the tag sets used in a few existing labeled corpora become the de facto standard . - Tag sets need to capture semantically or syntactically important distinctions that can easily be made by trained human annotators. � 6 CS447: Natural Language Processing (J. Hockenmaier)

Defining an annotation scheme A lot of NLP tasks require systems to map   natural language text to another representation:   POS tagging: Text ⟶ POS tagged text Syntactic Parsing: Text ⟶ parse trees Semantic Parsing: Text ⟶ meaning representations …: Text ⟶ … � 7 CS447: Natural Language Processing (J. Hockenmaier)

Defining an annotation scheme Training and evaluating models for these NLP tasks requires large corpora annotated with the desired representations.   Annotation at scale is expensive, so a few existing corpora and their annotations and annotation schemes (tag sets, etc.) often become the de facto standard for the field. It is difficult to know what the ‘right’ annotation scheme should be for any particular task How difficult is it to achieve high accuracy for that annotation? How useful is this annotation scheme for downstream tasks in the pipeline? ➩ We often can’t know the answer until we’ve annotated a lot of data… � 8 CS447: Natural Language Processing (J. Hockenmaier)

  Word classes Open classes: Nouns, Verbs, Adjectives, Adverbs   Closed classes: Auxiliaries and modal verbs Prepositions, Conjunctions Pronouns, Determiners Particles, Numerals (see Appendix for details) � 9 CS447: Natural Language Processing (J. Hockenmaier)

Defining a tag set Tag sets have different granularities: Brown corpus (Francis and Kucera 1982): 87 tags Penn Treebank (Marcus et al. 1993): 45 tags Simplified version of Brown tag set (de facto standard for English now)   NN: common noun (singular or mass): water , book NNS: common noun (plural): books   Prague Dependency Treebank (Czech): 4452 tags Complete morphological analysis: AAFP3----3N----: nejnezajímav ě j š ím Adjective Regular Feminine Plural Dative….Superlative [Hajic 2006, VMC tutorial] � 10 CS447: Natural Language Processing (J. Hockenmaier)

                  How much ambiguity is there? Most word types are unambiguous: Number of tags per word type:     NB: These numbers are based on word/tag combinations in the corpus. Many combinations that don’t occur in the corpus are equally correct. But a large fraction of word tokens are ambiguous Original Brown corpus: 40% of tokens are ambiguous � 11 CS447: Natural Language Processing (J. Hockenmaier)

Evaluating POS taggers CS447: Natural Language Processing (J. Hockenmaier) � 12

          Evaluating POS taggers Evaluation setup: Split data into separate training, ( dev elopment) and test sets.   T T D D V TRAINING V TRAINING E E or E E S S T T Better setup: n-fold cross validation : Split data into n sets of equal size Run n experiments, using set i to test and remainder to train   This gives average, maximal and minimal accuracies When comparing two taggers : Use the same test and training data with the same tag set � 13 CS447: Natural Language Processing (J. Hockenmaier)

Evaluation metric: test accuracy How many words in the unseen test data   can you tag correctly? State of the art on Penn Treebank: around 97%.   ➩ How many sentences can you tag correctly? Compare your model against a baseline Standard: assign to each word its most likely tag (use training corpus to estimate P(t|w) ) Baseline performance on Penn Treebank: around 93.7%   … and a (human) ceiling How often do human annotators agree on the same tag? Penn Treebank: around 97%   � 14 CS447: Natural Language Processing (J. Hockenmaier)

Is POS-tagging a solved task? Penn Treebank POS-tagging accuracy   ≈ human ceiling   Yes, but: Other languages with more complex morphology   need much larger tag sets for tagging to be useful,   and will contain many more distinct word forms   in corpora of the same size   They often have much lower accuracies � 15 CS447: Natural Language Processing (J. Hockenmaier)

            Qualitative evaluation Generate a confusion matrix (for development data):   How often was a word with tag i mistagged as tag j:   Correct Tags % of errors   caused by   Predicted   mistagging Tags VBN as JJ See what errors are causing problems: - Noun (NN) vs ProperNoun (NNP) vs Adj (JJ) - Preterite (VBD) vs Participle (VBN) vs Adjective (JJ) � 16 CS447: Natural Language Processing (J. Hockenmaier)

Building a POS tagger CS447: Natural Language Processing (J. Hockenmaier) � 17

  Statistical POS tagging She promised to back the bill w = w (1) w (2) w (3) w (4) w (5) w (6)     t = t (1) t (2) t (3) t (4) t (5) t (6)   PRP VBD TO VB DT NN What is the most likely sequence of tags t = t (1) …t (N)   for the given sequence of words w = w (1) …w (N) ? t* = argmax t P ( t | w ) � 18 CS447: Natural Language Processing (J. Hockenmaier)

        POS tagging with generative models P ( t , w ) P ( t | w ) ) = = argmax argmax P ( w ) t t = = argmax P ( t w ) P ( t , w ) argmax t = = P ( t ) P ( w | t ) ( t ) ( w argmax t P ( t , w ): the joint distribution of the labels we want to predict ( t ) and the observed data ( w ). We decompose P ( t , w ) into P ( t ) and P ( w | t ) since these distributions are easier to estimate.   Models based on joint distributions of labels and observed data are called generative models: think of P ( t ) P ( w | t ) as a stochastic process that first generates the labels, and then generates the data we see, based on these labels. � 19 CS447: Natural Language Processing (J. Hockenmaier)

Hidden Markov Models (HMMs) HMMs are the most commonly used generative models   for POS tagging (and other tasks, e.g. in speech recognition) HMMs make specific independence assumptions   when defining P ( t ) and P ( w | t ):   P ( t ) is an n-gram model over tags: Bigram HMM: P ( t ) = P (t (1) ) P (t (2) | t (1) ) P (t (3) | t (2) )… P (t (N) | t (N-1) ) Trigram HMM: P ( t ) = P (t (1) ) P (t (2) | t (1) ) P (t (3) | t (2) ,t (1) )… P (t (n) | t (N-1) ,t (N-2) ) P (t i | t j ) or P (t i | t j ,t k ) are called transition probabilities In P ( w | t ) each word is generated by its tag: P ( w | t ) = P (w (1) | t (1) ) P (w (2) | t (2) )… P (w (N) | t (N) ) P (w | t) are called emission probabilities � 20 CS447: Natural Language Processing (J. Hockenmaier)

HMMs as probabilistic automata An HMM defines   Transition probabilities: able P( t i | t j ) ... 0.01 Emission probabilities: 0.4 ... zealous P( w i | t i ) ... acts 0.003 the 0.45 yields 0.6 JJ ... a 0.5 0.7 0.02 0.2 0.001 0.55 0.3 0.1 every DT NN VBZ 0.00024 0.002 0.1 0.1 some ... abandonment zone no ... � 21 CS447: Natural Language Processing (J. Hockenmaier)

  How would the automaton for a trigram HMM with transition probabilities P (t i | t j t k ) look like? What about unigrams   or n-grams? ??? ??? � 22 CS447: Natural Language Processing (J. Hockenmaier)

Encoding a trigram model as FSA JJ JJ <S> q0 DT NN VBZ VBZ DT NN Bigram model: States = Tag Unigrams JJ_JJ Trigram model: States = Tag Bigrams JJ_DT NN_JJ DT_<S> <S> NN_NN VBZ_NN NN_DT � 23 CS447: Natural Language Processing (J. Hockenmaier)

Lecture 5: Part-of-Speech Tagging Julia Hockenmaier - PowerPoint PPT Presentation

CS447: Natural Language Processing http://courses.engr.illinois.edu/cs447 Lecture 5: Part-of-Speech Tagging Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center POS tagging Tagset: NNP: proper noun POS tagger CD: numeral, JJ:

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

Syntactic Processing: Parts-of-Speech Tagging CSE354 - Spring 2020 Task Syntactic

Lecture 09: Part-of-Speech Tagging Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Examining Racial and Gender Wealth Inequity: How Public Policy Promotes and Prevents Shared

No/fied Access: Extending Remote Memory Access Programming

opencypher.org | opencypher@googlegroups.com opencypher.org |

Parallel Programming Libraries and implementations Funding Partners bioexcel.eu Reusing this

Talk the Second Matthew Turk Questions Collaborations Future directions yt

Outline Why RDF (in general)? Why RDF as a universal healthcare exchange language? 2

Status of Spack Development / Migration LArSoft Coordination Meeting 2018-05-22 Chris Green,

Internet Governance in February 2017 28 February 2017 Main events in February 2-3, 6-8 Feb: ITU

Lecture 5: Part-of-Speech Tagging Julia Hockenmaier - PowerPoint PPT Presentation

CS447: Natural Language Processing http://courses.engr.illinois.edu/cs447 Lecture 5: Part-of-Speech Tagging Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center POS tagging Tagset: NNP: proper noun POS tagger CD: numeral, JJ:

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning &amp; H.

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

Syntactic Processing: Parts-of-Speech Tagging CSE354 - Spring 2020 Task Syntactic

Lecture 09: Part-of-Speech Tagging Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Examining Racial and Gender Wealth Inequity: How Public Policy Promotes and Prevents Shared

No/fied Access: Extending Remote Memory Access Programming

opencypher.org | opencypher@googlegroups.com opencypher.org |

Parallel Programming Libraries and implementations Funding Partners bioexcel.eu Reusing this

Talk the Second Matthew Turk Questions Collaborations Future directions yt

Outline Why RDF (in general)? Why RDF as a universal healthcare exchange language? 2

Status of Spack Development / Migration LArSoft Coordination Meeting 2018-05-22 Chris Green,

Internet Governance in February 2017 28 February 2017 Main events in February 2-3, 6-8 Feb: ITU

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.