Part-of-speech tagging A simple but useful form of - PowerPoint PPT Presentation

Part-‑of-‑speech ¡ tagging ¡ A ¡simple ¡but ¡useful ¡form ¡of ¡ linguis1c ¡analysis ¡ ¡ Many slides adapted from slides by Chris Manning

Parts ¡of ¡Speech ¡ • Perhaps ¡star1ng ¡with ¡Aristotle ¡in ¡the ¡West ¡(384–322 ¡BCE), ¡there ¡ was ¡the ¡idea ¡of ¡having ¡parts ¡of ¡speech ¡ • a.k.a ¡lexical ¡categories, ¡word ¡classes, ¡“tags”, ¡POS ¡ • It ¡comes ¡from ¡Dionysius ¡Thrax ¡of ¡Alexandria ¡(c. ¡100 ¡BCE) ¡the ¡ idea ¡that ¡is ¡s1ll ¡with ¡us ¡that ¡there ¡are ¡8 ¡parts ¡of ¡speech ¡ • But ¡actually ¡his ¡8 ¡aren’t ¡exactly ¡the ¡ones ¡we ¡are ¡taught ¡today ¡ • Thrax: ¡noun, ¡verb, ¡ar1cle, ¡adverb, ¡preposi1on, ¡conjunc1on, ¡par1ciple, ¡ pronoun ¡ • School ¡grammar: ¡noun, ¡verb, ¡adjec1ve, ¡adverb, ¡preposi1on, ¡ conjunc1on, ¡pronoun, ¡interjec1on ¡ ¡

Open class (lexical) words Nouns Verbs Adjectives old older oldest Proper Common Main Adverbs slowly IBM cat / cats see Italy snow registered Numbers … more 122,312 one Closed class (functional) Modals Determiners Prepositions the some can to with had … more Conjunctions Particles and or off up Pronouns he its Interjections Ow Eh

Open ¡vs. ¡Closed ¡classes ¡ • Open ¡vs. ¡Closed ¡classes ¡ • Closed: ¡ ¡ • determiners: ¡ a, ¡an, ¡the ¡ • pronouns: ¡ she, ¡he, ¡I ¡ • preposi1ons: ¡ on, ¡under, ¡over, ¡near, ¡by, ¡… ¡ • Why ¡ “ closed ” ? ¡ • Open: ¡ ¡ • Nouns, ¡Verbs, ¡Adjec1ves, ¡Adverbs. ¡ ¡

POS ¡Tagging ¡ • Words ¡oXen ¡have ¡more ¡than ¡one ¡POS: ¡ back ¡ • The ¡ back ¡door ¡= ¡JJ ¡ • On ¡my ¡ back ¡= ¡NN ¡ • Win ¡the ¡voters ¡ back ¡= ¡RB ¡ • Promised ¡to ¡ back ¡the ¡bill ¡= ¡VB ¡ • The ¡POS ¡tagging ¡problem ¡is ¡to ¡determine ¡the ¡POS ¡tag ¡for ¡a ¡ par1cular ¡instance ¡of ¡a ¡word. ¡

POS ¡Tagging ¡ https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html • Input: ¡ ¡ ¡ ¡Plays ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡well ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡with ¡ ¡others ¡ Penn ¡ Treebank ¡ • Ambiguity: ¡ ¡NNS/VBZ ¡UH/JJ/NN/RB ¡IN ¡ ¡ ¡ ¡ ¡ ¡NNS ¡ POS ¡tags ¡ • Output: ¡Plays/VBZ ¡well/RB ¡with/IN ¡others/NNS ¡ • Uses: ¡ • Text-‑to-‑speech ¡(how ¡do ¡we ¡pronounce ¡ “ lead ” ?) ¡ • Can ¡write ¡regexps ¡like ¡(Det) ¡Adj* ¡N+ ¡over ¡the ¡output ¡for ¡phrases, ¡etc. ¡ • As ¡input ¡to ¡or ¡to ¡speed ¡up ¡a ¡full ¡parser ¡ • If ¡you ¡know ¡the ¡tag, ¡you ¡can ¡back ¡off ¡to ¡it ¡in ¡other ¡tasks ¡

POS ¡tagging ¡performance ¡ • How ¡many ¡tags ¡are ¡correct? ¡ ¡(Tag ¡accuracy) ¡ • About ¡97% ¡currently ¡ • But ¡baseline ¡is ¡already ¡90% ¡ • Baseline ¡is ¡performance ¡of ¡stupidest ¡possible ¡method ¡ • Tag ¡every ¡word ¡with ¡its ¡most ¡frequent ¡tag ¡ • Tag ¡unknown ¡words ¡as ¡nouns ¡ • Partly ¡easy ¡because ¡ • Many ¡words ¡are ¡unambiguous ¡ • You ¡get ¡points ¡for ¡them ¡( the, ¡a, ¡ etc.) ¡and ¡for ¡punctua1on ¡marks! ¡

Deciding ¡on ¡the ¡correct ¡part ¡of ¡speech ¡can ¡ be ¡difficult ¡even ¡for ¡people ¡ • Mrs/NNP ¡Shaefer/NNP ¡never/RB ¡got/VBD ¡around/RP ¡to/TO ¡ joining/VBG ¡ particle • All/DT ¡we/PRP ¡gola/VBN ¡do/VB ¡is/VBZ ¡go/VB ¡around/IN ¡the/DT ¡ corner/NN ¡ • Chateau/NNP ¡Petrus/NNP ¡costs/VBZ ¡around/RB ¡250/CD ¡

How ¡difficult ¡is ¡POS ¡tagging? ¡ • About ¡11% ¡of ¡the ¡word ¡types ¡in ¡the ¡Brown ¡corpus ¡are ¡ ambiguous ¡with ¡regard ¡to ¡part ¡of ¡speech ¡ • But ¡they ¡tend ¡to ¡be ¡very ¡common ¡words. ¡E.g., ¡ that ¡ • I ¡know ¡ that ¡he ¡is ¡honest ¡= ¡IN ¡ Prepsition or Subordinating conjunction • Yes, ¡ that ¡play ¡was ¡nice ¡= ¡DT ¡ • You ¡can’t ¡go ¡ that ¡far ¡= ¡RB ¡ • 40% ¡of ¡the ¡word ¡tokens ¡are ¡ambiguous ¡

Part-of-speech tagging A simple but useful form of linguistic analysis

Part-of-speech tagging revisited A simple but useful form of linguistic analysis

Sources ¡of ¡informaAon ¡ • What ¡are ¡the ¡main ¡sources ¡of ¡informa1on ¡for ¡POS ¡tagging? ¡ • Knowledge ¡of ¡neighboring ¡words ¡ • Bill ¡ ¡ ¡ ¡saw ¡ ¡ ¡ ¡ ¡that ¡ ¡man ¡yesterday ¡ • NNP ¡NN ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡DT ¡ ¡ ¡ ¡NN ¡ ¡ ¡NN ¡ • VB ¡ ¡ ¡ ¡ ¡VB(D) ¡ ¡IN ¡ ¡ ¡ ¡ ¡ ¡VB ¡ ¡ ¡ ¡NN ¡ • Knowledge ¡of ¡word ¡probabili1es ¡ • man ¡is ¡rarely ¡used ¡as ¡a ¡verb…. ¡ • The ¡laler ¡proves ¡the ¡most ¡useful, ¡but ¡the ¡former ¡also ¡helps ¡

More ¡and ¡BeDer ¡Features ¡ è è ¡Feature-‑ based ¡tagger ¡ • Can ¡do ¡surprisingly ¡well ¡just ¡looking ¡at ¡a ¡word ¡by ¡itself: ¡ • Word ¡ ¡the: ¡the ¡ → ¡DT ¡ • Lowercased ¡word ¡Importantly: ¡importantly ¡ → ¡RB ¡ • Prefixes ¡ ¡unfathomable: ¡un-‑ ¡ → ¡JJ ¡ • Suffixes ¡ ¡Importantly: ¡-‑ly ¡ → ¡RB ¡ • Capitaliza1on ¡Meridian: ¡CAP ¡ → ¡NNP ¡ • Word ¡shapes ¡35-‑year: ¡d-‑x ¡ → ¡JJ ¡ • Then ¡build ¡a ¡maxent ¡(or ¡whatever) ¡model ¡to ¡predict ¡tag ¡ • Maxent ¡P(t|w): ¡ ¡93.7% ¡overall ¡/ ¡82.6% ¡unknown ¡

How ¡to ¡improve ¡supervised ¡results? ¡ Build better features! • RB PRP VBD IN RB IN PRP VBD . They left as soon as he arrived . • We could fix this with a feature that looked at the next word JJ NNP NNS VBD VBN . Intrinsic flaws remained undetected . • We could fix this by linking capitalized words to their lowercase versions

Tagging ¡Without ¡Sequence ¡InformaAon ¡ Baseline Three Words t 0 t 0 w 0 w -1 w 0 w 1 Model Features Token Unknown Baseline 56,805 93.69% 82.61% 3Words 239,767 96.57% 86.78% Using words only in a straight classifier works as well as a basic (HMM or discriminative) sequence model!!

Overview: POS Tagging Accuracies • Rough accuracies: • Most freq tag: ~90% / ~50% • Maxent P(t|w): 93.7% / 82.6% Most ¡errors ¡ on ¡unknown ¡ • Trigram HMM: ~95% / ~55% words ¡ • MEMM tagger: 96.9% / 86.9% • Bidirectional dependencies: 97.2% / 90.0% • Upper bound: ~98% (human agreement)

Summary ¡of ¡POS ¡Tagging ¡ For ¡tagging, ¡the ¡change ¡from ¡genera1ve ¡(HMM) ¡to ¡discrimina1ve ¡(ME) ¡ model ¡ does ¡not ¡by ¡itself ¡result ¡in ¡great ¡improvement ¡ ¡ One ¡profits ¡from ¡models ¡for ¡specifying ¡dependence ¡on ¡ overlapping ¡ features ¡of ¡the ¡observaAon ¡such ¡as ¡spelling, ¡suffix ¡analysis, ¡etc. ¡ An ¡MEMM ¡allows ¡integra1on ¡of ¡rich ¡features ¡of ¡the ¡observa1ons ¡and ¡ considers ¡dependence ¡with ¡the ¡previous ¡word’s ¡tag, ¡but ¡can ¡suffer ¡ strongly ¡from ¡assuming ¡independence ¡from ¡following ¡observa1ons; ¡this ¡ effect ¡can ¡be ¡relieved ¡by ¡adding ¡dependence ¡on ¡following ¡words. ¡ This ¡addi1onal ¡power ¡(of ¡the ¡CRF, ¡Structured ¡Perceptron ¡models) ¡has ¡been ¡ shown ¡to ¡result ¡in ¡improvements ¡in ¡accuracy ¡ The ¡ higher ¡accuracy ¡of ¡discrimina1ve ¡models ¡comes ¡at ¡the ¡price ¡of ¡ much ¡ slower ¡training ¡

Part-of-speech tagging revisited A simple but useful form of linguistic analysis

Part-of-speech tagging A simple but useful form of - PowerPoint PPT Presentation

Part-of-speech tagging A simple but useful form of linguis1c analysis Many slides adapted from slides by Chris Manning Parts of Speech Perhaps star1ng with

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

Syntactic Processing: Parts-of-Speech Tagging CSE354 - Spring 2020 Task Syntactic

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Metal Fabrication and Finishing Area Source NESHAP (subpart XXXXXX) EPA/MARAMA Air Toxics

Effingham County High School Senior Registration Night January 11, 2018 Senior Registration

Skilled Workforce Development Plan Tiger Manufacturing CURRENT SITUATION Right Now Nationally

KIKAM TECHNICAL INSTITUTE K I M T E C H A PRESENTATION ON THE ROLE OF STAKEHOLDER

National Civil Military Coordination System in Myanmar Presented by Col Nay Myo Hlaing Myanmar

1/17/18 Prince William Sound Regional Citizens Advisory Councils Long Term Environmental

Environmental Conditions Review 600 SOCIAL STREET, WOONSOCKET, RI MAY 29, 2020 Regulatory

PESTICIDE STEWARSHIP ALLIANCE CONFERENCE-2018 Rinsing, Rinsate, Disposal & IBC (Waste

Part-of-speech tagging A simple but useful form of - PowerPoint PPT Presentation

Part-of-speech tagging A simple but useful form of linguis1c analysis Many slides adapted from slides by Chris Manning Parts of Speech Perhaps star1ng with

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning &amp; H.

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

Syntactic Processing: Parts-of-Speech Tagging CSE354 - Spring 2020 Task Syntactic

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier,

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Metal Fabrication and Finishing Area Source NESHAP (subpart XXXXXX) EPA/MARAMA Air Toxics

Effingham County High School Senior Registration Night January 11, 2018 Senior Registration

Skilled Workforce Development Plan Tiger Manufacturing CURRENT SITUATION Right Now Nationally

KIKAM TECHNICAL INSTITUTE K I M T E C H A PRESENTATION ON THE ROLE OF STAKEHOLDER

National Civil Military Coordination System in Myanmar Presented by Col Nay Myo Hlaing Myanmar

1/17/18 Prince William Sound Regional Citizens Advisory Councils Long Term Environmental

Environmental Conditions Review 600 SOCIAL STREET, WOONSOCKET, RI MAY 29, 2020 Regulatory

PESTICIDE STEWARSHIP ALLIANCE CONFERENCE-2018 Rinsing, Rinsate, Disposal &amp; IBC (Waste

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

PESTICIDE STEWARSHIP ALLIANCE CONFERENCE-2018 Rinsing, Rinsate, Disposal & IBC (Waste