Forewords Tagging in a nutshell Sources Slides inspired by M. - PDF document

Tagging in a nutshell Tagging in a nutshell Forewords Tagging in a nutshell Sources ◮ Slides inspired by M. Rajman and J.-C. Chappelier, EPFL Vincent Claveau Vocabulary ◮ tagging, French: ´ etiquetage IRISA - CNRS ◮ tag, Fr. ´ Rennes, France etiquette ◮ Part-Of-Speech (partie-du-discours), morpho-syntactic Master 2 RI/BIG/MTIBH : DSS/ADT categories, grammatical categories Tagging in a nutshell Tagging in a nutshell Generalities The big picture Outline Generalities What do we want to do? ◮ assigns a sequence of symbols (tags) to a sequence of symbols Symbolic approaches (sentence) ◮ usually: one output symbol for each input symbol Stochastic approaches ◮ in this course, only one/two well-known approaches Zoom on TreeTagger In the framework of the track P4 ◮ cf. HMM and other stochastic approaches Conclusion Tagging in a nutshell Tagging in a nutshell Generalities Generalities About the tagging task About lemmatization Goals Reducing word-forms to their lemmas ◮ associate morpho-syntactic information to each word-form ◮ lemma = canonical form of word-forms that only differs by ◮ finite number of tags: tagset inflection ◮ canonical form is language-dependant and arbitrary ◮ collateral advantage: lemmatization Examples (French) Interests ◮ ADJ: m´ ◮ generalization: suppress morphological and lexical variability edicales → m´ edical ◮ NOUN: chiens → chien , but chiennes → chienne ◮ reducing the vocabulary size

Tagging in a nutshell Tagging in a nutshell Generalities Generalities Examples of tagged texts 1/2 Examples of tagged texts 2/2 Vous/PRV:pl faites/VCJ:pl preuve/SBC:sg de/PREP mesure/SBC:sg dans/PREP vos/DTN:pl propos/SBC:pl ,/, ===== D´ et/COO votre/DTN:sg discours/SBC:sg est/ECJ:sg toujours/ADV empreint/ADJ1PAR:sg de/PREP EBUT DE PHRASE ===== 1 3 6 Bien sˆ ur bien sˆ ur ADV 0x0000 Rgp - H 1 oblige 2 3 6 , , PCTFAIB - Ypw - H 1 oblige r´ eserve/SBC:sg ./. Vous/PRV:pl n’/ADV ˆ etes/ECJ:pl certainement/ADV pas/ADV indiff´ erent/SBC:sg ,/, 3 3 6 rien rien A2 PII 0xE080 Pi-.sn 3—3 S 1 oblige mais/COO peu/ADV expansif/SBC:pl ./. Votre/DTN:sg approche/SBC:sg plutˆ ot/ADV formaliste/SBC:sg 4 3 6 n’ ne A2 ADV 0x0200 Rpn 5 V 1 oblige 5 3 6 oblige obliger A5 VINDP3S - Vmip3s 5 V 1 oblige peut/VCJ:sg amener/VNCFF vos/DTN:pl interlocuteurs/SBC:pl ` a/PREP penser/VNCFF que/SUB vous/PRV:pl 6 3 6 un un A3 DETIMS 0xA000 Da-ms-i 7—7 D 1 oblige 7 3 6 site Web site web NCMS 0xA040 Ncms 7—7 D 1 oblige portez/VCJ:pl une/DTN:sg grande/ADJ:sg attention/SBC:sg aux/DTC:pl conventions/SBC:pl ou/COO 8 3 6 a ` a ` PREP 0x0000 Sp 9 F 1 oblige 9 3 6 choisir choisir VINF - Vmn– 9 F 1 oblige aux/DTC:pl usages/SBC:pl ./. Votre/DTN:sg comportement/SBC:sg peut/VCJ:sg ,/, par/PREP contre/PREP ,/, 10 3 6 un un A3 DETIMS 0xA000 Da-ms-i 11 D 1 oblige 11 3 6 nom nom NCMS 0xA040 Ncms 11 D 1 oblige paraˆ ıtre/VNCFF assez/ADV ferm´ e/ADJ2PAR:sg ` a/PREP ceux/PRO:pl qui/REL ont/ACJ:pl coutume/ADJ:sg 12 3 6 en en A3 PREP 0x0000 Sp 13 H 1 oblige 13 3 6 www www NCI 0xF020 Nc.. 13 H 1 oblige de/PREP r´ eagir/VNCFF spontan´ ement/ADV ./. Votre/DTN:sg approche/SBC:sg s´ erieuse/ADJ:sg peut/VCJ:sg 14 3 6 : : PCTFORTE - Yps - - 0 ===== FIN DE PHRASE ===== amener/VNCFF vos/DTN:pl interlocuteurs/SBC:pl ` a/PREP penser/VNCFF que/SUB vous/PRV:pl consid´ erez/VCJ:pl le/DTN:sg temps/SBC:sg comme/SUB un/DTN:sg... Tagging in a nutshell Tagging in a nutshell Generalities Generalities Problems Problems Ambiguities ◮ most words are polyfunctional Unknown word-form ◮ Ex Fr.: r` ◮ named entities egle common noun, verb indicative 1st person, 3rd person, subjunctive... ◮ person names, places, companies... ◮ imports ◮ depends on the tagset ◮ words, phrases or sentences from another language: leasing ... ◮ specialized terms Contextual disambiguation ◮ from specialized domains: parenth´ ◮ use context to choose the most reliable part-of-speech esage, kinesim´ etrie ... ◮ language register ◮ je r` egle la longueur avec la r` egle ◮ hard task, not always possible ◮ je la kiffe ` a donf, un fruit sur ◮ la belle ferme le voile ◮ la petite brise la glace Tagging in a nutshell Tagging in a nutshell Generalities Generalities Formalization of the task Evaluation Sequence to sequence Comparison with ground-truth ◮ foreach word, given its context, find the correct tag ◮ human annotation ◮ correct means there exists a ground truth (given by a human ◮ costly: only a big-enough abstract of the corpus expert), but even humans may disagree in some cases Standard measures Two families of approaches ◮ precision (sometimes recall) ◮ symbolic: Brill’s tagger ◮ possibly evaluation category by category ◮ stochastic: Multext tagger (HMM)

Tagging in a nutshell Tagging in a nutshell Generalities Generalities Evaluation Evaluation Exercise Exercise ◮ from example 1, compute precison of recall of the tagger on ◮ from example 1, compute precison and recall of the tagger on the common noun (SBC) common nouns (SBC) ◮ R = 14/15, P = 14/17 ◮ how could you easily obtain a 100% recall? ◮ how could you easily obtain a 100% recall? ◮ what would the precision then? ◮ what would the precision then? ◮ how could you easily obtain a 100% precision? ◮ how could you easily obtain a 100% precision? ◮ what would the recall then? ◮ what would the recall then? Tagging in a nutshell Tagging in a nutshell Symbolic approaches Symbolic approaches Brill’s tagger Outline Outline Generalities Generalities Symbolic approaches Symbolic approaches Brill’s tagger Transducers Stochastic approaches Stochastic approaches Zoom on TreeTagger Zoom on TreeTagger Conclusion Conclusion Tagging in a nutshell Tagging in a nutshell Symbolic approaches Symbolic approaches Brill’s tagger Brill’s tagger Brill’s tagger Overview Well-known approach ◮ much used during the 90’s ◮ freely available, developed for many languages ◮ conceptually simple Error-driven transformation based tagger ◮ error-driven → supervised learning ◮ transformation based → using induced transformation rules

Tagging in a nutshell Tagging in a nutshell Symbolic approaches Symbolic approaches Brill’s tagger Brill’s tagger Brill’s algorithm Brill’s algorithm Learning transformation rules Input ◮ for each rule, compute a score = # errors before applying the ◮ PoS lexicon: for each word-form, list of all the possible tags rule minus # errors after ◮ choose the best rule, add it to the rule base Initialization ◮ repeat while rules with score > threshold are proposed ◮ for known words (ie. in the lexicon): most frequent tag for this word-form Type of transformation rules ◮ for unknown words ◮ lexical: assign a tag to an unknown word (not in lexicon) ◮ 1992: proper noun for words with a capital, noun for others ◮ 1994: machine learning of “guessing rules” ◮ contextual: change the tag of a given word based on its context Tagging in a nutshell Tagging in a nutshell Symbolic approaches Symbolic approaches Brill’s tagger Transducers Brill’s algorithm Outline Examples of rules Generalities ◮ lexical: if condition then word ← tag ◮ suffix(word) = x or xy or xyz Symbolic approaches ◮ prefix(word) = x or xy or xyz Brill’s tagger ◮ word contains character x Transducers ◮ suppressing prefix/suffix gives a known word ◮ word is preceded by w’ (fixed for a given rule) ◮ contextual: if condition then tag ← tag Stochastic approaches ◮ (1st/2nd/3rd) tag before/after word is X ◮ tag bigram before/after word is YZ Zoom on TreeTagger ◮ preceding or next word before/after is W’ ◮ word is W and preceding or next word is W’ ◮ word is W and preceding or next tag is Z Conclusion Tagging in a nutshell Tagging in a nutshell Symbolic approaches Stochastic approaches Transducers Inferring transducers Outline Generalities Symbolic approaches To be done not in the course for this year Stochastic approaches Zoom on TreeTagger Conclusion

Forewords Tagging in a nutshell Sources Slides inspired by M. - PDF document

Tagging in a nutshell Tagging in a nutshell Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier, EPFL Vincent Claveau Vocabulary tagging, French: etiquetage IRISA - CNRS tag, Fr.

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Take a walk on the wild side: the drip-line Forewords Part I. Nuclear forces towards the drip

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Elementary Particle Physics in a Nutshell Elementary Particle Physics in a Nutshell

Design Thinking in a nutshell Alexander Gtze 25.10.2019 Design Thinking in a nutshell

laboratoire de physique Contents Forewords

POLITEKNIK NEGERI MANADO 2018 Table of Content THE TITLE FOREWORDS page ii Table of Content

Android 292 Jrme Pilliet Universit Paris-Est Marne-la-Valle Forewords Dynamic languages

Music Tagging Ryan Curtin LUG@GT Ryan Curtin Music Tagging - p. 1 The Problem You have a

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Memory-Efficient Parallel Computation of Tensor and Matrix Products for Big Tensor Decomposition

MUNICIPAL DEBT ISSUANCE FUNDAMENTALS SESSION 2: You Sold Your Bonds, Now What? SEPTEMBER 16, 2020

Flattened Device Trees for embedded FreeBSD Rafa Jaworowski raj@semihalf.com, raj@FreeBSD.org

OSA Task Tracker Task Task Name Status Note(s) ID MDE Call Center/ Any testing related

1. Physicochemistry of oil and related substances and how these properties influence

On Ability to Autonomously Execute Agent Programs with Sensing Sebastian Sardi na Giuseppe De

Presentation Title AGLF Lawyers Panel Juliet H. Huang Presenters Partner Chapman and Cutler

WELCOME 2018 NJSLA-S District Test Coordinator & District Technology Coordinator Training

Forewords Tagging in a nutshell Sources Slides inspired by M. - PDF document

Tagging in a nutshell Tagging in a nutshell Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier, EPFL Vincent Claveau Vocabulary tagging, French: etiquetage IRISA - CNRS tag, Fr.

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Take a walk on the wild side: the drip-line Forewords Part I. Nuclear forces towards the drip

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2005 References: 1. Speech and

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

Traffic UTM Tagging AdWords WebMaster Tools UTM TAGGING Where does my traffic come from? UTM

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Elementary Particle Physics in a Nutshell Elementary Particle Physics in a Nutshell

Design Thinking in a nutshell Alexander Gtze 25.10.2019 Design Thinking in a nutshell

laboratoire de physique Contents Forewords

POLITEKNIK NEGERI MANADO 2018 Table of Content THE TITLE FOREWORDS page ii Table of Content

Android 292 Jrme Pilliet Universit Paris-Est Marne-la-Valle Forewords Dynamic languages

Music Tagging Ryan Curtin LUG@GT Ryan Curtin Music Tagging - p. 1 The Problem You have a

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Memory-Efficient Parallel Computation of Tensor and Matrix Products for Big Tensor Decomposition

MUNICIPAL DEBT ISSUANCE FUNDAMENTALS SESSION 2: You Sold Your Bonds, Now What? SEPTEMBER 16, 2020

Flattened Device Trees for embedded FreeBSD Rafa Jaworowski raj@semihalf.com, raj@FreeBSD.org

OSA Task Tracker Task Task Name Status Note(s) ID MDE Call Center/ Any testing related

1. Physicochemistry of oil and related substances and how these properties influence

On Ability to Autonomously Execute Agent Programs with Sensing Sebastian Sardi na Giuseppe De

Presentation Title AGLF Lawyers Panel Juliet H. Huang Presenters Partner Chapman and Cutler

WELCOME 2018 NJSLA-S District Test Coordinator &amp; District Technology Coordinator Training

WELCOME 2018 NJSLA-S District Test Coordinator & District Technology Coordinator Training