Forewords Tagging in a nutshell Sources Slides inspired by M. - - PDF document

forewords tagging in a nutshell
SMART_READER_LITE
LIVE PREVIEW

Forewords Tagging in a nutshell Sources Slides inspired by M. - - PDF document

Tagging in a nutshell Tagging in a nutshell Forewords Tagging in a nutshell Sources Slides inspired by M. Rajman and J.-C. Chappelier, EPFL Vincent Claveau Vocabulary tagging, French: etiquetage IRISA - CNRS tag, Fr.


slide-1
SLIDE 1

Tagging in a nutshell

Tagging in a nutshell

Vincent Claveau

IRISA - CNRS Rennes, France

Master 2 RI/BIG/MTIBH : DSS/ADT

Tagging in a nutshell

Forewords

Sources

◮ Slides inspired by M. Rajman and J.-C. Chappelier, EPFL

Vocabulary

◮ tagging, French: ´

etiquetage

◮ tag, Fr. ´

etiquette

◮ Part-Of-Speech (partie-du-discours), morpho-syntactic

categories, grammatical categories

Tagging in a nutshell

The big picture

What do we want to do?

◮ assigns a sequence of symbols (tags) to a sequence of symbols

(sentence)

◮ usually: one output symbol for each input symbol

◮ in this course, only one/two well-known approaches

In the framework of the track P4

◮ cf. HMM and other stochastic approaches

Tagging in a nutshell Generalities

Outline

Generalities Symbolic approaches Stochastic approaches Zoom on TreeTagger Conclusion

Tagging in a nutshell Generalities

About the tagging task

Goals

◮ associate morpho-syntactic information to each word-form ◮ finite number of tags: tagset ◮ collateral advantage: lemmatization

Interests

◮ generalization: suppress morphological and lexical variability ◮ reducing the vocabulary size

Tagging in a nutshell Generalities

About lemmatization

Reducing word-forms to their lemmas

◮ lemma = canonical form of word-forms that only differs by

inflection

◮ canonical form is language-dependant and arbitrary

Examples (French)

◮ ADJ: m´

edicales → m´ edical

◮ NOUN: chiens → chien, but chiennes → chienne

slide-2
SLIDE 2

Tagging in a nutshell Generalities

Examples of tagged texts 1/2

Vous/PRV:pl faites/VCJ:pl preuve/SBC:sg de/PREP mesure/SBC:sg dans/PREP vos/DTN:pl propos/SBC:pl ,/, et/COO votre/DTN:sg discours/SBC:sg est/ECJ:sg toujours/ADV empreint/ADJ1PAR:sg de/PREP r´ eserve/SBC:sg ./. Vous/PRV:pl n’/ADV ˆ etes/ECJ:pl certainement/ADV pas/ADV indiff´ erent/SBC:sg ,/, mais/COO peu/ADV expansif/SBC:pl ./. Votre/DTN:sg approche/SBC:sg plutˆ

  • t/ADV formaliste/SBC:sg

peut/VCJ:sg amener/VNCFF vos/DTN:pl interlocuteurs/SBC:pl ` a/PREP penser/VNCFF que/SUB vous/PRV:pl portez/VCJ:pl une/DTN:sg grande/ADJ:sg attention/SBC:sg aux/DTC:pl conventions/SBC:pl ou/COO aux/DTC:pl usages/SBC:pl ./. Votre/DTN:sg comportement/SBC:sg peut/VCJ:sg ,/, par/PREP contre/PREP ,/, paraˆ ıtre/VNCFF assez/ADV ferm´ e/ADJ2PAR:sg ` a/PREP ceux/PRO:pl qui/REL ont/ACJ:pl coutume/ADJ:sg de/PREP r´ eagir/VNCFF spontan´ ement/ADV ./. Votre/DTN:sg approche/SBC:sg s´ erieuse/ADJ:sg peut/VCJ:sg amener/VNCFF vos/DTN:pl interlocuteurs/SBC:pl ` a/PREP penser/VNCFF que/SUB vous/PRV:pl consid´ erez/VCJ:pl le/DTN:sg temps/SBC:sg comme/SUB un/DTN:sg... Tagging in a nutshell Generalities

Examples of tagged texts 2/2

===== D´ EBUT DE PHRASE ===== 1 3 6 Bien sˆ ur bien sˆ ur ADV 0x0000 Rgp

  • H

1

  • blige

2 3 6 , , PCTFAIB

  • Ypw
  • H

1

  • blige

3 3 6 rien rien A2 PII 0xE080 Pi-.sn 3—3 S 1

  • blige

4 3 6 n’ ne A2 ADV 0x0200 Rpn 5 V 1

  • blige

5 3 6

  • blige
  • bliger

A5 VINDP3S

  • Vmip3s

5 V 1

  • blige

6 3 6 un un A3 DETIMS 0xA000 Da-ms-i 7—7 D 1

  • blige

7 3 6 site Web site web NCMS 0xA040 Ncms 7—7 D 1

  • blige

8 3 6 ` a ` a PREP 0x0000 Sp 9 F 1

  • blige

9 3 6 choisir choisir VINF

  • Vmn–

9 F 1

  • blige

10 3 6 un un A3 DETIMS 0xA000 Da-ms-i 11 D 1

  • blige

11 3 6 nom nom NCMS 0xA040 Ncms 11 D 1

  • blige

12 3 6 en en A3 PREP 0x0000 Sp 13 H 1

  • blige

13 3 6 www www NCI 0xF020 Nc.. 13 H 1

  • blige

14 3 6 : : PCTFORTE

  • Yps
  • ===== FIN DE PHRASE =====

Tagging in a nutshell Generalities

Problems

Ambiguities

◮ most words are polyfunctional ◮ Ex Fr.: r`

egle common noun, verb indicative 1st person, 3rd

person, subjunctive...

◮ depends on the tagset

Contextual disambiguation

◮ use context to choose the most reliable part-of-speech

◮ je r`

egle la longueur avec la r` egle

◮ hard task, not always possible

◮ la belle ferme le voile ◮ la petite brise la glace Tagging in a nutshell Generalities

Problems

Unknown word-form

◮ named entities

◮ person names, places, companies...

◮ imports

◮ words, phrases or sentences from another language: leasing...

◮ specialized terms

◮ from specialized domains: parenth´

esage, kinesim´ etrie...

◮ language register

◮ je la kiffe `

a donf, un fruit sur

Tagging in a nutshell Generalities

Formalization of the task

Sequence to sequence

◮ foreach word, given its context, find the correct tag ◮ correct means there exists a ground truth (given by a human

expert), but even humans may disagree in some cases Two families of approaches

◮ symbolic: Brill’s tagger ◮ stochastic: Multext tagger (HMM)

Tagging in a nutshell Generalities

Evaluation

Comparison with ground-truth

◮ human annotation ◮ costly: only a big-enough abstract of the corpus

Standard measures

◮ precision (sometimes recall) ◮ possibly evaluation category by category

slide-3
SLIDE 3

Tagging in a nutshell Generalities

Evaluation

Exercise

◮ from example 1, compute precison and recall of the tagger on

common nouns (SBC)

◮ how could you easily obtain a 100% recall?

◮ what would the precision then?

◮ how could you easily obtain a 100% precision?

◮ what would the recall then? Tagging in a nutshell Generalities

Evaluation

Exercise

◮ from example 1, compute precison of recall of the tagger on

the common noun (SBC)

◮ R = 14/15, P = 14/17

◮ how could you easily obtain a 100% recall?

◮ what would the precision then?

◮ how could you easily obtain a 100% precision?

◮ what would the recall then? Tagging in a nutshell Symbolic approaches

Outline

Generalities Symbolic approaches Stochastic approaches Zoom on TreeTagger Conclusion

Tagging in a nutshell Symbolic approaches Brill’s tagger

Outline

Generalities Symbolic approaches Brill’s tagger Transducers Stochastic approaches Zoom on TreeTagger Conclusion

Tagging in a nutshell Symbolic approaches Brill’s tagger

Brill’s tagger

Well-known approach

◮ much used during the 90’s ◮ freely available, developed for many languages ◮ conceptually simple

Error-driven transformation based tagger

◮ error-driven → supervised learning ◮ transformation based → using induced transformation rules

Tagging in a nutshell Symbolic approaches Brill’s tagger

Overview

slide-4
SLIDE 4

Tagging in a nutshell Symbolic approaches Brill’s tagger

Brill’s algorithm

Input

◮ PoS lexicon: for each word-form, list of all the possible tags

Initialization

◮ for known words (ie. in the lexicon): most frequent tag for

this word-form

◮ for unknown words

◮ 1992: proper noun for words with a capital, noun for others ◮ 1994: machine learning of “guessing rules” Tagging in a nutshell Symbolic approaches Brill’s tagger

Brill’s algorithm

Learning transformation rules

◮ for each rule, compute a score = # errors before applying the

rule minus # errors after

◮ choose the best rule, add it to the rule base ◮ repeat while rules with score > threshold are proposed

Type of transformation rules

◮ lexical: assign a tag to an unknown word (not in lexicon) ◮ contextual: change the tag of a given word based on its

context

Tagging in a nutshell Symbolic approaches Brill’s tagger

Brill’s algorithm

Examples of rules

◮ lexical: if condition then word ← tag

◮ suffix(word) = x or xy or xyz ◮ prefix(word) = x or xy or xyz ◮ word contains character x ◮ suppressing prefix/suffix gives a known word ◮ word is preceded by w’ (fixed for a given rule)

◮ contextual: if condition then tag ← tag

◮ (1st/2nd/3rd) tag before/after word is X ◮ tag bigram before/after word is YZ ◮ preceding or next word before/after is W’ ◮ word is W and preceding or next word is W’ ◮ word is W and preceding or next tag is Z Tagging in a nutshell Symbolic approaches Transducers

Outline

Generalities Symbolic approaches Brill’s tagger Transducers Stochastic approaches Zoom on TreeTagger Conclusion

Tagging in a nutshell Symbolic approaches Transducers

Inferring transducers

To be done

not in the course for this year

Tagging in a nutshell Stochastic approaches

Outline

Generalities Symbolic approaches Stochastic approaches Zoom on TreeTagger Conclusion

slide-5
SLIDE 5

Tagging in a nutshell Stochastic approaches

Stochastic framework

◮ sequence of words: W n 1 = w1w2...wn ◮ tag each word of W n 1 ◮ looking for the tag sequence C n 1 = c1c2...cn st

P(c1, ..., cn|w1, ..., wn) is maximal

◮ that is looking for C n 1 = argmax P(C n 1 |W n 1 )

Tagging in a nutshell Stochastic approaches

HMM framework

Rewrite the formula

◮ Bayes: P(C n 1 |W n 1 ) = P(W n

1 |C n 1 )·P(C n 1 )

P(W n

1 ) ◮ note that P(W n

1 ) do not change for the argmax

◮ chain rules

◮ P(C n

1 ) = P(c1) · P(c2|c1) · ... · P(cn|c1...cn−1)

◮ P(W n

1 |C n 1 ) = P(w1|C n 1 )·P(w2|w1, C n 1 )·...·P(wn|w1...wn−1, C n 1 )

Simplifying the hypotheses

◮ limited lexical conditioning: P(wi|...ci...) = P(wi|ci) ◮ limited dependency span: P(ci|c1...ci−1) = P(ci|ci−k, ..., ci−1)

Tagging in a nutshell Stochastic approaches

HMM framework

Final formula

◮ P(W n 1 |C n 1 ) · P(C n 1 ) =

P(W k

1 |C k 1 ) · P(C k 1 ) n i=k+1 P(wi|ci) · P(ci|C i−1 i−k )

Eureka!

It’s a k-th order Markov Model

◮ word-form = observations ◮ PoS tags = hidden states

Tagging in a nutshell Stochastic approaches

HMM framework

Advantages

◮ well-known framework ◮ efficient (and available) algorithms

Two major algorithms

◮ Viterbi: linear O(n) ◮ Baum-Welch: iterative, estimating on observations (∼

non-supervised)

◮ estimated parameters:

P(W k

1 |C k 1 ), P(C k 1 ), P(wi|ci), P(ci|C i−1 i−k )

Tagging in a nutshell Stochastic approaches

HMM framework

Supervised approach

◮ direct counting from hand-tagged texts ◮ pb with missing data / size of training corpus

Unsupervised approach

◮ non-tagged corpus ◮ very dependant on intial conditions

Hybrid approach

◮ initialize with small hand-tagged corpus and complete with

large non-tagged one

Tagging in a nutshell Stochastic approaches

Other possible techniques

Close models

◮ HMM: historical ◮ MaxEnt: Maximal Entropy ◮ CRF: Conditional Random Fields, very popular

Not developed in this course

◮ see course about information extraction ◮ additional slides on the website ◮ MRI: see G. Gravier’s course

slide-6
SLIDE 6

Tagging in a nutshell Zoom on TreeTagger

Outline

Generalities Symbolic approaches Stochastic approaches Zoom on TreeTagger Conclusion

Tagging in a nutshell Zoom on TreeTagger

TreeTagger

Readings

  • cf. provided article

Principle

◮ combines symbolic and stochastic approaches ◮ estimate the probability that a form (without context) has a

certain tag (eg. souris / NOM : 0.51%)

◮ estimate the probability that a word within its context has a

given tag with binary decision trees Learning

◮ the trees are recursively built from a set of known tri-grams

(ie. from a training set)

Tagging in a nutshell Zoom on TreeTagger

Decision Trees in TreeTagger

Tagging in a nutshell Conclusion

Outline

Generalities Symbolic approaches Stochastic approaches Zoom on TreeTagger Conclusion

Tagging in a nutshell Conclusion

Performances

Effective?

◮ plateau reached since several years: 95-98% precision ◮ baseline techniques: 90% ◮ inter-annotator agreement not perfect: hard to outperform

98% Efficient?

◮ rapid tools needed to process huge corpora ◮ learning step usually done once for a language/domain

Tagging in a nutshell Conclusion

Existing tools

French

◮ Brill http://research.microsoft.com/%7Ebrill/ ◮ Cordial http://www.synapse-fr.com ◮ Multext

http://aune.lpl.univ-aix.fr/projects/multext/

◮ TreeTagger http://www.ims.uni-stuttgart.de/

projekte/corplex/TreeTagger/ English

◮ Brill ◮ Multext ◮ TeeTagger ◮ Gate, LinguaStream...

slide-7
SLIDE 7

Tagging in a nutshell Conclusion

Tagging problems

Generic approaches

Algo not specific to PoS tagging problem