POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana - - PowerPoint PPT Presentation

pos tagging
SMART_READER_LITE
LIVE PREVIEW

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana - - PowerPoint PPT Presentation

POS Tagging Definition Tagsets Automatic POS Tagging Bigram tagging MLE POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS Tagging Def. Part of Speech Tagging Definition Tagsets Automatic POS


slide-1
SLIDE 1

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

POS Tagging

L645 / B659

  • Dept. of Linguistics, Indiana University

Fall 2015

1 / 17

slide-2
SLIDE 2

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

  • Def. Part of Speech Tagging

POS Tagging = Assigning word class information to words

ex: the man bought a book determiner noun verb determiner noun

2 / 17

slide-3
SLIDE 3

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Linguistic Questions

◮ How do we divide the text into individual word tokens? ◮ How do we choose a tagset to represent all words? ◮ How do we select appropriate tags for individual

words?

3 / 17

slide-4
SLIDE 4

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Tagsets

Size of tagsets

◮ English:

TOSCA 32 Penn treebank 36 BNC C5 61 Brown 77 LOB 132 London-Lund Corpus 197 TOSCA-ICE 270

◮ Romanian: 614 ◮ Hungarian: ca. 2 100

4 / 17

slide-5
SLIDE 5

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Penn Treebank Tagset

CC

  • Coord. conjunction

RB Adverb CD Cardinal number RBR Adverb, comparative DT Determiner RBS Adverb, superlative EX Existential there RP Particle FW Foreign word SYM Symbol IN

  • Prep. / subord. conj.

TO to JJ Adjective UH Interjection JJR Adjective, comparative VB Verb, base form JJS Adjective, superlative VBD Verb, past tense LS List item marker VBG Verb, gerund / present part. MD Modal VBN Verb, past part. NN Noun, singular or mass VBP Verb, non-3rd p., sing. pres. NNS Noun, plural VBZ Verb, 3rd p. sing. pres. NP Proper noun, singular WDT Wh-determiner NPS Proper noun, plural WP Wh-pronoun PDT Predeterminer WP Possessive wh-pronoun POS Possessive ending WRB Wh-adverb PRP Personal pronoun , Comma PRP$ Possessive pronoun . Sentence-final punctuation

5 / 17

slide-6
SLIDE 6

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Annotating POS Tags

Two fundamentally different approaches:

◮ Start from scratch, find characteristics in words or

context ( = rules) which give indication of word class

◮ e.g., if word ends in ‘‘ion’’, tag it as noun

◮ Accumulate lexicon, disambiguate words with more

than one tag

◮ e.g., possible categories for ‘‘about’’:

preposition, adverb, particle

6 / 17

slide-7
SLIDE 7

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Automatic POS Tagging

Assumption: local context is sufficient

Examples:

◮ for the man: noun or verb? ◮ we will man: noun or verb? ◮ I can put: verb base form or past? ◮ re-cap real quick: adjective or adverb?

7 / 17

slide-8
SLIDE 8

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Bigram Tagging

◮ Basic assumption: POS tag only depends on word itself

and on the POS tag of the previous word

◮ Use lexicon to retrieve ambiguity class for words

◮ e.g., word: beginning, ambiguity class: [JJ, NN, VBG] ◮ For unknown words: use heuristics, e.g. all open class

POS tags

◮ Disambiguation: look for most likely path through

possibilities

8 / 17

slide-9
SLIDE 9

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Bigram Tagging – Example

time flies like an arrow NN S VBZ IN DT NN E VB NNS VB JJ RB

9 / 17

slide-10
SLIDE 10

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Bigram Tagging – Probabilities

P(t1 . . . t5)

=

P(t1 | S)P(w1 | t1)P(t2 | t1)P(w2 | t2) . . . (Note: this is actually P(t1 . . . t5|w1 . . . w5)) green = transition probabilities blue = lexical probabilities

10 / 17

slide-11
SLIDE 11

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Bigram Tagging – Probability Table

Probabilities (in %s from 0 to 100):

P(time | NN) = 7.0727 P(NN | S) = 0.6823 P(IN | NNS) = 21.8302 P(time | VB) = 0.0005 P(VB | S) = 0.5294 P(VB | VBZ) = 0.7002 P(time | JJ) = P(JJ | S) = 0.8033 P (VB | NNS) = 11.1406 P(flies | VBZ) = 0.4754 P(VBZ | NN) = 3.9005 P(RB | VBZ) = 15.0350 P(flies | NNS) = 0.1610 P(VBZ | VB) = 0.0566 P(RB | NNS) = 6.4721 P(like | IN) = 2.6512 P(VBZ | JJ) = 2.0934 P(DT | IN) = 31.4263 P(like | VB) = 2.8413 P(NNS | NN) = 1.6076 P(DT | VB) = 15.2649 P(like | RB) = 0.5086 P(NNS | VB) = 0.6566 P(DT | RB) = 5.3113 P(an | DT) = 1.4192 P(NNS | JJ) = 2.4383 P(NN | DT) = 38.0170 P(arrow | NN) = 0.0215 P(IN | VBZ) = 8.5862 P(E | NN) = 0.2069

11 / 17

slide-12
SLIDE 12

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Bigram Tagging – Counter-Examples

◮ start before

◮ start before the course or start before he is

done

◮ real quick

◮ re-cap real quick or a real quick lunch

◮ barely changed

◮ he was barely changed or he barely changed his

contents

◮ that beginning

◮ that beginning part or that beginning

frightened the students or with that beginning early, he was forced ...

12 / 17

slide-13
SLIDE 13

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Maximum Likelihood Estimation

Simplest way to calculate such probabilities from a corpus: PMLE(tn | tn−1) = C(tn−1 tn)

C(tn−1)

PMLE(wn | tn) = C(wn tn)

C(tn) ◮ Uses relative frequency ◮ Maximizes the probabilities of the corpus

13 / 17

slide-14
SLIDE 14

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Maximum Likelihood Estimation (2)

◮ Not a great estimator: zero probabilities for unseen

events makes them impossible

◮ Need smoothing or discounting method to give minimal

probabilities to unseen events

◮ Simplest possibility: learn from hapax legomena (words

that appear only once)

14 / 17

slide-15
SLIDE 15

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Motivating Hidden Markov Models

Thinking back to Markov models: we are now given a sequence of words and want to find the POS tags

◮ The underlying event of POS tags can be thought of as

generating the words in the sentence

◮ Each state in the Markov model can be a POS tag

◮ We don’t know the correct state sequence (Hidden

Markov Model (HMM))

This requires an additional emission matrix, linking words with POS tags (cf. P(arrow|NN))

15 / 17

slide-16
SLIDE 16

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Example HMM

Assume DET, N, and VB as hidden states, with this transition matrix (A): DET N VB DET 0.01 0.89 0.10 N 0.30 0.20 0.50 VB 0.67 0.23 0.10 ... emission matrix (B): dogs bit the chased a these cats ... DET 0.0 0.0 0.33 0.0 0.33 0.33 0.0 ... N 0.2 0.1 0.0 0.0 0.0 0.0 0.15 ... VB 0.1 0.6 0.0 0.3 0.0 0.0 0.0 ... ... and initial probability matrix (π): DET 0.7 N 0.2 VB 0.1

16 / 17

slide-17
SLIDE 17

POS Tagging Definition Tagsets Automatic POS Tagging

Bigram tagging

MLE HMMs

Using Example HMM

In order to generate words, we:

  • 1. Choose tag/state from π
  • 2. Choose emitted word from relevant row of B
  • 3. Choose transition from relevant row of A
  • 4. Repeat #2 & #3, until we hit a stopping point

◮ keeping track of probabilities as we go along

We could generate all possibilities this way and find the most probable sequence

◮ Want a more efficient way of finding most probable

sequence

17 / 17