Part-of-Speech Tagging COSI 114 Computational Linguistics James - - PowerPoint PPT Presentation

part of speech tagging
SMART_READER_LITE
LIVE PREVIEW

Part-of-Speech Tagging COSI 114 Computational Linguistics James - - PowerPoint PPT Presentation

Part-of-Speech Tagging COSI 114 Computational Linguistics James Pustejovsky March 17, 2017 Brandeis University Parts of Speech Perhaps starting with Aristotle in the West (384322 BCE) the idea of having parts of speech lexical


slide-1
SLIDE 1

Part-of-Speech Tagging

COSI 114 – Computational Linguistics James Pustejovsky March 17, 2017 Brandeis University

slide-2
SLIDE 2

Parts of Speech

— Perhaps starting with Aristotle in the West

(384–322 BCE) the idea of having parts of speech

  • lexical categories, word classes, “tags”, POS

— Dionysius Thrax of Alexandria (c. 100 BCE):

8 parts of speech

  • Still with us! But his 8 aren’t exactly the ones we

are taught today

– Thrax: noun, verb, article, adverb, preposition, conjunction, participle, pronoun – School grammar: noun, verb, adjective, adverb, preposition, conjunction, pronoun, interjection

slide-3
SLIDE 3

Open class (lexical) words Closed class (functional) Nouns Verbs Proper Common Modals Main Adjectives Adverbs Prepositions Particles Determiners Conjunctions Pronouns … more … more

IBM Italy cat / cats snow see registered can had

  • ld older oldest

slowly to with

  • ff up

the some and or he its

Numbers

122,312

  • ne

Interjections Ow Eh

slide-4
SLIDE 4

Open vs. Closed classes

— Open vs. Closed classes

  • Closed:

– determiners: a, an, the – pronouns: she, he, I – prepositions: on, under, over, near, by, … – Why “closed”?

  • Open:

– Nouns, Verbs, Adjectives, Adverbs.

slide-5
SLIDE 5

POS Tagging

— Words often have more than one POS: back

  • The back door = JJ
  • On my back = NN
  • Win the voters back = RB
  • Promised to back the bill =

VB

— The POS tagging problem is to determine the

POS tag for a particular instance of a word.

slide-6
SLIDE 6

POS Tagging

— Input:

Plays well with others

— Ambiguity: NNS/VBZ UH/JJ/NN/RB IN NNS — Output: Plays/VBZ well/RB with/IN others/NNS — Uses:

  • MT: reordering of adjectives and nouns (say from Spanish to

English)

  • Text-to-speech (how do we pronounce “lead”?)
  • Can write regexps like (Det) Adj* N+ over the output for

phrases, etc.

  • Input to a syntactic parser

Penn Treebank POS tags

slide-7
SLIDE 7

The Penn TreeBank Tagset

7

slide-8
SLIDE 8

Penn Treebank tags

8

slide-9
SLIDE 9

POS tagging performance

— How many tags are correct? (Tag accuracy)

  • About 97% currently
  • But baseline is already 90%

– Baseline is performance of stupidest possible method

– Tag every word with its most frequent tag – Tag unknown words as nouns

  • Partly easy because

– Many words are unambiguous – You get points for them (the, a, etc.) and for punctuation marks!

slide-10
SLIDE 10

Deciding on the correct part of speech can be difficult even for people

— Mrs/NNP Shaefer/NNP never/RB got/

VBD around/RP to/TO joining/VBG

— All/DT we/PRP gotta/VBN do/VB is/VBZ

go/VB around/IN the/DT corner/NN

— Chateau/NNP Petrus/NNP costs/VBZ

around/RB 250/CD

slide-11
SLIDE 11

How difficult is POS tagging?

— About 11% of the word types in the

Brown corpus are ambiguous with regard to part of speech

— But they tend to be very common words.

E.g., that

  • I know that he is honest = IN
  • Yes, that play was nice = DT
  • You can’t go that far = RB

— 40% of the word tokens are ambiguous

slide-12
SLIDE 12

Sources of information

— What are the main sources of information

for POS tagging?

  • Knowledge of neighboring words

– Bill saw that man yesterday – NNP NN DT NN NN – VB VB(D) IN VB NN

  • Knowledge of word probabilities

– man is rarely used as a verb….

— The latter proves the most useful, but the

former also helps

slide-13
SLIDE 13

More and Better Features è Feature-based tagger

— Can do surprisingly well just looking at a

word by itself:

  • Word

the: the → DT

  • Lowercased word

Importantly: importantly → RB

  • Prefixes

unfathomable: un- → JJ

  • Suffixes

Importantly: -ly → RB

  • Capitalization Meridian: CAP → NNP
  • Word shapes 35-year: d-x → JJ

— Then build a classifier to predict tag

  • Maxent P(t|w): 93.7% overall / 82.6% unknown
slide-14
SLIDE 14

Overview: POS Tagging Accuracies

— Rough accuracies:

  • Most freq tag:

~90% / ~50%

  • Trigram HMM:

~95% / ~55%

  • Maxent P(t|w):

93.7% / 82.6%

  • TnT (HMM++):

96.2% / 86.0%

  • MEMM tagger:

96.9% / 86.9%

  • Bidirectional dependencies:

97.2% / 90.0%

  • Upper bound:

~98% (human agreement) Most errors

  • n unknown

words

slide-15
SLIDE 15

POS tagging as a sequence classification task

— We are given a sentence (an “observation”

  • r “sequence of observations”)
  • Secretariat is expected to race tomorrow
  • She promised to back the bill

— What is the best sequence of tags which

corresponds to this sequence of

  • bservations?

— Probabilistic view:

  • Consider all possible sequences of tags
  • Out of this universe of sequences, choose the tag

sequence which is most probable given the

  • bservation sequence of n words w1…wn.
slide-16
SLIDE 16

How do we apply classification to sequences?

slide-17
SLIDE 17

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier NNP

slide-18
SLIDE 18

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier VBD

slide-19
SLIDE 19

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier DT

slide-20
SLIDE 20

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier NN

slide-21
SLIDE 21

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier CC

slide-22
SLIDE 22

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier VBD

slide-23
SLIDE 23

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier TO

slide-24
SLIDE 24

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier VB

slide-25
SLIDE 25

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier PRP

slide-26
SLIDE 26

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier IN

slide-27
SLIDE 27

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier DT

slide-28
SLIDE 28

Sequence Labeling as Classification

— Classify each token independently but use

as input features, information about the surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier NN

slide-29
SLIDE 29

Sequence Labeling as Classification Using Outputs as Inputs

— Better input features are usually the

categories of the surrounding tokens, but these are not available yet.

— Can use category of either the preceding

  • r succeeding tokens by going forward or

back and using previous output.

Slide from Ray Mooney

slide-30
SLIDE 30

Forward Classification

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier NNP

slide-31
SLIDE 31

Forward Classification

Slide from Ray Mooney

NNP

John saw the saw and decided to take it to the table.

classifier VBD

slide-32
SLIDE 32

Forward Classification

Slide from Ray Mooney

NNP VBD John saw the saw and decided to take it to the table.

classifier DT

slide-33
SLIDE 33

Forward Classification

Slide from Ray Mooney

NNP

VBD DT John saw the saw and decided to take it to the table.

classifier NN

slide-34
SLIDE 34

Forward Classification

Slide from Ray Mooney

NNP VBD DT NN John saw the saw and decided to take it to the table.

classifier CC

slide-35
SLIDE 35

Forward Classification

Slide from Ray Mooney

NNP

VBD DT NN CC John saw the saw and decided to take it to the table.

classifier VBD

slide-36
SLIDE 36

Forward Classification

Slide from Ray Mooney

NNP

VBD DT NN CC VBD John saw the saw and decided to take it to the table.

classifier TO

slide-37
SLIDE 37

Forward Classification

Slide from Ray Mooney

NNP

VBD DT NN CC VBD TO John saw the saw and decided to take it to the table.

classifier VB

slide-38
SLIDE 38

Backward Classification

— Disambiguating “to” in this case would be

even easier backward.

Slide from Ray Mooney

DT NN John saw the saw and decided to take it to the table.

classifier IN

slide-39
SLIDE 39

Backward Classification

— Disambiguating “to” in this case would be

even easier backward.

Slide from Ray Mooney

IN DT NN John saw the saw and decided to take it to the table.

classifier PRP

slide-40
SLIDE 40

Backward Classification

— Disambiguating “to” in this case would be even

easier backward.

PRP IN DT NN John saw the saw and decided to take it to the table.

classifier VB

slide-41
SLIDE 41

Backward Classification

— Disambiguating “to” in this case would be

even easier backward.

Slide from Ray Mooney

VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier TO

slide-42
SLIDE 42

Backward Classification

— Disambiguating “to” in this case would be

even easier backward.

Slide from Ray Mooney

TO VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier VBD

slide-43
SLIDE 43

Backward Classification

— Disambiguating “to” in this case would be even

easier backward.

Slide from Ray Mooney

VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier CC

slide-44
SLIDE 44

Backward Classification

— Disambiguating “to” in this case would be

even easier backward.

Slide from Ray Mooney

CC VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier VBD

slide-45
SLIDE 45

Backward Classification

— Disambiguating “to” in this case would be

even easier backward.

Slide from Ray Mooney

VBD CC VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier DT

slide-46
SLIDE 46

Backward Classification

— Disambiguating “to” in this case would be even

easier backward.

Slide from Ray Mooney

DT VBD CC VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier VBD

slide-47
SLIDE 47

Backward Classification

— Disambiguating “to” in this case would be

even easier backward.

Slide from Ray Mooney

VBD DT VBD CC VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier NNP

slide-48
SLIDE 48

The Maximum Entropy Markov Model (MEMM)

— A sequence version of the logistic

regression (also called maximum entropy) classifier.

— Find the best series of tags:

48

slide-49
SLIDE 49

The Maximum Entropy Markov Model (MEMM)

49

will

MD VB

Janet back the bill

NNP

<s>

wi wi+1 wi-1 ti-1 ti-2 wi-1

slide-50
SLIDE 50

Features for the classifier at each tag

50

will

MD VB

Janet back the bill

NNP

<s>

wi wi+1 wi-1 ti-1 ti-2 wi-1

slide-51
SLIDE 51

More features

51

slide-52
SLIDE 52

MEMM computes the best tag sequence

52

slide-53
SLIDE 53

MEMM Decoding

— Simplest algorithm: — What we use in practice: The Viterbi

algorithm

— A version of the same dynamic programming

algorithm we used to compute minimum edit distance.

53

slide-54
SLIDE 54

The Stanford Tagger

— Is a bidirectional version of the MEMM

called a cyclic dependency network

— Stanford tagger:

  • http://nlp.stanford.edu/software/tagger.shtml

54