Lecture 9: Part of Speech Kai-Wei Chang CS @ University of Virginia - - PowerPoint PPT Presentation

lecture 9 part of speech
SMART_READER_LITE
LIVE PREVIEW

Lecture 9: Part of Speech Kai-Wei Chang CS @ University of Virginia - - PowerPoint PPT Presentation

Lecture 9: Part of Speech Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501 Natural Language Processing 1 This lecture v Parts of speech (POS) v POS Tagsets CS6501 Natural


slide-1
SLIDE 1

Lecture 9: Part of Speech

Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16

1 CS6501 Natural Language Processing

slide-2
SLIDE 2

This lecture

v Parts of speech (POS) v POS Tagsets

2 CS6501 Natural Language Processing

slide-3
SLIDE 3

CS6501 Natural Language Processing 3

Parts of Speech

v Traditional parts of speech

v~ 8 of them

slide-4
SLIDE 4

CS6501 Natural Language Processing 4

POS examples

v N noun chair, bandwidth, pacing v V verb study, debate, munch v ADJ adjective purple, tall, ridiculous v ADV adverb unfortunately, slowly v P preposition of, by, to v PRO pronoun I, me, mine v DET determiner the, a, that, those

slide-5
SLIDE 5

CS6501 Natural Language Processing 5

Parts of Speech

v A.k.a. parts-of-speech, lexical categories, word classes, morphological classes, lexical tags... v Lots of debate within linguistics about the number, nature, and universality of these

slide-6
SLIDE 6

CS6501 Natural Language Processing 6

POS Tagging

v The process of assigning a part-of-speech to each word in a collection (sentence).

WORD tag the DET koala N put V the DET keys N

  • n

P the DET table N

slide-7
SLIDE 7

CS6501 Natural Language Processing 7

Why is POS Tagging Useful?

v First step of a vast number of practical tasks v Parsing

v Need to know if a word is an N or V before you can parse

v Information extraction

v Finding names, relations, etc.

v Speech synthesis/recognition

v OBject

  • bJECT

v OVERflow

  • verFLOW

v DIScount disCOUNT v CONtent conTENT

v Machine Translation

slide-8
SLIDE 8

CS6501 Natural Language Processing 8

Open and Closed Classes

v Closed class: a small fixed membership

v Prepositions: of, in, by, … v Pronouns: I, you, she, mine, his, them, … v Usually function words (short common words which play a role in grammar)

v Open class: new ones can be created

v English has 4: Nouns, Verbs, Adjectives, Adverbs v Many languages have these 4, but not all!

slide-9
SLIDE 9

CS6501 Natural Language Processing 9

Open Class Words

v Nouns

v Proper nouns (Boulder, Granby, Eli Manning) v Common nouns (the rest). v Count nouns and mass nouns v Count: have plurals, get counted: goat/goats, one goat, two goats v Mass: don’t get counted (snow, salt, communism) (*two snows)

v Verbs

v In English, have morphological affixes (eat/eats/eaten)

slide-10
SLIDE 10

CS6501 Natural Language Processing 10

Closed Class Words

Examples:

vprepositions: on, under, over, … vparticles: up, down, on, off, … vdeterminers: a, an, the, … vpronouns: she, who, I, .. vconjunctions: and, but, or, … vauxiliary verbs: can, may should, … vnumerals: one, two, three, third, …

slide-11
SLIDE 11

CS6501 Natural Language Processing 11

Prepositions from CELEX

CELEX: online dictionary Frequency counts are from COBUILD 16-billion-word corpus

slide-12
SLIDE 12

CS6501 Natural Language Processing 12

English Particles

slide-13
SLIDE 13

CS6501 Natural Language Processing 13

Conjunctions

slide-14
SLIDE 14

CS6501 Natural Language Processing 14

Choosing a Tagset

v Could pick very coarse tagsets v N, V, Adj, Adv, Other v More commonly used set is finer grained v E.g., “Penn TreeBank tagset”, 45 tags: PRP$, WRB, WP$, VBG v Brown cropus, 87 tags.

v Prague Dependency Treebank (Czech)

v 4452 tags v AAFP3----3N----: (nejnezajímavějším)

Adj Regular Feminine Plural….Superlative [Hajic 2006, VMC tutorial]

slide-15
SLIDE 15

CS6501 Natural Language Processing 15

Penn TreeBank POS Tagset

slide-16
SLIDE 16

CS6501 Natural Language Processing 16

Using the Penn Tagset

v The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN

  • f/IN other/JJ topics/NNS ./.
slide-17
SLIDE 17

Universal Tag set

v ~ 12 different tags

vNOUN, VERB, ADJ, ADV, PRON, DET, ADP, NUM, CONJ, PRT, “.”, X

CS6501 Natural Language Processing 17

slide-18
SLIDE 18

CS6501 Natural Language Processing 18

POS Tagging v.s. Word clustering

v Words often have more than one POS: back

vThe back door = JJ vOn my back = NN vWin the voters back = RB vPromised to back the bill = VB

These examples from Dekang Lin

slide-19
SLIDE 19

CS6501 Natural Language Processing 19

How Hard is POS Tagging?

slide-20
SLIDE 20

POS tag sequences

v Some tag sequences more likely occur than others v POS Ngram view https://books.google.com/ngrams/graph?co ntent=_ADJ_+_NOUN_%2C_ADV_+_NOU N_%2C+_ADV_+_VERB_

CS6501 Natural Language Processing 20

Existing methods often model POS tagging as a sequence tagging problem

slide-21
SLIDE 21

Evaluation

v How many words in the unseen test data can be tagged correctly? v Usually evaluated on Penn Treebank

vState of the art ~97% vTrivial baseline (most likely tag) ~94% vHuman performance ~97%

CS6501 Natural Language Processing 21