Parts of Speech More Fine-Grained Classes More - - PowerPoint PPT Presentation

parts of speech more fine grained classes more fine
SMART_READER_LITE
LIVE PREVIEW

Parts of Speech More Fine-Grained Classes More - - PowerPoint PPT Presentation

Parts of Speech More Fine-Grained Classes More Fine-Grained Classes Actually , I ran home extremely quickly yesterday The closed classes Example of POS tagging The Penn Treebank Part-of-Speech


slide-1
SLIDE 1
slide-2
SLIDE 2

▪ ▪ ▪

▪ ▪ ▪

▪ ▪

slide-3
SLIDE 3

Parts of Speech

slide-4
SLIDE 4

More Fine-Grained Classes

slide-5
SLIDE 5

More Fine-Grained Classes

Actually, I ran home extremely quickly yesterday

slide-6
SLIDE 6

The closed classes

slide-7
SLIDE 7

Example of POS tagging

slide-8
SLIDE 8

The Penn Treebank Part-of-Speech Tagset

slide-9
SLIDE 9

The Universal POS tagset

https://universaldependencies.org

slide-10
SLIDE 10

POS tagging

goal: resolve POS ambiguities

slide-11
SLIDE 11

POS tagging

slide-12
SLIDE 12

Most Frequent Class Baseline

The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy

  • f 92.34%.
slide-13
SLIDE 13

Most Frequent Class Baseline

The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy

  • f 92.34%.
  • 97% tag accuracy achievable by most algorithms

(HMMs, MEMMs, neural networks, rule-based algorithms)

slide-14
SLIDE 14

Why POS tagging

▪ Text-to-speech

▪ record, lead, protest

▪ Lemmatization

▪ saw/V → see, saw/N → saw

▪ Preprocessing for harder disambiguation problems

▪ syntactic parsing ▪ semantic parsing

slide-15
SLIDE 15

Generative sequence labeling: Hidden Markov Models

slide-16
SLIDE 16
  • 1
  • 2
  • n

▪ In real world many events are not

  • bservable

▪ Speech recognition: we observe acoustic features but not the phones ▪ POS tagging: we observe words but not the POS tags

Hidden Markov Models

q1 q2 qn

...

slide-17
SLIDE 17

HMM

From J&M

slide-18
SLIDE 18

HMM example

From J&M

slide-19
SLIDE 19

HMMs:Algorithms

From J&M Forward Viterbi Forward–Backward; Baum–Welch

slide-20
SLIDE 20

HMM tagging as decoding

slide-21
SLIDE 21

HMM tagging as decoding

How many possible choices?

slide-22
SLIDE 22

Part of speech tagging example

Slide credit: Noah Smith

slide-23
SLIDE 23

The Viterbi Algorithm

slide-24
SLIDE 24

The Viterbi Algorithm

slide-25
SLIDE 25

The Viterbi Algorithm

slide-26
SLIDE 26

The Viterbi Algorithm

slide-27
SLIDE 27

Beam search

slide-28
SLIDE 28

HMMs:Algorithms

From J&M Forward Viterbi Forward–Backward; Baum–Welch

slide-29
SLIDE 29

The Forward Algorithm

sum instead of max

slide-30
SLIDE 30

Viterbi

▪ n-best decoding ▪ relationship to sequence alignment ▪

slide-31
SLIDE 31

Extending the HMM Algorithm to Trigrams

slide-32
SLIDE 32

▪ Word shape

▪ lower case → x ▪ upper case → X ▪ numbers → d ▪ punctuation → . ▪ I.M.F → X.X.X ▪ DC10-30 → XXdd-dd

▪ Word shape + consecutive character types are removed

▪ DC10-30 → Xd-d

▪ Prefixes & suffixes

  • s, -ed, ing

Unknown Words

slide-33
SLIDE 33

Brants (2000)

▪ a trigram HMM ▪ handling unknown words ▪ 96.7% on the Penn Treebank

slide-34
SLIDE 34

Generative vs. Discriminative models

▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data ▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes

From Bamman

slide-35
SLIDE 35

Maximum Entropy Markov Models (MEMM)

▪ HMM ▪ MEMM

slide-36
SLIDE 36

Features in a MEMM

slide-37
SLIDE 37

Features in a MEMM

▪ well-dressed

slide-38
SLIDE 38

Decoding and Training MEMMs

slide-39
SLIDE 39

Decoding MEMMs

greedy approach: doesn’t use evidence from future decisions

slide-40
SLIDE 40

Decoding MEMMs

Viterbi ▪ filling the chart with ▪ HMM ▪ MEMM

slide-41
SLIDE 41

Bidirectionality

▪ Label bias or observation bias problem

▪ will/NN to/TO fight/VB

▪ Linear-chain CRF (Lafferty et al. 2001) ▪ A bidirectional version of the MEMM (Toutanova et al. 2003) ▪ bi-LSTM

slide-42
SLIDE 42

Neural sequence tagger

▪ Lample et al. 2016 ▪ Neural Architectures for NER

slide-43
SLIDE 43

Multilingual POS tagging

▪ In morphologically-rich languages like Czech, Hungarian, Turkish

▪ a 250,000 word token corpus of Hungarian has more than twice as many word types as a similarly sized corpus of English ▪ a 10 million word token corpus of Turkish contains four times as many word types as a similarly sized English corpus

▪ ⇒ many UNKs ▪ more information is coded in morphology

slide-44
SLIDE 44

Multilingual POS tagging

▪ In non-word-space languages like Chinese word segmentation is either applied before tagging or done jointly

▪ UNKs are difficult: the majority of unknown words are common nouns and verbs because of extensive compounding

▪ Universal POS tagset accounts for cross-linguistic differences

slide-45
SLIDE 45

Named Entity Recognition

slide-46
SLIDE 46

Named Entity tags

slide-47
SLIDE 47

Ambiguity in NER

slide-48
SLIDE 48

NER as Sequence Labeling

IOB tagging scheme

slide-49
SLIDE 49

A feature-based algorithm for NER

slide-50
SLIDE 50

A feature-based algorithm for NER

▪ gazetteers

▪ a list of place names providing millions of entries for locations with detailed geographical and political information ▪ binary indicator features

slide-51
SLIDE 51

Evaluation of NER

▪ F-score ▪ segmentation is a confound

▪ e.g., American/B-ORG Airlines ▪ 2 errors: false positive for O and a false negative for I-ORG

slide-52
SLIDE 52

HMMs in Automatic Speech Recognition

ssssssssppppeeeeeeetshshshshllllaeaeaebbbbb “speech lab”

slide-53
SLIDE 53

HMMs in Automatic Speech Recognition

w1 w2

Words

s1 s2 s3 s4 s5 s6 s7

Sound types

a1 a2 a3 a4 a5 a6 a7

Acoustic

  • bservations

Language model Acoustic model

slide-54
SLIDE 54