Parts of Speech More Fine-Grained Classes More - - PowerPoint PPT Presentation
Parts of Speech More Fine-Grained Classes More - - PowerPoint PPT Presentation
Parts of Speech More Fine-Grained Classes More Fine-Grained Classes Actually , I ran home extremely quickly yesterday The closed classes Example of POS tagging The Penn Treebank Part-of-Speech
▪ ▪ ▪
▪ ▪ ▪
▪
▪
▪ ▪
Parts of Speech
More Fine-Grained Classes
More Fine-Grained Classes
Actually, I ran home extremely quickly yesterday
The closed classes
Example of POS tagging
The Penn Treebank Part-of-Speech Tagset
The Universal POS tagset
https://universaldependencies.org
POS tagging
goal: resolve POS ambiguities
POS tagging
Most Frequent Class Baseline
The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy
- f 92.34%.
Most Frequent Class Baseline
The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy
- f 92.34%.
- 97% tag accuracy achievable by most algorithms
(HMMs, MEMMs, neural networks, rule-based algorithms)
Why POS tagging
▪ Text-to-speech
▪ record, lead, protest
▪ Lemmatization
▪ saw/V → see, saw/N → saw
▪ Preprocessing for harder disambiguation problems
▪ syntactic parsing ▪ semantic parsing
Generative sequence labeling: Hidden Markov Models
- 1
- 2
- n
▪ In real world many events are not
- bservable
▪ Speech recognition: we observe acoustic features but not the phones ▪ POS tagging: we observe words but not the POS tags
Hidden Markov Models
q1 q2 qn
...
HMM
From J&M
HMM example
From J&M
HMMs:Algorithms
From J&M Forward Viterbi Forward–Backward; Baum–Welch
HMM tagging as decoding
HMM tagging as decoding
How many possible choices?
Part of speech tagging example
Slide credit: Noah Smith
The Viterbi Algorithm
The Viterbi Algorithm
The Viterbi Algorithm
The Viterbi Algorithm
Beam search
HMMs:Algorithms
From J&M Forward Viterbi Forward–Backward; Baum–Welch
The Forward Algorithm
sum instead of max
Viterbi
▪ n-best decoding ▪ relationship to sequence alignment ▪
Extending the HMM Algorithm to Trigrams
▪ Word shape
▪ lower case → x ▪ upper case → X ▪ numbers → d ▪ punctuation → . ▪ I.M.F → X.X.X ▪ DC10-30 → XXdd-dd
▪ Word shape + consecutive character types are removed
▪ DC10-30 → Xd-d
▪ Prefixes & suffixes
▪
- s, -ed, ing
▪
Unknown Words
Brants (2000)
▪ a trigram HMM ▪ handling unknown words ▪ 96.7% on the Penn Treebank
Generative vs. Discriminative models
▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data ▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes
From Bamman
Maximum Entropy Markov Models (MEMM)
▪ HMM ▪ MEMM
Features in a MEMM
Features in a MEMM
▪ well-dressed
Decoding and Training MEMMs
Decoding MEMMs
greedy approach: doesn’t use evidence from future decisions
Decoding MEMMs
Viterbi ▪ filling the chart with ▪ HMM ▪ MEMM
Bidirectionality
▪ Label bias or observation bias problem
▪ will/NN to/TO fight/VB
▪ Linear-chain CRF (Lafferty et al. 2001) ▪ A bidirectional version of the MEMM (Toutanova et al. 2003) ▪ bi-LSTM
Neural sequence tagger
▪ Lample et al. 2016 ▪ Neural Architectures for NER
Multilingual POS tagging
▪ In morphologically-rich languages like Czech, Hungarian, Turkish
▪ a 250,000 word token corpus of Hungarian has more than twice as many word types as a similarly sized corpus of English ▪ a 10 million word token corpus of Turkish contains four times as many word types as a similarly sized English corpus
▪ ⇒ many UNKs ▪ more information is coded in morphology
Multilingual POS tagging
▪ In non-word-space languages like Chinese word segmentation is either applied before tagging or done jointly
▪ UNKs are difficult: the majority of unknown words are common nouns and verbs because of extensive compounding
▪ Universal POS tagset accounts for cross-linguistic differences
Named Entity Recognition
Named Entity tags
Ambiguity in NER
NER as Sequence Labeling
IOB tagging scheme
A feature-based algorithm for NER
A feature-based algorithm for NER
▪ gazetteers
▪ a list of place names providing millions of entries for locations with detailed geographical and political information ▪ binary indicator features
Evaluation of NER
▪ F-score ▪ segmentation is a confound
▪ e.g., American/B-ORG Airlines ▪ 2 errors: false positive for O and a false negative for I-ORG
HMMs in Automatic Speech Recognition
ssssssssppppeeeeeeetshshshshllllaeaeaebbbbb “speech lab”
HMMs in Automatic Speech Recognition
w1 w2
Words
s1 s2 s3 s4 s5 s6 s7
Sound types
a1 a2 a3 a4 a5 a6 a7
Acoustic
- bservations
Language model Acoustic model