parts of speech more fine grained classes more fine
play

Parts of Speech More Fine-Grained Classes More - PowerPoint PPT Presentation

Parts of Speech More Fine-Grained Classes More Fine-Grained Classes Actually , I ran home extremely quickly yesterday The closed classes Example of POS tagging The Penn Treebank Part-of-Speech


  1. ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪

  2. Parts of Speech

  3. More Fine-Grained Classes

  4. More Fine-Grained Classes Actually , I ran home extremely quickly yesterday

  5. The closed classes

  6. Example of POS tagging

  7. The Penn Treebank Part-of-Speech Tagset

  8. The Universal POS tagset https://universaldependencies.org

  9. POS tagging goal: resolve POS ambiguities

  10. POS tagging

  11. Most Frequent Class Baseline The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy of 92.34%.

  12. Most Frequent Class Baseline The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy of 92.34%. ● 97% tag accuracy achievable by most algorithms (HMMs, MEMMs, neural networks, rule-based algorithms)

  13. Why POS tagging ▪ Text-to-speech ▪ record, lead, protest ▪ Lemmatization ▪ saw/V → see , saw/N → saw ▪ Preprocessing for harder disambiguation problems ▪ syntactic parsing ▪ semantic parsing

  14. Generative sequence labeling: Hidden Markov Models

  15. Hidden Markov Models ▪ In real world many events are not observable q 1 q 2 q n ▪ ... Speech recognition: we observe acoustic features but not the phones ▪ POS tagging: we observe words but o 1 o 2 o n not the POS tags

  16. HMM From J&M

  17. HMM example From J&M

  18. HMMs:Algorithms Forward Viterbi Forward–Backward; Baum–Welch From J&M

  19. HMM tagging as decoding

  20. HMM tagging as decoding How many possible choices?

  21. Part of speech tagging example Slide credit: Noah Smith

  22. The Viterbi Algorithm

  23. The Viterbi Algorithm

  24. The Viterbi Algorithm

  25. The Viterbi Algorithm

  26. Beam search

  27. HMMs:Algorithms Forward Viterbi Forward–Backward; Baum–Welch From J&M

  28. The Forward Algorithm sum instead of max

  29. Viterbi ▪ n -best decoding ▪ relationship to sequence alignment ▪

  30. Extending the HMM Algorithm to Trigrams

  31. Unknown Words ▪ Word shape ▪ lower case → x ▪ upper case → X ▪ numbers → d ▪ punctuation → . ▪ I.M.F → X.X.X ▪ DC10-30 → XXdd-dd ▪ Word shape + consecutive character types are removed ▪ DC10-30 → Xd-d ▪ Prefixes & suffixes ▪ -s, -ed, ing ▪

  32. Brants (2000) ▪ a trigram HMM ▪ handling unknown words ▪ 96.7% on the Penn Treebank

  33. Generative vs. Discriminative models ▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data ▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes From Bamman

  34. Maximum Entropy Markov Models (MEMM) ▪ HMM ▪ MEMM

  35. Features in a MEMM

  36. Features in a MEMM ▪ well-dressed

  37. Decoding and Training MEMMs

  38. Decoding MEMMs greedy approach: doesn’t use evidence from future decisions

  39. Decoding MEMMs Viterbi ▪ filling the chart with ▪ HMM ▪ MEMM

  40. Bidirectionality ▪ Label bias or observation bias problem ▪ will/NN to/TO fight/VB ▪ Linear-chain CRF (Lafferty et al. 2001) ▪ A bidirectional version of the MEMM (Toutanova et al. 2003) ▪ bi-LSTM

  41. Neural sequence tagger ▪ Lample et al. 2016 ▪ Neural Architectures for NER

  42. Multilingual POS tagging ▪ In morphologically-rich languages like Czech, Hungarian, Turkish ▪ a 250,000 word token corpus of Hungarian has more than twice as many word types as a similarly sized corpus of English ▪ a 10 million word token corpus of Turkish contains four times as many word types as a similarly sized English corpus ▪ ⇒ many UNKs ▪ more information is coded in morphology

  43. Multilingual POS tagging ▪ In non-word-space languages like Chinese word segmentation is either applied before tagging or done jointly ▪ UNKs are difficult: the majority of unknown words are common nouns and verbs because of extensive compounding ▪ Universal POS tagset accounts for cross-linguistic differences

  44. Named Entity Recognition

  45. Named Entity tags

  46. Ambiguity in NER

  47. NER as Sequence Labeling IOB tagging scheme

  48. A feature-based algorithm for NER

  49. A feature-based algorithm for NER ▪ gazetteers ▪ a list of place names providing millions of entries for locations with detailed geographical and political information ▪ binary indicator features

  50. Evaluation of NER ▪ F-score ▪ segmentation is a confound ▪ e.g., American/B-ORG Airlines ▪ 2 errors: false positive for O and a false negative for I-ORG

  51. HMMs in Automatic Speech Recognition “speech lab” ssssssssppppeeeeeeetshshshshllllaeaeaebbbbb

  52. HMMs in Automatic Speech Recognition Words w 1 w 2 Language model s 1 s 2 s 3 s 4 s 5 s 6 s 7 Sound types Acoustic a 1 a 2 a 3 a 4 a 5 a 6 a 7 model Acoustic observations

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend