Algorithms for NLP CS 11711, Fall 2019 Lecture 7: HMMs, POS tagging - PowerPoint PPT Presentation

Algorithms for NLP CS 11711, Fall 2019 Lecture 7: HMMs, POS tagging Yulia Tsvetkov 1

Readings for today’s lecture ▪ J&M SLP3 https://web.stanford.edu/~jurafsky/slp3/8.pdf Collins (2011) ▪ http://www.cs.columbia.edu/~mcollins/hmms-spring2013.pdf 2

Levels of linguistic knowledge Slide credit: Noah Smith 3

Sequence Labeling ▪ map a sequence of words to a sequence of labels ▪ Part-of-speech tagging (Church, 1988; Brants, 2000) ▪ Named entity recognition (Bikel et al., 1999) ▪ Text chunking and shallow parsing (Ramshaw and Marcus, 1995) ▪ Word alignment of parallel text (Vogel et al., 1996) ▪ Compression (Conroy and O’Leary, 2001) ▪ Acoustic models, discourse segmentation, etc. 4

Sequence labeling as classification 5

Generative sequence labeling: Hidden Markov Models

Markov Chain: weather the future is independent of the past given the present

Markov Chain

Markov Chain: words the future is independent of the past given the present

Hidden Markov Models ▪ In real world many events are not observable q 1 q 2 q n ▪ ... Speech recognition: we observe acoustic features but not the phones ▪ POS tagging: we observe words but o 1 o 2 o n not the POS tags

HMM From J&M

HMM example From J&M

Generative vs. Discriminative models ▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data ▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes From Bamman

Types of HMMs ▪ + many more From J&M

HMM in Language Technologies ▪ Part-of-speech tagging (Church, 1988; Brants, 2000) ▪ Named entity recognition (Bikel et al., 1999) and other information extraction tasks ▪ Text chunking and shallow parsing (Ramshaw and Marcus, 1995) ▪ Word alignment of parallel text (Vogel et al., 1996) ▪ Acoustic models in speech recognition (emissions are continuous) ▪ Discourse segmentation (labeling parts of a document)

HMM Parameters From J&M

HMMs:Questions From J&M

HMMs:Algorithms Forward Viterbi Forward–Backward; Baum–Welch From J&M

HMM tagging as decoding

HMM tagging as decoding How many possible choices?

Part of speech tagging example Slide credit: Noah Smith

Part of speech tagging example Greedy decoding? Slide credit: Noah Smith

Part of speech tagging example Greedy decoding? Consider: “the old dog the footsteps of the young” Slide credit: Noah Smith

The Viterbi Algorithm

The Viterbi Algorithm Complexity?

Beam search

Viterbi ▪ n -best decoding ▪ relationship to sequence alignment ▪

HMMs:Algorithms Forward Viterbi Forward–Backward; Baum–Welch From J&M

The Forward Algorithm sum instead of max

Parts of Speech

The closed classes

More Fine-Grained Classes

The Penn Treebank Part-of-Speech Tagset

The Universal POS tagset https://universaldependencies.org

POS tagging

POS tagging goal: resolve POS ambiguities

POS tagging

Most Frequent Class Baseline The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy of 92.34%.

Most Frequent Class Baseline The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy of 92.34%. ● 97% tag accuracy achievable by most algorithms (HMMs, MEMMs, neural networks, rule-based algorithms)

Algorithms for NLP CS 11711, Fall 2019 Lecture 7: HMMs, POS tagging - PowerPoint PPT Presentation

Algorithms for NLP CS 11711, Fall 2019 Lecture 7: HMMs, POS tagging Yulia Tsvetkov 1 Readings for todays lecture J&M SLP3 https://web.stanford.edu/~jurafsky/slp3/8.pdf Collins (2011)

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Algorithms for NLP 11-711, Fall 2019 Lecture 26: Computational Ethics Yulia Tsvetkov 1

Algorithms for NLP IITP, Fall 2019 Lecture 25: Computational Ethics Yulia Tsvetkov 1 Tsvetkov

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Tagging Problems, and Hidden Markov Models Michael Collins, Columbia University Overview The

Lecture 09: Part-of-Speech Tagging Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Lecture 10: Part-of-Speech Tagging Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

ADVANCED ALGORITHMS 2 LECTURE 5 ANNOUNCEMENTS Homework 1 due on Monday Sep 10, 11:59 PM

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

CSE 490 U Natural Language Processing Spring 2016 Feature Rich Models Yejin Choi - University

MathML 1 Mathematical Typesetting Mathematical typesetting differs in significant ways from

Caching 1 Key Point What are Cache lines Tags Index offset How do we find