Algorithms for NLP CS 11711, Fall 2019 Lecture 2: Language Models - PowerPoint PPT Presentation

Algorithms for NLP CS 11711, Fall 2019 Lecture 2: Language Models Yulia Tsvetkov 1

Announcements ▪ Homework 1 released on 9/3 ▪ you need to attend next lecture to understand it ▪ Chan will give an overview in the end of the next lecture ▪ + recitation on 9/6 2

1-slide review of probability Slide credit: Noah Smith 3

My legal name is Alexander Perchov. 10

My legal name is Alexander Perchov. But all of my many friends dub me Alex, because that is a more flaccid-to-utter version of my legal name. 11

My legal name is Alexander Perchov. But all of my many friends dub me Alex, because that is a more flaccid-to-utter version of my legal name. Mother dubs me Alexi-stop-spleening-me!, because I am always spleening her. 12

My legal name is Alexander Perchov. But all of my many friends dub me Alex, because that is a more flaccid-to-utter version of my legal name. Mother dubs me Alexi-stop-spleening-me!, because I am always spleening her. If you want to know why I am always spleening her, it is because I am always elsewhere with friends, and disseminating so much currency, and performing so many things that can spleen a mother. 13

My legal name is Alexander Perchov. But all of my many friends dub me Alex, because that is a more flaccid-to-utter version of my legal name. Mother dubs me Alexi-stop-spleening-me!, because I am always spleening her. If you want to know why I am always spleening her, it is because I am always elsewhere with friends, and disseminating so much currency, and performing so many things that can spleen a mother. Father used to dub me Shapka, for the fur hat I would don even in the summer month. 14

My legal name is Alexander Perchov. But all of my many friends dub me Alex, because that is a more flaccid-to-utter version of my legal name. Mother dubs me Alexi-stop-spleening-me!, because I am always spleening her. If you want to know why I am always spleening her, it is because I am always elsewhere with friends, and disseminating so much currency, and performing so many things that can spleen a mother. Father used to dub me Shapka, for the fur hat I would don even in the summer month. He ceased dubbing me that because I ordered him to cease dubbing me that. It sounded boyish to me, and I have always thought of myself as very potent and generative. 15

Language models play the role of ... ▪ a judge of grammaticality ▪ a judge of semantic plausibility ▪ an enforcer of stylistic consistency ▪ a repository of knowledge (?) 16

The Language Modeling problem ▪ Assign a probability to every sentence (or any string of words) ▪ finite vocabulary (e.g. words or characters) { the, a, telescope, … } ▪ infinite set of sequences ▪ a telescope STOP ▪ a STOP ▪ the the the STOP ▪ I saw a woman with a telescope STOP ▪ STOP ▪ ... 17

The Language Modeling problem ▪ Assign a probability to every sentence (or any string of words) ▪ finite vocabulary (e.g. words or characters) ▪ infinite set of sequences 18

p( disseminating so much currency STOP) = 10 -15 p( spending a lot of money STOP) = 10 -9 19

The Language Modeling problem ▪ Assign a probability to every sentence (or any string of words) ▪ finite vocabulary (e.g. words or characters) ▪ infinite set of sequences Objections? 20

Motivation ▪ Machinetranslation ▪ p( strong winds) > p( large winds) ▪ SpellCorrection ▪ The office is about fifteen minuets from my house ▪ p(about fifteen minutes from) > p(about fifteen minuets from) ▪ Speech Recognition ▪ p(I saw a van) >> p(eyes awe of an) ▪ Summarization, question-answering, handwriting recognition, OCR, etc. 21

Motivation ▪ Speech recognition: we want to predict a sentence given acoustics s p ee ch l a b 22

Motivation ▪ Speech recognition: we want to predict a sentence given acoustics the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep into english -14739 the station 's signs are in deep in english -14740 the station signs are in deep in the english -14741 the station signs are indeed in english -14757 the station 's signs are indeed in english -14760 the station signs are indians in english -14790 the station signs are indian in english -14799 the stations signs are indians in english -14807 the stations signs are indians and english -14815 23

Motivation: the Noisy-Channel Model W A noisy channel source 24

Motivation: the Noisy-Channel Model W A noisy channel source observed best decoder w a 25

Motivation: the Noisy-Channel Model W A noisy channel source observed best decoder w a 26

Motivation: the Noisy-Channel Model W A noisy channel source observed best decoder w a ▪ We want to predict a sentence given acoustics: 27

Motivation: the Noisy-Channel Model ▪ We want to predict a sentence given acoustics: ▪ The noisy-channel approach: 28

Motivation: the Noisy-Channel Model W A noisy channel source observed best decoder w a ▪ The noisy-channel approach: channel model source model 29

Motivation: the Noisy-Channel Model W A noisy channel source observed best decoder w a ▪ The noisy-channel approach: Likelihood Prior Language model: Distributions over sequences Acoustic model (HMMs) of words (sentences) 30

Noisy channel example: Automatic Speech Recognition Language Model Acoustic Model source channel w a P(w) P(a|w) observed best decoder w a argmax P(w|a) = argmax P(a|w)P(w) w w 31

Noisy channel example: Automatic Speech Recognition Language Model Acoustic Model source channel w a P(w) P(a|w) the station signs are in deep in english -14732 the stations signs are in deep in english -14735 observed the station signs are in deep into english -14739 best decoder the station 's signs are in deep in english -14740 w a the station signs are in deep in the english -14741 the station 's signs are in deep in english the station signs are indeed in english -14757 the station 's signs are indeed in english -14760 the station signs are indians in english -14790 the station signs are indian in english -14799 the stations signs are indians in english -14807 the stations signs are indians and english -14815 32

Noisy channel example: Machine Translation Language Model Translation Model sent transmission: recovered transmission: English French channel source e f P(e) P(f|e) observed best decoder e f recovered message: English’ argmax P(e|f) = argmax P(f|e)P(e) e e 33

Noisy Channel Examples ▪ speech recognition ▪ machine translation ▪ optical character recognition ▪ spelling and grammar correction ▪ handwriting recognition ▪ document summarization ▪ dialog generation ▪ linguistic decipherment ▪ etc. 35

Plan ▪ what is language modeling ▪ motivation ▪ how to build an n -gram LMs ▪ how to estimate parameters from training data ( n -gram probabilities) ▪ how to evaluate (perplexity) ▪ how to select vocabulary, what to do with OOVs (smoothing) 36

The Language Modeling problem ▪ Assign a probability to every sentence (or any string of words) ▪ finite vocabulary (e.g. words or characters) ▪ infinite set of sequences 37

A trivial model ▪ Assume we have n training sentences ▪ Let x 1 , x 2 , …, x n be a sentence, and c( x 1 , x 2 , …, x n ) be the number of times it appeared in the training data. ▪ Define a language model: 38

A trivial model ▪ Assume we have n training sentences ▪ Let x 1 , x 2 , …, x n be a sentence, and c( x 1 , x 2 , …, x n ) be the number of times it appeared in the training data. ▪ Define a language model: ▪ No generalization! 39

Markov processes ▪ Markov processes: ▪ Given a sequence of n random variables: ▪ We want a sequence probability model 40

Markov processes ▪ Markov processes: ▪ Given a sequence of n random variables: ▪ We want a sequence probability model There are |V| n possible sequences ▪ 41

First-order Markov process Chain rule 42

First-order Markov process Chain rule Markov assumption 43

Second-order Markov process: ▪ Relax independence assumption: 44

Second-order Markov process: ▪ Relax independence assumption: ▪ Simplify notation: 45

Detail: variable length ▪ We want probability distribution over sequences of any length 46

Detail: variable length ▪ Probability distribution over sequences of any length ▪ Define always X n =STOP, where STOP is a special symbol 47

Algorithms for NLP CS 11711, Fall 2019 Lecture 2: Language Models - PowerPoint PPT Presentation

Algorithms for NLP CS 11711, Fall 2019 Lecture 2: Language Models Yulia Tsvetkov 1 Announcements Homework 1 released on 9/3 you need to attend next lecture to understand it Chan will give an overview in the end of the next

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Algorithms for NLP 11-711, Fall 2019 Lecture 26: Computational Ethics Yulia Tsvetkov 1

Algorithms for NLP IITP, Fall 2019 Lecture 25: Computational Ethics Yulia Tsvetkov 1 Tsvetkov

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Hello Friends!! Mrs. Wren (with Bailey) Mrs. Merrill (with Kobe and Zoe) Mrs. McGarry Mrs.

CSE 158 Lecture 11 Web Mining and Recommender Systems Triadic closure; strong & weak ties

Fall General Meeting 2020 Meeting - Introduction of CCSS - Director Reports Agenda - Budget

Competitive Freshness Algorithms for Wait free Objects Wait-free Objects Peter Damaschke, Phuong

Childrens Friendship Why are they so complicated and changeable? Tracey Chitty & Chris

foldr, foldr, foldr2 and friends More higher order list functions Theory of Programming

Friends & Strangers Claim: there is a set of 3 mutual friends or 3 mutual strangers Albert

Big Lil Lillian Edelmann Big Lil Problems? Problems? Ask Lil ! Mediamobile service

Sambuz

Useful Links

Newsletter

Mail Us

Algorithms for NLP CS 11711, Fall 2019 Lecture 2: Language Models - PowerPoint PPT Presentation

Algorithms for NLP CS 11711, Fall 2019 Lecture 2: Language Models Yulia Tsvetkov 1 Announcements Homework 1 released on 9/3 you need to attend next lecture to understand it Chan will give an overview in the end of the next

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Algorithms for NLP 11-711, Fall 2019 Lecture 26: Computational Ethics Yulia Tsvetkov 1

Algorithms for NLP IITP, Fall 2019 Lecture 25: Computational Ethics Yulia Tsvetkov 1 Tsvetkov

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Hello Friends!! Mrs. Wren (with Bailey) Mrs. Merrill (with Kobe and Zoe) Mrs. McGarry Mrs.

CSE 158 Lecture 11 Web Mining and Recommender Systems Triadic closure; strong &amp; weak ties

Fall General Meeting 2020 Meeting - Introduction of CCSS - Director Reports Agenda - Budget

Competitive Freshness Algorithms for Wait free Objects Wait-free Objects Peter Damaschke, Phuong

Childrens Friendship Why are they so complicated and changeable? Tracey Chitty &amp; Chris

foldr, foldr, foldr2 and friends More higher order list functions Theory of Programming

Friends &amp; Strangers Claim: there is a set of 3 mutual friends or 3 mutual strangers Albert

Big Lil Lillian Edelmann Big Lil Problems? Problems? Ask Lil ! Mediamobile service

Sambuz

Useful Links

Newsletter

Mail Us

CSE 158 Lecture 11 Web Mining and Recommender Systems Triadic closure; strong & weak ties

Childrens Friendship Why are they so complicated and changeable? Tracey Chitty & Chris

Friends & Strangers Claim: there is a set of 3 mutual friends or 3 mutual strangers Albert