Language Modeling CS 6956: Deep Learning for NLP Overview What is - PowerPoint PPT Presentation

Language Modeling CS 6956: Deep Learning for NLP

Overview • What is a language model? • How do we evaluate language models? • Traditional language models • Feedforward neural networks for language modeling • Recurrent neural networks for language modeling 1

Language models What is the probability of a sentence? – Grammatically incorrect or rare sentences should be more improbable – Or equivalently, what is the probability of a word following a sequence of words? “The cat chased a mouse” vs “The cat chased a turnip” Can be framed as a sequence modeling task Two classes of models Count-based: Markov assumptions with smoothing – Neural models – 3

Language models What is the probability of a sentence? – Grammatically incorrect or rare sentences should be more improbable – Or equivalently, what is the probability of a word following a sequence of words? “The cat chased a mouse” vs “The cat chased a turnip” Can be framed as a sequence modeling task Two classes of models Count-based: Markov assumptions with smoothing – Neural models – 4

Language models What is the probability of a sentence? – Grammatically incorrect or rare sentences should be more improbable – Or equivalently, what is the probability of a word following a sequence of words? “The cat chased a mouse” vs “The cat chased a turnip” Can be framed as a sequence modeling task Two classes of models Count-based: Markov assumptions with smoothing – Neural models – We have seen this difference before. In this lecture, we will look at some details 5

Evaluating language models Extrinsic evaluation • A good language model should help with an end task such as machine translation – If we have a MT system that uses language models to produce outputs… – …a better language model can produce better outputs 7

Evaluating language models Extrinsic evaluation • A good language model should help with an end task such as machine translation – If we have a MT system that uses language models to produce outputs… – …a better language model can produce better outputs • To evaluate a language model, is a downstream task needed? – Can be slow, depends on the quality of the downstream system 8

Evaluating language models Extrinsic evaluation • A good language model should help with an end task such as machine translation – If we have a MT system that uses language models to produce outputs… – …a better language model can produce better outputs • To evaluate a language model, is a downstream task needed? – Can be slow, depends on the quality of the downstream system Can we define an intrinsic evaluation? 9

What is a good language model? • Should prefer good sentences to bad ones – It should higher probabilities to valid/grammatical/frequent sentences – It should assign lower probabilities to invalid/ungrammatical/rare sentences • Can we construct an evaluation metric that directly measures this? 10

What is a good language model? • Should prefer good sentences to bad ones – It should higher probabilities to valid/grammatical/frequent sentences – It should assign lower probabilities to invalid/ungrammatical/rare sentences • Can we construct an evaluation metric that directly measures this? Answer: Perplexity 11

Perplexity A good language model should assign high probability to sentences that occur in the real world – Need a metric that captures this intuition, but normalizes for length of sentences 12

Perplexity A good language model should assign high probability to sentences that occur in the real world – Need a metric that captures this intuition, but normalizes for length of sentences Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & 13

Perplexity A good language model should assign high probability to sentences that occur in the real world – Need a metric that captures this intuition, but normalizes for length of sentences Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & Lower perplexity corresponds to higher probability 14

Example: Uniformly likely words Suppose we have n words in a sentence, and they are all independent and uniform! – Would be a strange language…. ( 4 Perplexity = 𝑄 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & 5 ( 4 & 5 " = = 𝑜 & 15

� Perplexity of history based models Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & For a history based model, we have 𝑄 𝑥 " ⋯ 𝑥 & = 7 𝑄 𝑥 8 𝑥 ":8(" ) 8 16

� Perplexity of history based models Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 7 𝑄 𝑥 8 𝑥 ":8(" ) 8 17

� � Perplexity of history based models Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 7 𝑄 𝑥 8 𝑥 ":8(" ) 8 M4 5 ∏ J K L K 4:LM4 ) 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 2 EFG H L 18

� � � Perplexity of history based models Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 7 𝑄 𝑥 8 𝑥 ":8(" ) 8 M4 5 ∏ J K L K 4:LM4 ) 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 2 EFG H L 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 2 (" & ∑ EFG H J 𝑥 8 𝑥 ":8(" L 19

� � � Perplexity of history based models Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 7 𝑄 𝑥 8 𝑥 ":8(" ) 8 M4 5 ∏ J K L K 4:LM4 ) 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 2 EFG H L 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 2 (" & ∑ EFG H J 𝑥 8 𝑥 ":8(" L Average number of bits needed to encode the sentence 20

Evaluating language models Several benchmark sets available – Penn Treebank Wall Street Journal corpus • Standard preprocessing by Mikolov • Vocabulary size: 10K words • Training size: 890K tokens – Billion Word Benchmark • English news text [Chelba, et al 2013] • Vocabulary size: ~793K • Training size: ~800M tokens Standard methodology of training on the training set and evaluating on the test set Some papers also continue training on the evaluation set because no – labels needed 21

Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words 23

� Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words The (k+1) th order Markov assumption 𝑄 𝑥 " 𝑥 # ⋯ 𝑥 & ≈ 7 𝑄(𝑥 8Q" ∣ 𝑥 8(S:8 ) 8 24

� Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words The (k+1) th order Markov assumption 𝑄 𝑥 " 𝑥 # ⋯ 𝑥 & ≈ 7 𝑄(𝑥 8Q" ∣ 𝑥 8(S:8 ) 8 Need to get this from data 25

� Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words The (k+1) th order Markov assumption 𝑄 𝑥 " 𝑥 # ⋯ 𝑥 & ≈ 7 𝑄(𝑥 8Q" ∣ 𝑥 8(S:8 ) 8 TUV&W K LMX:L ,K LZ4 𝑄 𝑥 8Q" 𝑥 8(S:8 = TUV&W(K LMX:L ) 26

� Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words The (k+1) th order Markov assumption 𝑄 𝑥 " 𝑥 # ⋯ 𝑥 & ≈ 7 𝑄(𝑥 8Q" ∣ 𝑥 8(S:8 ) 8 TUV&W K LMX:L ,K LZ4 𝑄 𝑥 8Q" 𝑥 8(S:8 = TUV&W(K LMX:L ) The problem: Zeros in the counts. 27

� Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words The (k+1) th order Markov assumption 𝑄 𝑥 " 𝑥 # ⋯ 𝑥 & ≈ 7 𝑄(𝑥 8Q" ∣ 𝑥 8(S:8 ) 8 TUV&W K LMX:L ,K LZ4 𝑄 𝑥 8Q" 𝑥 8(S:8 = TUV&W(K LMX:L ) The problem: Zeros in the counts. The solution: Smoothing 28

Language Modeling CS 6956: Deep Learning for NLP Overview What is - PowerPoint PPT Presentation

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How do we evaluate language models? Traditional language models Feedforward neural networks for language modeling Recurrent neural networks for

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Language Modeling Michael Collins, Columbia University Overview The language modeling problem

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Count-based Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner

NEST Modeling Language: A modeling language for spiking neuron and synapse models for NEST

Topics Why E Field Modeling What is E Field Modeling Case Studies Questions 2 Why

Outline 1 The topic 2 Decision support systems 3 Modeling 3.3 Advanced modeling

Verilog HDL:Digital Design and Modeling Chapter 5 Gate-Level Modeling Chapter 5 Gate-Level

Developmental Developmental Disorders affecting Disorders affecting language language

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Draft Static Network Reliability Estimation Under the Marshall-Olkin Copula Pierre LEcuyer

02. The Confrontation 03. The Redeemed QUESTIONS FOR DISCUSSION & DISCOVERY MAY 31, 2015

PTA Meeting November 20 th , 2019 psis78pta.org FB: @PSIS78PTA I: @psis78q_pta Agenda 1. Call

Development Fly Ash Utilization in Turkey and Contribution of ISKEN to the Market Dr. Sirri

Milledgeville, GA November 12, 2014 Special Field Order 120 Section 4: The army will forage

flow of control, negation, cut, 2 nd order programming, tail recursion Yves Lesprance Adapted

Rare event analysis in technological catastrophes G. Rubino Paris, ICT-DM19 Dec. 19, 2019

Welcome to the course! Anurag Gupta and Abhishek Trehan People Analytics Practitioners DataCamp

Language Modeling CS 6956: Deep Learning for NLP Overview What is - PowerPoint PPT Presentation

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How do we evaluate language models? Traditional language models Feedforward neural networks for language modeling Recurrent neural networks for

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Language Modeling Michael Collins, Columbia University Overview The language modeling problem

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Count-based Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner

NEST Modeling Language: A modeling language for spiking neuron and synapse models for NEST

Topics Why E Field Modeling What is E Field Modeling Case Studies Questions 2 Why

Outline 1 The topic 2 Decision support systems 3 Modeling 3.3 Advanced modeling

Verilog HDL:Digital Design and Modeling Chapter 5 Gate-Level Modeling Chapter 5 Gate-Level

Developmental Developmental Disorders affecting Disorders affecting language language

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Draft Static Network Reliability Estimation Under the Marshall-Olkin Copula Pierre LEcuyer

02. The Confrontation 03. The Redeemed QUESTIONS FOR DISCUSSION &amp; DISCOVERY MAY 31, 2015

PTA Meeting November 20 th , 2019 psis78pta.org FB: @PSIS78PTA I: @psis78q_pta Agenda 1. Call

Development Fly Ash Utilization in Turkey and Contribution of ISKEN to the Market Dr. Sirri

Milledgeville, GA November 12, 2014 Special Field Order 120 Section 4: The army will forage

flow of control, negation, cut, 2 nd order programming, tail recursion Yves Lesprance Adapted

Rare event analysis in technological catastrophes G. Rubino Paris, ICT-DM19 Dec. 19, 2019

Welcome to the course! Anurag Gupta and Abhishek Trehan People Analytics Practitioners DataCamp

02. The Confrontation 03. The Redeemed QUESTIONS FOR DISCUSSION & DISCOVERY MAY 31, 2015