Language Modeling CSE392 - Spring 2019 Special Topic in CS Task - PowerPoint PPT Presentation

Language Modeling CSE392 - Spring 2019 Special Topic in CS

Task ● Probabilistic Modeling how? ● Language Modeling ○ ML: Logistic Regression (auto-complete) ○ Probability Theory

Language Modeling -- assigning a probability to sequences of words. Version 1: Compute P(w1, w2, w3, w4, w5) = P(W) :probability of a sequence of words

Language Modeling -- assigning a probability to sequences of words. Version 1: Compute P(w1, w2, w3, w4, w5) = P(W) :probability of a sequence of words Version 2: Compute P(w5| w1, w2, w3, w4) = P(w n | w 1 , w 2 , …, w n-1 ) :probability of a next word given history

Language Modeling Version 1: Compute P(w1, w2, w3, w4, w5) = P(W) :probability of a sequence of words P(He ate the cake with the fork) = ? Version 2: Compute P(w5| w1, w2, w3, w4) = P(w n | w 1 , w 2 , …, w n-1 ) :probability of a next word given history P(fork | He ate the cake with the) = ?

Language Modeling Version 1: Compute P(w1, w2, w3, w4, w5) = P(W) :probability of a sequence of words Applications: P(He ate the cake with the fork) = ? ● Auto-complete: What word is next? ● Machine Translation: Which translation is most likely? ● Spell Correction: Which word is most likely given Version 2: Compute P(w5| w1, w2, w3, w4) error? = P(w n | w 1 , w 2 , …, w n-1 ) ● Speech Recognition: What did they just say? :probability of a next word given history “eyes aw of an” P(fork | He ate the cake with the) = ? (example from Jurafsky, 2017)

Language Modeling Version 1: Compute P(w1, w2, w3, w4, w5) = P(W) :probability of a sequence of words P(He ate the cake with the fork) = ? Version 2: Compute P(w5| w1, w2, w3, w4) = P(w n | w 1 , w 2 , …, w n-1 ) :probability of a next word given history P(fork | He ate the cake with the) = ?

Simple Solution Version 1: Compute P(w1, w2, w3, w4, w5) = P(W) :probability of a sequence of words P(He ate the cake with the fork) = _count(He ate the cake with the fork) _ count( * * * * * * *)

Simple Solution: The Maximum Likelihood Estimate Version 1: Compute P(w1, w2, w3, w4, w5) = P(W) :probability of a sequence of words P(He ate the cake with the fork) = _count(He ate the cake with the fork) _ total number of count( * * * * * * *) observed 7grams

Simple Solution: The Maximum Likelihood Estimate Version 1: Compute P(w1, w2, w3, w4, w5) = P(W) :probability of a sequence of words P(He ate the cake with the fork) = _count(He ate the cake with the fork) _ count( * * * * * * *) P(fork | He ate the cake with the) = _count(He ate the cake with the fork) _ count(He at the cake with the)

Simple Solution: The Maximum Likelihood Estimate Problem: even the Web isn’t large enough to enable Version 1: Compute P(w1, w2, w3, w4, w5) = P(W) good estimates of most phrases. :probability of a sequence of words P(He ate the cake with the fork) = _count(He ate the cake with the fork) _ count( * * * * * * *) P(fork | He ate the cake with the) = _count(He ate the cake with the fork) _ count(He at the cake with the)

Problem: even the Web isn’t large enough to enable good estimates of most phrases. Solution: Estimate from shorter sequences, use more sophisticated probability theory.

Problem: even the Web isn’t large enough to enable good estimates of most phrases. Solution: Estimate from shorter sequences, use more sophisticated probability theory. P(B|A) = P(B, A) / P(A) ⇔ P(A)P(B|A) = P(B,A) = P(A,B) Example from (Jurafsky, 2017)

Problem: even the Web isn’t large enough to enable good estimates of most phrases. Solution: Estimate from shorter sequences, use more sophisticated probability theory. P(B|A) = P(B, A) / P(A) ⇔ P(A)P(B|A) = P(B,A) = P(A,B) P(A, B, C) = P(A)P(B|A)P(C| A, B) Example from (Jurafsky, 2017)

Problem: even the Web isn’t large enough to enable good estimates of most phrases. Solution: Estimate from shorter sequences, use more sophisticated probability theory. P(B|A) = P(B, A) / P(A) ⇔ P(A)P(B|A) = P(B,A) = P(A,B) P(A, B, C) = P(A)P(B|A)P(C| A, B) The Chain Rule: P(X1, X2…, Xn) = P(X1)P(X2|X1)P(X3|X1, X2)...P(Xn|X1, ..., Xn-1) Example from (Jurafsky, 2017)

Problem: even the Web isn’t large enough to enable good estimates of most phrases. Solution: Estimate from shorter sequences, use more sophisticated probability theory. P(B|A) = P(B, A) / P(A) ⇔ P(A)P(B|A) = P(B,A) = P(A,B) P(A, B, C) = P(A)P(B|A)P(C| A, B) The Chain Rule: P(X1, X2…, Xn) = P(X1)P(X2|X1)P(X3|X1, X2)...P(Xn|X1, ..., Xn-1)

Markov Assumption: Problem: even the Web isn’t large enough to enable good estimates of most phrases. Solution: Estimate from shorter sequences, use more sophisticated probability theory. P(B|A) = P(B, A) / P(A) ⇔ P(A)P(B|A) = P(B,A) = P(A,B) P(A, B, C) = P(A)P(B|A)P(C| A, B) The Chain Rule: P(X1, X2…, Xn) = P(X1)P(X2|X1)P(X3|X1, X2)...P(Xn|X1, ..., Xn-1)

Markov Assumption: Problem: even the Web isn’t large enough to enable good estimates of most phrases. P(Xn| X1…, Xn-1) ≈ P(Xn| Xn-k, …, Xn-1) where k < n What about Logistic Regression? Y = next word Solution: Estimate from shorter sequences, use more P(Y|X) = P(Xn | X1, X2, X3, ...) sophisticated probability theory. Not a terrible option, but X1 through Xn-1 would be modeled as independent dimensions. Let’s P(B|A) = P(B, A) / P(A) ⇔ P(A)P(B|A) = P(B,A) = P(A,B) revisit later. P(A, B, C) = P(A)P(B|A)P(C| A, B) The Chain Rule: P(X1, X2…, Xn) = P(X1)P(X2|X1)P(X3|X1, X2)...P(Xn|X1, ..., Xn-1)

Markov Assumption: Problem: even the Web isn’t large enough to enable good estimates of most phrases. P(Xn| X1…, Xn-1) ≈ P(Xn| Xn-k, …, Xn-1) where k < n Bigram Model: k = 1; Example generated sentence: outside, new, car, parking, lot, of, the, agreement, reached P(X1 = “outside”, X2=”new”, X3 = “car”, ....) ≈ P(X2=”new”|X1 = “outside) * P(X3=”car” | X2=”new”) * … Example from (Jurafsky, 2017)

Language Modeling Building a model (or system / API) that can answer the following: How common is this sequence? Language a sequence of natural language Model What is the next word in the sequence?

Language Modeling Building a model (or system / API) that can answer the following: How common is this sequence? Language a sequence of natural language Model What is the next word in the sequence? How to build?

Language Modeling Building a model (or system / API) that can answer the following: How common is this sequence? Language a sequence of natural language Model What is the next word in the sequence? How to build? training Training Corpus (fit, learn)

Language Modeling Building a model (or system / API) that can answer the following: How common is this sequence? Language a sequence of natural language Model What is the next word in the sequence? training Training Corpus (fit, learn)

first word \ second word Bigram Counts Language Modeling Building a model (or system / API) that can answer the following: How common is this sequence? Language a sequence of natural language Model What is the next word in the sequence? Example from (Jurafsky, 2017) training Training Corpus (fit, learn)

first word \ second word Bigram Counts Language Modeling Building a model (or system / API) that can answer the following: How common is this sequence? Language a sequence of natural language Model What is the next word in the sequence? Example from (Jurafsky, 2017) training Bigram model: Training Corpus (fit, learn) Need to estimate: P(Xi | Xi-1) = count(Xi-1 Xi) / count(Xi-1)

first word( Xi-1 ) \ second word ( Xi ) P(Xi | Xi-1) Language Modeling Building a model (or system / API) that can answer the following: How common is this sequence? Language a sequence of natural language Model What is the next word in the sequence? Example from (Jurafsky, 2017) training Bigram model: Training Corpus (fit, learn) Need to estimate: P(Xi | Xi-1) = count(Xi-1 Xi) / count(Xi)

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task - PowerPoint PPT Presentation

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling how? Language Modeling ML: Logistic Regression (auto-complete) Probability Theory Language Modeling -- assigning a probability to

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

1 min 1-1 P ( W ) , W = w ; w ; : : : ; w 1 2 n Basic Language Modeling Estimate

Language Modeling Michael Collins, Columbia University Overview The language modeling problem

Language Modeling Hsin-min Wang References: 1. X. Huang et. al., Spoken Language Processing,

The Language Modeling Problem We have some vocabulary, say V = { the, a, man, telescope,

Language and Stats 11-(7/6)61 Introduction Objectives Logistics Statistical Language Modeling

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

Count-based Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner

language modeling CS 685, Fall 2020 Introduction to Natural Language Processing

UML: Unified Modeling Language 1 Modeling Describing a system at a high level of

Natural Language Processing with Deep Learning Language Modeling with Recurrent Neural Networks

UML: Uniform Modeling Language UML is a standardized design language for object-oriented

NEST Modeling Language: A modeling language for spiking neuron and synapse models for NEST

Empirical Methods in Natural Language Processing Lecture 4 Language Modeling (II): Smoothing and

Empirical Methods in Natural Language Processing Lecture 4 Language Modeling (II): Smoothing and

Use Case Modeling Techniques From Universal Modeling Language (UML) R. Kuehl/J. Scott Hawker p.

and-Language Research Zhe Gan, Licheng Yu, Yu Cheng, Luowei Zhou, Linjie Li, Yen-Chun Chen,

Assignments Checkpoint 5 Advanced Topics in Global Illumination Due today

user support user support Issues different types of support at different tim es im

No Help Desk for Light Switches The Unbearable Lightness of Being Everywhere Joe Abley @ableyjoe

Hermite Curves CS 418 Interactive Computer Graphics John C. Hart Linear Interpolation

Interpolation problems on cycloidal spaces J. M. Carnicer, E. Mainar and J. M. Pe na

Topic 6: 3D Curves Intro to curve interpolation & approximation Polynomial

BEZIER CURVES 1 OUTLINE Introduce types of curves and surfaces Introduce the types of

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task - PowerPoint PPT Presentation

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling how? Language Modeling ML: Logistic Regression (auto-complete) Probability Theory Language Modeling -- assigning a probability to

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

1 min 1-1 P ( W ) , W = w ; w ; : : : ; w 1 2 n Basic Language Modeling Estimate

Language Modeling Michael Collins, Columbia University Overview The language modeling problem

Language Modeling Hsin-min Wang References: 1. X. Huang et. al., Spoken Language Processing,

The Language Modeling Problem We have some vocabulary, say V = { the, a, man, telescope,

Language and Stats 11-(7/6)61 Introduction Objectives Logistics Statistical Language Modeling

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

Count-based Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner

language modeling CS 685, Fall 2020 Introduction to Natural Language Processing

UML: Unified Modeling Language 1 Modeling Describing a system at a high level of

Natural Language Processing with Deep Learning Language Modeling with Recurrent Neural Networks

UML: Uniform Modeling Language UML is a standardized design language for object-oriented

NEST Modeling Language: A modeling language for spiking neuron and synapse models for NEST

Empirical Methods in Natural Language Processing Lecture 4 Language Modeling (II): Smoothing and

Empirical Methods in Natural Language Processing Lecture 4 Language Modeling (II): Smoothing and

Use Case Modeling Techniques From Universal Modeling Language (UML) R. Kuehl/J. Scott Hawker p.

and-Language Research Zhe Gan, Licheng Yu, Yu Cheng, Luowei Zhou, Linjie Li, Yen-Chun Chen,

Assignments Checkpoint 5 Advanced Topics in Global Illumination Due today

user support user support Issues different types of support at different tim es im

No Help Desk for Light Switches The Unbearable Lightness of Being Everywhere Joe Abley @ableyjoe

Hermite Curves CS 418 Interactive Computer Graphics John C. Hart Linear Interpolation

Interpolation problems on cycloidal spaces J. M. Carnicer, E. Mainar and J. M. Pe na

Topic 6: 3D Curves Intro to curve interpolation &amp; approximation Polynomial

BEZIER CURVES 1 OUTLINE Introduce types of curves and surfaces Introduce the types of

Topic 6: 3D Curves Intro to curve interpolation & approximation Polynomial