Language Modeling Diyi Yang Some slides borrowed from Yulia - PowerPoint PPT Presentation

CS 4650/7650: Natural Language Processing Language Modeling Diyi Yang Some slides borrowed from Yulia Tsvetkov at CMU and Kai-Wei Chang at UCLA 1

Logistics ¡ HW 1 Due ¡ HW 2 Out: Feb 3 rd , 2020, 3:00pm 2

Piazza & Office Hours ¡ ~ 11 mins response time 3

Review ¡ L2: Text classification ¡ L3: Neural network for text classification 4

This Lecture ¡ Language Models ¡ What are N-gram models ¡ How to use probabilities 5

This Lecture ¡ What is the probability of “ I like Georgia Tech at Atlanta ”? ¡ What is the probability of “like I Atlanta at Georgia Tech”? 6

Language Models Play the Role of … ¡ A judge of grammaticality ¡ A judge of semantic plausibility ¡ An enforcer of stylistic consistency ¡ A repository of knowledge (?) 7

The Language Modeling Problem ¡ Assign a probability to every sentence (or any string of words) ¡ Finite vocabulary (e.g., words or characters) {the, a, telescope, …} ¡ Infinite set of sequences ¡ A telescope STOP ¡ A STOP ¡ The the the STOP ¡ I saw a woman with a telescope STOP ¡ STOP 8 ¡ …

Example ¡ P(disseminating so much currency STOP) = 10 #$% ¡ P(spending so much currency STOP) = 10 #& 9

What Is A Language Model? ¡ Probability distributions over sentences (i.e., word sequences ) P(W) = P( ! " ! # ! $ ! % … ! ' ) ¡ Can use them to generate strings P( ! ' ∣ ! # ! $ ! % … ! ')" ) ¡ Rank possible sentences ¡ P(“ Today is Tuesday ”) > P(“ Tuesday Today is ”) ¡ P(“ Today is Tuesday ”) > P(“ Today is Atlanta ”) 10

Language Model Applications ¡ Machine Translation ¡ p(strong winds) > p(large winds) ¡ Spell Correction ¡ The office is about 15 minutes from my house ¡ p(15 minutes from my house) > p(15 minuets from my house) ¡ Speech Recognition ¡ p(I saw a van) >> p(eyes awe of an) ¡ Summarization, question-answering, handwriting recognition, etc.. 11

Language Model Applications 12

Language Model Applications Language generation https://pdos.csail.mit.edu/archive/scigen/ 13

Bag-of-Words with N-grams ¡ N-grams: a contiguous sequence of n tokens from a given piece of text http://recognize-speech.com/language-model/n-gram-model/comparison 14

N-grams Models ¡ Unigram model: ! " # ! " $ ! " % … !(" ( ) ¡ Bigram model: ! " # ! " $ |" # ! " % |" $ … !(" ( |" (+# ) ¡ Trigram model: ! " # ! " $ |" # ! " % |" $ , " # … !(" ( |" (+# " (+$ ) ¡ N-gram model: ! " # ! " $ |" # … !(" ( |" (+# " (+$ … " (+- ) 15

The Language Modeling Problem ¡ Assign a probability to every sentence (or any string of words) ¡ Finite vocabulary (e.g., words or characters) ¡ Infinite set of sequences ! & '( ) = 1 "∈$ ∗ & '( ) ≥ 0, ∀ ) ∈ Σ ∗ 16

A Trivial Model ¡ Assume we have ! training sentences ¡ Let " # , " % , … , " ' be a sentence, and c(" # , " % , … , " ' ) be the number of times it appeared in the training data. -(. / ,. 0 ,… ,. 1 ) ¡ Define a language model + " # , " % , … " ' = 2 ¡ No generalization! 17

Markov Processes ¡ Markov Processes: ¡ Given a sequence of ! random variables ¡ We want a sequence probability model ¡ " # , " % , … , " ' , (). +. , ! = 100), " 0 ∈ 2 ¡ 3(" # = 4 # , " % = 4 % , … " ' = 4 ' ) 18

Markov Processes ¡ Markov Processes: ¡ Given a sequence of ! random variables ¡ We want a sequence probability model ¡ " # , " % , … , " ' , " ( ∈ * ¡ +(" # = . # , " % = . % , … " ' = . ' ) ¡ There are * ' possible sequences 19

First-order Markov Processes ¡ Chain Rule: ¡ ! " # = % # , " ' = % ' , … " ) = % ) ) = * " # = % # + !(" , = % , |" # = % # , … , " ,0# = % ,0# ) ,-' 20

First-order Markov Processes ¡ Chain Rule: ¡ ! " # = % # , " ' = % ' , … " ) = % ) ) = * " # = % # + !(" , = % , |" # = % # , … , " ,0# = % ,0# ) ,-' ) = * " # = % # + !(" , = % , |" ,0# = % ,0# ) Markov Assumption ,-' 21

First-order Markov Processes ¡ Chain Rule: ¡ ! " # = % # , " ' = % ' , … " ) = % ) ) = * " # = % # + !(" , = % , |" # = % # , … , " ,0# = % ,0# ) ,-' ) = * " # = % # + !(" , = % , |" ,0# = % ,0# ) Markov Assumption ,-' 22

First-order Markov Processes 23

Second-order Markov Processes ¡ ! " # = % # , " ' = % ' , … " ) = % ) ) = ! " # = % # ×! " ' = % ' |" # = % # ∏ -./ !(" - = % - |" -1' = % -1' , " -1# = % -1# ) ¡ Simplify notation: % 3 = % 1# = ∗ 24

Details: Variable Length ¡ We want probability distribution over sequences of any length 25

Details: Variable Length ¡ Define always ! " = $%&' , where STOP is a special symbol ¡ Then use a Markov process as before: " + , - = ! - , , / = ! / , … , , " = ! " = 1 +(, 2 = ! 2 |, 26/ = ! 26/ , , 26- = ! 26- ) 23- ¡ We now have probability distribution over all sequences ¡ Intuition: at every step you have probability ( " to stop and 1 − ( " to keep going 26

The Process of Generating Sentences Step 1: Initialize ! = 1 and $ % = $ &' = ∗ Step 2: Generate $ ) from the distribution * + ) = $ ) |+ )&- = $ )&- , + )&' = $ )&' Step 3: If x 0 = 1234 then return the sequence $ ' ⋯ $ ) . Otherwise, set ! = ! + 1 and return to step 2. 27

3-gram LMs ¡ A trigram language model contains ¡ A vocabulary V ¡ A non negative parameter ! " #, % for every trigram, such that " ∈ ' ∪ STOP , #, % ∈ ' ∪ {∗} ¡ The probability of a sentence 0 1 , 0 2 , … , 0 4 , where 0 4 = STOP is 4 6 0 1 , … , 0 4 = 7 ! 0 8 0 8:1 , 0 8:2 ) 891 28

3-gram LMs ¡ A trigram language model contains ¡ A vocabulary V ¡ A non negative parameter ! " #, % for every trigram, such that " ∈ ' ∪ STOP , #, % ∈ ' ∪ {∗} ¡ The probability of a sentence 0 1 , 0 2 , … , 0 4 , where 0 4 = STOP is 4 6 0 1 , … , 0 4 = 7 ! 0 8 0 8:1 , 0 8:2 ) 891 29

3-gram LMs: Example ! the dog barks STOP = 2 the ∗,∗) × 30

3-gram LMs: Example ! the dog barks STOP = 2 the ∗,∗) × = 2 dog ∗, the) × = 2 barks the, dog) × = 2 STOP dog, barks) 31

Limitations ¡ Markovian assumption is false He is from France, so it makes sense that his first language is … ¡ We want to model longer dependencies 32

N-gram model 33

More Examples ¡ Yoav’s blog post: ¡ http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139 ¡ 10-gram character-level LM First Citizen: Nay, then, that was hers, It speaks against your other service: But since the youth of the circumstance be spoken: Your uncle and one Baptista's daughter. SEBASTIAN: Do I stand till the break off. 34

Maximum Likelihood Estimation ¡ “Best” means “data likelihood reaches maximum” ! " = $%&'$( " )(+|") 35

Maximum Likelihood Estimation Unigram Language Model q Estimation Document p(w| q )=? text 10 … mining 5 text ? 10/100 association 3 mining ? 5/100 association ? database 3 3/100 database ? algorithm 2 3/100 … … query ? query 1 1/100 … efficient 1 A paper (total #words=100) 36

Which Bag of Words More Likely to be Generated aaaDaaaKoaaaa a K a K a o o P D a a a a D F E b a E a n 37

Parameter Estimation ¡ General setting: ¡ Given a (hypothesized & probabilistic) model that governs the random experiment ¡ The model gives a probability of any data !(#|%) that depends on the parameter % ¡ Now, given actual sample data X={x 1 ,…,x n }, what can we say about the value of % ? ¡ Intuitively, take our best guess of % ¡ “best” means “best explaining/fitting the data” ¡ Generally an optimization problem 38

Maximum Likelihood Estimation ¡ Data: a collection of words, ! " , ! $ , … , ! & ¡ Model: multinomial distribution p()) with parameters + , = .(! , ) / + = 012304 5∈7 .()|+) ¡ Maximum likelihood estimator: 39

Maximum Likelihood Estimation , , % 1(2 3 ) ∝ . 1(2 3 ) ! " # = & ' ( , … , &(' , ) . # / # / /0( /0( , ⇒ log ! " # = 9 & ' / log # / + &;<=> /0( , ? # = @ABC@D E∈G 9 & ' / log # / /0( 40

Maximum Likelihood Estimation 0 ! " = $%&'$( )∈+ , 1 2 - log " - -./ 0 0 6 7, " = , 1 2 - log " - + : , " - − 1 Lagrange multiplier -./ -./ =6 = 1 2 - → " - = − 1 2 - + : Set partial derivatives to zero =" - " - : 0 0 : = − , 1 2 - ∑ -./ " - =1 Requirement from probability Since we have -./ 1 2 - " - = ML estimate 0 ∑ -./ 1 2 - 41

Maximum Likelihood Estimation ¡ For N-gram language models +(- . ,- ./0 ,…,- ./120 ) ¡ ! " # " #$% , … , " #$()% = +(- ./0 ,…,- ./120 ) 42

Practical Issues ¡ We do everything in the log space ¡ Avoid underflow ¡ Adding is faster than multiplying log $ % ×$ ' = log $ % + log $ ' 43

Language Modeling Diyi Yang Some slides borrowed from Yulia - PowerPoint PPT Presentation

CS 4650/7650: Natural Language Processing Language Modeling Diyi Yang Some slides borrowed from Yulia Tsvetkov at CMU and Kai-Wei Chang at UCLA 1 Logistics HW 1 Due HW 2 Out: Feb 3 rd , 2020, 3:00pm 2 Piazza & Office Hours ~ 11

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

Language Modeling Michael Collins, Columbia University Overview The language modeling problem

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Count-based Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner

NEST Modeling Language: A modeling language for spiking neuron and synapse models for NEST

Topics Why E Field Modeling What is E Field Modeling Case Studies Questions 2 Why

Outline 1 The topic 2 Decision support systems 3 Modeling 3.3 Advanced modeling

Verilog HDL:Digital Design and Modeling Chapter 5 Gate-Level Modeling Chapter 5 Gate-Level

Developmental Developmental Disorders affecting Disorders affecting language language

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Self-Monitoring and Assumptions Self-Adapting Systems Performance is important. People

Antifungal agents for prophylaxis, preemption or for proven aspergillosis preemption, or for

GRAFT ENGINEERING AND CELLULAR IMMUNOTHERAPY What the present and future holds Dr Mickey Koh

Limit laws of anticipated rejection and related algorithms Axel Bacher Coauthors: Olivier

Algorithms for NLP Language Modeling I Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC

N-gram Language Models CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin Roadmap

Lecture 2: N-gram Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage:

Retrospective review of a blood culture identification panel implementation and its impact on

Language Modeling Diyi Yang Some slides borrowed from Yulia - PowerPoint PPT Presentation

CS 4650/7650: Natural Language Processing Language Modeling Diyi Yang Some slides borrowed from Yulia Tsvetkov at CMU and Kai-Wei Chang at UCLA 1 Logistics HW 1 Due HW 2 Out: Feb 3 rd , 2020, 3:00pm 2 Piazza & Office Hours ~ 11

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

Language Modeling Michael Collins, Columbia University Overview The language modeling problem

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Count-based Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner

NEST Modeling Language: A modeling language for spiking neuron and synapse models for NEST

Topics Why E Field Modeling What is E Field Modeling Case Studies Questions 2 Why

Outline 1 The topic 2 Decision support systems 3 Modeling 3.3 Advanced modeling

Verilog HDL:Digital Design and Modeling Chapter 5 Gate-Level Modeling Chapter 5 Gate-Level

Developmental Developmental Disorders affecting Disorders affecting language language

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Self-Monitoring and Assumptions Self-Adapting Systems Performance is important. People

Antifungal agents for prophylaxis, preemption or for proven aspergillosis preemption, or for

GRAFT ENGINEERING AND CELLULAR IMMUNOTHERAPY What the present and future holds Dr Mickey Koh

Limit laws of anticipated rejection and related algorithms Axel Bacher Coauthors: Olivier

Algorithms for NLP Language Modeling I Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC

N-gram Language Models CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin Roadmap

Lecture 2: N-gram Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage:

Retrospective review of a blood culture identification panel implementation and its impact on

N-gram Language Models CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin Roadmap