Probability & Language Modeling CMSC 473/673 UMBC Some slides - PowerPoint PPT Presentation

Bayes Rule prior likelihood probability 𝑞 𝑌 𝑍) = 𝑞 𝑍 𝑌) ∗ 𝑞(𝑌) 𝑞(𝑍) posterior probability marginal likelihood (probability)

Changing the Left 1 p(A) 0

Changing the Left 1 p(A) p(A, B) 0

Changing the Left 1 p(A) p(A, B) p(A, B, C) 0

Changing the Left 1 p(A) p(A, B) p(A, B, C) p(A, B, C, D) 0

Changing the Left 1 p(A) p(A, B) p(A, B, C) p(A, B, C, D) p(A, B, C, D, E) 0

Changing the Right 1 p(A) p(A | B) 0

Changing the Right 1 p(A | B) p(A) 0

Changing the Right Bias vs. Variance Lower bias: More specific to what we care about Higher variance: For fixed observations, estimates become less reliable

Probability Chain Rule 𝑞 𝑦 1 , 𝑦 2 = 𝑞 𝑦 1 𝑞 𝑦 2 𝑦 1 ) Bayes rule

Probability Chain Rule 𝑞 𝑦 1 , 𝑦 2 , … , 𝑦 𝑇 = 𝑞 𝑦 1 𝑞 𝑦 2 𝑦 1 )𝑞 𝑦 3 𝑦 1 , 𝑦 2 ) ⋯ 𝑞 𝑦 𝑇 𝑦 1 , … , 𝑦 𝑗

Probability Chain Rule 𝑞 𝑦 1 , 𝑦 2 , … , 𝑦 𝑇 = 𝑞 𝑦 1 𝑞 𝑦 2 𝑦 1 )𝑞 𝑦 3 𝑦 1 , 𝑦 2 ) ⋯ 𝑞 𝑦 𝑇 𝑦 1 , … , 𝑦 𝑗 = 𝑇 ෑ 𝑞 𝑦 𝑗 𝑦 1 , … , 𝑦 𝑗−1 ) 𝑗

Probability Takeaways Basic probability axioms and definitions Probabilistic Independence Definition of joint probability Definition of conditional probability Bayes rule Probability chain rule

Outline Probability review Words Defining Language Models Breaking & Fixing Language Models Evaluating Language Models

What Are Words? Linguists don’t agree (Human) Language-dependent White-space separation is a sometimes okay (for written English longform) Social media? Spoken vs. written? Other languages?

What Are Words? bat http://www.freepngimg.com/download/bat/9-2-bat-png-hd.png

What Are Words? bats http://www.freepngimg.com/download/bat/9-2-bat-png-hd.png

What Are Words? Fledermaus flutter mouse http://www.freepngimg.com/download/bat/9-2-bat-png-hd.png

What Are Words? pişirdiler They cooked it. pişmişlermişlerdi They had it cooked it.

What Are Words? ): my leg is hurting nasty ):

Examples of Text Normalization Segmenting or tokenizing words Normalizing word formats Segmenting sentences in running text

What Are Words? Tokens vs. Types The film got a great opening and the film went on to become a hit . Tokens : an instance of Types : an element of the that type in running text. vocabulary. • The • The • film • film • got • got • a • a • great • great • opening • opening • and • and • the • the • film • went • went • on • on • to • to • become • become • hit • a • . • hit • .

Some Issues with Tokenization mph, MPH, M.D. MD, M.D. Baltimore ’s mayor I ’m , w on’t state-of-the-art San Francisco

CaSE inSensitive? Replace all letters with lower case version Can be useful for information retrieval (IR), machine translation, language modeling cat vs Cat (there are other ways to signify beginning)

CaSE inSensitive? Replace all letters with lower case version Can be useful for information retrieval (IR), machine translation, language modeling cat vs Cat (there are other ways to signify beginning) But… case can be useful Sentiment analysis, machine translation, information extraction US vs us

cat ≟ cats Lemma : same stem, part of speech, rough word sense cat and cats: same lemma Word form : the fully inflected surface form cat and cats: different word forms

Lemmatization Reduce inflections or variant forms to base form am, are, is  be car, cars, car's , cars'  car the boy's cars are different colors  the boy car be different color

Morphosyntax Morphemes: The small meaningful units that make up words Stems : The core meaning- bearing units Affixes : Bits and pieces that adhere to stems

Morphosyntax Morphemes: The small Inflectional : meaningful units that make up (they) look  (they) looked words (they) ran  (they) run Stems : The core meaning- bearing units Derivational : Affixes : Bits and pieces that (a) run  running (of the Bulls) adhere to stems code  codeable

Morphosyntax Morphemes: The small Inflectional : meaningful units that make up (they) look  (they) looked words (they) ran  (they) run Stems : The core meaning- bearing units Derivational : Affixes : Bits and pieces that (a) run  running (of the Bulls) adhere to stems code  codeable Syntax: Contractions can rewrite and reorder a sentence Baltimore ’s [mayor ’s {campaign} ]  [ {the campaign} of the mayor] of Baltimore

Words vs. Sentences !, ? are relatively unambiguous Period “.” is quite ambiguous Sentence boundary Abbreviations like Inc. or Dr. Numbers like .02% or 4.3 Solution: write rules, build a classifier

Outline Probability review Words Defining Language Models Breaking & Fixing Language Models Evaluating Language Models

Goal of Language Modeling p θ ( ) […text..] Learn a probabilistic model of text Accomplished through observing text and updating model parameters to make text more likely

Goal of Language Modeling p θ ( ) […text..] Learn a probabilistic model of 0 ≤ 𝑞 𝜄 [… 𝑢𝑓𝑦𝑢 … ] ≤ 1 text Accomplished through ෍ 𝑞 𝜄 𝑢 = 1 observing text and updating model parameters to make 𝑢:𝑢 is valid text text more likely

“The Unreasonable Effectiveness of Recurrent Neural Networks ” http://karpathy.github.io/2015/05/21/rnn-effectiveness/

“The Unreasonable Effectiveness of Recurrent Neural Networks ” http://karpathy.github.io/2015/05/21/rnn-effectiveness/ “The Unreasonable Effectiveness of Character - level Language Models” (and why RNNs are still cool) http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139

Simple Count-Based 𝑞 item

Simple Count-Based “proportional to” 𝑞 item ∝ 𝑑𝑝𝑣𝑜𝑢(item)

Simple Count-Based “proportional to” 𝑞 item ∝ 𝑑𝑝𝑣𝑜𝑢 item 𝑑𝑝𝑣𝑜𝑢(item) = σ any other item 𝑧 𝑑𝑝𝑣𝑜𝑢(y)

Simple Count-Based “proportional to” 𝑞 item ∝ 𝑑𝑝𝑣𝑜𝑢 item 𝑑𝑝𝑣𝑜𝑢(item) = σ any other item 𝑧 𝑑𝑝𝑣𝑜𝑢(y) constant

Simple Count-Based 𝑞 item ∝ 𝑑𝑝𝑣𝑜𝑢(item) sequence of characters  pseudo-words sequence of words  pseudo-phrases

Shakespearian Sequences of Characters

Shakespearian Sequences of Words

Novel Words, Novel Sentences “Colorless green ideas sleep furiously” – Chomsky (1957) Let’s observe and record all sentences with our big, bad supercomputer Red ideas? Read ideas?

Probability Chain Rule 𝑞 𝑦 1 , 𝑦 2 , … , 𝑦 𝑇 = 𝑞 𝑦 1 𝑞 𝑦 2 𝑦 1 )𝑞 𝑦 3 𝑦 1 , 𝑦 2 ) ⋯ 𝑞 𝑦 𝑇 𝑦 1 , … , 𝑦 𝑗

Probability Chain Rule 𝑞 𝑦 1 , 𝑦 2 , … , 𝑦 𝑇 = 𝑞 𝑦 1 𝑞 𝑦 2 𝑦 1 )𝑞 𝑦 3 𝑦 1 , 𝑦 2 ) ⋯ 𝑞 𝑦 𝑇 𝑦 1 , … , 𝑦 𝑗 = 𝑇 ෑ 𝑞 𝑦 𝑗 𝑦 1 , … , 𝑦 𝑗−1 ) 𝑗

N-Grams Maintaining an entire inventory over sentences could be too much to ask Store “smaller” pieces? p(Colorless green ideas sleep furiously)

N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store “smaller” pieces? p(Colorless green ideas sleep furiously) = p(Colorless) *

N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store “smaller” pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) *

N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store “smaller” pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) * p(ideas | Colorless green) * p(sleep | Colorless green ideas) * p(furiously | Colorless green ideas sleep)

N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store “smaller” pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) * apply the p(ideas | Colorless green) * chain rule p(sleep | Colorless green ideas) * p(furiously | Colorless green ideas sleep)

N-Grams p(furiously | Colorless green ideas sleep) How much does “Colorless” influence the choice of “furiously?”

Probability & Language Modeling CMSC 473/673 UMBC Some slides - PowerPoint PPT Presentation

Probability & Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner VISION AUDIO prosody intonation orthography color morphology lexemes syntax semantics pragmatics discourse Three people have been

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Counting and Probability Whats to come? Counting and Probability Whats to come?

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Lecture 15: More Probability. Summary. CS70: Onwards. Events, Conditional Probability,

Formal Modeling in Cognitive Science 1 Special Probability Distributions Uniform Distribution

Review of probability Nuno Vasconcelos UCSD Probability probability is the language to deal

Probability Probability Random variables Atomic events Sample space Probability

Payload Operations The Google Lunar X PRIZE is a competition begun in 2007. The first

The NEMO Project Neutrino Mediterranean Observatory P. Piattelli I stituto Nazionale di Fisica

Conventional Surface Water Treatment for Drinking Water Paddle Flocculators at Everett WTP The

Regional Stormwater Management Presented by: Mark Gutshall Lancast Lancaster County Clean W

How complex is discourse structure? Markus Egg and Gisela Redeker Humboldt-Universit at

Introduction to Mobile Robotics SLAM Grid-based FastSLAM Wolfram Burgard, Maren Bennewitz,

1. Shapes and Masses of Nuclei Or: Nuclear Phenomeology References: [PRSZR 5.4, 2.3, 3.1/3; HG

CSE 451 Section Assignment 3 Virtual Memory Important mechanism, enables: Isolation and