Machinetranslation p( strong winds) > - PowerPoint PPT Presentation

▪ ▪ ▪ ▪ ▪ ▪

▪ ▪ ▪

▪ Machinetranslation ▪ p( strong winds) > p( large winds) ▪ SpellCorrection ▪ The office is about fifteen minuets from my house ▪ p(about fifteen minutes from) > p(about fifteen minuets from) ▪ Speech Recognition ▪ p(I saw a van) >> p(eyes awe of an) ▪ Summarization, question-answering, handwriting recognition, OCR, etc.

W A noisy channel source observed best decoder w a ▪ We want to predict a sentence given acoustics:

W A noisy channel source observed best decoder w a ▪ The noisy-channel approach: Likelihood Prior Language model: Distributions over sequences Acoustic model (HMMs) of words (sentences)

Language Model Acoustic Model source channel w a P(w) P(a|w) the station signs are in deep in english the stations signs are in deep in english observed the station signs are in deep into english best decoder the station 's signs are in deep in english w a the station signs are in deep in the english the station 's signs are in deep in english the station signs are indeed in english the station 's signs are indeed in english the station signs are indians in english argmax P(w|a) = argmax P(a|w)P(w) the station signs are indian in english the stations signs are indians in english the stations signs are indians and english

Language Model Translation Model sent transmission: recovered transmission: English French channel source e f P(e) P(f|e) observed best decoder e f recovered message: English’ argmax P(e|f) = argmax P(f|e)P(e) e e

the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep into english -14739 the station 's signs are in deep in english -14740 the station signs are in deep in the english -14741 the station signs are indeed in english -14757 the station 's signs are indeed in english -14760 the station signs are indians in english -14790 the station signs are indian in english -14799 the stations signs are indians in english -14807 the stations signs are indians and english -14815

▪ A language model is a distribution over sequences of words (sentences) ▪ What’s w? (closed vs open vocabulary) ▪ What’s n? (must sum to one over all lengths) ▪ Can have rich structure or be linguistically naive ▪ Why language models? ▪ Usually the point is to assign high weights to plausible sentences (cf acoustic confusions) ▪ This is not the same as modeling grammaticality

▪ Language models are distributions over sentences ▪ N-gram models are built from local conditional probabilities ▪ The methods we’ve seen are backed by corpus n-gram counts

▪ ▪ ▪

▪ ▪ ▪ ▪

▪ ▪ ▪ ▪ ▪

▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪

▪ ▪ ▪ ⇒ ▪ ⇒

▪ unk unk ▪ unk ▪

▪ ▪ ▪ ▪ ▪ ▪ ▪

Held-Out Test Training Data Data Data Counts / parameters from Hyperparameters Evaluate here here from here

▪ We often want to make estimates from sparse statistics: P(w | denied the) 3 allegations allegations 2 reports reports charges benefits claims motion 1 claims … requ est 1 request 7 total ▪ Smoothing flattens spiky distributions so they generalize better: P(w | denied the) 2.5 allegations allegations 1.5 reports allegations charges benefits 0.5 claims motion reports … 0.5 request clai ues ms req t 2 other 7 total ▪ Very important all over NLP, but easy to do badly

▪ ▪ ▪ ▪ ▪ ▪

▪ ▪ ▪ ▪ ▪

▪ ▪ ▪ ▪ ▪ ▪ ▪

▪ ▪ ▪ ▪ ▪ ▪

▪ ▪ ▪ ▪ ▪

▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪

The LAMBADA dataset Context: “Why?” “I would have thought you’d find him rather dry,” she said. “I don’t know about that,” said Gabriel. “He was a great craftsman,” said Heather. “That he was,” said Flannery. Target sentence: “And Polish, to boot,” said _______. Target word: Gabriel [Paperno et al. 2016]

Other Techniques? ▪ Lots of other techniques ▪ Maximum entropy LMs ▪ Neural network LMs (soon) ▪ Syntactic / grammar-structured LMs (later)

How to Build an LM

▪ Good LMs need lots of n-grams! [Brants et al, 2007]

▪ Key function: map from n-grams to counts … searching for the best 192593 searching for the right 45805 searching for the cheapest 44965 searching for the perfect 43959 searching for the truth 23165 searching for the “ 19086 searching for the most 15512 searching for the latest 12670 searching for the next 10120 searching for the lowest 10080 searching for the name 8402 searching for the finest 8171 …

https://ai.googleblog.com/2006/08/all-our-n-gram-are-belong-to-you.html

● 24GB compressed ● 6 DVDs

source channel w a P(w) P(a|w) the station signs are in deep in english the stations signs are in deep in english observed the station signs are in deep into english best decoder the station 's signs are in deep in english w a the station signs are in deep in the english the station 's signs are in deep in english the station signs are indeed in english the station 's signs are indeed in english the station signs are indians in english argmax P(w|a) = argmax P(a|w)P(w) the station signs are indian in english the stations signs are indians in english the stations signs are indians and english

0 1 key value c(cat) = 12 hash(cat) = 2 the 87 2 cat 12 3 c(the) = 87 hash(the) = 2 4 5 and 76 c(and) = 76 hash(and) = 5 6 dog 11 7 c(dog) = 11 hash(dog) = 7 c(have) = ? hash(have) = 2

HashMap<String, Long> ngram_counts; String ngram1 = “I have a car”; String ngram2 = “I have a cat”; ngram_counts.put(ngram1, 123); ngram_counts.put(ngram2, 333);

HashMap<String[], Long> ngram_counts; String[] ngram1 = {“I”, “have”, “a”, “car”}; String[] ngram2 = {“I”, “have”, “a”, “cat”}; ngram_counts.put(ngram1, 123); ngram_counts.put(ngram2, 333);

Per 3-gram: 1 Pointer = 8 bytes 1 Map.Entry = 8 bytes (obj) +3x8 bytes (pointers) HashMap<String[], Long> ngram_counts; 1 Long = 8 bytes (obj) + 8 bytes (long) 1 String[] = 8 bytes (obj) + + 3x8 bytes (pointers) … at best Strings are canonicalized Total: > 88 bytes 4 billion ngrams * 88 bytes = 352 GB Obvious alternatives: - Sorted arrays - Open addressing at c

key value c(cat) = 12 hash(cat) = 2 0 1 c(the) = 87 hash(the) = 2 2 3 c(and) = 76 hash(and) = 5 4 5 c(dog) = 11 hash(dog) = 7 6 7

key value c(cat) = 12 hash(cat) = 2 0 1 c(the) = 87 hash(the) = 2 2 cat 12 3 the 87 c(and) = 76 hash(and) = 5 4 5 and 5 c(dog) = 11 hash(dog) = 7 6 7 dog 7 c(have) = ? hash(have) = 2

key value 0 c(cat) = 12 hash(cat) = 2 1 2 c(the) = 87 hash(the) = 2 3 4 c(and) = 76 hash(and) = 5 5 6 c(dog) = 11 hash(dog) = 7 7 … … … 14 15

▪ Closed address hashing ▪ Resolve collisions with chains ▪ Easier to understand but bigger ▪ Open address hashing ▪ Resolve collisions with probe sequences ▪ Smaller but easy to mess up ▪ Direct-address hashing ▪ No collision resolution ▪ Just eject previous entries ▪ Not suitable for core LM storage

HashMap<String[], Long> ngram_counts; Per 3-gram: 1 Pointer = 8 bytes 1 Map.Entry = 8 bytes (obj) +3x8 bytes (pointers) 1 Long = 8 bytes (obj) + 8 bytes (long) 1 String[] = 8 bytes (obj) + + 3x8 bytes (pointers) … at best Strings are canonicalized Total: > 88 bytes Obvious alternatives: - Sorted arrays - Open addressing

word ids 7 1 15 the cat laughed 233 n-gram count

Got 3 numbers under 2 20 to store? 7 1 15 0 … 00111 0...00001 0...01111 20 bits 20 bits 20 bits Fits in a primitive 64-bit long

n-gram encoding 15176595 = the cat laughed 233 n-gram count 32 bytes → 8 bytes

HashMap<String[], Long> ngram_counts; Per 3-gram: 1 Pointer = 8 bytes 1 Map.Entry = 8 bytes (obj) +3x8 bytes (pointers) 1 Long = 8 bytes (obj) + 8 bytes (long) 1 String[] = 8 bytes (obj) + + 3x8 bytes (pointers) … at best Strings are canonicalized Total: > 88 bytes Obvious alternatives: - Sorted arrays - Open addressing

c(the) = 23135851162 < 2 35 35 bits to represent integers between 0 and 2 35 60 bits 35 bits 15176595 233 n-gram encoding count

● 24GB compressed ● 6 DVDs

# unique counts = 770000 < 2 20 20 bits to represent ranks of all counts rank count 60 bits 20 bits 0 1 1 2 15176595 3 2 51 n-gram encoding rank 3 233

Vocabulary N-gram encoding scheme unigram: f(id) = id bigram: f(id 1 , id 2 ) = ? trigram: f(id 1 , id 2 , id 3 ) = ? Count DB unigram bigram trigram Counts lookup

▪ ▪

[Many details from Pauls and Klein, 2011]

Compression

Machinetranslation p( strong winds) > - PowerPoint PPT Presentation

Machinetranslation p( strong winds) > p( large winds) SpellCorrection The office is about fifteen minuets from my house p(about fifteen minutes from) > p(about fifteen

Five Winds The concept of the cluster development of the internal and inbound tourism in

Decommissioning: Winds of Change in Offshore Oil & Gas Accelerating NAMEPA & NOIA Winds

11/5/18 SIO15-18: Lecture16: Winds and Weather SIO15-18: Lecture16: Winds and Weather 1 11/5/18

PREVAILING WINDS, LLC OUR MISSION OUR MISSION OUR MISSION OUR MISSION Prevailing Winds,

Pedestrian, cyclist and road and rail safety in high winds Pedestrians in high winds 1 07/03/20

Strong Workforce 2017-2018 UPDATES STRONG WORKFORCE PROGRAM STRONG WORKFORCE AD HOC GROUP What

How to gain value from M&A Mikkel Boe Grab and Go - 2 nd October 2019 Agenda Winds of change

SIO15-SS1 20: Topic 16: Winds and Weather END: condensation: release of latent heat Fig. 12.4a

Lighter element primary process in neutrino-driven winds Almudena Arcones Helmholtz Young

Superluminal waves in pulsar winds The striped wind and its termination shock Brief review

Galactic Winds driven by Clustered Supernovae Drummond Fielding Flatiron Institute, CCA

Mapping A Paradise: Batanes, the Home of the Winds Arnalie Faye Vicario Philippines

A Site Analysis Site Conditions: Seismic Challenges Strong Winds Average Temperature

Atsuo Okazaki (Hokkai-Gakuen U.) In collaboration with Stan Owocki, Chris Russell & Tom

Strong winds in a coupled wave-atmosphere model during a North Atlantic storm event: evaluation

Energy-limited escape revisited: A transition from strong planetary winds to stable thermospheres

BONE & JOINT INFECTIONS Henry F. Chambers, MD Disclosures AstraZeneca advisory board

Massive Black Hole Growth and Formation: Implications for LISA P.Coppi, Yale 1. Supermassive

Some Basic Statistical Modeling Issues in Molecular and Ocean Dynamics Peter R. Kramer

Inference and Inverse Problems for Multiscale Diffusions G.A. Pavliotis Department of

Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University

Usingcharacter n gramstoclassify na3velanguageinanonna3ve

Twitter Sentiment Analysis Group 23a CS365A- Project Presentation Ajay Singh (12056)

New Jersey Center for Teaching and Learning AP Chemistry Progressive Science Initiative This

Machinetranslation p( strong winds) > - PowerPoint PPT Presentation

Machinetranslation p( strong winds) > p( large winds) SpellCorrection The office is about fifteen minuets from my house p(about fifteen minutes from) > p(about fifteen

Five Winds The concept of the cluster development of the internal and inbound tourism in

Decommissioning: Winds of Change in Offshore Oil &amp; Gas Accelerating NAMEPA &amp; NOIA Winds

11/5/18 SIO15-18: Lecture16: Winds and Weather SIO15-18: Lecture16: Winds and Weather 1 11/5/18

PREVAILING WINDS, LLC OUR MISSION OUR MISSION OUR MISSION OUR MISSION Prevailing Winds,

Pedestrian, cyclist and road and rail safety in high winds Pedestrians in high winds 1 07/03/20

Strong Workforce 2017-2018 UPDATES STRONG WORKFORCE PROGRAM STRONG WORKFORCE AD HOC GROUP What

How to gain value from M&amp;A Mikkel Boe Grab and Go - 2 nd October 2019 Agenda Winds of change

SIO15-SS1 20: Topic 16: Winds and Weather END: condensation: release of latent heat Fig. 12.4a

Lighter element primary process in neutrino-driven winds Almudena Arcones Helmholtz Young

Superluminal waves in pulsar winds The striped wind and its termination shock Brief review

Galactic Winds driven by Clustered Supernovae Drummond Fielding Flatiron Institute, CCA

Mapping A Paradise: Batanes, the Home of the Winds Arnalie Faye Vicario Philippines

A Site Analysis Site Conditions: Seismic Challenges Strong Winds Average Temperature

Atsuo Okazaki (Hokkai-Gakuen U.) In collaboration with Stan Owocki, Chris Russell &amp; Tom

Strong winds in a coupled wave-atmosphere model during a North Atlantic storm event: evaluation

Energy-limited escape revisited: A transition from strong planetary winds to stable thermospheres

BONE &amp; JOINT INFECTIONS Henry F. Chambers, MD Disclosures AstraZeneca advisory board

Massive Black Hole Growth and Formation: Implications for LISA P.Coppi, Yale 1. Supermassive

Some Basic Statistical Modeling Issues in Molecular and Ocean Dynamics Peter R. Kramer

Inference and Inverse Problems for Multiscale Diffusions G.A. Pavliotis Department of

Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University

Usingcharacter n gramstoclassify na3velanguageinanonna3ve

Twitter Sentiment Analysis Group 23a CS365A- Project Presentation Ajay Singh (12056)

New Jersey Center for Teaching and Learning AP Chemistry Progressive Science Initiative This

Decommissioning: Winds of Change in Offshore Oil & Gas Accelerating NAMEPA & NOIA Winds

How to gain value from M&A Mikkel Boe Grab and Go - 2 nd October 2019 Agenda Winds of change

Atsuo Okazaki (Hokkai-Gakuen U.) In collaboration with Stan Owocki, Chris Russell & Tom

BONE & JOINT INFECTIONS Henry F. Chambers, MD Disclosures AstraZeneca advisory board