Language Modeling Recap
CMSC 473/673 UMBC
Some slides adapted from 3SLP
Language Modeling Recap CMSC 473/673 UMBC Some slides adapted from - - PowerPoint PPT Presentation
Language Modeling Recap CMSC 473/673 UMBC Some slides adapted from 3SLP n-grams = Chain Rule + Backoff (Markov assumption) N-Gram Terminology how to (efficiently) compute p(Colorless green ideas sleep furiously)? Commonly History Size
Some slides adapted from 3SLP
n Commonly called History Size (Markov order) Example 1 unigram p(furiously) 2 bigram 1 p(furiously | sleep) 3 trigram (3-gram) 2 p(furiously | ideas sleep) 4 4-gram 3 p(furiously | green ideas sleep) n n-gram n-1 p(wi | wi-n+1 … wi-1) how to (efficiently) compute p(Colorless green ideas sleep furiously)?
Q: Why do we have all these options? Why is MLE not sufficient?
Q: Why do we have all these options? Why is MLE not sufficient? A: Do we trust our training corpus? (insufficient counts → 0s; corpora have lexical biases; …)
Q: What are the parameters we learn?
Q: What are the parameters we learn? A: The counts or normalized probability values
Q: What are the hyperparameters?
Q: What are the hyperparameters? A: Laplace, backoff, KN: The adjustments to counts Interpolation: reweighting values
Dev Data Test Data
acquire primary statistics for learning model parameters fine-tune any secondary (hyper)parameters perform final evaluation
Fix the N-gram probabilities/counts (on the training data) Search for λs that give largest probability to held-out set
Dev Data Test Data
−1 𝑁 σ𝑗=1 𝑁 log 𝑞 𝑥𝑗
n-gram history (n-1 items)
Post 33
Context: x y Word (Type): z Raw Count Add-1 count Norm. Probability p(z | x y) The film The The film film The film got 1 The film went The film OOV The film EOS … a great great a great
1 a great and a great the …
The film got a great opening and the film went on to become a hit .
Q: With OOV, EOS, and BOS, how many types (for normalization)?
Context: x y Word (Type): z Raw Count Add-1 count Norm. Probability p(z | x y) The film The The film film The film got 1 The film went The film OOV The film EOS … a great great a great
1 a great and a great the …
The film got a great opening and the film went on to become a hit .
Q: With OOV, EOS, and BOS, how many types (for normalization)? A: 16 (why don’t we count BOS?)
Context: x y Word (Type): z Raw Count Add-1 count Norm. Probability p(z | x y) The film The 1 17 (=1+16*1) 1/17 The film film 1
1/17
The film got 1 2
2/17
The film went 1
1/17
…
…
The film OOV 1 1/17 The film EOS 1 1/17 … a great great 1
17
1/17 a great
1 2 2/17 a great and 1 1/17 a great the 1 1/17 …
The film got a great opening and the film went on to become a hit .
Q: With OOV, EOS, and BOS, how many types (for normalization)? A: 16 (why don’t we count BOS?)
The film got a great opening and the film went on to become a hit .
Context: x y Word (Type): z Raw Count Add-1 count Norm. Probability p(z | x y) The film The 1 17 (=1+16*1) 1/17 The film film 1
1/17
The film got 1 2
2/17
The film went 1
1/17
…
…
The film OOV 1 1/17 The film EOS 1 1/17 … a great great 1
17
1/17 a great
1 2 2/17 a great and 1 1/17 a great the 1 1/17 …
Q: What is the perplexity for the sentence “The film , a hit !”
Trigrams MLE p(trigram) <BOS> <BOS> The 1 <BOS> The film 1 The film , film , a , a hit a hit ! hit ! <EOS> Perplexity ???
Trigrams MLE p(trigram) <BOS> <BOS> The 1 <BOS> The film 1 The film , film , a , a hit a hit ! hit ! <EOS> Perplexity Infinity
Trigrams MLE p(trigram) UNK-ed trigrams <BOS> <BOS> The 1 <BOS> <BOS> The <BOS> The film 1 <BOS> The film The film , The film <UNK> film , a film <UNK> a , a hit <UNK> a hit a hit ! a hit <UNK> hit ! <EOS> hit <UNK> <EOS> Perplexity Infinity
Trigrams MLE p(trigram) UNK-ed trigrams Smoothed p(trigram) <BOS> <BOS> The 1 <BOS> <BOS> The 2/17 <BOS> The film 1 <BOS> The film 2/17 The film , The film <UNK> 1/17 film , a film <UNK> a 1/16 , a hit <UNK> a hit 1/16 a hit ! a hit <UNK> 1/17 hit ! <EOS> hit <UNK> <EOS> 1/16 Perplexity Infinity Perplexity ???
Trigrams MLE p(trigram) UNK-ed trigrams Smoothed p(trigram) <BOS> <BOS> The 1 <BOS> <BOS> The 2/17 <BOS> The film 1 <BOS> The film 2/17 The film , The film <UNK> 1/17 film , a film <UNK> a 1/16 , a hit <UNK> a hit 1/16 a hit ! a hit <UNK> 1/17 hit ! <EOS> hit <UNK> <EOS> 1/16 Perplexity Infinity Perplexity 13.59
numpy.exp(-numpy.mean(numpy.log(probs_per_trigram_token)))
numpy.exp(-numpy.mean(c*lp(t) for (t, c) in ngram_types.items()))