 
              . . . . .. . . .. . . .. . . .. . . .. .. . . Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL, National Taiwan University Lab of Ontologies, Language Processing and e-Humanities Shu-Kai Hsieh Week 5: ngrams and language model Lab .. The Entropy of Natural Languages Related Topics Web-scaled N-grams N-grams model . . . . . . . . . . . . . . . . . . . . .. .. .. .. . . .. . . . . . .. . . .. . . . .. . . . .. . . .. . .. . . . .. . . .. . Shu-Kai Hsieh
. . . . . . . . . . . . . . . . . . . . . .. . . .. . Web-scaled N-grams .. . . .. . . .. . N-grams model Related Topics .. 3 Related Topics Topics in Computational Linguistics 5 Lab . . 4 The Entropy of Natural Languages . . . The Entropy of Natural Languages . 2 Web-scaled N-grams . Smoothing Techniques Evaluation 1 N-grams model . Lab . . .. . . . .. . . .. . .. . . . .. . . .. . . .. . . . .. . . .. . . .. . .. .. . . .. . . .. . . Shu-Kai Hsieh
. . . .. . . .. . .. .. . . .. . . .. . . . Lab Topics in Computational Linguistics QA, etc . and language processing. probability of a sequence of words.) (which will turn out to be closely related to computing the Language models The Entropy of Natural Languages . Related Topics Web-scaled N-grams N-grams model . . . . . . . . . . . . . . . . . . . .. . .. .. .. .. . . .. . . . . . .. . . .. . . . . .. . . .. . . .. . . Shu-Kai Hsieh .. . . .. . . .. . • Statistical/probabilistic language models aim to compute • either the prob. of a sentence or sequence of words, P ( S ) = P ( w 1 , w 2 , w 3 , ... w n ) , or • the prob. of the upcoming word P ( w n | w 1 , w 2 , w 3 , ... w n − 1 ) • N-gram model is one of the most important tools in speech • Varied applications : spelling checker, MT, Speech Recognition,
. . . . . . . . . . . . . . . . . . . . . .. . . .. . Web-scaled N-grams .. . . .. . . .. . N-grams model Related Topics .. . Topics in Computational Linguistics 5 Lab . . 4 The Entropy of Natural Languages . . 3 Related Topics . The Entropy of Natural Languages 2 Web-scaled N-grams . . Smoothing Techniques Evaluation 1 N-grams model . Lab . . .. . . . .. . . .. . .. . . . .. . . .. . . .. . . . .. . . .. . . .. . .. .. . . .. . . .. . . Shu-Kai Hsieh
. .. . . .. . . .. . . .. . . .. . . . .. N-grams model Topics in Computational Linguistics Simple n-gram model Lab The Entropy of Natural Languages Related Topics Web-scaled N-grams . . . . . . . . . . . . . . . . . . . . .. . . .. .. . . . . . .. . . .. . .. .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . Shu-Kai Hsieh • Let’s start with calculating the P ( S ) , say, P ( S ) = P ( 學, 語言, 很, 有趣 )
. .. . . .. . . .. . . .. . . .. . . . . N-grams model Topics in Computational Linguistics Review of Joint and Conditional Probability Lab The Entropy of Natural Languages Related Topics Web-scaled N-grams . . . . . . . . . . . . . . . . . . . . .. . . .. .. .. . . .. . .. . . .. . . .. . . .. . . .. . . . . . .. . .. .. . . .. . . .. . . Shu-Kai Hsieh • Recall that the conditional prob. of X given Y , P ( X | Y ) , is defined in terms of the prob. of Y , P ( Y ) , and the joint prob. of X and Y , P ( X , Y ) : P ( X | Y ) = P ( X , Y ) P ( Y )
. . . . .. . . .. . . .. . . .. .. . .. . . Web-scaled N-grams Topics in Computational Linguistics which leads to the chain rule Review of Chain Rule of Probability Lab The Entropy of Natural Languages Related Topics N-grams model .. . . . . . . . . . . . . . . . . . . . .. . . .. . . . . .. .. . . .. . . .. . . .. . . .. . . . . . .. .. . .. . . .. . . .. . . Shu-Kai Hsieh Conversely, the joint prob. P ( X , Y ) can be expressed in terms of the conditional prob. P ( X | Y ) . P ( X , Y ) = P ( X | Y ) P ( Y ) P ( X 1 , X 2 , X 3 , · · · , X n ) = P(X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 1 , X 2 ) · · · P ( X n | X 1 , · · · , X n − 1 ) = P(X 1 ) ∏ n i =2 P ( X i | X 1 , · · · , X i − 1 )
. .. .. . . .. . . . . . .. . .. .. . . .. .. Related Topics Topics in Computational Linguistics chain rule of probability words in sentence The Chain Rule applied to calculate joint probability of Lab The Entropy of Natural Languages Web-scaled N-grams . N-grams model . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . .. . . .. . . .. .. . .. .. . .. .. . . . . . . . Shu-Kai Hsieh .. . . .. 1 ) ... P ( w n | w n − 1 1 ) = P ( w 1 ) P ( w 2 | w 1 ) P ( w 3 | w 2 P ( S ) = P ( w n ) 1 k =1 P ( w k | w k − 1 = ∏ n ) 1 = P( 學 ) * P( 語言 | 學 ) * P( 很 | 學 語言 ) * P( 有趣 | 學 語言 很 )
. . . .. . . .. . .. .. . . .. . . .. . . . Lab Topics in Computational Linguistics 1 MLE sometimes called relative frequency count and divide between 0 and 1. (There are of course more sophisticated counting in a corpus and normalize them so that they lie How to Estimate these Probabilities? The Entropy of Natural Languages . Related Topics Web-scaled N-grams N-grams model . . . . . . . . . . . . . . . . . . . .. .. . .. .. .. . . .. . . . . . .. . . .. . . . . .. . . .. . . .. . . .. . . .. . . .. . Shu-Kai Hsieh • Maximum Likelihood Estimation (MLE): by dividing simply algorithms) 1 P( 嗎 | 學 語言 很 有趣 ) = Count( 學 語言 很 有趣 嗎 ) / Count( 學 語言 很 有趣 )
. .. .. . . .. . . . . . .. . . .. . . .. .. The Entropy of Natural Languages Topics in Computational Linguistics words . entire history, we can approximate the history by just the last few Simplified idea: instead of computing the prob. of a word given its Markov Assumption: Don’t look too far into the past Lab Related Topics . Web-scaled N-grams N-grams model . . . . . . . . . . . . . . . . . . . .. . . .. . . . . .. . . .. . . .. . . .. . . .. . . .. .. . . .. . . .. . . .. . . .. . . Shu-Kai Hsieh P( 嗎 | 學 語言 很 有趣 ) ≈ P( 嗎 | 有趣 ) OR, P( 嗎 | 學 語言 很 有趣 ) ≈ P( 嗎 | 很 有趣 )
. . . . .. . . .. . . .. .. . .. . . .. . . Web-scaled N-grams Topics in Computational Linguistics in general this is an insufficient model of language (because In other words Lab The Entropy of Natural Languages Related Topics N-grams model .. . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . .. . .. .. . . .. . . .. . . .. . . .. . . .. .. . . Shu-Kai Hsieh . . .. . . .. . . • Bi-gram model: approximates the prob. of a word give all the previous P ( w n | w n − 1 ) by using only the conditional prob. of 1 the preceding words P ( w n | w n − 1 ) . Thus generalized as P ( w n | w n − 1 ) ≈ P ( w n | w n − 1 n − N +1 ) 1 • Tri-gram: (your turn) • We can extend to trigrams, 4-grams, 5-grams, knowing that language has long-distance dependencies ). 我 在 一 個 非 常 奇特 的 機緣巧合 之下 學 梵文
. .. .. . . .. . . . . . .. . . .. .. . .. .. Related Topics Topics in Computational Linguistics n word, we can compute the prob. of the entire sentence as In other words Lab The Entropy of Natural Languages Web-scaled N-grams . N-grams model . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . .. . . .. . . .. . . .. . . .. .. . . .. .. . . . . .. . . .. . . Shu-Kai Hsieh • So given the bi-gram assumption for the prob. of an individual ∏ P ( S ) = P ( w n 1 ) ≈ P ( w k | w k − 1 ) k =1 • recall MLE on JM book equation (4.13)-(4.14)
. .. . . .. . . .. . . .. . . .. . . . . N-grams model Topics in Computational Linguistics Example: Language Modeling of Alice.txt Lab The Entropy of Natural Languages Related Topics Web-scaled N-grams . . . . . . . . . . . . . . . . . . . . .. . . .. .. . .. . . . .. . . .. . . .. . . .. . . .. . .. . . . .. . . .. . .. .. . . .. . . Shu-Kai Hsieh
. . .. . . .. . . .. . . .. . . .. . .. . . . . . . . . . . . . . . . . Topics in Computational Linguistics Lab The Entropy of Natural Languages Related Topics Web-scaled N-grams N-grams model . . . . . .. . . .. . . .. .. .. . . .. . . .. . . .. . . .. . . . . . . .. . . .. . .. . . . .. . . .. Shu-Kai Hsieh
. .. .. . . .. . . . . . .. . . .. . . .. .. The Entropy of Natural Languages Topics in Computational Linguistics (also adding is faster than multiplying) BTW, we used to do everything in log space to avoid underflow sentences (PP90-91) Exercise Lab Related Topics . Web-scaled N-grams N-grams model . . . . . . . . . . . . . . . . . . . .. . . .. . . . . .. . . .. . . .. . . .. . . .. . . .. .. . . .. . . .. . . .. . . .. . . Shu-Kai Hsieh • Walk through the example of Berkeley Restaurant Project log ( p 1 ∗ p 2 ∗ p 3 ) = logp 1 + logp 2 + logp 3
Recommend
More recommend