Natural Language Processing Lecture 5: Language Models and - PowerPoint PPT Presentation

Natural Language Processing Lecture 5: Language Models and Smoothing

Language Modeling • Is this sentences good? – This is a pen – Pen this is a • Help choose between optons, help score optons – 他向记者介绍了发言的主要内容 – He briefed to reporters on the chief contents of the statement – He briefed reporters on the chief contents of the statement – He briefed to reporters on the main contents of the statement – He briefed reporters on the main contents of the statement

One-Slide Review of Probability Terminology • Random variables take diferent values, depending on chance. • Notaton: p ( X = x ) is the probability that r.v. X takes value x p ( x ) is shorthand for the same p ( X ) is the distributon over values X can take (a functon) • Joint probability: p ( X = x , Y = y ) – Independence – Chain rule • Conditonal probability: p ( X = x | Y = y )

Unigram Model • Every word in Σ is assigned some probability. • Random variables W 1 , W 2 , ... (one per word).

Part of A Unigram Distributon [rank 1] … p(the) = 0.038 [rank 1001] p(of) = 0.023 p(joint) = 0.00014 p(and) = 0.021 p(relatvely) = 0.00014 p(to) = 0.017 p(plot) = 0.00014 p(is) = 0.013 p(DEL1SUBSEQ) = 0.00014 p(a) = 0.012 p(rule) = 0.00014 p(in) = 0.012 p(62.0) = 0.00014 p(for) = 0.009 p(9.1) = 0.00014 ... p(evaluated) = 0.00014 ...

Unigram Model as a Generator fjrst, fsrsm! etss tet Tiess sftrtnt 2ee4*), suut wues e gds e 19.2 Ms te tetsr It ~(s?1), gdsvtn e.62 tetst (xe; m! t e 1 s et uuet. x 6e 1998. uun tr by Nsts t wut sfs st tt CFG 12e bt 1ee es tssn uur y Ifs m!s tes nstt 21.8 t e e WP te t tet te t Nsv? k. ts fsuun tssn; ts [e, ts sftrtnt v euuts, m!s te 65 sts. s s - 24*.94* stnttn ts nst te t 2 In ts euusttrsngd t e K&M 1ee Bse fs t X))] ppest ; In 1e4* S. gdr m!m! r wu s (St tssn sntr stsvt tetsss, tet m! esnts t bet -5.66 trs es: An tet ttxtuu e (fs m!sey ppes tssns. Wt e vt fssr m!s tes 4*e.1 ns 156 txpt tt rt ntsgdebsress

Full History Model • Every word in Σ is assigned some probability, conditoned on every history .

Bsee Cesntsn's uunuusuu eey srt t sm!m!tnt Wt nts y sn tet pssssbet rset sfs r t sn tet tet tssn wu s sn kttpsngd wuste tet Cesntsns' bs ts psrtr y Ob m! , wues ss sm!sngd ts bt sm!t tet fjrst be k U.S. prtss tnt, s tet et r fs vsrstt, tetrtby etsstnsngd tet psttnts e fs eesuut sfs Hsee ry Cesntsn sts nst wusn sn Ssuute C rsesn .

N-Gram Model • Every word in Σ is assigned some probability, conditoned on a fjxed-jlfengfth history ( n – 1 ).

Bigram Model as a Generator t. (A.33) (A.34*) A.5 Ms teS rt ess bttn sm!petttey suurp sst sn ptrfssrm! n t sn r fsts sfs snesnt egdsrstem!s n estvt fs r m!srt ss wueset suubst nts eey sm!prsvt uussngd CE. 4*.4*.1 MLE s C stsfsCE 71 26.34* 23.1 57.8 K&M 4*2.4* 62.7 4*e.9 4*4* 4*3 9e.7 1ee.e 1ee.e 1ee.e 15.1 3e.9 18.e 21.2 6e.1 uun srt tt tv euu tssns srt tt DEL1 TiRANS1 ntsgdebsress . Tiess sntsnuuts, wuste suuptrvsst snst., stm!ssuuptrvsst MLE wuste tet METiU- S b n sTirttb nk 195 ADJA ADJD ADV APPR APPRARTi APPO APZR ARTi CARD FM ITiJ KOUI KOUS KON KOKOM NN NN NN IN JJ NNTietsr prsbetm! ss y x. Tiet tv euu tssn sftrs tet eypstetsszt esnk gdr m!m! r wuste G uusss n

Trigram Model as a Generator tsp(xI ,rsgdet,B). (A.39) vsnte(X, I) r snstste(I 1, I). (A.4*e) vsnt(n). (A.4*1) Tietst tquu tssns wutrt prtstntt sn bste sts; tetst s srts uu<AC>snts prsb bsesty sstrsbuutssn ss tvtn sm! eetr(r =e.e5). Tiess ss tx tey fsEM. Duursngd DA, ss gdr uu eey rte xt . Tiess pprs e suue bt tf stntey uust sn prtvssuus e pttrs) btfssrt tr snsngd (ttst) K&MZtrsLs er n sm! m!s tes Fsgduurt4*.12: Dsrt tt uur y sn ee ssx e ngduu gdts. Im!psrt ntey, tetst p ptrs estvt st tt- sfs-tet- rt rtsuuets sn tetsr t sks n uune btet t n tet vtrbs rt eeswut (fssr snst n t) ts stet t tet r sn esty sfs ss rttt struu tuurts, eskt m! t esngds sn wutsgdett gdr pes (M Dsn e tt e., 1993) (35 t gd typts, 3.39 bsts). Tiet Buuegd rs n,

What’s in a word • Is punctuaton a word? – Does knowing the last “word” is a “,” help? • In speech – I do uh main- mainly business processing – Is “uh” a word?

For Thought • Do N-Gram models “know” English? • Unknown words • N-gram models and fnite-state automata

Startng and Stopping Unigram model: ... Bigram model: ... Trigram model: ...

Evaluatio

Which model is beter? • Can I get a number about how good my model is for a test set? • What is the P(test_set | Model ) • We measure this by Perplexity • Perplexity is the probability of test set normalized by the number of words

Perplexity

Perplexity of diferent models • Beter models have lower perplexity – WSJ: Unigram 962; Bigram 170; Trigram 109 • Diferent tasks have diferent perplexity – WSJ (109) vs Bus Informaton Queries (~25) • Higher the conditonal probability, lower the perplexity • Perplexity is the average branching rate

What about open class • What is the probability of unseen words? – (Naïve answer is 0.0) • But that’s not what you want – Test set will usually include words not in training • What is the probability of – P(Nebuchadnezzur | son of )

LM smoothing • Laplace or add-one smoothing – Add one to all counts – Or add “epsilon” to all counts – You stll need to know all your vocabulary • Have an OOV word in your vocabulary – The probability of seeing an unseen word

Good-Turing Smoothing • Good (1953) From Turing. – Using the count of things you’ve seen once to estmate count of things you’ve never seen. • Calculate the frequency of frequencies of Ngrams – Count of Ngrams that appear 1 tmes – Count of Ngrams that appear 2 tmes – Count of Ngrams that appear 3 tmes – … – Estmate new c = (c+1) (N_c + 1)/N_c) • Change the counts a litle so we get a beter estmate for count 0

Good-Turing’s Discounted Counts AP Newswire Berkeley Restaurants Bigrams Smith Thesis Bigrams Bigrams c N c c* N c c* N c c* e 74*,671,1ee,eee e.eeee27e 2,e81,4*96 e.ee2553 x 38,e4*8 / x 1 2,e18,e4*6 e.4*4*6 5,315 e.53396e 38,e4*8 e.2114*7 2 4*4*9,721 1.26 1,4*19 1.357294* 4*,e32 1.e5e71 3 188,933 2.24* 64*2 2.373832 1,4*e9 2.12633 4* 1e5,668 3.24* 381 4*.e81365 74*9 2.63685 5 68,379 4*.22 311 3.78135e 395 3.91899 6 4*8,19e 5.19 196 4*.5eeeee 258 4*.4*224*8

Backof • If no trigram, use bigram • If no bigram, use unigram • If no unigram … smooth the unigrams

Estmatng p ( w | esstsry) • Relatve frequencies (count & normalize) • Transform the counts: – Laplace/“add one”/“add λ” – Good-Turing discountng • Interpolate or “backof”: – With Good-Turing discountng: Katz backof – “Stupid” backof – Absolute discountng: Kneser-Ney

Natural Language Processing Lecture 5: Language Models and - PowerPoint PPT Presentation

Natural Language Processing Lecture 5: Language Models and Smoothing Language Modeling Is this sentences good? This is a pen Pen this is a Help choose between optons, help score optons

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

Meson spectra and beyond from the BSE Andreas Krassnigg University of Graz, Austria

Fermion-antifermion phenomenology in Minkowski space Jorge H. A. Nogueira Universit di Roma

Education Policy and Intergenerational Transfers in Equilibrium B. Abbott, G. Gallipoli, C.

Heavy flavour spectroscopy at ATLAS, CMS and LHCb Mat Charles (Sorbonne Universit * /LPNHE) 1

Tensor learning approach to sparse QMC sampling of two-particle Greens function in DMFT

Computer Science Independent Work Fall 2018 Aarti Gupta Robert Fish Colleen Kenny Welcome!

A roadmap for geo-neutrinos: A roadmap for geo-neutrinos: theory and experiment theory and

Solarneutrinoandterrestrial antineutrinofluxes