Statistical NLP Frequency gives pitch; amplitude gives volume - PDF document

Speech in a Slide Statistical NLP Frequency gives pitch; amplitude gives volume � Spring 2011 s p ee ch l a b amplitude Frequencies at each time slice processed into observation vectors � Lecture 2: Language Models Dan Klein – UC Berkeley …………………………………………….. a 12 a 13 a 12 a 14 a 14 ……….. Acoustically Scored Hypotheses The Noisy-Channel Model � We want to predict a sentence given acoustics: the station signs are in deep in english -14732 the stations signs are in deep in english -14735 � The noisy channel approach: the station signs are in deep into english -14739 the station 's signs are in deep in english -14740 the station signs are in deep in the english -14741 the station signs are indeed in english -14757 the station 's signs are indeed in english -14760 the station signs are indians in english -14790 the station signs are indian in english -14799 the stations signs are indians in english -14807 the stations signs are indians and english -14815 Acoustic model: HMMs over Language model: word positions with mixtures Distributions over sequences of Gaussians as emissions of words (sentences) ASR System Components Translation: Codebreaking? � “Also knowing nothing official about, but having �� guessed and inferred considerable about, the �� powerful new mechanized methods in � � cryptography—methods which I believe succeed �� even when one does not know what language has been coded—one naturally wonders if the problem of translation could conceivably be treated as a �� problem in cryptography. When I look at an article �� in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ ” �� Warren Weaver (1955:18, quoting a letter he wrote in 1947) 1

MT System Components N-Gram Model Decomposition � Break sentence probability down (w/o deeper variables) �� Impractical to condition on everything before � � �� P(??? | Turn to page 134 and look at the picture of the) ? � N-gram models: assume each word depends only on a �� short linear history �� Example: � � N-Gram Model Parameters Higher Order N-grams? � The parameters of an n-gram model: � The actual conditional probability estimates, we’ll call them θ Please close the door � Obvious estimate: relative frequency ( maximum likelihood ) estimate � General approach � Take a training set X and a test set X’ � Compute an estimate θ from X Please close the first window on the left � Use it to assign probabilities to other sentences, such as those in X’ 198015222 the first 198015222 the first 197302 close the window 3380 please close the door Training Counts 194623024 the same 194623024 the same 191125 close the door 1601 please close the window 168504105 the following 168504105 the following 152500 close the gap 1164 please close the new 158562063 the world 158562063 the world 116451 close the thread 1159 please close the gate … … 87298 close the deal … 14112454 the door 14112454 the door ----------------- 0 please close the first ----------------- ----------------- 3785230 close the * ----------------- 23135851162 the * 23135851162 the * 13951 please close the * Unigram Models Bigram Models � � Simplest case: unigrams Big problem with unigrams: P(the the the the) >> P(I like ice cream)! � Condition on previous single word: � Generative process: pick a word, pick a word, … until you pick STOP � As a graphical model: w 1 w 2 w n -1 STOP START STOP w 1 w 2 w n -1 …………. � Obvious that this should help – in probabilistic terms, we’re using weaker � Examples: conditional independence assumptions (what’s the cost?) � [fifth, an, of, futures, the, an, incorporated, a, a, the, inflation, most, dollars, quarter, in, is, mass.] � Any better? � [thrift, did, eighty, said, hard, 'm, july, bullish] � [texaco, rose, one, in, this, issue, is, pursuing, growth, in, a, boiler, house, said, mr., � [that, or, limited, the] gurria, mexico, 's, motion, control, proposal, without, permission, from, five, hundred, � [] fifty, five, yen] � [after, any, on, consistently, hospital, lake, of, of, other, and, factors, raised, analyst, too, allowed, mexico, never, consider, fall, bungled, davison, that, obtain, price, lines, the, to, sass, the, the, further, � [outside, new, car, parking, lot, of, the, agreement, reached] board, a, details, machinists, the, companies, which, rivals, an, because, longer, oakes, percent, a, � [although, common, shares, rose, forty, six, point, four, hundred, dollars, from, thirty, they, three, edward, it, currier, an, within, in, three, wrote, is, you, s., longer, institute, dentistry, pay, however, said, possible, to, rooms, hiding, eggs, approximate, financial, canada, the, so, workers, seconds, at, the, greatest, play, disingenuous, to, be, reset, annually, the, buy, out, of, advancers, half, between, nasdaq] american, brands, vying, for, mr., womack, currently, sharedata, incorporated, believe, chemical, prices, undoubtedly, will, be, as, much, is, scheduled, to, conscientious, teaching] � [this, would, be, a, record, november] 2

Regular Languages? More N-Gram Examples � N-gram models are (weighted) regular languages � Many linguistic arguments that language isn’t regular. � Long-distance effects: “The computer which I had just put into the machine room on the fifth floor ___.” � Recursive structure � Why CAN we often get away with n-gram models? � PCFG LM (later): � [This, quarter, ‘s, surprisingly, independent, attack, paid, off, the, risk, involving, IRS, leaders, and, transportation, prices, .] � [It, could, be, announced, sometime, .] � [Mr., Toseland, believes, the, average, defense, economy, is, drafted, from, slightly, more, than, 12, stocks, .] Measuring Model Quality Measuring Model Quality � The Shannon Game: grease 0.5 � The game isn’t to pound out fake sentences! � How well can we predict the next word? sauce 0.4 � Obviously, generated sentences get “better” as we increase the dust 0.05 model order When I eat pizza, I wipe off the ____ …. � More precisely: using ML estimators, higher order is always Many children are allergic to ____ mice 0.0001 better likelihood on train, but not test …. I saw a ____ the 1e-100 � Unigrams are terrible at this game. (Why?) � What we really want to know is: � Will our model prefer good sentences to bad ones? � “Entropy”: per-word test 3516 wipe off the excess � Bad ≠ ungrammatical! 1034 wipe off the dust 547 wipe off the sweat log likelihood (misnamed) � Bad ≈ unlikely 518 wipe off the mouthpiece … � Bad = sentences that our acoustic model really likes but aren’t 120 wipe off the grease the correct answer 0 wipe off the sauce 0 wipe off the mice ----------------- 28048 wipe off the * Measuring Model Quality Measuring Model Quality (Speech) � Problem with “entropy”: � Word Error Rate (WER) insertions + deletions + substitutions � 0.1 bits of improvement doesn’t sound so good true sentence size � “Solution”: perplexity Correct answer: Andy saw a part of the movie � Interpretation: average branching factor in model Recognizer output: And he saw apart of the movie � Important notes: � The “right” measure: WER: 4/7 � It’s easy to get bogus perplexities by having bogus probabilities � Task error driven = 57% that sum to more than one over their event spaces. 30% of you � For speech recognition will do this on HW1. � For a specific recognizer! � Even though our models require a stop step, averages are per actual word, not per derivation step. � Common issue: intrinsic measures like perplexity are easier to use, but extrinsic ones are more credible 3

Statistical NLP Frequency gives pitch; amplitude gives volume - PDF document

Speech in a Slide Statistical NLP Frequency gives pitch; amplitude gives volume Spring 2011 s p ee ch l a b amplitude Frequencies at each time slice processed into observation vectors

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Statistical Significance Tests in NLP Natural Language Processing VU (706.230) - Andi Rexha

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Maximum Entropy Model (I) LING 572 Advanced Statistical Methods for NLP January 28, 2020 1

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

CSCI 5832 Natural Language Processing Jim Martin Lecture 7 2/7/08 1 Today 2/5 Review LM

Managing Volatility for Investment Success In 2019 and Beyond 1 1 Management, Inc. 12400

Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 8,

N-gram Language Models CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Roadmap

Word Order Carl Pollard Department of Linguistics Ohio State University February 7, 2012 Carl

Why do we kill animals for fur? Kristina Mering Tallinn University N htamatud Loomad

The Postgrad Critical Other upcoming workshops Thinking Toolbox Techniques to have more

RSA, the Chinese Remainder Theorem, and Remote Coin Flipping CS70 Summer 2016 - Lecture 7B David

Sambuz

Useful Links

Newsletter

Mail Us

Statistical NLP Frequency gives pitch; amplitude gives volume - PDF document

Speech in a Slide Statistical NLP Frequency gives pitch; amplitude gives volume Spring 2011 s p ee ch l a b amplitude Frequencies at each time slice processed into observation vectors

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Statistical Significance Tests in NLP Natural Language Processing VU (706.230) - Andi Rexha

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Maximum Entropy Model (I) LING 572 Advanced Statistical Methods for NLP January 28, 2020 1

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

CSCI 5832 Natural Language Processing Jim Martin Lecture 7 2/7/08 1 Today 2/5 Review LM

Managing Volatility for Investment Success In 2019 and Beyond 1 1 Management, Inc. 12400

Question Processing: Formulation &amp; Expansion Ling573 NLP Systems and Applications May 8,

N-gram Language Models CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Roadmap

Word Order Carl Pollard Department of Linguistics Ohio State University February 7, 2012 Carl

Why do we kill animals for fur? Kristina Mering Tallinn University N htamatud Loomad

The Postgrad Critical Other upcoming workshops Thinking Toolbox Techniques to have more

RSA, the Chinese Remainder Theorem, and Remote Coin Flipping CS70 Summer 2016 - Lecture 7B David

Sambuz

Useful Links

Newsletter

Mail Us

Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 8,