StatisticalNLP Spring2010 Lecture2:LanguageModels DanKlein - PDF document

Statistical�NLP Spring�2010 Lecture�2:�Language�Models Dan�Klein�– UC�Berkeley Speech�in�a�Slide Frequency�gives�pitch;�amplitude�gives�volume � s��p��ee��ch��l��a��b amplitude Frequencies�at�each�time�slice�processed�into�observation�vectors � …………………………………………….. a 12 a 13 a 12 a 14 a 14 ……….. 1

The�Noisy-Channel�Model � We�want�to�predict�a�sentence�given�acoustics: � The�noisy�channel�approach: Acoustic�model:�HMMs�over� Language�model:� word�positions�with�mixtures� Distributions�over�sequences� of�Gaussians�as�emissions of�words�(sentences) Acoustically�Scored�Hypotheses the�station�signs�are�in�deep�in�english -14732 the�stations�signs�are�in�deep�in�english -14735 the�station�signs�are�in�deep�into�english -14739 the�station�'s�signs�are�in�deep�in�english -14740 the�station�signs�are�in�deep�in�the�english -14741 the�station�signs�are�indeed�in�english -14757 the�station�'s�signs�are�indeed�in�english -14760 the�station�signs�are�indians�in�english -14790 the�station�signs�are�indian�in�english -14799 the�stations�signs�are�indians�in�english -14807 the�stations�signs�are�indians�and�english -14815 2

ASR�System�Components �� Translation:�Codebreaking? � “Also�knowing�nothing�official�about,�but�having� guessed�and�inferred�considerable�about,�the� powerful�new�mechanized�methods�in� cryptography—methods�which�I�believe�succeed� even�when�one�does�not�know�what�language�has� been�coded—one�naturally�wonders�if�the�problem� of�translation�could�conceivably�be�treated�as�a� problem�in�cryptography.��When�I�look�at�an�article� in�Russian,�I�say:�‘This�is�really�written�in�English,� but�it�has�been�coded�in�some�strange�symbols.�I� will�now�proceed�to�decode.’��”� � Warren�Weaver�(1955:18,�quoting�a�letter�he�wrote�in�1947) 3

MT�Overview � MT�System�Components �� 4

Other�Noisy-Channel�Processes � Spelling�Correction ∝ � � �� Handwriting�recognition ∝ � � �� OCR ∝ � � �� More… Probabilistic�Language�Models � Goal:�Assign�useful�probabilities�P(x)�to�sentences�x � Input:�many�observations�of�training�sentences�x � Output:�system�capable�of�computing�P(x) � Probabilities�should�broadly�indicate�plausibility�of�sentences � P(I�saw�a�van)�>>�P(eyes�awe�of�an) � �� :�P(artichokes�intimidate�zippers)� ≈ 0 � In�principle,�“plausible”�depends�on�the�domain,�context,�speaker… � One�option:�empirical�distribution�over�training�sentences? � Problem:�doesn’t�generalize�(at�all) � Two�aspects�of�generalization � Decomposition:�break�sentences�into�small�pieces�which�can�be� recombined�in�new�ways�(conditional�independence) � Smoothing:�allow�for�the�possibility�of�unseen�pieces 5

N-Gram�Model�Decomposition � Chain�rule:�break�sentence�probability�down � Impractical�to�condition�on�everything�before � P(???�|�Turn�to�page�134�and�look�at�the�picture�of�the)�? � N-gram�models:�assume�each�word�depends�only�on�a� short�linear�history � Example:� N-Gram�Model�Parameters � The�parameters�of�an�n-gram�model: � The�actual�conditional�probability�estimates,�we’ll�call�them� θ � Obvious�estimate:� �� ( �� )� �� General�approach � Take�a�training�set�X�and�a�test�set�X’ � Compute�an�estimate� θ from�X � Use�it�to�assign�probabilities�to�other�sentences,�such�as�those�in�X’ 198015222�the�first 194623024�the�same Training�Counts 168504105�the�following 158562063�the�world … 14112454�the�door ----------------- 23135851162�the�* 6

Higher�Order�N-grams? Please�close�the�door Please�close�the�first�window�on�the�left 198015222�the�first 197302�close�the�window� 3380�please�close�the�door 194623024�the�same 191125�close�the�door� 1601�please�close�the�window 168504105�the�following 152500�close�the�gap� 1164�please�close�the�new 158562063�the�world 116451�close�the�thread� 1159�please�close�the�gate … 87298�close�the�deal 900�please�close�the�browser 14112454�the�door ----------------- ----------------- ----------------- 3785230 close�the�* 13951�please�close�the�* 23135851162�the�* Unigram�Models � Simplest�case:�unigrams � Generative�process:�pick�a�word,�pick�a�word,�…�until�you�pick�STOP � As�a�graphical�model: � � � � � � �� …………. � Examples: � [fifth,�an,�of,�futures,�the,�an,�incorporated,�a,�a,�the,�inflation,�most,�dollars,�quarter,�in,�is,�mass.] � [thrift,�did,�eighty,�said,�hard,�'m,�july,�bullish] � [that,�or,�limited,�the] � [] � [after,�any,�on,�consistently,�hospital,�lake,�of,�of,�other,�and,�factors,�raised,�analyst,�too,�allowed,� mexico,�never,�consider,�fall,�bungled,�davison,�that,�obtain,�price,�lines,�the,�to,�sass,�the,�the,�further,� board,�a,�details,�machinists,�the,�companies,�which,�rivals,�an,�because,�longer,�oakes,�percent,�a,� they,�three,�edward,�it,�currier,�an,�within,�in,�three,�wrote,�is,�you,�s.,�longer,�institute,�dentistry,�pay,� however,�said,�possible,�to,�rooms,�hiding,�eggs,�approximate,�financial,�canada,�the,�so,�workers,� advancers,�half,�between,�nasdaq] 7

StatisticalNLP Spring2010 Lecture2:LanguageModels DanKlein - PDF document

StatisticalNLP Spring2010 Lecture2:LanguageModels DanKlein UCBerkeley SpeechinaSlide Frequencygivespitch;amplitudegivesvolume

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Statistical Significance Tests in NLP Natural Language Processing VU (706.230) - Andi Rexha

Maximum Entropy Model (I) LING 572 Advanced Statistical Methods for NLP January 28, 2020 1

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Community Detection: From Plain to Attributed Complex Networks Martin Atzmueller Universit y of

Tagging modality in Oceanic languages of Melanesia Annika Tjuka, Lena Weimann, and Kilu von

Investigating the OpenPGP Web of Trust Alexander Ulrich, Ralph Holz , Peter Hauck, Georg Carle

RHyTHM: A Randomized Hybrid Scheme To Hide in the Mobile Crowd Mohammad Khodaei, Andreas Messing,

Growth in Southeast San Francisco/San Mateo Counties Planned/Proposed Development + Housing:

Steps Toward a Two Loop Graphical Coproduct James Matthew in collaboration with Samuel Abreu,

Category Theory Roy L. Crole University of Leicester, UK April 2018 MGS 2018, 9-13 April,

Impurity and modularity from monads and coproducts Tarmo Uustalu (based on joint work with Neil