Exploiting Syntactic Structure for Language Modeling Ciprian Chelba, - PowerPoint PPT Presentation

Hierarchical Structure in Natural Language ended_S ended_VP with_PP loss_NP of_PP cents_NP contract_NP loss_NP cents_NNS 7_CD the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN � Words are hierarchically organized in syntactic constituents — tree structure � Part of Speech(POS) and Non-Terminal(NT) tags identify the type of constituent � Lexicalized annotation of intermediate nodes in the tree Identifying the syntactic structure � Parsing ✔ Automatic parsing of natural language text is an area of active research Microsoft Research Speech.Net

Exploiting Syntactic Structure for Language Modeling Ciprian Chelba, Frederick Jelinek � Hierarchical Structure in Natural Language � Speech Recognition: Statistical Approach � Basic Language Modeling: – Measures for Language Model Quality – Current Approaches to Language Modeling � A Structured Language Model: – Language Model Requirements – Word and Structure Generation – Research Issues – Model Performance: Perplexity results on UPenn-Treebank – Model Performance: Perplexity and WER results on WSJ/SWB/BN � Any Future for the Structured Language Model? – Richer Syntactic Dependencies – Syntactic Structure Portability – Information Extraction from Text Microsoft Research Speech.Net

Speech Recognition — Statistical Approach Speaker’s Speech Acoustic Linguistic Mind Producer Processor Decoder ^ Speech W A W Speaker Speech Recognizer Acoustic Channel ^ W = a rgmax P ( W j A ) = a rgmax P ( A j W ) � P ( W ) W W ( A j W ) acoustic model : channel probability; � P � P ( W ) language model : source probability; ^ � search for the most likely word string W . ✔ due to the large vocabulary size — tens of thousands of words — an exhaustive search is intractable. Microsoft Research Speech.Net

Basic Language Modeling Estimate the source probability ( W ) , = P W w ; w ; : : : ; w 1 2 n from a training corpus — millions of words of text chosen for its similarity to the expected utterances. Parametric conditional models: P ( w =w : : : w ) ; � 2 � ; w 2 V i 1 i � 1 i � � parameter space � � V source alphabet (vocabulary) ✔ Source Modeling Problem Microsoft Research Speech.Net

Measures for Language Model Quality Word Error Rate (WER) TRN: UP UPSTATE NEW YORK SOMEWHERE UH OVER OVER HUGE AREAS HYP: UPSTATE NEW YORK SOMEWHERE UH ALL ALL THE HUGE AREAS 1 0 0 0 0 0 1 1 1 0 0 :4 errors per 10 words in transcription; WER = 40% Evaluating WER reduction is computationally expensive. Perplexity(PPL) ! N 1 X P P L ( M ) = exp � ln [ P ( w j w : : : w ) ℄ M i 1 i � 1 N i =1 ✔ different than maximum likelihood estimation: the test data is not seen during the model estimation process; ✔ good models are smooth: P ( w j w : : : w ) > � i 1 i � 1 M Microsoft Research Speech.Net

Current Approaches to Language Modeling Assume a Markov source of order n ; equivalence classification of a given context: [ w : : : w ℄ = w : : : w = h 1 i � 1 i � n +1 i � 1 n ( w ) Data sparseness: 3-gram model j w ; w i i � 2 i � 1 � approx. 70% of the trigrams in the training data have been seen once. � the rate of new (unseen) trigrams in test data relative to those observed in a training corpus of size 38 million words is 32% for a 20,000-words vocabulary; Smoothing: recursive linear interpolation among relative frequency estimates of different orders f ( � ) ; k = 0 : : : n using a recursive mixing scheme: k P ( u j z ; : : : ; z ) = n 1 n � ( z ; : : : ; z ) � P ( u j z ; : : : ; z ) + (1 � � ( z ; : : : ; z )) � f ( u j z ; : : : ; z ) ; 1 n n � 1 1 n � 1 1 n n 1 n ( u ) = m ( U ) P unif or � 1 Parameters: � = f � ( z ; : : : ; z ); ount ( u j z ; : : : ; z ) ; 8 ( u j z ; : : : ; z ) 2 T g n n n 1 1 1 Microsoft Research Speech.Net

Exploiting Syntactic Structure for Language Modeling � Hierarchical Structure in Natural Language � Speech Recognition: Statistical Approach � Basic Language Modeling: ☞ A Structured Language Model: – Language Model Requirements – Word and Structure Generation – Research Issues: � Model Component Parameterization � Pruning Method � Word Level Probability Assignment � Model Statistics Reestimation – Model Performance: Perplexity results on UPenn-Treebank – Model Performance: Perplexity and WER results on WSJ/SWB/BN Microsoft Research Speech.Net

A Structured Language Model � Generalize trigram modeling (local) by taking advantage of sentence structure (influ- ence by more distant past) � Use exposed heads h (words w and their corresponding non-terminal tags l ) for prediction: P ( w j T ) = P ( w j h ( T ) ; h ( T )) i i i � 2 i � 1 i i is the partial hidden structure, with head assignment, provided to T W i ended_VP’ with_PP loss_NP of_PP contract_NP loss_NP cents_NP the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN 7_CD cents_NNS after Microsoft Research Speech.Net

Language Model Requirements � Model must operate left-to-right: P ( w =w : : : w ) i 1 i � 1 � In hypothesizing hidden structure, the model can use only word-prefix W ; i.e. , not i the complete sentence w ; :::; w ; :::; w n +1 as all conventional parsers do! 0 i � Model complexity must be limited; even trigram model faces critical data sparseness problems � Model will assign joint probability to sequences of words and hidden parse structure: P ( T ; ) W i i Microsoft Research Speech.Net

ended_VP’ with_PP loss_NP of_PP cents_NP contract_NP loss_NP the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN 7_CD cents_NNS : : : ; null ; predict cents ; POStag cents ; adjoin-right-NP ; adjoin-left-PP ; : : : ; adjoin- left-VP’ ; null ; : : : ; Microsoft Research Speech.Net

ended_VP’ with_PP loss_NP of_PP cents_NP contract_NP loss_NP the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN _NNS 7_CD cents : : : ; null ; predict cents ; POStag cents ; adjoin-right-NP ; adjoin-left-PP ; : : : ; adjoin- left-VP’ ; null ; : : : ;

predict word ended_VP’ PREDICTOR TAGGER with_PP loss_NP null tag word PARSER of_PP adjoin_{left,right} cents_NP contract_NP loss_NP the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN 7_CD cents_NNS : : : ; null ; predict cents ; POStag cents ; adjoin-right-NP ; adjoin-left-PP ; : : : ; adjoin- left-VP’ ; null ; : : : ;

Word and Structure Generation P ( T ; W ) = n +1 n +1 n +1 Y P ( w j h ; h ) P ( g j w ; h :tag ; h :tag ) P ( T j w ; g ; ) T i � 2 � 1 i i � 1 � 2 i i i i � 1 | {z } | {z } | {z } parser i =1 tagger predictor � The predictor generates the next word w i with probability P ( w = v j h ; h ) � 2 � 1 i � The tagger attaches tag g i to the most recently generated word w i with probability P ( g j w ; h :tag ; h :tag ) i i � 1 � 2 � The parser builds the partial parse i from ; w i , and g i in a series of moves T T i � 1 ending with null , where a parser move a is made with probability P ( a j h ; h ) ; � 2 � 1 f (adjoin-left, NTtag) , (adjoin-right, NTtag) , null g a 2 Microsoft Research Speech.Net

Research Issues � Model component parameterization — equivalence classifications for model compo- nents: P ( w = v j h ; h ) ; P ( g j w ; h :tag ; h :tag ) ; P ( a j h ; h ) i � 2 � 1 i i � 1 � 2 � 2 � 1 � Huge number of hidden parses — need to prune it by discarding the unlikely ones ( w ) � Word level probability assignment — calculate P =w : : : w i 1 i � 1 � Model statistics estimation — unsupervised algorithm for maximizing P ( W ) (mini- mizing perplexity) Microsoft Research Speech.Net

Pruning Method k Number of parses T k for a given word prefix W k is jf T gj � O (2 ) ; k Prune most parses without discarding the most likely ones for a given sentence Synchronous Multi-Stack Pruning Algorithm � the hypotheses are ranked according to ln( P ( W ; T )) k k � each stack contains partial parses constructed by the same number of parser oper- ations The width of the pruning is controlled by: � maximum number of stack entries � log-probability threshold Microsoft Research Speech.Net

Exploiting Syntactic Structure for Language Modeling Ciprian Chelba, - PowerPoint PPT Presentation

Hierarchical Structure in Natural Language ended_S ended_VP with_PP loss_NP of_PP cents_NP contract_NP loss_NP cents_NNS 7_CD the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN Words are hierarchically organized in

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

1 min 1-1 P ( W ) , W = w ; w ; : : : ; w 1 2 n Basic Language Modeling Estimate

Outline Information Retrieval (IR) Syntactic IR Problems of Syntactic IR Semantic

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Resugaring: Lifting Evaluation Sequences through Syntactic Sugar Justin Pombrio, Shriram

Introduction Syntactic analysis (5LN455) Syntactic parsing (5LN713/5LN717) 2017-11-07 Sara

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Natural Language Processing Syntactic Models Machine Translation III Dan Klein UC Berkeley 1

EXPLOITING STRUCTURE FOR META-LEARNING NeurIPS Metalearning Workshop | December 8, 2018 Lise

The Java Syntactic Extender Jonathan Bachrach MIT AI Lab Keith Playford Functional Objects,

Natural Language Processing: Natural Language Processing: Introduction to Syntactic Parsing

Syntactic list of tokens analysis Syntactic analyzer grammar: context free format: BNF

Basic Issues in Syntactic Parsing Joakim Nivre Uppsala University Department of Linguistics and

Chapter 6: Noun Phrases and Agreement Syntactic Constructions in English Kim and Michaelis (2020)

Chapter 1: What Is a Theory of English Syntax about Syntactic Constructions in English Kim and

Syntactic Theory Lecture 3 (11.11.2010) PD Dr.Valia Kordoni Email: kordoni@coli.uni-sb.de

Unsupervised Morpheme Analysis Competition 3: Statistical Machine Translation Mikko Kurimo, Sami

Syntactically Guided Neural Machine Translation Felix Stahlberg, Eva Hasler, Aurelien Waite, and

Syntactic Translation Lattices Felix Stahlberg, Adria de Gispert, Eva Hasler, and Bill Byrne

EAPCI 2018 Expert Consensus Document on Clinical Use of Intracoronary Imaging Giulio Guagliumi,

Outline Conditionals, Questions and Meaning Background 1 The Interrogative Link 2 William

This presentation, the Reopening Guide and a recording of this session will be available at:

The Decadal Survey on Planetary Science and Astrobiology 2023-2032 Co-chairs: Robin Canup and

Dialog in NLP applica.ons VELJKO MILJANIC Overview Applica(ons in S2S