Natural Language Processing Anoop Sarkar - PowerPoint PPT Presentation

SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 20, 2017 0

Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University Part 1: Generative Models for Word Alignment 1

Statistical Machine Translation Generative Model of Word Alignment Word Alignments: IBM Model 3 Word Alignments: IBM Model 1 Finding the best alignment: IBM Model 1 Learning Parameters: IBM Model 1 IBM Model 2 Back to IBM Model 3 2

Statistical Machine Translation Noisy Channel Model e ∗ = arg max Pr( e ) · Pr( f | e ) e � �� Language Model Alignment Model 3

Alignment Task e Program Pr( e | f ) f Training Data ◮ Alignment Model : learn a mapping between f and e . Training data: lots of translation pairs between f and e . 4

Statistical Machine Translation The IBM Models ◮ The first statistical machine translation models were developed at IBM Research (Yorktown Heights, NY) in the 1980s ◮ The models were published in 1993: Brown et. al. The Mathematics of Statistical Machine Translation. Computational Linguistics . 1993. http://aclweb.org/anthology/J/J93/J93-2003.pdf ◮ These models are the basic SMT models, called: IBM Model 1, IBM Model 2, IBM Model 3, IBM Model 4, IBM Model 5 as they were called in the 1993 paper. ◮ We use e and f in the equations in honor of their system which translated from French to English. Trained on the Canadian Hansards (Parliament Proceedings) 5

Generative Model of Word Alignment ◮ English e : Mary did not slap the green witch ◮ “French” f : Maria no daba una botefada a la bruja verde ◮ Alignment a : { 1 , 3 , 4 , 4 , 4 , 5 , 5 , 7 , 6 } e.g. ( f 8 , e a 8 ) = ( f 8 , e 7 ) = (bruja, witch) Visualizing alignment a Mary did not slap the green witch Maria no daba una botefada a la bruja verde 7

Generative Model of Word Alignment Data Set ◮ Data set D of N sentences: D = { ( f (1) , e (1) ) , . . . , ( f ( N ) , e ( N ) ) } ◮ French f : ( f 1 , f 2 , . . . , f I ) ◮ English e : ( e 1 , e 2 , . . . , e J ) ◮ Alignment a : ( a 1 , a 2 , . . . , a I ) ◮ length( f ) = length( a ) = I 8

Word Alignments: IBM Model 3 Generative “story” for P ( f , a | e ) Mary did not slap the green witch Mary not slap slap slap the the green witch (fertility) Maria no daba una botefada a la verde bruja (translate) Maria no daba una botefada a la bruja verde (reorder) 11

Word Alignments: IBM Model 3 Sentence pair with alignment a = (4 , 3 , 1 , 2) 1 2 3 4 the house is small 1 2 3 4 klein ist das Haus If we know the parameter values we can easily compute the probability of this aligned sentence pair. Pr( f , a | e ) = n (1 | the ) × t ( das | the ) × d (3 | 1 , 4 , 4) × n (1 | house ) × t ( Haus | house ) × d (4 | 2 , 4 , 4) × n (1 | is ) × t ( ist | is ) × d (2 | 3 , 4 , 4) × n (1 | small ) × t ( klein | small ) × d (1 | 4 , 4 , 4) 14

Word Alignments: IBM Model 3 1 2 3 4 1 2 3 4 the house is small the building is small 1 2 3 4 1 2 3 4 klein ist das Haus das Haus ist klein 1 2 3 5 1 2 3 4 4 very is is the home small the house small 1 2 3 4 1 2 3 4 5 das Haus ist klitzeklein das Haus ist ja klein Parameter Estimation ◮ What is n (1 | very ) = ? and n (0 | very ) = ? ◮ What is t ( Haus | house ) = ? and t ( klein | small ) = ? ◮ What is d (1 | 4 , 4 , 4) = ? and d (1 | 1 , 4 , 4) = ? 15

Word Alignments: IBM Model 3 1 2 4 1 2 4 3 3 the house is small the building is small 1 2 3 4 1 2 3 4 klein ist das Haus das Haus ist klein 1 2 3 5 1 2 3 4 4 very the home is small the house is small 1 2 3 4 1 2 3 4 5 ist ist ja das Haus klitzeklein das Haus klein Parameter Estimation: Sum over all alignments I � � � Pr( f , a | e ) = n ( φ a i | e a i ) × t ( f i | e a i ) × d ( i | a i , I , J ) a a i =1 16

Word Alignments: IBM Model 3 Summary ◮ If we know the parameter values we can easily compute the probability Pr( a | f , e ) given an aligned sentence pair ◮ If we are given a corpus of sentence pairs with alignments we can easily learn the parameter values by using relative frequencies. ◮ If we do not know the alignments then perhaps we can produce all possible alignments each with a certain probability? IBM Model 3 is too hard: Let us try learning only t ( f i | e a i ) I � � � Pr( f , a | e ) = n ( φ a i | e a i ) × t ( f i | e a i ) × d ( i | a i , I , J ) a a i =1 17

Word Alignments: IBM Model 1 Generative “story” for Model 1 the house is small das Haus ist klein (translate) I � Pr( f , a | e ) = t ( f i | e a i ) i =1 20

Finding the best word alignment: IBM Model 1 Compute the arg max word alignment ˆ a = arg max Pr( a | e , f ) a ◮ For each f i in ( f 1 , . . . , f I ) build a = ( ˆ a 1 , . . . , ˆ a I ) a i = arg max ˆ t ( f i | e a i ) a i Many to one alignment ✓ One to many alignment ✗ 1 2 3 4 1 2 3 4 the house is small the house is small 1 2 4 1 2 4 3 3 das Haus ist klein das Haus ist klein 22

Learning parameters [from P.Koehn SMT book slides] ◮ We would like to estimate the lexical translation probabilities t ( e | f ) from a parallel corpus ◮ ... but we do not have the alignments ◮ Chicken and egg problem ◮ if we had the alignments , → we could estimate the parameters of our generative model ◮ if we had the parameters , → we could estimate the alignments 24

EM Algorithm [from P.Koehn SMT book slides] ◮ Incomplete data ◮ if we had complete data , we could estimate model ◮ if we had model , we could fill in the gaps in the data ◮ Expectation Maximization (EM) in a nutshell 1. initialize model parameters (e.g. uniform) 2. assign probabilities to the missing data 3. estimate model parameters from completed data 4. iterate steps 2–3 until convergence 25

EM Algorithm [from P.Koehn SMT book slides] ... la maison ... la maison blue ... la fleur ... ... the house ... the blue house ... the flower ... ◮ Initial step: all alignments equally likely ◮ Model learns that, e.g., la is often aligned with the 26

EM Algorithm [from P.Koehn SMT book slides] ... la maison ... la maison blue ... la fleur ... ... the house ... the blue house ... the flower ... ◮ After one iteration ◮ Alignments, e.g., between la and the are more likely 27

EM Algorithm [from P.Koehn SMT book slides] ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... ◮ After another iteration ◮ It becomes apparent that alignments, e.g., between fleur and flower are more likely (pigeon hole principle) 28

EM Algorithm [from P.Koehn SMT book slides] ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... ◮ Convergence ◮ Inherent hidden structure revealed by EM 29

Natural Language Processing Anoop Sarkar - PowerPoint PPT Presentation

SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 20, 2017 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University Part 1:

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

Overview Key exchange Session vs. interchange keys Classical, public key methods

Algorithms for Big Data (IX) Chihao Zhang Shanghai Jiao Tong University Nov. 15, 2019

Alice goes floating Frank Mittelbach TUG 2016, Toronto, Canada, July 2016 /Alice goes floating

Serializable Snapshot Isolation in PostgreSQL Dan Ports Kevin Grittner University of Washington

6.1 Shape Matching Hao Li http://cs621.hao-li.com 1 Acknowledgement Images and Slides are

Alignment-Guided Chunking Yanjun Ma , Nicolas Stroppa, Andy Way { yma,nstroppa,away }

Statistical NLP Spring 2011 Lecture 8: Word Alignment Dan Klein UC Berkeley Phrase-Based

Statistical Machine Translation Overview p EM algorithm Lecture 3 Improved word alignment

Sambuz

Useful Links

Newsletter

Mail Us