statistical machine translation overview p
play

Statistical Machine Translation Overview p EM algorithm Lecture 3 - PDF document

Statistical modeling Statistical Machine Translation Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation Overview p EM algorithm Lecture 3 Improved word alignment Word Alignment and Phrase Models


  1. � Statistical modeling Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation Overview p � EM algorithm Lecture 3 � Improved word alignment Word Alignment and Phrase Models � Phrase-based SMT Philipp Koehn pkoehn@inf.ed.ac.uk School of Informatics University of Edinburgh – p.1 – p.2 Philipp Koehn, University of Edinburgh 2 Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Modeling p Statistical Modeling (2) p Mary did not slap the green witch Mary did not slap the green witch � Learn P ( f j e ) from a parallel corpus � Break the process into smaller steps Maria no daba una bofetada a la bruja verde Maria no daba una bofetada a la bruja verde � Not sufficient data to estimate P ( f j e ) directly – p.3 – p.4 Philipp Koehn, University of Edinburgh 3 Philipp Koehn, University of Edinburgh 4 � Generate a story how an English string e gets to be a Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p f Statistical Modeling (3) p Statistical Modeling (4) p p ( bruja j witch ) Mary did not slap the green witch n(3|slap) foreign string � Formula for P ( f j e ) in terms of parameters Mary not slap slap slap the green witch – choices in story are decided by reference to parameters p-null – e.g., � Training to obtain parameter estimates from possibly Mary not slap slap slap NULL the green witch t(la|the) Maria no daba una botefada a la verde bruja – usually long and hairy, but mechanical to extract from the story � Probabilities for smaller steps can be learned d(4|4) Maria no daba una bofetada a la bruja verde incomplete data – off-the-shelf EM – p.5 – p.6 Philipp Koehn, University of Edinburgh 5 Philipp Koehn, University of Edinburgh 6

  2. � Incomplete data Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Parallel Corpora p EM Algorithm p ... la maison ... la maison blue ... la fleur ... � EM in a nutshell – if we had complete data, would could estimate model � Incomplete data – if we had model, we could fill in the gaps in the data ... the house ... the blue house ... the flower ... � Chicken and egg problem – initialize model parameters (e.g. uniform) – assign probabilities to the missing data – English and foreign words, but no connections between them – estimate model parameters from completed data – iterate – if we had the connections, we could estimate the parameters of our generative story – if we had the parameters, we could estimate the connections – p.7 – p.8 Philipp Koehn, University of Edinburgh 7 Philipp Koehn, University of Edinburgh 8 Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p EM Algorithm (2) p EM Algorithm (3) p ... la maison ... la maison blue ... la fleur ... ... la maison ... la maison blue ... la fleur ... � Initial step: all connections equally likely � After one iteration ... the house ... the blue house ... the flower ... ... the house ... the blue house ... the flower ... � Model learns that, e.g., la is often connected with the � Connections, e.g., between la and the are more likely – p.9 – p.10 Philipp Koehn, University of Edinburgh 9 Philipp Koehn, University of Edinburgh 10 Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p EM Algorithm (4) p EM Algorithm (5) p ... la maison ... la maison bleu ... la fleur ... ... la maison ... la maison bleu ... la fleur ... � After another iteration � Convergence ... the house ... the blue house ... the flower ... ... the house ... the blue house ... the flower ... � It becomes apparent that connections, e.g., between fleur � Inherent hidden structure revealed by EM and flower are more likely (pigeon hole principle) – p.11 – p.12 Philipp Koehn, University of Edinburgh 11 Philipp Koehn, University of Edinburgh 12

  3. Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p m Y � p ( e ; a j f ) = t ( e j f ) EM Algorithm (6) p IBM Model 1 p j a ( j ) m ( l + 1) j =1 ... la maison ... la maison bleu ... la fleur ... � What is going on? f :::f 1 m e :::e 1 l ... the house ... the blue house ... the flower ... e f j is generated by a English word a ( j ) , as defined – foreign sentence f = a , with the probabilty t – English sentence e = � is required to turn the formula into a proper p(la|the) = 0.453 – each English word p(le|the) = 0.334 by the alignment function p(maison|house) = 0.876 � Parameter estimation from the connected corpus p(bleu|blue) = 0.563 – the normalization factor ... probability function – p.13 – p.14 Philipp Koehn, University of Edinburgh 13 Philipp Koehn, University of Edinburgh 14 � EM Algorithm consists of two steps Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p One example p IBM Model 1 and EM p � Expectation-Step: Apply model to the data e t ( e j f ) e t ( e j f ) e t ( e j f ) e t ( e j f ) das haus ist klein the house is small � Maximization-Step: Estimate model from data das Haus ist klein – parts of the model are hidden (here: alignments) – using the model, assign probabilities to possible values the 0.7 house 0.8 is 0.8 small 0.4 that 0.15 building 0.16 ’s 0.16 little 0.4 � which 0.075 p ( e; a j f home ) = � 0.02 t ( the j das ) ? � t ( house 0.02 j Haus short ) 0.1 3 – take assign values as fact 4 who 0.05 household 0.015 ? 0.015 minor 0.06 � Iterate these steps until convergence � t ( is j ist ) � t ( small j klein ) – collect counts (weighted by probabilities) this 0.025 shell 0.005 ? 0.005 petty 0.04 � = � 0 : 7 � 0 : 8 � 0 : 8 � 0 : 4 – estimate model from counts 3 4 = 0 : 0256 � – p.15 – p.16 Philipp Koehn, University of Edinburgh 15 Philipp Koehn, University of Edinburgh 16 � We need to be able to compute: Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p � We need to compute p ( a j e ; f ) IBM Model 1 and EM p IBM Model 1 and EM: Expectation Step p � Applying the chain rule: p ( a j e ; f ) = p ( e ; a j f ) =p ( e j f ) – Expectation-Step: probability of alignments – Maximization-Step: count collection � We already have the formula for p ( e ; a j f ) (definition of Model 1) – p.17 – p.18 Philipp Koehn, University of Edinburgh 17 Philipp Koehn, University of Edinburgh 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend