IBM Model 1 and the EM Algorithm Philipp Koehn 10 September 2020 - PowerPoint PPT Presentation

IBM Model 1 and the EM Algorithm Philipp Koehn 10 September 2020 Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Lexical Translation 1 • How to translate a word → look up in dictionary Haus — house, building, home, household, shell. • Multiple translations – some more frequent than others – for instance: house, and building most common – special cases: Haus of a snail is its shell • Note: In all lectures, we translate from a foreign language into English Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Collect Statistics 2 Look at a parallel corpus (German text along with English translation) Translation of Haus Count house 8,000 building 1,600 home 200 household 150 shell 50 Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Estimate Translation Probabilities 3 Maximum likelihood estimation  0 . 8 if e = house ,    0 . 16 if e = building ,     p f ( e ) = 0 . 02 if e = home ,  0 . 015 if e = household ,      0 . 005 if e = shell .  Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Alignment 4 • In a parallel text (or when we translate), we align words in one language with the words in the other 1 2 3 4 das Haus ist klein the house is small 1 2 3 4 • Word positions are numbered 1–4 Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Alignment Function 5 • Formalizing alignment with an alignment function • Mapping an English target word at position i to a German source word at position j with a function a : i → j • Example a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 4 } Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Reordering 6 Words may be reordered during translation 1 2 3 4 klein ist das Haus the house is small 1 2 3 4 a : { 1 → 3 , 2 → 4 , 3 → 2 , 4 → 1 } Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

One-to-Many Translation 7 A source word may translate into multiple target words 1 2 3 4 das Haus ist klitzeklein the house is very small 1 2 3 4 5 a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 4 , 5 → 4 } Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Dropping Words 8 Words may be dropped when translated (German article das is dropped) 1 2 3 4 das Haus ist klein house is small 1 2 3 a : { 1 → 2 , 2 → 3 , 3 → 4 } Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Inserting Words 9 • Words may be added during translation – The English just does not have an equivalent in German – We still need to map it to something: special NULL token 0 1 2 3 4 das Haus ist klein NULL the house is just small 1 2 3 4 5 a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 0 , 5 → 4 } Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

IBM Model 1 10 • Generative model: break up translation process into smaller steps – IBM Model 1 only uses lexical translation • Translation probability – for a foreign sentence f = ( f 1 , ..., f l f ) of length l f – to an English sentence e = ( e 1 , ..., e l e ) of length l e – with an alignment of each English word e j to a foreign word f i according to the alignment function a : j → i l e ǫ � p ( e , a | f ) = t ( e j | f a ( j ) ) ( l f + 1) l e j =1 – parameter ǫ is a normalization constant Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Example 11 das Haus ist klein e t ( e | f ) e t ( e | f ) e t ( e | f ) e t ( e | f ) the 0.7 house 0.8 is 0.8 small 0.4 that 0.15 building 0.16 ’s 0.16 little 0.4 which 0.075 home 0.02 exists 0.02 short 0.1 who 0.05 household 0.015 has 0.015 minor 0.06 this 0.025 shell 0.005 are 0.005 petty 0.04 p ( e, a | f ) = ǫ 4 3 × t ( the | das ) × t ( house | Haus ) × t ( is | ist ) × t ( small | klein ) = ǫ 4 3 × 0 . 7 × 0 . 8 × 0 . 8 × 0 . 4 = 0 . 0028 ǫ Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

12 finding translations Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Centauri-Arcturan Parallel Text 13 1a. ok-voon ororok sprok . 7a. lalok farok ororok lalok sprok izok enemok . 1b. at-voon bichat dat . 7b. wat jjat bichat wat dat vat eneat . ————————————————– ————————————————– 2a. ok-drubel ok-voon anok plok sprok . 8a. lalok brok anok plok nok . 2b. at-drubel at-voon pippat rrat dat . 8b. iat lat pippat rrat nnat . ————————————————– ————————————————– 3a. erok sprok izok hihok ghirok . 9a. wiwok nok izok kantok ok-yurp . 3b. totat dat arrat vat hilat . 9b. totat nnat quat oloat at-yurp . ————————————————– ————————————————– 4a. ok-voon anok drok brok jok . 10a. lalok mok nok yorok ghirok clok . 4b. at-voon krat pippat sat lat . 10b. wat nnat gat mat bat hilat . ————————————————– ————————————————– 5a. wiwok farok izok stok . 11a. lalok nok crrrok hihok yorok zanzanok . 5b. totat jjat quat cat . 11b. wat nnat arrat mat zanzanat . ————————————————– ————————————————– 6a. lalok sprok izok jok stok . 12a. lalok rarok nok izok hihok mok . 6b. wat dat krat quat cat . 12b. wat nnat forat arrat vat gat . Translation challenge: farok crrrok hihok yorok clok kantok ok-yurp (from Knight (1997): Automating Knowledge Acquisition for Machine Translation) Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

14 em algorithm Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

Learning Lexical Translation Models 15 • We would like to estimate the lexical translation probabilities t ( e | f ) from a parallel corpus • ... but we do not have the alignments • Chicken and egg problem – if we had the alignments , → we could estimate the parameters of our generative model – if we had the parameters , → we could estimate the alignments Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

EM Algorithm 16 • Incomplete data – if we had complete data , would could estimate model – if we had model , we could fill in the gaps in the data • Expectation Maximization (EM) in a nutshell 1. initialize model parameters (e.g. uniform) 2. assign probabilities to the missing data 3. estimate model parameters from completed data 4. iterate steps 2–3 until convergence Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

EM Algorithm 17 ... la maison ... la maison blue ... la fleur ... ... the house ... the blue house ... the flower ... • Initial step: all alignments equally likely • Model learns that, e.g., la is often aligned with the Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

EM Algorithm 18 ... la maison ... la maison blue ... la fleur ... ... the house ... the blue house ... the flower ... • After one iteration • Alignments, e.g., between la and the are more likely Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

EM Algorithm 19 ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... • After another iteration • It becomes apparent that alignments, e.g., between fleur and flower are more likely (pigeon hole principle) Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

EM Algorithm 20 ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... • Convergence • Inherent hidden structure revealed by EM Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

EM Algorithm 21 ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... p(la|the) = 0.453 p(le|the) = 0.334 p(maison|house) = 0.876 p(bleu|blue) = 0.563 ... • Parameter estimation from the aligned corpus Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

IBM Model 1 and EM 22 • EM Algorithm consists of two steps • Expectation-Step: Apply model to the data – parts of the model are hidden (here: alignments) – using the model, assign probabilities to possible values • Maximization-Step: Estimate model from data – take assign values as fact – collect counts (weighted by probabilities) – estimate model from counts • Iterate these steps until convergence Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

IBM Model 1 and EM 23 • We need to be able to compute: – Expectation-Step: probability of alignments – Maximization-Step: count collection Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020

IBM Model 1 and the EM Algorithm Philipp Koehn 10 September 2020 - PowerPoint PPT Presentation

IBM Model 1 and the EM Algorithm Philipp Koehn 10 September 2020 Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020 Lexical Translation 1 How to translate a word look up in dictionary Haus house,

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Problem solved: IBM Notes Replacement 2 IBM Notes Replacement Migrating from IBM Notes to

IBM i Its been a challenge to determine how to distill the essence of IBM i. Since IBM i is

Apache Beam Dan Debrunner Programming Model Architect IBM Streams STSM, IBM Background

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

IBM POWER6 Processor and Systems IBM POWER6 Fault-Tolerant Design Presenter: Natalya Kostenko

Distributed Planning Poker Integrating IBM Rational Team Concert and Google Wave for distributed

IBM Cloud Private on Linux on IBM Z & LinuxONE Presentation for Vicom Infinity Kershaw Mehta -

Latest on IBM i Therese Eaton Client Technical Specialist Top IBM i Client Projects IBM i

Infuse AI to Your Enterprise Yonghua LIN, IBM Research IBM Distinguished Engineer Leader of AI

The IBM 2016 Speaker Recognition System Seyed Omid Sadjadi, Sriram Ganapathy, Jason Pelecanos IBM

Black-Box Performance Control for High-Volume Non-Interactive Systems Chunqiang (CQ) Tang IBM

VP, Marketing Analytics, IBM VP of Marketing Analytics, IBM Has worked at IBM for 17 years

SimOSPPC Full System Simulation of PowerPC Architecture Tom Keller Austin Research Lab IBM

IBM Systems Cognitive Systems Dr. Wolfgang Maier Director HW Development IBM Research &

Mortgage Debt, Consumption, and Illiquid Housing Markets in the Great Recession Carlos Garriga

OSU District Plan Open House October 2014 We want to hear from you David Dodson University

In-house management tools TF-NOC George Kargiotakis (kargig@noc.grnet.gr) Andreas Polyrakis

Search Frictions and Idiosyncratic Price Dispersion in the US Housing Market Nadia Kotova 1

Jill Piekut Roy Archivist & Special Collections Librarian Sagadahoc History &

Resilient House Project Serena, Tessa, Cassandra and Will House Sketch Resilient Components

Congress Chapter 12 1 congress powerpoint 90 slides February 27, 2015 The Constitution and the

Bolt: Data management for connected homes Trinabh Gupta*, Rayman Preet Singh

IBM Model 1 and the EM Algorithm Philipp Koehn 10 September 2020 - PowerPoint PPT Presentation

IBM Model 1 and the EM Algorithm Philipp Koehn 10 September 2020 Philipp Koehn Machine Translation: IBM Model 1 and the EM Algorithm 10 September 2020 Lexical Translation 1 How to translate a word look up in dictionary Haus house,

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Problem solved: IBM Notes Replacement 2 IBM Notes Replacement Migrating from IBM Notes to

IBM i Its been a challenge to determine how to distill the essence of IBM i. Since IBM i is

Apache Beam Dan Debrunner Programming Model Architect IBM Streams STSM, IBM Background

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

IBM POWER6 Processor and Systems IBM POWER6 Fault-Tolerant Design Presenter: Natalya Kostenko

Distributed Planning Poker Integrating IBM Rational Team Concert and Google Wave for distributed

IBM Cloud Private on Linux on IBM Z &amp; LinuxONE Presentation for Vicom Infinity Kershaw Mehta -

Latest on IBM i Therese Eaton Client Technical Specialist Top IBM i Client Projects IBM i

Infuse AI to Your Enterprise Yonghua LIN, IBM Research IBM Distinguished Engineer Leader of AI

The IBM 2016 Speaker Recognition System Seyed Omid Sadjadi, Sriram Ganapathy, Jason Pelecanos IBM

Black-Box Performance Control for High-Volume Non-Interactive Systems Chunqiang (CQ) Tang IBM

VP, Marketing Analytics, IBM VP of Marketing Analytics, IBM Has worked at IBM for 17 years

SimOSPPC Full System Simulation of PowerPC Architecture Tom Keller Austin Research Lab IBM

IBM Systems Cognitive Systems Dr. Wolfgang Maier Director HW Development IBM Research &amp;

Mortgage Debt, Consumption, and Illiquid Housing Markets in the Great Recession Carlos Garriga

OSU District Plan Open House October 2014 We want to hear from you David Dodson University

In-house management tools TF-NOC George Kargiotakis (kargig@noc.grnet.gr) Andreas Polyrakis

Search Frictions and Idiosyncratic Price Dispersion in the US Housing Market Nadia Kotova 1

Jill Piekut Roy Archivist &amp; Special Collections Librarian Sagadahoc History &amp;

Resilient House Project Serena, Tessa, Cassandra and Will House Sketch Resilient Components

Congress Chapter 12 1 congress powerpoint 90 slides February 27, 2015 The Constitution and the

Bolt: Data management for connected homes Trinabh Gupta*, Rayman Preet Singh

IBM Cloud Private on Linux on IBM Z & LinuxONE Presentation for Vicom Infinity Kershaw Mehta -

IBM Systems Cognitive Systems Dr. Wolfgang Maier Director HW Development IBM Research &

Jill Piekut Roy Archivist & Special Collections Librarian Sagadahoc History &