Statistical Machine Translation May 13th, 2014 Josef van Genabith - - PowerPoint PPT Presentation

▶

Mar 02, 2023 499 likes •987 views

Statistical Machine Translation May 13th, 2014 Josef van Genabith DFKI GmbH Josef.van_Genabith@dfki.de Language Technology II SS 2014 With some additional slides from Chris Dyer MT Marathon 2011 and Sabine Hunsiker LT SS 2012 Overview

SLIDE 1

Statistical Machine Translation

Josef van Genabith DFKI GmbH Josef.van_Genabith@dfki.de

Language Technology II SS 2014

May 13th, 2014

With some additional slides from Chris Dyer MT Marathon 2011 and Sabine Hunsiker LT SS 2012

SLIDE 2

Overview

 Introduction: the basic idea  IBM models: the noisy channel  Phrase-Based SMT

Language Technology II (SS 2014): Statistical Machine Translation 2 Josef.van_Genabith@dfki.de

SLIDE 3

 Want to learn translation from data  Data = bitext  Texts and their translations  Aligned at sentence level  Brown et al, “The Mathematics of Statistical Machine Translation”, Computational Linguistics, 1993  Tough going  Fortunately: “A Statistical MT Workbook”, Kevin Knight, 1999  These slides are based on Kevin Knight’s explanations …

Language Technology II (SS 2014): Statistical Machine Translation 3 Josef.van_Genabith@dfki.de

SLIDE 4

Language Technology II (SS 2014): Statistical Machine Translation 4 Josef.van_Genabith@dfki.de

Mary did not slap the green witch Maria no daba una bofetada a la bruja verde Mary  not slap slap slap the green witch Mary not slap slap slap NULL the green witch Maria no daba una bofetada a la verde bruja

SLIDE 5

 A generative story  Given a string in the source language, how can we generate a string in the target language that is a translation  Components of the story:  Fertility t Translation (between words) d Distortion (reordering) 0 NULL generated words  Putting them into a model  Learning the model (parameters) from data

Language Technology II (SS 2014): Statistical Machine Translation 5 Josef.van_Genabith@dfki.de

SLIDE 6

 𝑄 𝑓  𝑄 𝑓, 𝑔 = 𝑄 𝑓 × 𝑄 𝑔 if e and f independent  𝑄 𝑓, 𝑔 = 𝑄 𝑓 × 𝑄(𝑔|𝑓) if e and f are not independent  𝑄 𝑓 𝑔 =

𝑄(𝑓,𝑔) 𝑄(𝑔)

 𝑄 𝑓, 𝑔 = 𝑄 𝑔, 𝑓  𝑄 𝑓 𝑔 ≠ 𝑄 𝑔 𝑓 in general

Language Technology II (SS 2014): Statistical Machine Translation 6 Josef.van_Genabith@dfki.de

SLIDE 7

 𝑓 = arg max

𝑓

𝑄(𝑓|𝑔)  𝑄 𝑓 𝑔 =

𝑄 𝑔 𝑓 ×𝑄(𝑓) 𝑄(𝑔)

 𝑓 = arg max

𝑓 𝑄 𝑓 𝑔 = arg max 𝑓 𝑄 𝑔 𝑓 ×𝑄(𝑓) 𝑞(𝑔)

= arg max

𝑓

𝑄 𝑔 𝑓 × 𝑄(𝑓)  this is the Noisy Channel Model

Language Technology II (SS 2014): Statistical Machine Translation 7 Josef.van_Genabith@dfki.de

SLIDE 8

The Noisy Channel Model

arg max

𝑓

𝑄 𝑔 𝑓 × 𝑄(𝑓)

 The noisy channel works like this. We imagine that someone has e in his head, but by the time it gets on to the printed page it is corrupted by “noise” and becomes f. To recover the most likely e, we reason about (1) what kinds of things people say any English, and (2) how English gets turned into French. These are sometimes called “source modeling” and “channel modeling.” (Knight, 1999, p.2)  People use the noisy channel metaphor for a lot of engineering problems, like actual noise on telephone transmissions. (ibid)

Language Technology II (SS 2014): Statistical Machine Translation 8 Josef.van_Genabith@dfki.de

SLIDE 9

The Noisy Channel Model

𝑓 = arg max

𝑓

𝑄 𝑔 𝑓 × 𝑄(𝑓) 𝑄 𝑓 the source model, the language model 𝑄(𝑔|𝑓) the channel model, the translation model

Language Technology II (SS 2014): Statistical Machine Translation 9 Josef.van_Genabith@dfki.de

Source e 𝑄(𝑓) Channel 𝑄(𝑔|𝑓) Observed f What is most likely e ? 𝑓 e f

SLIDE 10

Interlude