Statistical Machine Translation Statistical Machine Translation p - PDF document

� Components: Translation model, language model, decoder Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of Decoding foreign/English English Philipp Koehn parallel text text pkoehn@inf.ed.ac.uk statistical analysis statistical analysis School of Informatics University of Edinburgh Translation Language Model Model Decoding Algorithm – p.1 – p.2 Philipp Koehn, University of Edinburgh 2 � A number of research groups developed phrase-based Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Phrase-Based Systems p Phrase-Based Translation p Morgen fliege ich nach Kanada zur Konferenz � Systems differ in systems ( RWTH Aachen, Univ. of Southern California/ISI, CMU, IBM, Johns Hopkins Univ., Cambridge Univ., Univ. of Catalunya, � Foreign input is segmented in phrases ITC-irst, Univ. Edinburgh, Univ. of Maryland...) Tomorrow I will fly to the conference in Canada � Each phrase is translated into English – training methods – model for phrase translation table – any sequence of words, not necessarily linguistically motivated � Phrases are reordered � Currently best method for SMT (MT?) – reordering models – additional feature functions – top systems in DARPA/NIST evaluation are phrase-based – best commercial system for Arabic-English is phrase-based – p.3 – p.4 Philipp Koehn, University of Edinburgh 3 Philipp Koehn, University of Edinburgh 4 � Phrase Translations for “den Vorschlag”: Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Phrase Translation Table p Decoding Process p � (e j f) � (e j f) Maria no dio una bofetada a la bruja verde English English � Build translation left to right the proposal 0.6227 the suggestions 0.0114 ’s proposal 0.1068 the proposed 0.0114 a proposal 0.0341 the motion 0.0091 – select foreign words to be translated the idea 0.0250 the idea of 0.0091 this proposal 0.0227 the proposal , 0.0068 proposal 0.0205 its proposal 0.0068 of the proposal 0.0159 it 0.0068 the proposals 0.0159 ... ... – p.5 – p.6 Philipp Koehn, University of Edinburgh 5 Philipp Koehn, University of Edinburgh 6

Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde � Build translation left to right � Build translation left to right Mary Mary – select foreign words to be translated – select foreign words to be translated – find English phrase translation – find English phrase translation – add English phrase to end of partial translation – add English phrase to end of partial translation – mark foreign words as translated – p.7 – p.8 Philipp Koehn, University of Edinburgh 7 Philipp Koehn, University of Edinburgh 8 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde � One to many translation Mary did not � Many to one translation Mary did not slap – p.9 – p.10 Philipp Koehn, University of Edinburgh 9 Philipp Koehn, University of Edinburgh 10 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde � Many to one translation Mary did not slap the � Reordering Mary did not slap the green – p.11 – p.12 Philipp Koehn, University of Edinburgh 11 Philipp Koehn, University of Edinburgh 12

Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Translation Options p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to � Look up possible phrase translations the � Translation finished Mary did not slap the green witch slap the witch – many different ways to segment words into phrases – many different ways to translate each phrase – p.13 – p.14 Philipp Koehn, University of Edinburgh 13 Philipp Koehn, University of Edinburgh 14 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Hypothesis Expansion p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green Mary not give a slap to the witch green did not a slap by green witch did not a slap by green witch no slap to the no slap to the did not give to did not give to the the slap the witch slap the witch � Start with empty hypothesis � Pick translation option e: e: e: Mary f: --------- f: --------- f: *-------- � Create hypothesis p: 1 p: 1 p: .534 – e: no English words – f: no foreign words covered – e: add English phrase Mary – f: first foreign word covered – p: probability 1 – p: probability 0.534 – p.15 – p.16 Philipp Koehn, University of Edinburgh 15 Philipp Koehn, University of Edinburgh 16 � Not going into detail here, but... Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p A Quick Word on Probabilities p � Translation Model Hypothesis Expansion p j Maria) Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to – phrase translation probability p(Mary the slap the witch – reordering costs � Language Model e: witch f: -------*- p: .182 – phrase/word count costs � Add another hypothesis e: e: Mary – ... f: --------- f: *-------- j < s > ) * p(did j Mary, < s > ) * p(not j Mary did) p: 1 p: .534 – uses trigrams: – p(Mary did not) = p(Mary – p.17 – p.18 Philipp Koehn, University of Edinburgh 17 Philipp Koehn, University of Edinburgh 18

Statistical Machine Translation Statistical Machine Translation p - PDF document

Components: Translation model, language model, decoder Statistical Machine Translation Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of Decoding

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Global Translation Services Website translation using post-edited machine translation and

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Algorithm X. Li, C. Bao, F. Baker March 2009 Abstract This document specifies an update to

[537] TLBs Tyler Harter 9/21/14 Overview Review Paging TLBs (Chapter 18) TLB measurement demo

Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

The HIT-LTRC Machine Translation System for IWSLT 2012 Xiaoning Zhu, Yiming Cui, Conghui Zhu,

Leveling for Non-Volatile Main Memories Haikun Liu , Yuanyuan Ye, Xiaofei Liao, Hai Jin, Yu Zhang,

Syntax-Directed Translation for Top-Down Parsing 1 Midterm next week during class online