statistical machine translation statistical machine
play

Statistical Machine Translation Statistical Machine Translation p - PDF document

Components: Translation model, language model, decoder Statistical Machine Translation Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of Decoding


  1. � Components: Translation model, language model, decoder Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of Decoding foreign/English English Philipp Koehn parallel text text pkoehn@inf.ed.ac.uk statistical analysis statistical analysis School of Informatics University of Edinburgh Translation Language Model Model Decoding Algorithm – p.1 – p.2 Philipp Koehn, University of Edinburgh 2 � A number of research groups developed phrase-based Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Phrase-Based Systems p Phrase-Based Translation p Morgen fliege ich nach Kanada zur Konferenz � Systems differ in systems ( RWTH Aachen, Univ. of Southern California/ISI, CMU, IBM, Johns Hopkins Univ., Cambridge Univ., Univ. of Catalunya, � Foreign input is segmented in phrases ITC-irst, Univ. Edinburgh, Univ. of Maryland...) Tomorrow I will fly to the conference in Canada � Each phrase is translated into English – training methods – model for phrase translation table – any sequence of words, not necessarily linguistically motivated � Phrases are reordered � Currently best method for SMT (MT?) – reordering models – additional feature functions – top systems in DARPA/NIST evaluation are phrase-based – best commercial system for Arabic-English is phrase-based – p.3 – p.4 Philipp Koehn, University of Edinburgh 3 Philipp Koehn, University of Edinburgh 4 � Phrase Translations for “den Vorschlag”: Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Phrase Translation Table p Decoding Process p � (e j f) � (e j f) Maria no dio una bofetada a la bruja verde English English � Build translation left to right the proposal 0.6227 the suggestions 0.0114 ’s proposal 0.1068 the proposed 0.0114 a proposal 0.0341 the motion 0.0091 – select foreign words to be translated the idea 0.0250 the idea of 0.0091 this proposal 0.0227 the proposal , 0.0068 proposal 0.0205 its proposal 0.0068 of the proposal 0.0159 it 0.0068 the proposals 0.0159 ... ... – p.5 – p.6 Philipp Koehn, University of Edinburgh 5 Philipp Koehn, University of Edinburgh 6

  2. Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde � Build translation left to right � Build translation left to right Mary Mary – select foreign words to be translated – select foreign words to be translated – find English phrase translation – find English phrase translation – add English phrase to end of partial translation – add English phrase to end of partial translation – mark foreign words as translated – p.7 – p.8 Philipp Koehn, University of Edinburgh 7 Philipp Koehn, University of Edinburgh 8 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde � One to many translation Mary did not � Many to one translation Mary did not slap – p.9 – p.10 Philipp Koehn, University of Edinburgh 9 Philipp Koehn, University of Edinburgh 10 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde � Many to one translation Mary did not slap the � Reordering Mary did not slap the green – p.11 – p.12 Philipp Koehn, University of Edinburgh 11 Philipp Koehn, University of Edinburgh 12

  3. Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Translation Options p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to � Look up possible phrase translations the � Translation finished Mary did not slap the green witch slap the witch – many different ways to segment words into phrases – many different ways to translate each phrase – p.13 – p.14 Philipp Koehn, University of Edinburgh 13 Philipp Koehn, University of Edinburgh 14 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Hypothesis Expansion p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green Mary not give a slap to the witch green did not a slap by green witch did not a slap by green witch no slap to the no slap to the did not give to did not give to the the slap the witch slap the witch � Start with empty hypothesis � Pick translation option e: e: e: Mary f: --------- f: --------- f: *-------- � Create hypothesis p: 1 p: 1 p: .534 – e: no English words – f: no foreign words covered – e: add English phrase Mary – f: first foreign word covered – p: probability 1 – p: probability 0.534 – p.15 – p.16 Philipp Koehn, University of Edinburgh 15 Philipp Koehn, University of Edinburgh 16 � Not going into detail here, but... Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p A Quick Word on Probabilities p � Translation Model Hypothesis Expansion p j Maria) Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to – phrase translation probability p(Mary the slap the witch – reordering costs � Language Model e: witch f: -------*- p: .182 – phrase/word count costs � Add another hypothesis e: e: Mary – ... f: --------- f: *-------- j < s > ) * p(did j Mary, < s > ) * p(not j Mary did) p: 1 p: .534 – uses trigrams: – p(Mary did not) = p(Mary – p.17 – p.18 Philipp Koehn, University of Edinburgh 17 Philipp Koehn, University of Edinburgh 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend