Extended Translation Models in Phrase-based Decoding Andreas Guta, - PowerPoint PPT Presentation

Extended Translation Models in Phrase-based Decoding Andreas Guta, Joern Wuebker, Miguel Graça, Yunsu Kim and Hermann Ney surname@cs.rwth-aachen.de Tenth Workshop on Statistical Machine Translation (WMT) Lisbon, Portugal 18.09.2015 Human Language Technology and Pattern Recognition Chair of Computer Science 6 Computer Science Department RWTH Aachen University, Germany Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 1 / 17

Introduction Phrase-based translation models [Och & Tillmann + 99, Zens & Och + 02, Koehn & Och + 03] ◮ phrases extracted from alignments obtained using GIZA ++ [Och & Ney 03] ◮ estimation as relative frequencies of phrase pairs ◮ drawbacks: ⊲ single-word phrases translated without any context ⊲ uncaptured dependencies beyond phrase boundaries ⊲ difficulties with long-range reorderings Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 2 / 17

Related Work ◮ bilingual language models [Niehues & Herrmann + 11] ⊲ atomic source phrases, no reordering context ◮ reordering model based on sequence labeling [Feng & Peter + 13] ⊲ modeling only reorderings ◮ operation sequence model (OSM) [Durrani & Fraser + 13] ⊲ n -gram model based on minimal translation units ◮ neural network models for extended translation context ⊲ rescoring [Le & Allauzen + 12, Sundermeyer & Alkhouli + 14] ⊲ decoding [Devlin & Zbib + 14, Auli & Gao 14, Alkhouli & Rietig + 15] ⊲ stand-alone models [Sutskever & Vinyals + 14, Bahdanau & Cho + 15] ◮ joint translation and reordering models [Guta & Alkhouli + 15] ⊲ word-based and simpler reordering approach than OSM ⊲ count models and neural networks (NNs) Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 3 / 17

This Work ◮ develop two variants of extended translation models (ETM) ⊲ extend IBM models by a bilingual word pair and a reordering operation ⊲ integrated into log-linear framework of phrase-based decoding ⊲ explicit treatment of multiple alignments and unaligned words ◮ benefits: ⊲ lexical and reordering context for single-word phrases ⊲ dependencies across phrase boundaries ⊲ long-range source dependencies ◮ first step: implementation as smoothed count models ◮ the long-term goal: ⊲ application as stand-alone models in decoding ⊲ retraining the word alignments Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 4 / 17

Extended Translation Models ◮ source sentence f J 1 = f 1 . . . f j . . . f J ◮ target sentence e I 1 = e 1 . . . e i . . . e I ◮ inverted alignment b I 1 with b i ⊆ { 1 . . . J } ⊲ unaligned source positions b 0 ◮ empty words f 0 , e 0 Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 5 / 17

Jump Classes ◮ generalizing alignments to ⊲ jump classes for source positions aligned to subsequent target positions insert ( ↓ ) stay ( • ) forward ( → ) jump forward ( � ) backward ( ← ) jump backward ( � ) ⊲ jump classes source positions aligned to the same target position forward ( → ) jump forward ( � ) Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 6 / 17

Extended Inverse Translation Model (EiTM) ◮ EiTM models the inverse probability p ( f J 1 | e I 1 ) � I � � � � p ( f J 1 | e I 1 ) = max p ( f b i | e i ′ , e i , f b i ′ , b i ′ , b i ) · p ( b i | e i ′ , e i , f b i ′ , b i ′ ) · p ( f b 0 | e 0 ) b I � �� 1 i =1 deletion model lexicon model alignment model ◮ current source words f b i and target word e i ◮ previous source words f b i ′ and target word e i ′ ◮ generalize aligments b i ′ , b i to jump classes ◮ multiple source predecessors j ′ in b i ′ or b i ⊲ average probabilities over all j ′ Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 7 / 17

EiTM Example Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 8 / 17

Extended Direct Translation Model (EdTM) ◮ further aim: model p ( e I 1 | f J 1 ) as well ◮ first approach by using the EiTM: ⊲ swap source and target corpora ⊲ invert also the alignment ◮ drawback: ⊲ source words not translated in monotone order ⊲ source word preceding a phrase might have not been translated yet ⊲ its last aligned predecessor and corresponding aligned target words gen- erally unknown ◮ dependencies beyond phrase boundaries cannot be captured ◮ develop the EdTM ⊲ swap source and target corpora, but keep b I 1 ⊲ incorporate dependencies beyond phrase boundaries Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 9 / 17

Extended Direct Translation Model (EdTM) ◮ EdTM models the direct probability p ( e I 1 | f J 1 ) � I � � � � p ( e I 1 | f J 1 ) = max p ( e i | f b i ′ , f b i , e i ′ , b i ′ , b i ) · p ( b i | f b i ′ , f b i , e i ′ , b i ′ ) · p ( e 0 | f b 0 ) b I � �� 1 i =1 deletion model lexicon model alignment model ◮ differences to EiTM ⊲ lexicon model: swapped e i and f b i ⊲ alignment model: dependence on f b i (instead of e i ) ⊲ deletion model: swapped e 0 and f b 0 Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 10 / 17

Count Models and Smoothing How to train the derived EdTM and EiTM models? ◮ estimate Viterbi alignment using GIZA ++ [Och & Ney 03] ◮ compute relative frequencies ◮ apply interpolated Kneser-Ney smoothing [Chen & Goodman 98] Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 11 / 17

Integration into Phrase-based Decoding ◮ phrase-based decoder Jane 2 [Wuebker & Huck + 12] ◮ log-linear model combination [Och & Ney 04] ⊲ tuning with minimum error rate training (MERT) [Och 03] ◮ annotation of phrase-table entries with word alignments ◮ extended translation models integrated as up to 4 additional features: ⊲ EdTM and EiTM ⊲ Source → Target and Target → Source ◮ search state extension: ⊲ store the source position aligned to the last translated target word ◮ context beyond phrase boundaries only in Source → Target direction Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 12 / 17

Experimental Setups IWSLT IWSLT BOLT BOLT German English English French Chinese English Arabic English Sentences full data 4.32M 26.05M 4.08M 0.92M indomain 138K 185K 67.8K 0.92M Run. Words 108M 109M 698M 810M 78M 86M 14M 16M Vocabulary 836K 792K 2119K 2139K 384K 817K 285K 203K ◮ phrase-based systems ⊲ phrasal and lexical models (both directions) ⊲ word and phrase penalties ⊲ distortion model ⊲ 4 - / 5 -gram language model (LM) ⊲ 7 -gram word class LM [Wuebker & Peitz + 13] ⊲ hierarchical reordering model (HRM) [Galley & Manning 08] Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 13 / 17

Results: IWSLT 2014 German → English test2010 B LEU [%] T ER [%] phrase-based system + HRM 30.7 49.3 + EiTM (Source ↔ Target) 31.4 48.3 + EdTM (Source ↔ Target) 31.6 48.1 + EiTM (Source → Target) + EdTM (Source → Target) 31.6 48.2 + EiTM (Source ↔ Target) + EdTM (Source ↔ Target) 31.8 48.2 Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 14 / 17

Results: Comparison to OSM ◮ all results measured in B LEU [%] IWSLT BOLT De → En En → Fr Zh → En Ar → En phrase-based system + HRM 30.7 33.1 17.0 24.0 + ETM 31.8 33.9 17.5 24.4 + 7 -gram OSM 31.8 34.5 17.6 24.1 Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 15 / 17

Conclusion ◮ integration of extended translation models into phrase-based decoding ⊲ lexical and reordering context beyond phrase boundaries ⊲ multiple and empty alignments ⊲ relative frequencies with interpolated Kneser-Ney smoothing ◮ improving phrase-based systems including HRM ⊲ by up to 1.1% B LEU and T ER ⊲ by 0.7% B LEU on average for four large-scale tasks ◮ competitive to a 7 -gram OSM ⊲ 0.1% B LEU less improvement on average on top of phrase-based systems including the HRM ◮ long-term goals: ⊲ retraining the alignments: joint optimization ⊲ stand-alone decoding without phrases Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 16 / 17

Thank you for your attention Andreas Guta surname@cs.rwth-aachen.de http://www-i6.informatik.rwth-aachen.de/ Guta et al.: Extended Translation Models in Phrase-based Decoding WMT 2015: 18.09.2015 17 / 17

Extended Translation Models in Phrase-based Decoding Andreas Guta, - PowerPoint PPT Presentation

Extended Translation Models in Phrase-based Decoding Andreas Guta, Joern Wuebker, Miguel Graa, Yunsu Kim and Hermann Ney surname@cs.rwth-aachen.de Tenth Workshop on Statistical Machine Translation (WMT) Lisbon, Portugal 18.09.2015 Human

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation:

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Syntax-Based Decoding 2 Philipp Koehn 14 November 2017 Philipp Koehn Machine Translation:

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

Translation Model Parallel corpus source target translation e f phrase phrase features

Investigations on Phrase-based Decoding with Recurrent Neural Network Language and Translation

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

lti Introduction Two trends in machine translation research Many approaches to decoding

Winter School Day 3: Decoding / Phrase-based models MT Marathon 28 January 2009 MT Marathon

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Text generation: decoding / evaluation CS 685, Fall 2020 Advanced Natural Language Processing

Seagrass Seagrass canopy density Product description Spatial maps of seagrass canopy

Do-support in the parsed EME corpora: beyond Ellegrd () Aaron Ecay University of

Kimberley Indigenous Saltwater Science Project (KISSP) Dean Mathews, Yawuru Project Timeline

Machine Translation Evaluation (Based on Milo s Stanojevi cs slides) Iacer Calixto

Hybrid Adaptation of Named Entity Recognition for Statistical Machine Translation Vassilina

A Greedy Decoder for Phrase-Based Statistical Machine Translation Philippe Langlais, Alexandre

Capturing Translational Divergences with Zhechev & Andy Way a Statistical Tree-to-Tree

Extended Translation Models in Phrase-based Decoding Andreas Guta, - PowerPoint PPT Presentation

Extended Translation Models in Phrase-based Decoding Andreas Guta, Joern Wuebker, Miguel Graa, Yunsu Kim and Hermann Ney surname@cs.rwth-aachen.de Tenth Workshop on Statistical Machine Translation (WMT) Lisbon, Portugal 18.09.2015 Human

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation:

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Syntax-Based Decoding 2 Philipp Koehn 14 November 2017 Philipp Koehn Machine Translation:

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Building a Phrase-based SMT System Graham Neubig &amp; Kevin Duh Nara Institute of Science and

Translation Model Parallel corpus source target translation e f phrase phrase features

Investigations on Phrase-based Decoding with Recurrent Neural Network Language and Translation

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

lti Introduction Two trends in machine translation research Many approaches to decoding

Winter School Day 3: Decoding / Phrase-based models MT Marathon 28 January 2009 MT Marathon

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Text generation: decoding / evaluation CS 685, Fall 2020 Advanced Natural Language Processing

Seagrass Seagrass canopy density Product description Spatial maps of seagrass canopy

Do-support in the parsed EME corpora: beyond Ellegrd () Aaron Ecay University of

Kimberley Indigenous Saltwater Science Project (KISSP) Dean Mathews, Yawuru Project Timeline

Machine Translation Evaluation (Based on Milo s Stanojevi cs slides) Iacer Calixto

Hybrid Adaptation of Named Entity Recognition for Statistical Machine Translation Vassilina

A Greedy Decoder for Phrase-Based Statistical Machine Translation Philippe Langlais, Alexandre

Capturing Translational Divergences with Zhechev &amp; Andy Way a Statistical Tree-to-Tree

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

Capturing Translational Divergences with Zhechev & Andy Way a Statistical Tree-to-Tree