Search Aware Tuning for Machine Translation 0 1 2 3 4 Lemao - PowerPoint PPT Presentation

Search Aware Tuning for Machine Translation 0 1 2 3 4 Lemao Liu Liang Huang City University of New York EMNLP 2014. Presented by Taro Watanabe.

Search Aware Tuning for Machine Translation Lemao Liu Liang Huang City University of New York EMNLP 2014. Presented by Taro Watanabe.

Parameter Tuning for MT w 0 1 2 3 4 x decoder eval & update y • most tuning methods view MT decoder as a black box • “search-agnostic” tuning (MERT, MIRA, PRO, ...) • but actually search error is a main reason of bad quality • potentially good sub-translations pruned early in search • final k -best list also lacks diversity Search-Aware Tuning - Liu & Huang (CUNY) 2

Parameter Tuning for MT w 0 1 2 3 4 x decoder eval & update y • most tuning methods view MT decoder as a black box • “search-agnostic” tuning (MERT, MIRA, PRO, ...) • but actually search error is a main reason of bad quality • potentially good sub-translations pruned early in search • final k -best list also lacks diversity cf.: Y-chromosome Adam Mitochondria Eva Search-Aware Tuning - Liu & Huang (CUNY) 2

Search Error in MT Search-Aware Tuning - Liu & Huang (CUNY) 3

Parameter Tuning for MT w 0 1 2 3 4 x decoder eval & update y Search-Aware Tuning - Liu & Huang (CUNY) 4

Parameter Tuning for MT w 0 1 2 3 4 x decoder eval & update y • most tuning methods view MT decoder as a black box • “search-agnostic” tuning (MERT, MIRA, PRO, ...) • but actually search error is a main reason of bad quality • potentially good sub-translations pruned early in search Search-Aware Tuning - Liu & Huang (CUNY) 4

Parameter Tuning for MT w x decoder eval & update y • most tuning methods view MT decoder as a black box • “search-agnostic” tuning (MERT, MIRA, PRO, ...) • but actually search error is a main reason of bad quality • potentially good sub-translations pruned early in search Search-Aware Tuning - Liu & Huang (CUNY) 4

Parameter Tuning for MT w x decoder eval & update y • most tuning methods view MT decoder as a black box • “search-agnostic” tuning (MERT, MIRA, PRO, ...) • but actually search error is a main reason of bad quality • potentially good sub-translations pruned early in search • Q: how to promote these promising sub-derivations? • A: tune the ranking of non-final bins as well as final bin • “search-aware tuning” (SA-MERT, SA-MIRA, SA-PRO, ...) • Q: how to evaluate the “potential” of a sub-derivation? Search-Aware Tuning - Liu & Huang (CUNY) 4

Outline • Motivations • Evaluating Partial Derivations • challenges • method 1: naive partial BLEU • method 2: novel potential BLEU • Search-Aware MERT, MIRA, and PRO • Experiments • consistent +1 BLEU improvement with dense features Search-Aware Tuning - Liu & Huang (CUNY) 5

Challenges in Partial Evaluation 0 1 2 3 4 • challenge 1: there is no “partial” references • challenge 2: in phrase-based MT, partial translations in the same bin may cover different source words Search-Aware Tuning - Liu & Huang (CUNY) 6

Challenges in Partial Evaluation 0 1 2 3 4 • challenge 1: there is no “partial” references • challenge 2: in phrase-based MT, partial translations in the same bin may cover different source words source: 我从上海⻜食到北京 Search-Aware Tuning - Liu & Huang (CUNY) 6

Challenges in Partial Evaluation 0 1 2 3 4 • challenge 1: there is no “partial” references • challenge 2: in phrase-based MT, partial translations in the same bin may cover different source words source: 我从上海⻜食到北京 gloss: I from Shanghai fly to Beijing Search-Aware Tuning - Liu & Huang (CUNY) 6

Challenges in Partial Evaluation 0 1 2 3 4 • challenge 1: there is no “partial” references • challenge 2: in phrase-based MT, partial translations in the same bin may cover different source words source: 我从上海⻜食到北京 gloss: I from Shanghai fly to Beijing reference: I flew from Shanghai to Beijing Search-Aware Tuning - Liu & Huang (CUNY) 6

Challenges in Partial Evaluation 0 1 2 3 4 • challenge 1: there is no “partial” references • challenge 2: in phrase-based MT, partial translations in the same bin may cover different source words source: 我从上海⻜食到北京 gloss: I from Shanghai fly to Beijing reference: I flew from Shanghai to Beijing partial 1: I from Search-Aware Tuning - Liu & Huang (CUNY) 6

Challenges in Partial Evaluation 0 1 2 3 4 • challenge 1: there is no “partial” references • challenge 2: in phrase-based MT, partial translations in the same bin may cover different source words source: 我从上海⻜食到北京 gloss: I from Shanghai fly to Beijing reference: I flew from Shanghai to Beijing partial 1: I from partial 2: I fly Search-Aware Tuning - Liu & Huang (CUNY) 6

Method 1: Naive Partial BLEU • naive solution: just evaluate against the full reference • but using a prorated reference length • proportional to number of source words translated so far • inspired by oracle extraction (Li & Khudanpur 10; Chiang 12) • problem: favoring those translating “easier” words first source: 我从上海⻜食到北京 gloss: I from Shanghai fly to Beijing reference: I flew from Shanghai to Beijing partial 1: I from unigram=2 partial 2: I fly unigram=1 Search-Aware Tuning - Liu & Huang (CUNY) 7

Method 1: Naive Partial BLEU • naive solution: just evaluate against the full reference • but using a prorated reference length • proportional to number of source words translated so far • inspired by oracle extraction (Li & Khudanpur 10; Chiang 12) • problem: favoring those translating “easier” words first source: 我从上海⻜食到北京 gloss: I from Shanghai fly to Beijing reference: I flew from Shanghai to Beijing unigram=1 ✔ ︎ partial 1: I from unigram=2 partial 2: I fly Search-Aware Tuning - Liu & Huang (CUNY) 7

Evaluating the “Potential” • better not evaluate partial translation as is, but its potential • do we want the oracle (best) or average potential? • oracle is too hard to compute, and maybe not that useful • want the “most likely” potential given the current model oracle current start state state worst Search-Aware Tuning - Liu & Huang (CUNY) 8

Evaluating the “Potential” • better not evaluate partial translation as is, but its potential • do we want the oracle (best) or average potential? • oracle is too hard to compute, and maybe not that useful • want the “most likely” potential given the current model oracle current start “most likely” state state potential worst Search-Aware Tuning - Liu & Huang (CUNY) 8

Method 2: Potential BLEU • the “most likely potential” BLEU of a derivation • extend partial derivation to cover uncovered words • using best monotonic translation for uncovered portions • inspired by “future cost” in phrase-based decoding • (inadmissible) A* heuristic computed by DP (Koehn, 2004) source: 我从上海⻜食到北京 x = gloss: I from Shanghai fly to Beijing reordering monotonic e ( d ) future ( d, x ) e x ( d ) = ¯ reference: I flew from Shanghai to Beijing � partial 1: I from partial 2: I fly Search-Aware Tuning - Liu & Huang (CUNY) 9

Method 2: Potential BLEU • the “most likely potential” BLEU of a derivation • extend partial derivation to cover uncovered words • using best monotonic translation for uncovered portions • inspired by “future cost” in phrase-based decoding • (inadmissible) A* heuristic computed by DP (Koehn, 2004) source: 我从上海⻜食到北京 x = gloss: I from Shanghai fly to Beijing reordering monotonic e ( d ) future ( d, x ) e x ( d ) = ¯ reference: I flew from Shanghai to Beijing � Shanghai fly to Beijing partial 1: I from partial 2: I fly Search-Aware Tuning - Liu & Huang (CUNY) 9

Method 2: Potential BLEU • the “most likely potential” BLEU of a derivation • extend partial derivation to cover uncovered words • using best monotonic translation for uncovered portions • inspired by “future cost” in phrase-based decoding • (inadmissible) A* heuristic computed by DP (Koehn, 2004) source: 我从上海⻜食到北京 x = gloss: I from Shanghai fly to Beijing reordering monotonic e ( d ) future ( d, x ) e x ( d ) = ¯ reference: I flew from Shanghai to Beijing � Shanghai fly to Beijing partial 1: I from partial 2: I fly from Shanghai to Beijing Search-Aware Tuning - Liu & Huang (CUNY) 9

Search Aware Tuning for Machine Translation 0 1 2 3 4 Lemao - PowerPoint PPT Presentation

Search Aware Tuning for Machine Translation 0 1 2 3 4 Lemao Liu Liang Huang City University of New York EMNLP 2014. Presented by Taro Watanabe. Search Aware Tuning for Machine Translation Lemao Liu Liang Huang City University

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Global Translation Services Website translation using post-edited machine translation and

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

9/14/16 Overview of the generation and utilization of a proton-motive force Biochemistry

Epistemic Cognition Naples Webinar April 10, 2014 Clark Chinn, Rutgers University

Algorithms in Bioinformatics: A Practical Introduction Phylogenetic Trees Reconstruction

Sponges: Chondroclada lampadglobus Size: up to 50 cm high with inflated spheres 35 cm in

Studying RNA Virus Replication with Cryo-Electron Microscopy on HTC Hong ZHAN 2019 May 20th

Non-unique Games over Compact Groups and Orientation Estimation in Cryo-EM Amit Singer

Mitral Valve Repair versus Replacement for Severe Ischemic Mitral Regurgitation Michael Acker,

How do you know it works if you don't know what it's supposed to do? Patrick Curran JCP Chair