Confidence-based Rewriting of Machine Translation Output Benjamin - PowerPoint PPT Presentation

Confidence-based Rewriting of Machine Translation Output Benjamin Marie 1 , 2 elien Max 1 , 3 Aur´ (3) Universit´ (1) LIMSI-CNRS (2) Lingua et Machina e Paris-Sud

Introduction Rewriter Experiments Analysis Conclusion Introduction ◮ Phrase-Based Statisical Machine Translation (PBSMT) systems use many features during decoding to assess the quality of translation hypotheses ◮ For other features, several difficulties of integration to overcome, e.g. : ◮ need of a complete hypothesis e.g. sentence-level syntactic features ◮ computational cost e.g. Neural Network language models ◮ need of a first decoding e.g. a posteriori confidence models ◮ How to use such features efficiently in PBSMT ? Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 2 / 47

Introduction Rewriter Experiments Analysis Conclusion Reranking of translation hypotheses A solution ◮ rerank the n -best list of the decoder using new, complex features ◮ can achieve good performance with some features (Och et al., 2004; Carter and Monz, 2011; Le et al., 2012; Luong et al., 2014) 2 strong limitations ◮ lack of diversity (Gimpel et al., 2013) ◮ inherit a limited selection of hypotheses made by the decoder Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 3 / 47

Introduction Rewriter Experiments Analysis Conclusion A rewriting system Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 4 / 47

Introduction Rewriter Experiments Analysis Conclusion A rewriter to extend the exploration ◮ idea: search for new promising hypotheses not in the n -best list operations rewriting phrase replace return table merge 1-best split 1-best == seed 1 hypothesis 2 hypothesis 1-best != seed 3 hypothesis seed 4 hypothesis i hypothesis generate neighborhood rank neighborhood seed 1-best Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 5 / 47

Introduction Rewriter Experiments Analysis Conclusion The seed: an hypothesis to rewrite seed Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 6 / 47

Introduction Rewriter Experiments Analysis Conclusion A rewriting phrase table rewriting phrase table seed Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 7 / 47

Introduction Rewriter Experiments Analysis Conclusion A set of rewriting operations operations rewriting phrase replace table merge split seed Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 8 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation operations rewriting phrase replace table merge split seed generate neighborhood Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 9 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : replace il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 10 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : replace il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 11 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : replace il a refusé le test immédiatement. he has refused a test now . he has refused a test now . he refused a test now . he had refused a test now . it has refused a test now . it refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 12 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : split il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 13 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : split il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 14 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : split il a refusé le test immédiatement. he has refused a test now . he has refused a test now . he is refused a test now . he had refused a test now . it has refused a test now . it have refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 15 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : merge il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 16 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : merge il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 17 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : merge il a refusé le test immédiatement. he has refused a test now . he has refused a test now . he refused a test now . he rejected a test now . he has just refused a test now . he has a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 18 / 47

Introduction Rewriter Experiments Analysis Conclusion Rewriting phrase table Building the rewriting table ◮ Method 1 : take the i best translations according to p(e | f) ◮ Method 2 : take the bi-phrases appearing in the decoder k -best list Method 1 ◮ produces very large neighborhoods ◮ not suitable for costly features Method 2 ◮ produces very small and adapted rewriting phrase table for each sentence ◮ keeps only bi-phrases for which the decoder was the most confident Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 19 / 47

Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation operations rewriting phrase replace table merge split seed generate neighborhood Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 20 / 47

Introduction Rewriter Experiments Analysis Conclusion Ranking of the neighborhood operations rewriting phrase replace table merge split 1 hypothesis 2 hypothesis 3 hypothesis seed 4 hypothesis i hypothesis generate neighborhood rank neighborhood Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 21 / 47

Introduction Rewriter Experiments Analysis Conclusion Ranking of the neighborhood Objective ◮ rank (manageable) neighborhoods using complex features Training the reranker: 2 kinds of examples ◮ n -best produced by the decoder ◮ neighborhoods produced by one iteration of rewriter Training algorithm ◮ kb-mira (Cherry and Foster, 2012) Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 22 / 47

Introduction Rewriter Experiments Analysis Conclusion Ranking of the neighborhood operations rewriting phrase replace table merge split 1 hypothesis 2 hypothesis 3 hypothesis seed 4 hypothesis i hypothesis generate neighborhood rank neighborhood Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 23 / 47

Introduction Rewriter Experiments Analysis Conclusion Greedy search operations rewriting phrase replace return table merge 1-best split 1-best == seed 1 hypothesis 2 hypothesis 3 hypothesis seed 4 hypothesis i hypothesis generate neighborhood rank neighborhood Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 24 / 47

Introduction Rewriter Experiments Analysis Conclusion Greedy search operations rewriting phrase replace return table merge 1-best split 1-best == seed 1 hypothesis 2 hypothesis 1-best != seed 3 hypothesis seed 4 hypothesis i hypothesis generate neighborhood rank neighborhood seed 1-best Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 25 / 47

Introduction Rewriter Experiments Analysis Conclusion Greedy search ◮ greedy search algorithm for PBSMT (Langlais et al., 2007) ◮ choose at each iteration the best rewriting / operation according to the (new) scoring function Source il a refus´ e le test imm´ ediatement . Reference he refused the test straight away . il a 1 refus´ e 2 le test 3 imm´ ediatement . 4 seed ↓ he has 1 refused 2 a test 3 now . 4 Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 26 / 47

Introduction Rewriter Experiments Analysis Conclusion Greedy search ◮ greedy search algorithm for PBSMT (Langlais et al., 2007) ◮ choose at each iteration the best rewriting / operation according to the (new) scoring function Source il a refus´ e le test imm´ ediatement . Reference he refused the test straight away . il a 1 refus´ e 2 le test 3 imm´ ediatement . 4 seed ↓ he has 1 refused 2 a test 3 now . 4 il a refus´ e 1 le test 2 imm´ ediatement . 3 merge iteration 1 he refused 1 a test 2 now . 3 Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 26 / 47

Confidence-based Rewriting of Machine Translation Output Benjamin - PowerPoint PPT Presentation

Confidence-based Rewriting of Machine Translation Output Benjamin Marie 1 , 2 elien Max 1 , 3 Aur (3) Universit (1) LIMSI-CNRS (2) Lingua et Machina e Paris-Sud Introduction Rewriter Experiments Analysis Conclusion Introduction

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

THE LISTING PRESENTATION A Natural Close! CONFIDENCE CONFIDENCE CONFIDENCE CONFIDENCE Hi

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

On Data-Structure Rewriting Rachid Echahed LIG Lab, Grenoble France June, 2010 Rewriting

Rewriting Part 4. Termination of Term Rewriting Systems Temur Kutsia RISC, JKU Linz Termination

Selected Policy Issues That Keep Me Up at Night Susan N. Kelly President and CEO American

MEETINGS IMAGINED Belen Guzman Catering Sales Manager Tel: 07768 80 80 89 Email:

Expanded Session Descriptions Fall Orientation Schedule Thursday, August 15 Written & Oral

THE INS, OUTS, OVERS, AND UNDERS OF THE NEW GLOBAL NETTING RULES

Click to edit Master title style Click to edit Master title style Click to edit Master title

LCCMR ID: 173-F3+4 Project Title: Midway Organic Power Project Category: F3+4. Renewable

Heroin at the Corner Store Rotary Club of Vancouver January 23 rd , 2017 Illicit Overdose Deaths

Poster Design Workshop INNOVATIONSKONTORET On the Agenda 09.00 Welcome Communica2on

Confidence-based Rewriting of Machine Translation Output Benjamin - PowerPoint PPT Presentation

Confidence-based Rewriting of Machine Translation Output Benjamin Marie 1 , 2 elien Max 1 , 3 Aur (3) Universit (1) LIMSI-CNRS (2) Lingua et Machina e Paris-Sud Introduction Rewriter Experiments Analysis Conclusion Introduction

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

THE LISTING PRESENTATION A Natural Close! CONFIDENCE CONFIDENCE CONFIDENCE CONFIDENCE Hi

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

On Data-Structure Rewriting Rachid Echahed LIG Lab, Grenoble France June, 2010 Rewriting

Rewriting Part 4. Termination of Term Rewriting Systems Temur Kutsia RISC, JKU Linz Termination

Selected Policy Issues That Keep Me Up at Night Susan N. Kelly President and CEO American

MEETINGS IMAGINED Belen Guzman Catering Sales Manager Tel: 07768 80 80 89 Email:

Expanded Session Descriptions Fall Orientation Schedule Thursday, August 15 Written &amp; Oral

THE INS, OUTS, OVERS, AND UNDERS OF THE NEW GLOBAL NETTING RULES

Click to edit Master title style Click to edit Master title style Click to edit Master title

LCCMR ID: 173-F3+4 Project Title: Midway Organic Power Project Category: F3+4. Renewable

Heroin at the Corner Store Rotary Club of Vancouver January 23 rd , 2017 Illicit Overdose Deaths

Poster Design Workshop INNOVATIONSKONTORET On the Agenda 09.00 Welcome Communica2on

Expanded Session Descriptions Fall Orientation Schedule Thursday, August 15 Written & Oral