confidence based rewriting of machine translation output
play

Confidence-based Rewriting of Machine Translation Output Benjamin - PowerPoint PPT Presentation

Confidence-based Rewriting of Machine Translation Output Benjamin Marie 1 , 2 elien Max 1 , 3 Aur (3) Universit (1) LIMSI-CNRS (2) Lingua et Machina e Paris-Sud Introduction Rewriter Experiments Analysis Conclusion Introduction


  1. Confidence-based Rewriting of Machine Translation Output Benjamin Marie 1 , 2 elien Max 1 , 3 Aur´ (3) Universit´ (1) LIMSI-CNRS (2) Lingua et Machina e Paris-Sud

  2. Introduction Rewriter Experiments Analysis Conclusion Introduction ◮ Phrase-Based Statisical Machine Translation (PBSMT) systems use many features during decoding to assess the quality of translation hypotheses ◮ For other features, several difficulties of integration to overcome, e.g. : ◮ need of a complete hypothesis e.g. sentence-level syntactic features ◮ computational cost e.g. Neural Network language models ◮ need of a first decoding e.g. a posteriori confidence models ◮ How to use such features efficiently in PBSMT ? Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 2 / 47

  3. Introduction Rewriter Experiments Analysis Conclusion Reranking of translation hypotheses A solution ◮ rerank the n -best list of the decoder using new, complex features ◮ can achieve good performance with some features (Och et al., 2004; Carter and Monz, 2011; Le et al., 2012; Luong et al., 2014) 2 strong limitations ◮ lack of diversity (Gimpel et al., 2013) ◮ inherit a limited selection of hypotheses made by the decoder Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 3 / 47

  4. Introduction Rewriter Experiments Analysis Conclusion A rewriting system Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 4 / 47

  5. Introduction Rewriter Experiments Analysis Conclusion A rewriter to extend the exploration ◮ idea: search for new promising hypotheses not in the n -best list operations rewriting phrase replace return table merge 1-best split 1-best == seed 1 hypothesis 2 hypothesis 1-best != seed 3 hypothesis seed 4 hypothesis i hypothesis generate neighborhood rank neighborhood seed 1-best Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 5 / 47

  6. Introduction Rewriter Experiments Analysis Conclusion The seed: an hypothesis to rewrite seed Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 6 / 47

  7. Introduction Rewriter Experiments Analysis Conclusion A rewriting phrase table rewriting phrase table seed Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 7 / 47

  8. Introduction Rewriter Experiments Analysis Conclusion A set of rewriting operations operations rewriting phrase replace table merge split seed Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 8 / 47

  9. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation operations rewriting phrase replace table merge split seed generate neighborhood Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 9 / 47

  10. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : replace il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 10 / 47

  11. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : replace il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 11 / 47

  12. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : replace il a refusé le test immédiatement. he has refused a test now . he has refused a test now . he refused a test now . he had refused a test now . it has refused a test now . it refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 12 / 47

  13. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : split il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 13 / 47

  14. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : split il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 14 / 47

  15. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : split il a refusé le test immédiatement. he has refused a test now . he has refused a test now . he is refused a test now . he had refused a test now . it has refused a test now . it have refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 15 / 47

  16. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : merge il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 16 / 47

  17. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : merge il a refusé le test immédiatement. he has refused a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 17 / 47

  18. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation : merge il a refusé le test immédiatement. he has refused a test now . he has refused a test now . he refused a test now . he rejected a test now . he has just refused a test now . he has a test now . Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 18 / 47

  19. Introduction Rewriter Experiments Analysis Conclusion Rewriting phrase table Building the rewriting table ◮ Method 1 : take the i best translations according to p(e | f) ◮ Method 2 : take the bi-phrases appearing in the decoder k -best list Method 1 ◮ produces very large neighborhoods ◮ not suitable for costly features Method 2 ◮ produces very small and adapted rewriting phrase table for each sentence ◮ keeps only bi-phrases for which the decoder was the most confident Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 19 / 47

  20. Introduction Rewriter Experiments Analysis Conclusion Neighborhood generation operations rewriting phrase replace table merge split seed generate neighborhood Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 20 / 47

  21. Introduction Rewriter Experiments Analysis Conclusion Ranking of the neighborhood operations rewriting phrase replace table merge split 1 hypothesis 2 hypothesis 3 hypothesis seed 4 hypothesis i hypothesis generate neighborhood rank neighborhood Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 21 / 47

  22. Introduction Rewriter Experiments Analysis Conclusion Ranking of the neighborhood Objective ◮ rank (manageable) neighborhoods using complex features Training the reranker: 2 kinds of examples ◮ n -best produced by the decoder ◮ neighborhoods produced by one iteration of rewriter Training algorithm ◮ kb-mira (Cherry and Foster, 2012) Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 22 / 47

  23. Introduction Rewriter Experiments Analysis Conclusion Ranking of the neighborhood operations rewriting phrase replace table merge split 1 hypothesis 2 hypothesis 3 hypothesis seed 4 hypothesis i hypothesis generate neighborhood rank neighborhood Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 23 / 47

  24. Introduction Rewriter Experiments Analysis Conclusion Greedy search operations rewriting phrase replace return table merge 1-best split 1-best == seed 1 hypothesis 2 hypothesis 3 hypothesis seed 4 hypothesis i hypothesis generate neighborhood rank neighborhood Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 24 / 47

  25. Introduction Rewriter Experiments Analysis Conclusion Greedy search operations rewriting phrase replace return table merge 1-best split 1-best == seed 1 hypothesis 2 hypothesis 1-best != seed 3 hypothesis seed 4 hypothesis i hypothesis generate neighborhood rank neighborhood seed 1-best Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 25 / 47

  26. Introduction Rewriter Experiments Analysis Conclusion Greedy search ◮ greedy search algorithm for PBSMT (Langlais et al., 2007) ◮ choose at each iteration the best rewriting / operation according to the (new) scoring function Source il a refus´ e le test imm´ ediatement . Reference he refused the test straight away . il a 1 refus´ e 2 le test 3 imm´ ediatement . 4 seed ↓ he has 1 refused 2 a test 3 now . 4 Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 26 / 47

  27. Introduction Rewriter Experiments Analysis Conclusion Greedy search ◮ greedy search algorithm for PBSMT (Langlais et al., 2007) ◮ choose at each iteration the best rewriting / operation according to the (new) scoring function Source il a refus´ e le test imm´ ediatement . Reference he refused the test straight away . il a 1 refus´ e 2 le test 3 imm´ ediatement . 4 seed ↓ he has 1 refused 2 a test 3 now . 4 il a refus´ e 1 le test 2 imm´ ediatement . 3 merge iteration 1 he refused 1 a test 2 now . 3 Benjamin MARIE (LIMSI-CNRS) Confidence-based Rewriting of MT output 10/2014 26 / 47

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend