The Effect of Translationese in Machine Translation Test Sets - PowerPoint PPT Presentation

The Effect of Translationese in Machine Translation Test Sets WMT19, Florence, 2nd of August 2019 Mike Zhang Antonio Toral Information Science Programme CLCG University of Groningen University of Groningen The Netherlands The Netherlands j.j.zhang.1@student.rug.nl a.toral.ruiz@rug.nl

Overview 1. What is translationese? 2. Translationese in MT data sets 3. Research Questions 4. Conclusions & Future work 1

What is translationese?

Translationese Translated text ( translationese ) � = original text 2

Translationese Translated text ( translationese ) � = original text • The differences do not indicate poor translation but rather a statistical phenomenon (Gellerstam, 1986) • Simpler, more homogeneous, more explicit, interference from source language, aka translation universals (Baker, 1993) 2

Translationese in MT data sets

Translationese in MT data sets What is the effect of translationese on MT? • Mainly studied wrt training data (Kurokawa et al., 2009; Lembersky, 2013) 3

Translationese in MT data sets What is the effect of translationese on MT? • Mainly studied wrt training data (Kurokawa et al., 2009; Lembersky, 2013) • ( Source original , Target translationese ) > ( Source translationese , Target original ) 3

Translationese in MT data sets What is the effect of translationese on MT? • Mainly studied wrt training data (Kurokawa et al., 2009; Lembersky, 2013) • ( Source original , Target translationese ) > ( Source translationese , Target original ) • Also wrt dev data, in SMT (Stymne, 2017) 3

Translationese in MT data sets What is the effect of translationese on MT? • Mainly studied wrt training data (Kurokawa et al., 2009; Lembersky, 2013) • ( Source original , Target translationese ) > ( Source translationese , Target original ) • Also wrt dev data, in SMT (Stymne, 2017) • Using tuning texts translated in the same original direction as the MT system tended to give a better score 3

Translationese in MT data sets What is the effect of translationese on MT? • Mainly studied wrt training data (Kurokawa et al., 2009; Lembersky, 2013) • ( Source original , Target translationese ) > ( Source translationese , Target original ) • Also wrt dev data, in SMT (Stymne, 2017) • Using tuning texts translated in the same original direction as the MT system tended to give a better score • What about test data? 3

Translationese in Test • Toral et al. (2018): translationese input favours MT systems, on Hassan et al. (2018) 4

Translationese in Test • Toral et al. (2018): translationese input favours MT systems, on Hassan et al. (2018) Source (ZH) Reference (EN) ZH ZH EN ZH ORG WMT ZH EN TRS EN EN 4

Translationese in Test • Toral et al. (2018): translationese input favours MT systems, on Hassan et al. (2018) ● Source (ZH) Reference (EN) ● 0.70 ● ZH ZH EN ZH ORG Score (range [0,1]) ● SystemID WMT 0.65 ● HT ZH EN TRS EN EN ● ● MS ● GG 0.60 ● 0.55 zh en Original language of the source sentence 4

Translationese in Test • Toral et al. (2018): translationese input favours MT systems, on Hassan et al. (2018) 5

Translationese in Test • Toral et al. (2018): translationese input favours MT systems, on Hassan et al. (2018) • L¨ aubli et al. (2018) in similar fashion, show stronger preference for human translations over MT when evaluating documents compared to isolated sentences, on Hassan et al. (2018) 5

Translationese in Test • Toral et al. (2018): translationese input favours MT systems, on Hassan et al. (2018) • L¨ aubli et al. (2018) in similar fashion, show stronger preference for human translations over MT when evaluating documents compared to isolated sentences, on Hassan et al. (2018) • Taking the two works above, Graham et al. (2019) found evidence that translationese compared to original text can potentially negatively impact the accuracy of machine translation evaluations 5

Research Questions

Research Question(s) 1. Does the use of translationese in the source side of MT test sets unfairly favour MT systems? 6

Research Question(s) 1. Does the use of translationese in the source side of MT test sets unfairly favour MT systems? 2. If the answer to RQ1 is yes, does this effect of translationese have an impact on WMT’s system rankings? 6

Research Question(s) 1. Does the use of translationese in the source side of MT test sets unfairly favour MT systems? 2. If the answer to RQ1 is yes, does this effect of translationese have an impact on WMT’s system rankings? 3. If the answer to RQ1 is yes, would some language pairs be more affected than others? 6

This study • Dataset : WMT16, WMT17, and WMT18 → 17 translation directions, 10 unique languages (Bojar et al., 2016, 2017, 2018). • Human evaluation : Direct Assessment (DA), by bilingual crowd workers and participants (Graham et al., 2013, 2014, 2017). Source (ZH) Reference (EN) ZH ZH EN ZH ORG WMT ZH EN TRS EN EN 7

RQ1: Does Translationese Affect Human Evaluation Scores?

RQ1: favouritism for translationese, WMT16 WMT16 6 3 • Score difference in DA, ORG = original Score difference (DA) input, TRS = translationese input Subset • Consistent trend over all language pairs 0 TRS ORG −3 −6 deen csen fien ruen tren roen 8 Language Pair

WMT17 WMT17 10 • Similar trend, TRS = inflation of scores, 5 Score difference (DA) ORG = deflation of scores. Subset 0 TRS ORG −5 −10 ende deen entr enlv encs enru enfi enzh csen tren zhen fien lven ruen 9 Language Pair

WMT18 WMT18 5 • Again, same trend over all Score difference (DA) language pairs Subset • Does translationese unfairly favour TRS 0 ORG MT systems? • Yes! −5 deen ende enfi enru encs entr eten enet tren enzh fien zhen csen ruen 10 Language Pair

RQ2: Do Systems’ Rankings Change?

RQ2: impact on WMT’s system rankings? (e.g. ZH → EN) 11

RQ2: impact on WMT’s system rankings? (e.g. ZH → EN) 12

RQ2: impact on WMT’s system rankings? (e.g. ZH → EN) • Clusters change: WMT(1,4,7,8,11,12) → ORG(1,6,7,12) → TRS(1,3,5,12,14) 12

Another example (RU → EN) 13

Another example (RU → EN) 14

Another example (RU → EN) • Clusters change: WMT(1,5,10) → ORG(1,10) → TRS(1,5,8,10) 14

Another example (RU → EN) • Clusters change: WMT(1,5,10) → ORG(1,10) → TRS(1,5,8,10) • So would there be ranking changes? 14

Another example (RU → EN) • Clusters change: WMT(1,5,10) → ORG(1,10) → TRS(1,5,8,10) • So would there be ranking changes? • Yes, and clusters too! 14

Another example (RU → EN) • Clusters change: WMT(1,5,10) → ORG(1,10) → TRS(1,5,8,10) • So would there be ranking changes? • Yes, and clusters too! • However, half data 14

RQ3: Are Some Languages More Affected?

Research Question 3: is there a trend? LS vs. relative difference Relative difference between original input and source input enfi ● R = − 0.15 , p = 0.61 enru ● encs ● 10 • Language similarity (lang2vec (Littell et al., 2017)) vs. relative difference between WMT enet entr ● ● eten input and ORG input ● enzh deen 5 ● ● tren ● • Low correlation fien ● csen ende ● ● zhen ● ruen 0 ● 0.2 0.4 0.6 Similarity of the language pair using URIEL and lang2vec 15

Research Question 3: is there a trend? Best system vs. relative difference Relative difference between WMT input and original input enfi ● R = − 0.84 , p = 0.00019 enru ● encs ● 10 • Highest scoring system (with only ORG input) vs. relative difference enetentr ● ● between WMT input and ORG input eten ● enzh deen 5 ● ● tren • High correlation! ● fien ● csen • High differences could be due to under- ende ● zhen ● ● resourced languages ruen 0 ● 60 65 70 75 80 Score of the best system with original input 16

Conclusions & Future work

Conclusion • Translationese : if present, it inflates DA scores. If removed, it lowers DA scores. 17

Conclusion • Translationese : if present, it inflates DA scores. If removed, it lowers DA scores. • Translation quality : 17

Conclusion • Translationese : if present, it inflates DA scores. If removed, it lowers DA scores. • Translation quality : • Correlation between the effect of translationese and the translation quality attainable for translation directions. 17

Conclusion • Translationese : if present, it inflates DA scores. If removed, it lowers DA scores. • Translation quality : • Correlation between the effect of translationese and the translation quality attainable for translation directions. • The effect of translationese tends to be high when an under-resourced language is present. 17

The Effect of Translationese in Machine Translation Test Sets - PowerPoint PPT Presentation

The Effect of Translationese in Machine Translation Test Sets WMT19, Florence, 2nd of August 2019 Mike Zhang Antonio Toral Information Science Programme CLCG University of Groningen University of Groningen The Netherlands The Netherlands

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Global Translation Services Website translation using post-edited machine translation and

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

Florence County Planning Commission August 27, 2019 Florence County Council Chambers 6:00 PM 1

Bldg A January 5 th Flood

Candidat didates s Toda day Hirin ring g Toda day Unemployment at a 7 year low Model

Trumpet Study.notebook April 27, 2011 The Study of a trumpet What does a Trumpet sound like?

Study Abroad as a Mul/faceted Approach to Suppor/ng College

2020 Florida Equity Report Florida Equity Report (Data Years 2018-2019) Equity Policies

Florida COVID-19 Cases COVID-19 Cases Hillsborough COVID-19 Cases Orange COVID-19 Cases

Proposed Capital Project 2019 Capital Improvement Project Board of Education Meeting January 16

Sambuz

Useful Links

Newsletter

Mail Us