Translation Quality Estimation: Past, Present, and Future Andr e - PowerPoint PPT Presentation

Translation Quality Estimation: Past, Present, and Future Andr´ e Martins MT Marathon, Lisbon, August 31st, 2017 Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 1 / 69

This Talk First part: largely based on Lucia Specia’s MTM16 slides Second part: joint work with Marcin, Fabio, Ramon, Chris, Roman Third part: my thoughts on the future of QE Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 2 / 69

Outline 1 MT Evaluation & Quality Estimation 2 Pushing the Limits of Quality Estimation 3 The Future Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 3 / 69

Why Do We Care About Evaluation? In the business of developing MT , we need to: measure progress over new/alternative versions compare different MT systems decide whether a translation is good enough for something optimize parameters of MT systems understand where systems go wrong (diagnosis) ... remember Yvette’s lecture on Monday: Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 4 / 69

Why Do We Care About Evaluation? One should optimize a system using the same metric that will be used to evaluate it Issue : how to choose a metric? Choice should be related to the system’s purpose (not the case in practice) Other aspects are important for tuning (sentence/corpus-level, fast, cheap, differentiable, ...) Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 5 / 69

Complex Problem What does quality mean? Fluent? Adequate? Both? Easy to post-edit? System A better than system B? ... Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 6 / 69

Complex Problem What does quality mean? Fluent? Adequate? Both? Easy to post-edit? System A better than system B? ... Quality for whom / what ? End-user (gisting vs dissemination) Post-editor (light vs heavy post-editing) Other applications (e.g. CLIR) MT-system (tuning or diagnosis for improvement) ... Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 6 / 69

Complex Problem MT Do buy this product, it’s their craziest invention! Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 7 / 69

Complex Problem MT Do buy this product, it’s their craziest invention! HT Do not buy this product, it’s their craziest invention! Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 7 / 69

Complex Problem MT Do buy this product, it’s their craziest invention! HT Do not buy this product, it’s their craziest invention! Severe if end-user does not speak source language Trivial to post-edit by translators Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 7 / 69

Complex Problem MT Six-hours battery, 30 minutes to full charge last . Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 8 / 69

Complex Problem MT Six-hours battery, 30 minutes to full charge last . HT The battery lasts 6 hours and it can be fully recharged in 30 minutes . Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 8 / 69

Complex Problem MT Six-hours battery, 30 minutes to full charge last . HT The battery lasts 6 hours and it can be fully recharged in 30 minutes . Ok for gisting - meaning preserved Very costly for post-editing if style is to be preserved Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 8 / 69

A Taxonomy of MT Evaluation Methods Manual Automatic Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 9 / 69

A Taxonomy of MT Evaluation Methods Scoring Direct asses. Manual Automatic Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 9 / 69

Manual Assessment: Scoring Is this translation correct? Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 10 / 69

A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Manual Automatic Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 11 / 69

Manual Assessment: Ranking Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 12 / 69

A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Error annotation Manual Automatic Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 13 / 69

MQM (Multidimensional Quality Metrics) Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 14 / 69

A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Error annotation Manual Post-editing Task-based Automatic Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 15 / 69

Amount of Post-Editing HTER Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 16 / 69

Amount of Post-Editing Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 16 / 69

A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Error annotation Manual Post-editing Task-based Reading comprehension Automatic Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 17 / 69

Reading Comprehension Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 18 / 69

A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Error annotation Manual Post-editing Task-based Reading comprehension Eye-tracking Automatic Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 19 / 69

Eye-Tracking Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 20 / 69

A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Error annotation Manual Post-editing Task-based Reading comprehension Eye-tracking Reference-based Automatic Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 21 / 69

A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Error annotation Manual Post-editing Task-based Reading comprehension Eye-tracking Reference-based BLEU, Meteor, NIST, TER, WER, PER, CDER, BEER, CiDER, Cobalt, RATATOUILLE, RED, AMBER, PARMESAN, ... Automatic Quality estimation Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 21 / 69

Reference-Based Evaluation Reference(s): subset of good translations, usually one Some metrics expand matching, e.g. synonyms in Meteor Huge variation in reference translations. E.g. Source 不过这一切都由不得你 However these all totally beyond the control of you. MT But all this is beyond the control of you. Human score BLEU score HT 1 But all this is beyond your control. 3.4 0.427 HT 2 However, you cannot choose yourself. 2 0.049 HT 3 However, not everything is up to you to decide. 2 0.050 HT 4 But you can’t choose that. 2.8 0.055 Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 22 / 69

Reference-Based Evaluation Reference(s): subset of good translations, usually one Some metrics expand matching, e.g. synonyms in Meteor Huge variation in reference translations. E.g. Source 不过这一切都由不得你 However these all totally beyond the control of you. MT But all this is beyond the control of you. Human score BLEU score HT 1 But all this is beyond your control. 3.4 0.427 HT 2 However, you cannot choose yourself. 2 0.049 HT 3 However, not everything is up to you to decide. 2 0.050 HT 4 But you can’t choose that. 2.8 0.055 Metrics completely disregard source segment Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 22 / 69

Reference-Based Evaluation Reference(s): subset of good translations, usually one Some metrics expand matching, e.g. synonyms in Meteor Huge variation in reference translations. E.g. Source 不过这一切都由不得你 However these all totally beyond the control of you. MT But all this is beyond the control of you. Human score BLEU score HT 1 But all this is beyond your control. 3.4 0.427 HT 2 However, you cannot choose yourself. 2 0.049 HT 3 However, not everything is up to you to decide. 2 0.050 HT 4 But you can’t choose that. 2.8 0.055 Metrics completely disregard source segment Main problem: Cannot be applied for MT systems in use Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 22 / 69

A Taxonomy of MT Evaluation Methods Scoring Direct asses. Ranking Error annotation Manual Post-editing Task-based Reading comprehension Eye-tracking Reference-based BLEU, Meteor, NIST, TER, WER, PER, CDER, BEER, CiDER, Cobalt, RATATOUILLE, RED, AMBER, PARMESAN, ... Automatic Quality estimation Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 23 / 69

Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69

Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data : purpose is clear, no comparison to references , source considered Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69

Quality Estimation (Specia et al., 2013) Quality Estimation (QE): metrics that provide an estimate on the quality of translations on the fly Quality defined by the data : purpose is clear, no comparison to references , source considered Quality = Can we publish it as is? Andr´ e Martins (Unbabel) Quality Estimation MTM, 31/8/17 24 / 69

Translation Quality Estimation: Past, Present, and Future Andr e - PowerPoint PPT Presentation

Translation Quality Estimation: Past, Present, and Future Andr e Martins MT Marathon, Lisbon, August 31st, 2017 Andr e Martins (Unbabel) Quality Estimation MTM, 31/8/17 1 / 69 This Talk First part: largely based on Lucia Specias

The Past, Present, and Future of the R Project Kurt Hornik Kurt Hornik useR! 2008 The Past,

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Translation Quality Estimation Tutorial Hands-on QuEst++ Carolina Scarton and Lucia Specia July

The Past, Present and Future of Irish Agriculture Brendan Kearney The Past, Present and Future of

Israel: Israel: Past, Present, and Past, Present, and Future Future Ezekiel 5:5 Thus says

Israel: Israel: Past, Present, and Past, Present, and Future Future The LO RD did not set

OR-PAST,PRESENT OR-PAST,PRESENT & FUTURE & FUTURE Do You Know Do You Know Where Your

20 Years of PaX PaX Team SSTIC 2012.06.06 20 Years of PaX About Past Present Future About

Whither NEAC An Overview of the Past, Present An Overview of the Past, Present and Future of

Innovation Parks: Past, Present, Future Applied Solutions Coalition Santa Fe Workshop October,

VA Psychology Leadership Conference y gy p PAST PRESENT & FUTURE FOR VHA: PAST, PRESENT

Global Translation Services Website translation using post-edited machine translation and

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

SAMS Programming - Section C Lecture 1: Introduction + Basic Building Blocks of Programming

Texture What is texture? Easy to recognize, hard to define Deterministic textures

to climate change Elanus caeruleus - http://www.chassimages.com Context | Method

ParmeSan: Sanitizer-guided Greybox Fuzzing USENIX 2020 *some pages borrowed from Zheyu Ma

HOW WOW HOW! #reporthowwow @parapowwow Report writing howwow bonanza The tech the words

social data science Statistical Learning Sebastian Barfort August 15, 2016 University of

Java & SQL Stronger Together @MarkusWinand @ModernSQL Picture: Africa

The Public Option: A non-regulatory alternative to Network Neutrality Richard Ma Advanced Digital

Sambuz

Useful Links

Newsletter

Mail Us

Translation Quality Estimation: Past, Present, and Future Andr e - PowerPoint PPT Presentation

Translation Quality Estimation: Past, Present, and Future Andr e Martins MT Marathon, Lisbon, August 31st, 2017 Andr e Martins (Unbabel) Quality Estimation MTM, 31/8/17 1 / 69 This Talk First part: largely based on Lucia Specias

The Past, Present, and Future of the R Project Kurt Hornik Kurt Hornik useR! 2008 The Past,

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Translation Quality Estimation Tutorial Hands-on QuEst++ Carolina Scarton and Lucia Specia July

The Past, Present and Future of Irish Agriculture Brendan Kearney The Past, Present and Future of

Israel: Israel: Past, Present, and Past, Present, and Future Future Ezekiel 5:5 Thus says

Israel: Israel: Past, Present, and Past, Present, and Future Future The LO RD did not set

OR-PAST,PRESENT OR-PAST,PRESENT &amp; FUTURE &amp; FUTURE Do You Know Do You Know Where Your

20 Years of PaX PaX Team SSTIC 2012.06.06 20 Years of PaX About Past Present Future About

Whither NEAC An Overview of the Past, Present An Overview of the Past, Present and Future of

Innovation Parks: Past, Present, Future Applied Solutions Coalition Santa Fe Workshop October,

VA Psychology Leadership Conference y gy p PAST PRESENT &amp; FUTURE FOR VHA: PAST, PRESENT

Global Translation Services Website translation using post-edited machine translation and

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

SAMS Programming - Section C Lecture 1: Introduction + Basic Building Blocks of Programming

Texture What is texture? Easy to recognize, hard to define Deterministic textures

to climate change Elanus caeruleus - http://www.chassimages.com Context | Method

ParmeSan: Sanitizer-guided Greybox Fuzzing USENIX 2020 *some pages borrowed from Zheyu Ma

HOW WOW HOW! #reporthowwow @parapowwow Report writing howwow bonanza The tech the words

social data science Statistical Learning Sebastian Barfort August 15, 2016 University of

Java &amp; SQL Stronger Together @MarkusWinand @ModernSQL Picture: Africa

The Public Option: A non-regulatory alternative to Network Neutrality Richard Ma Advanced Digital

Sambuz

Useful Links

Newsletter

Mail Us

OR-PAST,PRESENT OR-PAST,PRESENT & FUTURE & FUTURE Do You Know Do You Know Where Your

VA Psychology Leadership Conference y gy p PAST PRESENT & FUTURE FOR VHA: PAST, PRESENT

Java & SQL Stronger Together @MarkusWinand @ModernSQL Picture: Africa