Combining Crowd and AI to scale professional-quality translation - PowerPoint PPT Presentation

Combining Crowd and AI to scale professional-quality translation João Graça CTO

The Internet, 1997 80%   English

The Internet, 2017 30%   English 20%   Chinese 8%   6%   Spanish 5%   Japanese 4%   Portuguese 3%   German Arabic

“All translation firms together are able to translate far less than 1% of relevant content produced everyday” CSA – MT Is Unavoidable to Keep Up with Content Volumes

Is Machine Translation already here? * Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang Everyone agrees that NMT is here to stay and much better than SMT

Unbabel Experiments with Customer Service Tickets 49,9 43,5 35,7 29,6 Moses Neural MT Moses Neural MT (generic) (generic) (adapted) (adapted) Bleu

Is NMT really enough? * A Neural Network for Machine Translation, at Production Scale. Google Blog

Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad

Will AI solve translation? JOBS MQM 95 QUALITY TIME MACHINE ONLY

Will AI solve translation? JOBS MQM 95 HUMAN EFFORT TIME

Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad

Unbabel Pipeline

Unbabel Pipeline Job

Unbabel Pipeline Job Machine Translation

Unbabel Pipeline Q.E. Job Machine Quality Translation Estimation

Unbabel Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation

Unbabel Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Translation Community Estimation Re-Eval Translators

Data Generation Engine Q.E. Q.E. Customer Job Customer Machine Quality Quality Community Translation Estimation Estimation Translators

  Data Generation Engine Before After Initial text Initial text   All changes in between: NO Mouse clicks DATA Key presses Timestamps POINTS Submitted text Submitted text

Keystroke Analysis Raw data Processed information At 18:03:30: At 18:03:35: At 18:03:30: In nugget 3 In nugget 3 In nugget 3 Initial text mouseClick Pressed Shift mouseClick “Espero que esto es útil” Cursor at 16 Cursor at 25 Cursor at 16 Selected: 0 Selected: 0 Selected: 0 • Deleted word “ es” At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 • Inserted word “sea” Pressed Backspace Pressed s Pressed Backspace Cursor at 16 Cursor at 25 Cursor at 16 Submitted text Selected: 0 Selected: 0 Selected: 0 “Espero que esto sea útil” At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed i Pressed Backspace Cursor at 15 Cursor at 26 Cursor at 15 Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed e Pressed Backspace

Unbabel Pipeline MACHINE QE COMMUNITY High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Translation Crowd Estimation Re-Eval

Machine Translation Pipeline Translation Memory Job Result MT Router Customer MT Customer APE

Machine Translation Models Phrase-based MT Neural MT

Customer Adaptation Customer Support Tickets 60 52,5 51,7 48,8 47,2 46,9 45 43,2 43,1 37,5 34,1 Bleu 30 G N 1 5 1 2 2 K K 0 0 5 M o K k K o T g l e N M T

Quality Estimation Word-Level QE   Which words are translated correctly/incorrectly? Sentence-Level QE   How good is the entire translation?

Quality Estimation Word-level QE example Hey lá , eu sou pesaroso sobre aquele ! BA BA BA BA BA OK OK OK OK D D D D D

QE Training Bad translation Unbabel Ticket source MT final Good translation

QE Word Level

QE Word Level F1_MULT Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner 55,7 APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner 55,7 APE-QE 57,5 FullStackedQE 0 15 30 45 60 Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz. “Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)

QE Sentence Level

QE Sentence Level YANDEX StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

QE Sentence Level 52,5 YANDEX StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

QE Sentence Level 52,5 YANDEX 54,9 StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

QE Sentence Level 52,5 YANDEX 54,9 StackedQE 61,3 APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

QE Sentence Level 52,5 YANDEX 54,9 StackedQE 61,3 APE-QE 65,6 FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz. “Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)

QE in the Pipeline

QE in the Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation

QE in the Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation Document-Level QE   how good is the entire document?

QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE   how good is the entire document?

QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE   Interesting how good is the entire document? numbers Human QE   coming soon Can we evaluate post-edit output?

Professional Translation Goals Quality Speed Cost

Professional Translation Goals Pillars Quality Speed Cost

Professional Translation Goals Pillars Quality • Editors Pool Speed Cost

Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed Cost

Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment Cost

Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment • Interfaces Cost

Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment • Interfaces Cost • Quality Evaluation

Unbabel Community

Unbabel Community 50 000 Users

Distributed Pipeline Quality Estimation

Editors Pool Specialisation layers will grow with time Expert The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase

Editors Pool Evaluators Specialisation layers will grow with time Expert The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase

Editors Pool Evaluators Specialisation layers will grow with time Expert Annotators The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase

How Editors are Evaluated

Editors Profiling

Editor Assignment

Editor Assignment Tasks/time 2 m 6 m 10 m 12 m 18 m 45 m

Editor Assignment Tasks/time Editors 2 m 6 m 10 m 12 m 18 m 45 m

Editor Assignment Tasks/time Editors 2 m 6 m Pull 10 m 12 m 18 m 45 m

Editor Assignment SLA Tasks/time Editors 6 H 2 m 30 m 6 m Pull 2 D 10 m 6 D 12 m 20 m 18 m 40 m 45 m

Editor Assignment Priority SLA Tasks/time Editors 6 H 1000 2 m 30 m 1100 6 m Pull 2 D 1000 10 m 6 D 1000 12 m 20 m 1100 18 m 40 m 1100 45 m

Editor Assignment Queue Priority SLA Tasks/time Editors G 6 H 1000 2 m 30 m G 1100 6 m Pull 2 D G 1000 10 m 6 D G 1000 12 m 20 m 1100 R 18 m 40 m 1100 45 m R

Combining Crowd and AI to scale professional-quality translation - PowerPoint PPT Presentation

Combining Crowd and AI to scale professional-quality translation Joo Graa CTO The Internet, 1997 80% English The Internet, 2017 30% English 20% Chinese 8% 6% Spanish 5% Japanese 4% Portuguese 3% German

Combining Crowd and AI to scale professional-quality translation Joo Graa Joo Graa CTO

Utilizing Crowd Funding Utilizing Crowd Funding for Support SMEs funding for Support SMEs

How to Stand Out from the Crowd on How to Stand Out from the Crowd on LinkedIn LinkedIn Maureen

POV & EXPERIENCE PROTOTYPES SLOANE, TINA, MARIE & KARNA CROWDPOWER DREAM TEAM Sloane

URBAN SCALE CROWD DATA ANALYSIS, SIMULATION, AND VISUALIZATION Isaac Rudomin May 2017 ABSTRACT

participatory governance syros_14.07.2012 the power of the crowd some facts crowd (people)

CrowdsFunding Gilad Ravid, PhD Crowd Sourcing Pooling Collective Knowledge Ushahidi

Slides from session at online conference imoot 2013 May 26 th 2013 These were crowd sourced from

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

General Schemes of Combining Rules and the Quality Characteristics of Combining Alexander Lepskiy

Combining Crowd and Expert Labels using Decision Theoretic Active Learning An T. Nguyen 1 Byron

RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW

Human Behaviour and Crowd Considera2ons in Normal and Emergency Condi2ons Presented by Steve Allen

Managing General and Individual Knowledge in Crowd Mining Applications Yael Amsterdamer, Susan

Canada Conceptual Model for Crowd Behaviour Anissa Frini, Ph.D DRDC CORA R et D pour la

Re Real Time Crowd Navigation From Fi First P Princi ncipl ples o of P Proba babi bility

Using to Facilitate Quantitative Reasoning in Science May Lee

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Calibrating Survey Weights in Stata Jeff Pitblado StataCorp LLC 2018 Nordic and Baltic Stata

Introduction Outline how to investigate heterogeneity Give statistical test Highlight

Event Ex Extraction Ev Xiachong Feng RE Ph.D. Candidate 2018.8 Ou Outline 1. Basic

Workshop 8.3a: Non-independence part 1 Murray Logan 28 May 2015 Section 1 Linear modelling

Application of Multi- -Objective Objective Metaheuristic Metaheuristic Application of Multi

Update in Pediatric Hospital Medicine 2014 Pediatric Grand Rounds Bradley Monash, MD Phuoc Le,

Sambuz

Useful Links

Newsletter

Mail Us

Combining Crowd and AI to scale professional-quality translation - PowerPoint PPT Presentation

Combining Crowd and AI to scale professional-quality translation Joo Graa CTO The Internet, 1997 80% English The Internet, 2017 30% English 20% Chinese 8% 6% Spanish 5% Japanese 4% Portuguese 3% German

Combining Crowd and AI to scale professional-quality translation Joo Graa Joo Graa CTO

Utilizing Crowd Funding Utilizing Crowd Funding for Support SMEs funding for Support SMEs

How to Stand Out from the Crowd on How to Stand Out from the Crowd on LinkedIn LinkedIn Maureen

POV &amp; EXPERIENCE PROTOTYPES SLOANE, TINA, MARIE &amp; KARNA CROWDPOWER DREAM TEAM Sloane

URBAN SCALE CROWD DATA ANALYSIS, SIMULATION, AND VISUALIZATION Isaac Rudomin May 2017 ABSTRACT

participatory governance syros_14.07.2012 the power of the crowd some facts crowd (people)

CrowdsFunding Gilad Ravid, PhD Crowd Sourcing Pooling Collective Knowledge Ushahidi

Slides from session at online conference imoot 2013 May 26 th 2013 These were crowd sourced from

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

General Schemes of Combining Rules and the Quality Characteristics of Combining Alexander Lepskiy

Combining Crowd and Expert Labels using Decision Theoretic Active Learning An T. Nguyen 1 Byron

RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW

Human Behaviour and Crowd Considera2ons in Normal and Emergency Condi2ons Presented by Steve Allen

Managing General and Individual Knowledge in Crowd Mining Applications Yael Amsterdamer, Susan

Canada Conceptual Model for Crowd Behaviour Anissa Frini, Ph.D DRDC CORA R et D pour la

Re Real Time Crowd Navigation From Fi First P Princi ncipl ples o of P Proba babi bility

Using to Facilitate Quantitative Reasoning in Science May Lee

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Calibrating Survey Weights in Stata Jeff Pitblado StataCorp LLC 2018 Nordic and Baltic Stata

Introduction Outline how to investigate heterogeneity Give statistical test Highlight

Event Ex Extraction Ev Xiachong Feng RE Ph.D. Candidate 2018.8 Ou Outline 1. Basic

Workshop 8.3a: Non-independence part 1 Murray Logan 28 May 2015 Section 1 Linear modelling

Application of Multi- -Objective Objective Metaheuristic Metaheuristic Application of Multi

Update in Pediatric Hospital Medicine 2014 Pediatric Grand Rounds Bradley Monash, MD Phuoc Le,

Sambuz

Useful Links

Newsletter

Mail Us

POV & EXPERIENCE PROTOTYPES SLOANE, TINA, MARIE & KARNA CROWDPOWER DREAM TEAM Sloane