Combining Crowd and AI to scale professional-quality translation
João Graça CTO
Combining Crowd and AI to scale professional-quality translation - - PowerPoint PPT Presentation
Combining Crowd and AI to scale professional-quality translation Joo Graa CTO The Internet, 1997 80% English The Internet, 2017 30% English 20% Chinese 8% 6% Spanish 5% Japanese 4% Portuguese 3% German
João Graça CTO
English
5%
Portuguese
8%
Spanish
20%
Chinese
4%
German
6%
Japanese
3%
Arabic
English
CSA – MT Is Unavoidable to Keep Up with Content Volumes
*
Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang
Everyone agrees that NMT is here to stay and much better than SMT
Bleu Moses (generic) Neural MT (generic) Moses (adapted) Neural MT (adapted) 49,9 43,5 35,7 29,6
* A Neural Network for Machine Translation, at Production Scale. Google Blog
0% 20% 40% 60% 80% 100% 6 12 18 24 30 Good Not sure Bad
MQM
Job
MACHINE ONLY TIME JOBS MQM 95 QUALITY
HUMAN EFFORT TIME JOBS MQM 95
0% 20% 40% 60% 80% 100% 6 12 18 24 30 Good Not sure Bad
MQM
Job
Job
Job Machine Translation
Job Machine Translation
Job Machine Translation
Q.E.
Quality Estimation
Job Machine Translation Customer High Q.E.
Q.E.
Quality Estimation
Job Machine Translation Customer High Q.E.
Q.E.
Quality Estimation Low Q.E. Re-Eval Community Translators
Job Machine Translation Customer
Q.E.
Quality Estimation Community Translators Customer
Q.E.
Quality Estimation
Initial text Submitted text
After
Initial text Submitted text
Before
NO DATA POINTS
All changes in between:
Mouse clicks Key presses Timestamps
Raw data Processed information
At 18:03:30: In nugget 3 mouseClick Cursor at 16 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace Cursor at 16 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace Cursor at 15 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace At 18:03:35: In nugget 3 Pressed Shift Cursor at 25 Selected: 0 At 18:03:35: In nugget 3 Pressed s Cursor at 25 Selected: 0 At 18:03:35: In nugget 3 Pressed i Cursor at 26 Selected: 0 At 18:03:35: In nugget 3 Pressed e At 18:03:30: In nugget 3 mouseClick Cursor at 16 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace Cursor at 16 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace Cursor at 15 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace
Initial text
“Espero que esto es útil”
Submitted text
“Espero que esto sea útil”
Job Machine Translation Customer High Q.E.
Q.E.
Quality Estimation Low Q.E. Re-Eval Crowd
MACHINE QE COMMUNITY
Translation Memory Job Result MT Router Customer MT Customer APE
Phrase-based MT Neural MT
Customer Support Tickets
30 37,5 45 52,5 60
G
l e N M T N M T 1 K 5 K 1 K 2 k 2 5 K 51,7 48,8 47,2 46,9 43,2 34,1 43,1
Bleu
Word-Level QE Which words are translated correctly/incorrectly? Sentence-Level QE How good is the entire translation?
Word-level QE example
Hey lá , eu sou pesaroso sobre aquele !
OK OK OK OK BA D BA D BA D BA D BA D
Unbabel Ticket Bad translation Good translation
source MT final
Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE
15 30 45 60F1_MULT
Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE
15 30 45 60F1_MULT
41,1
Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE
15 30 45 60F1_MULT
45,2 41,1
Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE
15 30 45 60F1_MULT
47,3 45,2 41,1
Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE
15 30 45 60F1_MULT
50,3 47,3 45,2 41,1
WMT 16 WL QE Winner
Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE
15 30 45 60F1_MULT
55,7 50,3 47,3 45,2 41,1
WMT 16 WL QE Winner
Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE
15 30 45 60F1_MULT
57,5 55,7 50,3 47,3 45,2 41,1
Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz. “Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)
WMT 16 WL QE Winner
YANDEX StackedQE APE-QE FullStackedQE
Pearson Correlation
17,5 35 52,5 70YANDEX StackedQE APE-QE FullStackedQE
Pearson Correlation
17,5 35 52,5 7052,5
YANDEX StackedQE APE-QE FullStackedQE
Pearson Correlation
17,5 35 52,5 7054,9 52,5
YANDEX StackedQE APE-QE FullStackedQE
Pearson Correlation
17,5 35 52,5 7061,3 54,9 52,5
YANDEX StackedQE APE-QE FullStackedQE
Pearson Correlation
17,5 35 52,5 7065,6 61,3 54,9 52,5
Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz. “Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)
Job Machine Translation Customer High Q.E.
Q.E.
Quality Estimation
Job Machine Translation Customer High Q.E.
Q.E.
Quality Estimation
Document-Level QE how good is the entire document?
Job Machine Translation Customer High Q.E.
Q.E.
Quality Estimation Low Q.E. Re-Eval Community Translators
Document-Level QE how good is the entire document?
Job Machine Translation Customer High Q.E.
Q.E.
Quality Estimation Low Q.E. Re-Eval Community Translators
Document-Level QE how good is the entire document? Human QE Can we evaluate post-edit output?
Interesting numbers coming soon
Goals Quality Speed Cost
Goals Quality Speed Cost Pillars
Goals Quality Speed Cost Pillars
Goals Quality Speed Cost Pillars
Goals Quality Speed Cost Pillars
Goals Quality Speed Cost Pillars
Goals Quality Speed Cost Pillars
Goals Quality Speed Cost Pillars
Quality Estimation
Expert
Specialisation layers will grow with time
Testing Phase
Editors are tested when they sign up
Training Content
Editors get ratings for the tasks
Paid Content
The best editors have access to paid content
Expert
Specialisation layers will grow with time
Testing Phase
Editors are tested when they sign up
Training Content
Editors get ratings for the tasks
Paid Content
The best editors have access to paid content
Evaluators
Expert
Specialisation layers will grow with time
Testing Phase
Editors are tested when they sign up
Training Content
Editors get ratings for the tasks
Paid Content
The best editors have access to paid content
Evaluators Annotators
Tasks/time
6 m 2 m 10 m 12 m 18 m 45 m
Editors Tasks/time
6 m 2 m 10 m 12 m 18 m 45 m
Pull
Editors Tasks/time
6 m 2 m 10 m 12 m 18 m 45 m
Pull
30 m 6 H 2 D 6 D 20 m 40 m
SLA Editors Tasks/time
6 m 2 m 10 m 12 m 18 m 45 m
Pull
30 m 6 H 2 D 6 D 20 m 40 m
SLA
1100 1000 1000 1000 1100 1100
Priority Editors Tasks/time
6 m 2 m 10 m 12 m 18 m 45 m
Pull
30 m 6 H 2 D 6 D 20 m 40 m
SLA
1100 1000 1000 1000 1100 1100
Priority Editors Queue
G G G G R R
Tasks/time
6 m 2 m 10 m 12 m 18 m 45 m
Pull
30 m 6 H 2 D 6 D 20 m 40 m
SLA
1100 1000 1000 1000 1100 1100
Priority Editors Rating
4.2 3.8 4.3 4.8
Queue
G G G G R R
Tasks/time
6 m 2 m 10 m 12 m 18 m 45 m
Pull
30 m 6 H 2 D 6 D 20 m 40 m
SLA
1100 1000 1000 1000 1100 1100
Priority Editors Rating
4.2 3.8 4.3 4.8
Native Queue
G G G G R R
Tasks/time
6 m 2 m 10 m 12 m 18 m 45 m
Pull
30 m 6 H 2 D 6 D 20 m 40 m
SLA
1100 1000 1000 1000 1100 1100
Priority Editors Rating
4.2 3.8 4.3 4.8
Native Queue
G G G G R R
Tasks/time
6 m 2 m 10 m 12 m 18 m 45 m
Topics
Pull
30 m 6 H 2 D 6 D 20 m 40 m
SLA
1100 1000 1000 1000 1100 1100
Priority Editors Rating
4.2 3.8 4.3 4.8
Native Topics Queue
G G G G R R
Tasks/time
6 m 2 m 10 m 12 m 18 m 45 m
Topics
Regular distribution
3.8
Regular distribution
3.8
Smart distribution
4.6
Improved rating
TIME
DELIVERY
MT
WAITING
Translator 1 TIME
DELIVERY
MT
WAITING
Translator 1 TIME
DELIVERY
MT Translator 2
WAITING
WAITING
Translator 1 TIME
DELIVERY
MT Translator 2
WAITING
less
Spelling Tone Formality Consistency
External NLP Services
Spell Check Syntax Parser Word Aligner
Client Rule
Spelling Tone Formality Consistency
External NLP Services
Spell Check Syntax Parser Word Aligner
Client Rule
Spelling Tone Formality Consistency
External NLP Services
Spell Check Syntax Parser Word Aligner
Client Rule
Annotation Tool
Eval
Annotated
Spelling Tone Formality Consistency
External NLP Services
Spell Check Syntax Parser Word Aligner
Client Rule
Annotation Tool
Eval
Annotated
Learn
M A C H I N E + H U M A N
CORE API CHAT API CYRANO API
Customer Service Conversational
MESSAGING API
Language Engine
Bots
Tickets can come in many languages.
Unbabel’s Domain Adapted Machine Translation Distributed Human Translation English-speaking agent Zendesk app connected with Unbabel API
Unbabel can offer the same Customer Satisfaction as native agents
SPEED QUALITY
minutes
Editors train the MT engine Community of 50K+ translators Skips Humans
SPEED QUALITY
Minutes
HUMAN EFFORT TIME JOBS MQM 95
joao@unbabel.com