Combining Crowd and AI to scale professional-quality translation - - PowerPoint PPT Presentation

combining crowd and ai to scale professional quality
SMART_READER_LITE
LIVE PREVIEW

Combining Crowd and AI to scale professional-quality translation - - PowerPoint PPT Presentation

Building universal understanding Combining Crowd and AI to scale professional-quality translation Joo Graa Joo Graa CTO CTO Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March


slide-1
SLIDE 1

Building universal understanding

Combining Crowd and AI to scale professional-quality translation

João Graça CTO

João Graça CTO

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 41

slide-2
SLIDE 2

80%


English

The internet, 1997

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 42

slide-3
SLIDE 3

5%


Portuguese

8%


Spanish

20%


Chinese

4%


German

6%


Japanese

3%


Arabic

30%


English

The internet, 2017

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 43

slide-4
SLIDE 4

Language barriers = trade barriers

“Everyone speaks English” costs the UK

£48B

Just 12%

  • f EU retailers sell online

to other EU countries

Just 15%

  • f EU consumers buy online

from other EU countries 3.5% UK GDP every year

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 44

slide-5
SLIDE 5

5

Machine Translation Affordable Fast Quality not
 good enough Professional Translation Expensive Slow

Lack of fast, affordable translation with human quality

Available Solutions

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 45

slide-6
SLIDE 6

“All translation firms together are able to translate far less than 1% of relevant content produced everyday”

CSA – MT Is Unavoidable to Keep Up with Content Volumes

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 46

slide-7
SLIDE 7

Will AI solve translation?

MACHINE ONLY TIME JOBS MQM 95 QUALITY

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 47

slide-8
SLIDE 8

Will AI solve translation?

HUMAN EFFORT TIME JOBS MQM 95

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 48

slide-9
SLIDE 9

0% 20% 40% 60% 80% 100% 6 12 18 24 30 Good Not sure Bad

MQM

Job

Quality per Job

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 49

slide-10
SLIDE 10

Original customer request Translated customer request High Q.E.

Unbabel Pipeline

Quality Estimation

Q.E.

Machine Translation Low Q.E. Re-Eval Community Translators

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 50

slide-11
SLIDE 11

Translation Memory Job Result MT Router Customer MT Customer APE

Machine Translation Pipeline

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 51

slide-12
SLIDE 12

Customer Support Tickets

30 47,5 65 82,5 100

S M T N M T C u s t

  • m

i z e d P r

  • f

e s s i

  • n

a l 94,0 80,0 65,0 50,0

MQM

Customer Adaptation

MQM

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 52

slide-13
SLIDE 13

Word-Level QE 
 Which words are translated correctly/incorrectly? Sentence-Level QE 
 How good is the entire translation?

Quality Estimation

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 53

slide-14
SLIDE 14

Word-level QE example

Hey lá , eu sou pesaroso sobre aquele !

OK OK OK OK BA D BA D BA D BA D BA D

Quality Estimation

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 54

slide-15
SLIDE 15

Unbabel Ticket Bad translation Good translation

source MT final

QE Training

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 55

slide-16
SLIDE 16

Job Machine Translation Customer High Q.E.

Q.E.

Quality Estimation Low Q.E. Re-Eval Community Translators

Document-Level QE 
 how good is the entire document? Human QE 
 Can we evaluate post-edit output?

QE in the Pipeline

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 56

slide-17
SLIDE 17

Job Machine Translation Customer

Q.E.

Quality Estimation Community Translators Customer

Q.E.

Quality Estimation

Data Generation Engine

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 57

slide-18
SLIDE 18

Initial text Submitted text

After Before

NO DATA POINTS

With Data points: Mouse clicks Key presses Timestamps

Data Generation Engine

Initial text Submitted text

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 58

slide-19
SLIDE 19

Raw data Processed information

At 18:03:30: In nugget 3 mouseClick Cursor at 16 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace Cursor at 16 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace Cursor at 15 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace Cursor at 14 Selected: 0 At 18:03:35: In nugget 3 Pressed Shift Cursor at 25 Selected: 0 At 18:03:35: In nugget 3 Pressed s Cursor at 25 Selected: 0 At 18:03:35: In nugget 3 Pressed i Cursor at 26 Selected: 0 At 18:03:35: In nugget 3 Pressed e Cursor at 27 Selected: 0 At 18:03:30: In nugget 3 mouseClick Cursor at 16 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace Cursor at 16 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace Cursor at 15 Selected: 0 At 18:03:31: In nugget 3 Pressed Backspace Cursor at 14 Selected: 0

Initial text

“Espero que esto es útil”

  • Deleted word “es”
  • Inserted word “sea”

Submitted text

“Espero que esto sea útil”

Keystroke Analysis

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 59

slide-20
SLIDE 20
  • Editors Pool
  • Initial Text (MT)
  • Editor Assignment
  • Custom Editing Interfaces
  • Constant Quality Evaluation

Profession translation

Unbabel pillars

Speed Quality Cost

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 60

slide-21
SLIDE 21

50.000 Users

Unbabel Community

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 61

slide-22
SLIDE 22

Evaluators

More specialization layers 
 will be created First tests right after signup Editors get rated 
 with training tasks

Expert Editor Testing Phase Training Content Paid Work

Only the best rated editors have access to customer tasks

Editors Pool

Annotators

1 2 3 4

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 62

slide-23
SLIDE 23

Evaluation Tool

Document Level Human QE

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 63

slide-24
SLIDE 24

Deep Annotations

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 64

slide-25
SLIDE 25

Error Analysis

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 65

slide-26
SLIDE 26

QE for Annotation

Pre-fill with word level QE

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 66

slide-27
SLIDE 27

Editors Profiling

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 67

slide-28
SLIDE 28

Pull

30 m 6 H 2 D 6 D 20 m 40 m

SLA

1100 1000 1000 1000 1100 1100

Priority Editors Rating

4.2 3.8 4.3 4.8

Native Topics Queue

G G G G R R

Tasks/time

6 m 2 m 10 m 12 m 18 m 45 m

Topics

Editor Assignment

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 68

slide-29
SLIDE 29

Regular distribution

3.8

  • ld rating

Smart distribution

4.6

Improved rating

Editor Assignment

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 69

slide-30
SLIDE 30

Post-Editing Interfaces

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 70

slide-31
SLIDE 31

QE on Interfaces

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 71

slide-32
SLIDE 32

Post-Editing Interfaces

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 72

slide-33
SLIDE 33

Time Spent on Job

WAITING

Translator 1 TIME

DELIVERY

MT Translator 2

WAITING

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 73

slide-34
SLIDE 34

Time Spent on Job: Mobile

WAITING

Translator 1 TIME

DELIVERY

MT Translator 2

WAITING

  • 20%

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 74

slide-35
SLIDE 35

Spelling Grammar Consistency Terminology

External NLP Services

Spell Check Syntax Parser Word Aligner Register

Smartcheck

Style Guides

Customer

Learning

Annotations

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 75

slide-36
SLIDE 36

Spelling Grammar Consistency Terminology Register

Smartcheck (QE Version)

Style Guides

Customer

Learning

Annotations

Q.E.

Quality Estimation

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 76

slide-37
SLIDE 37

M A C H I N E + H U M A N

CORE API CHAT API CYRANO API

Customer Service Conversational

MESSAGING API

Language OS

Language Engine

Bots

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 77

slide-38
SLIDE 38

Unbabel for Customer Service

Unbabel adapts to any workflow

English-speaking agent Unbabel’s 
 Machine Translation Distributed Human Translation

API

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 78

slide-39
SLIDE 39

94

Customer Replies: Speed & Quality

20 minutes

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 79

slide-40
SLIDE 40

Unbabel Chat

Native speaking 
 in multiple languages

API

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 80

slide-41
SLIDE 41

Chat

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 81

slide-42
SLIDE 42

Editors train the MT engine Community of 100K+ translators Skips Humans

Chat Translation Flow

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 82

slide-43
SLIDE 43

Chat Messages: Speed & Quality

90 2 minutes

MT

80%

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 83

slide-44
SLIDE 44

Other Use Cases

Reviews Travel descriptions Video SEO Newsletters

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 84

slide-45
SLIDE 45

We’re Hiring

https://unbabel.com/careers/

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 85