combining crowd and ai to scale professional quality
play

Combining Crowd and AI to scale professional-quality translation - PowerPoint PPT Presentation

Building universal understanding Combining Crowd and AI to scale professional-quality translation Joo Graa Joo Graa CTO CTO Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March


  1. Building universal understanding Combining Crowd and AI to scale professional-quality translation João Graça João Graça CTO CTO Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 41

  2. The internet, 1997 80% 
 English Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 42

  3. The internet, 2017 30% 
 English 20% 
 Chinese 8% 
 Spanish 6% 
 Japanese 5% 
 Portuguese 4% 
 German 3% 
 Arabic Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 43

  4. Language barriers = trade barriers “Everyone Just 12% speaks English” costs the UK of EU retailers sell online £48B to other EU countries Just 15% of EU consumers buy online 3.5% UK GDP every year from other EU countries Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 44

  5. Available Solutions Lack of fast, affordable translation with human quality Machine Professional Translation Translation Affordable Expensive Fast Slow Quality not 
 good enough 5 Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 45

  6. “All translation firms together are able to translate far less than 1% of relevant content produced everyday” CSA – MT Is Unavoidable to Keep Up with Content Volumes Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 46

  7. Will AI solve translation? JOBS MQM 95 QUALITY MACHINE ONLY TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 47

  8. Will AI solve translation? JOBS MQM 95 HUMAN EFFORT TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 48

  9. Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 49

  10. Unbabel Pipeline High Q.E. Low Q.E. Q.E. Original Translated Machine customer Quality customer Translation Community request request Estimation Re-Eval Translators Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 50

  11. Machine Translation Pipeline Translation Memory Job Result MT Router Customer MT Customer APE Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 51

  12. Customer Adaptation Customer Support Tickets MQM MQM 100 94,0 82,5 80,0 65 65,0 47,5 50,0 30 S N C P M r M u o s T f T t e o s m s i i o z e n a d l Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 52

  13. Quality Estimation Word-Level QE 
 Which words are translated correctly/incorrectly? Sentence-Level QE 
 How good is the entire translation? Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 53

  14. Quality Estimation Word-level QE example Hey lá , eu sou pesaroso sobre aquele ! BA BA BA BA BA OK OK OK OK D D D D D Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 54

  15. QE Training Bad translation Unbabel Ticket source MT final Good translation Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 55

  16. QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE 
 how good is the entire document? Human QE 
 Can we evaluate post-edit output? Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 56

  17. Data Generation Engine Customer Q.E. Q.E. Customer Job Machine Quality Quality Community Translation Estimation Estimation Translators Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 57

  18. Data Generation Engine Before After Initial text Initial text With Data points: NO Mouse clicks DATA Key presses POINTS Timestamps Submitted text Submitted text Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 58

  19. Keystroke Analysis Raw data Processed information At 18:03:30: At 18:03:35: At 18:03:30: In nugget 3 In nugget 3 In nugget 3 Initial text mouseClick Pressed Shift mouseClick “Espero que esto es útil” Cursor at 16 Cursor at 25 Cursor at 16 Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 • Deleted word “ es” Pressed Backspace Pressed s Pressed Backspace Cursor at 16 Cursor at 25 Cursor at 16 • Inserted word “sea” Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed i Pressed Backspace Submitted text Cursor at 15 Cursor at 26 Cursor at 15 “Espero que esto sea útil” Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed e Pressed Backspace Cursor at 14 Cursor at 27 Cursor at 14 Selected: 0 Selected: 0 Selected: 0 Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 59

  20. Profession translation Unbabel pillars Cost •Editors Pool •Initial Text (MT) •Editor Assignment Speed Quality •Custom Editing Interfaces •Constant Quality Evaluation Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 60

  21. Unbabel Community 50.000 Users Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 61

  22. Editors Pool More specialization layers 
 4 Expert will be created Editor Annotators Evaluators Only the best rated editors 3 Paid Work have access to customer tasks Editors get rated 
 2 Training Content with training tasks Testing Phase First tests right after signup 1 Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 62

  23. Evaluation Tool Document Level Human QE Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 63

  24. Deep Annotations Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 64

  25. Error Analysis Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 65

  26. QE for Annotation Pre-fill with word level QE Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 66

  27. Editors Profiling Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 67

  28. Editor Assignment Queue Topics Priority SLA Tasks/time Editors Rating Native Topics G 6 H 4.2 1000 2 m 30 m G 1100 6 m Pull 3.8 2 D G 1000 10 m 4.3 6 D G 1000 12 m 4.8 20 m 1100 R 18 m 40 m 1100 R 45 m Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 68

  29. Editor Assignment Smart distribution Regular distribution 3.8 4.6 old rating Improved rating Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 69

  30. Post-Editing Interfaces Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 70

  31. QE on Interfaces Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 71

  32. Post-Editing Interfaces Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 72

  33. Time Spent on Job Translator 1 Translator 2 MT WAITING WAITING DELIVERY TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 73

  34. Time Spent on Job: Mobile Translator 1 Translator 2 -20% MT WAITING WAITING DELIVERY TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 74

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend