Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on - - PowerPoint PPT Presentation

post edi7ng effort for english to arabic hybrid machine
SMART_READER_LITE
LIVE PREVIEW

Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on - - PowerPoint PPT Presentation

An Empirical Study: Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on Hassan Sajjad , Francisco Guzman, Stephan Vogel Qatar Compu7ng Research Ins7tute, HBKU Introduc7on Old Arabic documents Transla7on of metadata from


slide-1
SLIDE 1

An Empirical Study:

Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on

Hassan Sajjad, Francisco Guzman, Stephan Vogel

Qatar Compu7ng Research Ins7tute, HBKU

slide-2
SLIDE 2

Introduc7on

  • Old Arabic documents
  • Transla7on of

metadata from English to Arabic

slide-3
SLIDE 3

Tradi7onal Transla7on Process

Translators Translation Company British Library

TM

slide-4
SLIDE 4

Problem

  • Various small documents
  • Fewer overlap at sentence/segment level
  • Few transla7on memory matches

– A lot needs to be translated from scratch

  • Time and cost inefficient
slide-5
SLIDE 5

Solu7on: Hybrid Machine Transla7on

100% recall – readily available transla7ons High precision transla7ons

TM CMT

Hybrid MT

Hybrid MT: Combines the benefits of both!

Transla7on Memory and Customized MT

slide-6
SLIDE 6

Hybrid MT System

  • Transla7on Memory

– First pass: use strict matching to translate known words and phrases

  • Customized Machine Transla7on

– Second pass: translate the remaining text using machine transla7on system

TM CMT

slide-7
SLIDE 7

Aiming higher: Post Edi7ng for Quality

Post Editors

  • High quality
  • High consistency
  • Cost and time effective

TM CMT

Hybrid MT

slide-8
SLIDE 8

Customized Machine Transla7on

  • A sta7s7cal machine transla7on system

– Train specific to the domain of the text that needs to be translated

  • General prac7ce

– Use Moses – Train on the data of transla7on memory – Follow recipe of a compe77on grade system to ensure high quality

CMT

slide-9
SLIDE 9

English to Arabic CMT

  • Best compe77on grade pipeline involves

– Arabic (de-) tokeniza7on

  • Spli\ng morphologically rich words into smaller segments and

vice-versa

  • +1.5 BLEU points improvement

– Arabic (de-) normaliza7on

  • Mapping different forms of a leaer to one form and vice verse
  • +0.5 BLEU point improvement

This ensures high quality but does not guarantee less frustra7on for post-editors

CMT

slide-10
SLIDE 10

Why?

Transla7on output requires:

  • De-tokeniza7on and de-normaliza7on
  • De-normaliza7on introduces character-level

errors

– Frustra7ng for the post-editor to correct – Time inefficient

CMT

slide-11
SLIDE 11

Recommended Prac7ces for CMT of English-Arabic

  • Don’t normalize

But

  • Always tokenize

– Improve coverage of words – Beaer transla7ons

CMT

slide-12
SLIDE 12

Let’s Talk about BL Case Numbers!

We compare:

  • Transla7on Memory (TM) only
  • Hybrid MT (TM + CMT)

Also:

  • Translator
  • Hybrid MT + Post edi7ng (PE)

Looking at:

  • Effec7veness
  • Quality
  • Consistency
slide-13
SLIDE 13

Data

  • 1000 documents

– 90k parallel sentences/segments – 953 documents for training

  • 489k tokens

– Rest for tune and test

slide-14
SLIDE 14

Effec7veness of TM

Exact match Fuzzy match

50% 50%

segments

7% 7%

words

84% 84%

segments

13. 13.5% 5%

words

More than 85% of words still need to be translated !!!!

* Based on an assessment over X documents BUT COVERS ONLY BUT COVERS ONLY

slide-15
SLIDE 15

Effec7veness of CMT

100% 100%

segments

99. 99.9% 9%

words

AND

translated!

slide-16
SLIDE 16

Effec7veness of Hybrid MT

  • High precision

– TM exact matches

  • High recall

– CMT to produce high quality transla7ons

slide-17
SLIDE 17

Assessing Quality

  • BLEU

– Compare output to ‘reference’ transla7on Strict Par7al TM 7.07 21.01 TM + CMT 54.60 48.54

CMT alone BLEU scores are 53.90

slide-18
SLIDE 18

Assessing Quality

  • TER: Transla7on Error Rate

– How much effort is needed to get perfect transla7on? – Compare to ‘reference’ transla7on

Hybrid MT can improve beyond that!!!

0% 20% 40% 60% 80% 100% Percentage of effort required Hybrid MT TM

slide-19
SLIDE 19

Assessing Quality

  • TER vs. Post edi7ng effort

– Similar effort es7ma7on using post-edi7ng of Hybrid MT

0% 20% 40% 60% 80% 100% Percentage of effort required PE on Hybrid MT Hybrid MT TM * PE is based on an assessment over 4 documents, using a junior translator

slide-20
SLIDE 20

Consistency of Hybrid MT

  • We compared Hybrid MT versus a junior translator
  • We measured consistency with reference

transla7ons Hybrid MT is more consistent with reference translations

* Based on an assessment over 4 documents 0% 10% 20% 30% 40% 50% 60% 70% Overlap with reference transla7on Hybrid MT Translator

slide-21
SLIDE 21

Speedup of Hybrid MT

  • We compared Hybrid MT versus a junior

translator

* Based on an assessment over 4 documents

Hybrid MT+PE is 30% more efficient

20 40 60 80 100 120 Time taken to translate (mins) Translator Hybrid MT + PE

slide-22
SLIDE 22

Conclusion

  • Hybrid MT

– High precision and high recall

  • Hybrid MT plus Post-edi7ng

– Efficient in terms of both 7me and cost – Improves consistency

  • Customized MT for English-Arabic

– Don’t normalize but always tokenize

slide-23
SLIDE 23

References

  • Ahmed Abdelali, Kareem Darwish, Nadir Durrani, and Hamdy Mubarak.

Farasa: A Fast and Furious Segmenter for Arabic. In NAACL-2016, San Diego, US.

  • Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello

Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Chris7ne Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constan7n, and Evan

  • Herbst. Moses: Open source toolkit for sta7s7cal machine transla7on. In

ACL-2007, Prague, Czech Republic

  • Hassan Sajjad, Francisco Guzman, Preslav Nakov, Ahmed Abdelali, Kenton

Murray, Fahad Al Obaidli, and Stephan Vogel. QCRI at IWSLT 2013: Experiments in Arabic-English and English-Arabic Spoken Language

  • Transla7on. In IWSLT-2013, Heidelberg, Germany
slide-24
SLIDE 24

Thank you