Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on - PowerPoint PPT Presentation

An Empirical Study: Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on Hassan Sajjad , Francisco Guzman, Stephan Vogel Qatar Compu7ng Research Ins7tute, HBKU

Introduc7on • Old Arabic documents • Transla7on of metadata from English to Arabic

Tradi7onal Transla7on Process TM Translation Company British Library Translators

Problem • Various small documents • Fewer overlap at sentence/segment level • Few transla7on memory matches – A lot needs to be translated from scratch • Time and cost inefficient

Solu7on: Hybrid Machine Transla7on 100% recall – TM CMT High precision readily available transla7ons transla7ons Hybrid MT Hybrid MT: Combines the benefits of both! Transla7on Memory and Customized MT

Hybrid MT System • Transla7on Memory TM – First pass: use strict matching to translate known words and phrases • Customized Machine Transla7on CMT – Second pass: translate the remaining text using machine transla7on system

Aiming higher: Post Edi7ng for Quality TM CMT Hybrid MT Post Editors • High quality • High consistency • Cost and time effective

Customized Machine Transla7on CMT • A sta7s7cal machine transla7on system – Train specific to the domain of the text that needs to be translated • General prac7ce – Use Moses – Train on the data of transla7on memory – Follow recipe of a compe77on grade system to ensure high quality

English to Arabic CMT CMT • Best compe77on grade pipeline involves – Arabic (de-) tokeniza7on • Spli\ng morphologically rich words into smaller segments and vice-versa • +1.5 BLEU points improvement – Arabic (de-) normaliza7on • Mapping different forms of a leaer to one form and vice verse • +0.5 BLEU point improvement This ensures high quality but does not guarantee less frustra7on for post-editors

Why? CMT Transla7on output requires: • De-tokeniza7on and de-normaliza7on • De-normaliza7on introduces character-level errors – Frustra7ng for the post-editor to correct – Time inefficient

Recommended Prac7ces for CMT of CMT English-Arabic • Don’t normalize But • Always tokenize – Improve coverage of words – Beaer transla7ons

Let’s Talk about BL Case Numbers! We compare: Looking at: • Transla7on Memory (TM) only • Effec7veness • Hybrid MT (TM + CMT) • Quality • Consistency Also: • Translator • Hybrid MT + Post edi7ng (PE)

Data • 1000 documents – 90k parallel sentences/segments – 953 documents for training • 489k tokens – Rest for tune and test

Effec7veness of TM Exact match Fuzzy match 7% 7% 84% 84% 13. 13.5% 5% 50% 50% BUT BUT COVERS COVERS ONLY ONLY words segments words segments More than 85% of words still need to be translated !!!! * Based on an assessment over X documents

Effec7veness of CMT 100% 100% 99. 99.9% 9% AND segments words translated!

Effec7veness of Hybrid MT • High precision – TM exact matches • High recall – CMT to produce high quality transla7ons

Assessing Quality • BLEU – Compare output to ‘reference’ transla7on Strict Par7al TM 7.07 21.01 TM + CMT 54.60 48.54 CMT alone BLEU scores are 53.90

Assessing Quality • TER: Transla7on Error Rate – How much effort is needed to get perfect transla7on? – Compare to ‘reference’ transla7on Hybrid MT TM 0% 20% 40% 60% 80% 100% Percentage of effort required Hybrid MT can improve beyond that!!!

Assessing Quality • TER vs. Post edi7ng effort – Similar effort es7ma7on using post-edi7ng of Hybrid MT PE on Hybrid MT Hybrid MT TM 0% 20% 40% 60% 80% 100% Percentage of effort required * PE is based on an assessment over 4 documents, using a junior translator

Consistency of Hybrid MT • We compared Hybrid MT versus a junior translator • We measured consistency with reference transla7ons Hybrid MT Translator 0% 10% 20% 30% 40% 50% 60% 70% Overlap with reference transla7on Hybrid MT is more consistent with reference translations * Based on an assessment over 4 documents

Speedup of Hybrid MT • We compared Hybrid MT versus a junior translator 120 Hybrid MT+PE is 30% more efficient Time taken to translate 100 80 (mins) Translator 60 Hybrid MT + PE 40 20 0 * Based on an assessment over 4 documents

Conclusion • Hybrid MT – High precision and high recall • Hybrid MT plus Post-edi7ng – Efficient in terms of both 7me and cost – Improves consistency • Customized MT for English-Arabic – Don’t normalize but always tokenize

References Ahmed Abdelali, Kareem Darwish, Nadir Durrani, and Hamdy Mubarak. • Farasa: A Fast and Furious Segmenter for Arabic. In NAACL-2016, San Diego, US. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello • Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Chris7ne Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constan7n, and Evan Herbst. Moses: Open source toolkit for sta7s7cal machine transla7on. In ACL-2007, Prague, Czech Republic Hassan Sajjad, Francisco Guzman, Preslav Nakov, Ahmed Abdelali, Kenton • Murray, Fahad Al Obaidli, and Stephan Vogel. QCRI at IWSLT 2013: Experiments in Arabic-English and English-Arabic Spoken Language Transla7on. In IWSLT-2013, Heidelberg, Germany

Thank you

Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on - PowerPoint PPT Presentation

An Empirical Study: Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on Hassan Sajjad , Francisco Guzman, Stephan Vogel Qatar Compu7ng Research Ins7tute, HBKU Introduc7on Old Arabic documents Transla7on of metadata from

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

4 English I CP or Honors Credits English II CP or Honors of English III CP or

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

BUDDHIST BIRTH-STORIES ARABIC I ARABIC II TIBETAN CHINESE SANSCRIT II SANSCRIT I BUDDHIST

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

Expressing I`rab: The Presentation of Arabic Grammatical Analysis Expressing I`rab: The

Dialect contact and change in an Arabic morpheme: The feminine ending in Jordan and Palestine

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud & John Payne The

Arabic Language Challenges Walid Magdy This lecture is not About Arabic language technologies

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Introduction to Radon October 15, 2019 Lisa A. Hbert, R.S., M.P.H. Radon Unit Massachusetts

GRADE TENS Entering Grade 11 in the Fall of 2019 Course Selection All grade 11 students are

Strategic Updates and Key Initiatives 2011-2015 STRATEGIC PLAN 1. LEADING CENTRE LEADING

Results Presentation The First Nine Months of the Year Ending March 31, 2014 Tokyo Broadcasting

A Pastor's Time Management: Accessibility vs. Personal Boundaries REV. DARYL G. BLOODSAW , D. MIN,

Family Ministry Elementary Department FAMILY GOSPEL REFLECTION MARCH 8, 2020 2ND SUNDAY OF LENT

Investor Briefing Trinity Energy Ltd Brief Background of Trinity Energy Ltd Local South

Sustainability at Trinity College Karen Misbach, CHMM EH&S Manager & Sustainability

Sambuz

Useful Links

Newsletter

Mail Us

Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on - PowerPoint PPT Presentation

An Empirical Study: Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on Hassan Sajjad , Francisco Guzman, Stephan Vogel Qatar Compu7ng Research Ins7tute, HBKU Introduc7on Old Arabic documents Transla7on of metadata from

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

4 English I CP or Honors Credits English II CP or Honors of English III CP or

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

BUDDHIST BIRTH-STORIES ARABIC I ARABIC II TIBETAN CHINESE SANSCRIT II SANSCRIT I BUDDHIST

Overview and Progress ICANN Singapore Meeting Task Force on Arabic Script IDNs (TF-AIDN) Middle

Expressing I`rab: The Presentation of Arabic Grammatical Analysis Expressing I`rab: The

Dialect contact and change in an Arabic morpheme: The feminine ending in Jordan and Palestine

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud &amp; John Payne The

Arabic Language Challenges Walid Magdy This lecture is not About Arabic language technologies

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Introduction to Radon October 15, 2019 Lisa A. Hbert, R.S., M.P.H. Radon Unit Massachusetts

GRADE TENS Entering Grade 11 in the Fall of 2019 Course Selection All grade 11 students are

Strategic Updates and Key Initiatives 2011-2015 STRATEGIC PLAN 1. LEADING CENTRE LEADING

Results Presentation The First Nine Months of the Year Ending March 31, 2014 Tokyo Broadcasting

A Pastor's Time Management: Accessibility vs. Personal Boundaries REV. DARYL G. BLOODSAW , D. MIN,

Family Ministry Elementary Department FAMILY GOSPEL REFLECTION MARCH 8, 2020 2ND SUNDAY OF LENT

Investor Briefing Trinity Energy Ltd Brief Background of Trinity Energy Ltd Local South

Sustainability at Trinity College Karen Misbach, CHMM EH&amp;S Manager &amp; Sustainability

Sambuz

Useful Links

Newsletter

Mail Us

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud & John Payne The

Sustainability at Trinity College Karen Misbach, CHMM EH&S Manager & Sustainability