Machine Translation Evaluation (Based on Milo s Stanojevi cs - PowerPoint PPT Presentation

Machine Translation Evaluation (Based on Miloˇ s Stanojevi´ c’s slides) Iacer Calixto Institute for Logic, Language and Computation University of Amsterdam May 18, 2018 Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 1 / 18

Introduction Machine Translation Pipeline Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 2 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? • One that reads fluently in the target language? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? • One that reads fluently in the target language? • What about translating literature, e.g. Alice’s Adventures in Wonderland? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? • One that reads fluently in the target language? • What about translating literature, e.g. Alice’s Adventures in Wonderland? • Or a philosophical treatise, e.g. Beyond Good and Evil? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction Good Translations - Fluency vs. Adequacy • Let’s simplify the problem: • One axis of our evaluation should account for target-language fluency ; Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 4 / 18

Introduction Good Translations - Fluency vs. Adequacy • Let’s simplify the problem: • One axis of our evaluation should account for target-language fluency ; • Another axis should account for how adequate are the source-sentence “ units of meaning ” translated into the target language. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 4 / 18

Introduction Good Translations - Fluency vs. Adequacy • Let’s simplify the problem: • One axis of our evaluation should account for target-language fluency ; • Another axis should account for how adequate are the source-sentence “ units of meaning ” translated into the target language. • Examples: • The man is playing football (source sentence) • La femme joue au football ( ✓ fluent but ✗ adequate) • ✗ Le homme joue ✗ football ( ✗ fluent but ✓ adequate) • L’homme joue au football ( ✓ fluent and ✓ adequate) Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 4 / 18

Outline 1 Introduction 2 Outline 3 Motivation 4 Word-based Metrics 5 Feature-based Metric(s) 6 Wrap-up & Conclusions Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 5 / 18

Motivation Why Machine Translation Evaluation? • Why do we need automatic evaluation of MT output? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 6 / 18

Motivation Why Machine Translation Evaluation? • Why do we need automatic evaluation of MT output? • Rapid system development; • Tuning MT systems; • Comparing different systems; Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 6 / 18

Motivation Why Machine Translation Evaluation? • Why do we need automatic evaluation of MT output? • Rapid system development; • Tuning MT systems; • Comparing different systems; • Ideally we would like to incorporate human feedback too, but they are too expensive ... � Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 6 / 18

Motivation What is a Metric? • A function that computes the similarity between the output of an MT system (i.e. hypothesis or sys ) and one or more human translations (reference translations or ref ); Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 7 / 18

Motivation What is a Metric? • A function that computes the similarity between the output of an MT system (i.e. hypothesis or sys ) and one or more human translations (reference translations or ref ); • It can be interpreted in different ways: • Overlap between sys and ref : precision, recall... • Edit distance: insert, delete, shift; • Etc. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 7 / 18

Motivation What is a Metric? • A function that computes the similarity between the output of an MT system (i.e. hypothesis or sys ) and one or more human translations (reference translations or ref ); • It can be interpreted in different ways: • Overlap between sys and ref : precision, recall... • Edit distance: insert, delete, shift; • Etc. • Different metrics make different choices; Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 7 / 18

Word-based Metrics BLEU (Papineni et al., 2002) • Commonly, we set N = 4, w n = 1 N ; • BP stands for “Brevity Penalty” and is computed by: • c is the length of the candidate translation; • r is the effective reference corpus length. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 8 / 18

Word-based Metrics BLEU (cont.) • ref : john plays in the park (length = 5) • hyp : john is playing in the park (length = 6) • 1-gram : ✓ john ✗ is ✗ playing ✓ in ✓ the ✓ park • BP = 1 ( c > r ) • For N = 1: • w 1 = 1 1 = 1 • p 1 = 4 5 , therefore BLEU 1 = 1 · exp(1 · log 0 . 8) = 0 . 9. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 9 / 18

Machine Translation Evaluation (Based on Milo s Stanojevi cs - PowerPoint PPT Presentation

Machine Translation Evaluation (Based on Milo s Stanojevi cs slides) Iacer Calixto Institute for Logic, Language and Computation University of Amsterdam May 18, 2018 Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18,

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

History & Evaluation CMSC 470 Marine Carpuat T odays topics Machine Translation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Chapter 8 Evaluation Statistical Machine Translation Evaluation How good is a given machine

Extended Translation Models in Phrase-based Decoding Andreas Guta, Joern Wuebker, Miguel Graa,

Text generation: decoding / evaluation CS 685, Fall 2020 Advanced Natural Language Processing

Seagrass Seagrass canopy density Product description Spatial maps of seagrass canopy

Do-support in the parsed EME corpora: beyond Ellegrd () Aaron Ecay University of

Hybrid Adaptation of Named Entity Recognition for Statistical Machine Translation Vassilina

A Greedy Decoder for Phrase-Based Statistical Machine Translation Philippe Langlais, Alexandre

Capturing Translational Divergences with Zhechev & Andy Way a Statistical Tree-to-Tree

A F R A M E W O R K F O R A U T O M AT I C Q U E S T I O N G E N E R AT I O N F R O M T E X

Machine Translation Evaluation (Based on Milo s Stanojevi cs - PowerPoint PPT Presentation

Machine Translation Evaluation (Based on Milo s Stanojevi cs slides) Iacer Calixto Institute for Logic, Language and Computation University of Amsterdam May 18, 2018 Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18,

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

History &amp; Evaluation CMSC 470 Marine Carpuat T odays topics Machine Translation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Chapter 8 Evaluation Statistical Machine Translation Evaluation How good is a given machine

Extended Translation Models in Phrase-based Decoding Andreas Guta, Joern Wuebker, Miguel Graa,

Text generation: decoding / evaluation CS 685, Fall 2020 Advanced Natural Language Processing

Seagrass Seagrass canopy density Product description Spatial maps of seagrass canopy

Do-support in the parsed EME corpora: beyond Ellegrd () Aaron Ecay University of

Hybrid Adaptation of Named Entity Recognition for Statistical Machine Translation Vassilina

A Greedy Decoder for Phrase-Based Statistical Machine Translation Philippe Langlais, Alexandre

Capturing Translational Divergences with Zhechev &amp; Andy Way a Statistical Tree-to-Tree

A F R A M E W O R K F O R A U T O M AT I C Q U E S T I O N G E N E R AT I O N F R O M T E X

History & Evaluation CMSC 470 Marine Carpuat T odays topics Machine Translation

Capturing Translational Divergences with Zhechev & Andy Way a Statistical Tree-to-Tree