Machine Translation for Human Translators Carnegie Mellon Ph.D. - PowerPoint PPT Presentation

Hierarchical Phrase-Based Translation Example X X Pourtant , X X Yet X X la v´ erit´ e selon moi in my view the truth F E Pourtant , la v´ erit´ e est ailleurs selon moi . Yet in my view the truth 20

Hierarchical Phrase-Based Translation Example S S X X X X Pourtant , X estailleurs X . Yet X , X lieselsewhere . la v´ erit´ e selon moi in my view the truth F E Pourtant , la v´ erit´ e est ailleurs selon moi . Yet in my view , the truth lies elsewhere . 20

Model Parameterization Ambiguity: many ways to translate the same source phrase Add feature scores that encode properties of translation: � X − → devis quote 0.5 10 -137 ... � X − → devis estimate 0.4 13 -261 ... � X − → devis specifications 0.2 5 -407 ... Decoder uses feature scores and weights to select the most likely translation derivation. 21

Linear Translation Models Single feature score for a translation derivation with rule-local features h i ∈ H i : X → ¯ � � � ¯ � H i ( D ) = h i f e X → ¯ f / ¯ e ∈ D Score for a derivation using several features H i ∈ H with weight vector w i ∈ W : | H | � S ( D ) = w i H i ( D ) i = 1 Decoder selects translation with largest product W · H 22

Linear Translation Models Single feature score for a translation derivation with rule-local features h i ∈ H i : X → ¯ � � � ¯ � H i ( D ) = h i f e X → ¯ f / ¯ e ∈ D Score for a derivation using several features H i ∈ H with weight vector w i ∈ W : | H | � S ( D ) = w i H i ( D ) i = 1 Decoder selects translation with largest product W · H � sentence-level prediction step 22

Learning Translations Learning translations 23

Translation Model Estimation Sentence-parallel bilingual text F E Devis de garage en quatre ´ etapes. A shop’s estimate in four steps. With Avec l’outil Auda-Taller, l’entreprise the AudaTaller tool, Audatex guaran- Audatex garantit que l’usager ob- tees that the user gets an estimate tient un devis en seulement qua- in only 4 steps: identify the vehi- tre ´ etapes : identifier le v´ ehicule, cle, look for the spare part, create an chercher la pi` ece de rechange, cr´ eer estimate and generate an estimate. un devis et le g´ en´ erer. La facilit´ e User friendliness is an essential con- d’utilisation est un ´ el´ ement essentiel dition for these systems, especially to de ces syst` emes, surtout pour conva- convincing older technicians, who, to incre les professionnels les plus ˆ ag´ es varying degrees, are usually more re- qui, dans une plus ou moins grande luctant to use new management tech- mesure, sont r´ etifs ` a l’utilisation de niques. nouvelles techniques de gestion. 24

Translation Model Estimation Sentence-parallel bilingual text F E Devis de garage en quatre ´ etapes. A shop’s estimate in four steps. With Avec l’outil Auda-Taller, l’entreprise the AudaTaller tool, Audatex guaran- Audatex garantit que l’usager ob- tees that the user gets an estimate tient un devis en seulement qua- in only 4 steps: identify the vehi- tre ´ etapes : identifier le v´ ehicule, cle, look for the spare part, create an chercher la pi` ece de rechange, cr´ eer estimate and generate an estimate. un devis et le g´ en´ erer. La facilit´ e User friendliness is an essential con- d’utilisation est un ´ el´ ement essentiel dition for these systems, especially to de ces syst` emes, surtout pour conva- convincing older technicians, who, to incre les professionnels les plus ˆ ag´ es varying degrees, are usually more re- qui, dans une plus ou moins grande luctant to use new management tech- mesure, sont r´ etifs ` a l’utilisation de niques. nouvelles techniques de gestion. Each sentence is a training instance 24

Model Estimation: Word Alignment Brown et al. (1993), Dyer et al. (2013) F ´ Devis de garage en quatre etapes A shop ’s estimate in four steps E 25

Model Estimation: Phrase Extraction Koehn et al. (2003), Och and Ney (2004), Och et al. (1999) E estimate steps shop four in A ’s Devis • de • • garage • F en • quatre • ´ etapes • 26

Model Estimation: Phrase Extraction Koehn et al. (2003), Och and Ney (2004), Och et al. (1999) E estimate de garage steps shop four a shop ’s in A ’s Devis • de • • garage • F en • en quatre ´ etapes quatre • ´ etapes • in four steps 26

Model Estimation: Hierarchical Phrase Extraction Chiang (2007) E elsewhere truth view lies Yet the my in , . Pourtant • , • la • v´ erit´ e • F est • ailleurs • selon • • moi • . • 27

Model Estimation: Hierarchical Phrase Extraction Chiang (2007) E elsewhere truth view lies Yet the my in , . Pourtant • , • la • v´ erit´ e • F est • ailleurs • selon • • moi • . • la v´ erit´ e est ailleurs selon moi . in my view , the truth lies elsewhere . − → 27

Model Estimation: Hierarchical Phrase Extraction Chiang (2007) E elsewhere truth view lies Yet the my in , . Pourtant • , • la • X 1 v´ erit´ e • F est • ailleurs • selon • • moi • . • X 2 la v´ erit´ e est ailleurs selon moi . in my view , the truth lies elsewhere . − → X 1 est ailleurs X 2 . − → X 2 , X 1 lies elsewhere . 27

Model Estimation: Hierarchical Phrase Extraction Chiang (2007) � sentence-level rule learning E elsewhere truth view lies Yet the my in , . Pourtant • , • la • X 1 v´ erit´ e • F est • ailleurs • selon • • moi • . • X 2 la v´ erit´ e est ailleurs selon moi . in my view , the truth lies elsewhere . − → X 1 est ailleurs X 2 . − → X 2 , X 1 lies elsewhere . 27

Parameterization: Feature Scoring → ¯ Add feature functions to rules X − f / ¯ e : X → � N ¯ i = 1 f / ¯ e Training Data Corpus Stats Scored Grammar Translate (Global) Sentence Input Static Sentence 28

Parameterization: Feature Scoring → ¯ Add feature functions to rules X − f / ¯ e : X → � N ¯ i = 1 f / ¯ e Training Data Corpus Stats Scored Grammar Translate (Global) Sentence Input Static Sentence × corpus-level rule scoring 28

Suffix Array Grammar Extraction Brown (1996), Callison-Burch et al. (2005), Lopez (2008) Static Training Data Suffix Array X → � N ¯ f / ¯ e i = 1 SA Sample Sample Stats Grammar Translate (Sentence) Sentence Input Sentence 29

Scoring via Sampling Suffix array statistics available in sample S for each source ¯ f : c S (¯ e ) : count of instances where ¯ f , ¯ f is aligned to ¯ e (co-occurrence count) c S (¯ f ) : count of instances where ¯ f is aligned to any target | S | : total number of instances (equal to occurrences of ¯ f in training data, up to the sample size) Used to calculate feature scores for each rule at the time of extraction 30

Scoring via Sampling Suffix array statistics available in sample S for each source ¯ f : c S (¯ e ) : count of instances where ¯ f , ¯ f is aligned to ¯ e (co-occurrence count) c S (¯ f ) : count of instances where ¯ f is aligned to any target | S | : total number of instances (equal to occurrences of ¯ f in training data, up to the sample size) Used to calculate feature scores for each rule at the time of extraction × sentence-level grammar extraction, but static training data 30

Overview Online learning for statistical MT Translation model review Real time model adaptation Simulated post-editing Post-editing software and experiments Kent State live post-editing Automatic metrics for post-editing Meteor automatic metric Evaluation and optimization for post-editing Conclusion and Future Work 31

Online Grammar Extraction Denkowski et al. (EACL 2014) Static Training Data Suffix Array X → � N ¯ f / ¯ e i = 1 Sample Sample Stats Grammar Translate (Sentence) Sentence Input Sentence 32

Online Grammar Extraction Denkowski et al. (EACL 2014) Dynamic Static Training Data Suffix Array Lookup Table Post-Edit Sentence X → � N ¯ f / ¯ e i = 1 Sample Sample Stats Grammar Translate (Sentence) Sentence Input Sentence 32

Online Grammar Extraction Denkowski et al. (EACL 2014) Maintain dynamic lookup table for post-edit data Pair each sample S from suffix array with exhaustive lookup L from lookup table Parallel statistics available at grammar scoring time: c L (¯ e ) : count of instances where ¯ f , ¯ f is aligned to ¯ e (co-occurrence count) c L (¯ f ) : count of instances where ¯ f is aligned to any target | L | : total number of instances (equal to occurrences of ¯ f in post-edit data, no limit) 33

Rule Scoring Denkowski et al. (EACL 2014) Suffix array feature set (Lopez 2008) Phrase features encode likelihood of translation rule given training data Features scored with S : CoherentP(e|f) = c S (¯ f , ¯ e ) | S | Count(f,e) = c S (¯ f , ¯ e ) SampleCount(f) = | S | 34

Rule Scoring Denkowski et al. (EACL 2014) Suffix array feature set (Lopez 2008) Phrase features encode likelihood of translation rule given training data Features scored with S and L : CoherentP(e|f) = c S (¯ e ) + c L (¯ f , ¯ f , ¯ e ) | S | + | L | Count(f,e) = c S (¯ e ) + c L (¯ f , ¯ f , ¯ e ) SampleCount(f) = | S | + | L | 34

Rule Scoring Denkowski et al. (EACL 2014) Indicator features identify certain classes of rules Features scored with S : � c S (¯ 1 f ) = 1 Singleton(f) = 0 otherwise � c S (¯ f , ¯ 1 e ) = 1 Singleton(f,e) = 0 otherwise 35

Rule Scoring Denkowski et al. (EACL 2014) Indicator features identify certain classes of rules Features scored with S and L : � c S (¯ f ) + c L (¯ 1 f ) = 1 Singleton(f) = 0 otherwise � c S (¯ e ) + c L (¯ f , ¯ f , ¯ 1 e ) = 1 Singleton(f,e) = 0 otherwise � c L (¯ 1 f , ¯ e ) > 0 PostEditSupport(f,e) = 0 otherwise 35

Parameter Optimization Denkowski et al. (EACL 2014) Choose feature weights that maximize objective function (BLEU score) on a development corpus Minimum error rate training (MERT) (Och, 2003): Translate Optimize 36

Parameter Optimization Denkowski et al. (EACL 2014) Choose feature weights that maximize objective function (BLEU score) on a development corpus Minimum error rate training (MERT) (Och, 2003): Translate Optimize Margin infused relaxed algorithm (MIRA) (Chiang 2012): Update Translate Truth 36

Post-Editing with Standard MT Denkowski et al. (EACL 2014) Static X → w i ... w n f / e Large LM Weights Grammar Input Sentence Post-Editing Decoder 37

Post-Editing with Adaptive MT Denkowski et al. (EACL 2014) Static LM Large Bitext Dynamic w i ... w n Weights PE Data X → f / e TM Input Sentence Post-Editing Decoder 38

Overview How can we build systems without translators in the loop? 39

Simulated Post-Editing Denkowski et al. (EACL 2014) Incremental training data Hello voicemail, my old ... Hola contestadora ... He llamado a servicio ... I’ve called for tech ... Ignor´ e la advertencia ... I ignored my boss’ ... Ahora anochece, y mi ... Now it’s evening, and ... Todav´ ıa sigo en espera ... I’m still on hold. I’m ... No creo que me hayas ... I don’t think you ... Ya he presionado cada ... I punched every touch ... Target (Reference) Source Use pre-generated references in place of post-editing (Hardt and Elming, 2010) Build, evaluate, and deploy adaptive systems using only standard training data 41

Simulated Post-Editing Experiments Denkowski et al. (EACL 2014) MT System ( cdec ) Hierarchical phrase-based model using suffix arrays Large 4-gram language model MIRA optimization Model Adaptation Update TM and weights independently and in conjunction Training Data WMT12 Spanish–English and NIST 2012 Arabic–English Evaluation Data WMT/NIST news (standard test sets) TED talks (totally blind out-of-domain test) 42

Simulated Post-Editing Experiments Denkowski et al. (EACL 2014) Spanish–English Arabic–English 36 28 Baseline Baseline 26 Grammar Grammar 34 MIRA 24 MIRA Both Both 22 32 BLEU Score BLEU Score 20 30 18 16 28 14 12 26 10 WMT TED1 TED2 NIST TED1 TED2 43

Simulated Post-Editing Experiments Denkowski et al. (EACL 2014) Spanish–English Arabic–English 36 28 Baseline Baseline 26 Grammar Grammar 34 MIRA 24 MIRA Both Both 22 32 BLEU Score BLEU Score 20 30 18 16 28 14 12 26 10 WMT TED1 TED2 NIST TED1 TED2 Up to 1.7 BLEU improvement over static baseline 43

Recent Work How can we better leverage incremental data? 44

Translation Model Combination Denkowski (AMTA 2014 Workshop on Interactive and Adaptive MT) cdec (Dyer et al., 2010) Single translation model updated with new data Single feature set that changes over time (summation) Moses (Koehn et al., 2007) Multiple translation models: background and post-editing Per-feature linear interpolation in context of full system Recent additions to Moses toolkit Dynamic suffix array phrase tables (Germann, 2014) Fast MIRA implementation (Cherry and Foster, 2012) Multiple phrase tables with runtime weight updates (Denkowski, 2014) 45

Translation Model Combination Denkowski (AMTA 2014 Workshop on Interactive and Adaptive MT) Spanish–English Arabic–English 38 30 Baseline Baseline 37 PE Support PE Support Multi Model Multi Model 36 25 +MIRA +MIRA 35 BLEU Score BLEU Score 34 20 33 32 15 31 30 10 WMT TED1 TED2 NIST TED1 TED2 46

Translation Model Combination Denkowski (AMTA 2014 Workshop on Interactive and Adaptive MT) Spanish–English Arabic–English 38 30 Baseline Baseline 37 PE Support PE Support Multi Model Multi Model 36 25 +MIRA +MIRA 35 BLEU Score BLEU Score 34 20 33 32 15 31 30 10 WMT TED1 TED2 NIST TED1 TED2 Up to 4.9 BLEU improvement over static baseline 46

Related Work: Learning from Post-Editing Updating translation grammars with post-editing data Cache-based translation and language models (Nepveu et al., 2004; Bertoldi et al., 2013) Store sufficient statistics in grammar (Ortiz-Mart´ ınez et al., 2010) Distinguish between background and post-editing data (Hardt and Elming, 2010) Updating feature weights during decoding Various online learning algorithms to update MERT weights (Mart´ ınez-G´ omez et al., 2012; L´ opez-Salcedo et al., 2012) Algorithm for learning from binary classification examples (Saluja et al., 2012) 47

Tools for Human Translators 49

TransCenter Post-Editing Interface Denkowski and Lavie (AMTA 2012), Denkowski et al. (HaCat 2014) 50

Post-Editing Field Test Denkowski et al. (HaCat 2014) Experimental Setup Six translation studies students from Kent State University post-edited MT output Text: 4 excerpts from TED talks translated from Spanish into English (100 sentences total) Two excerpts translated by static system, two by adaptive system (shuffled by user) Record post-editing effort (HTER) and translator rating 55

Post-Editing Field Test Denkowski et al. (HaCat 2014) Results Adaptive system significantly outperforms static baseline Small improvement in simulated scenario leads to significant improvement in production HTER ↓ Rating ↑ Sim PE BLEU ↑ Baseline 19.26 4.19 34.50 Adaptive 17.01 4.31 34.95 56

Related Work: Computer-Aided Translation Tools Translation software suites CASMACAT project: full-featured open source translator’s workbench software (Ortiz-Mart´ ınez et al., 2012) MateCat project: enterprise-grade workbench with MT integration and project management (Federico, 2014; Cattelan, 2014) Novel CAT approaches Streamlined interface with both phrase prediction and post-editing (Green, 2014) Effectiveness of monolingual post-editing assisted by word alignments (Schwartz, 2014) 57

System Optimization Parameter optimization (MIRA) Choose feature weights W that maximizes objective on tuning set Automatic metrics approximate human evaluation of MT output against reference translations Adequacy-based evaluation Good translations should be semantically similar to references Several adequacy-driven research efforts: ACL WMT (Callison-Burch et al., 2011) NIST OpenMT (Przybocki et al., 2009) 59

Standard MT Evaluation Standard BLEU metric based on N -gram precision ( P ) (Papineni et al., 2002) Matches spans of hypothesis E ′ against reference E Surface forms only, depends on multiple references to capture translation variation (expensive) Jointly measures word choice and order � N | E ′ | > | E | � � 1 1 � BLEU = BP × exp N log P n BP = 1 −| E | | E ′ | ≤ | E | | E ′| e n = 1 60

Standard MT Evaluation Shortcomings of BLEU metric (Banerjee and Lavie 2005, Callison-Burch et al., 2007): Evaluating surface forms misses correct translations N -grams have no notion of global coherence 61

Standard MT Evaluation Shortcomings of BLEU metric (Banerjee and Lavie 2005, Callison-Burch et al., 2007): Evaluating surface forms misses correct translations N -grams have no notion of global coherence E : The large home 61

Standard MT Evaluation Shortcomings of BLEU metric (Banerjee and Lavie 2005, Callison-Burch et al., 2007): Evaluating surface forms misses correct translations N -grams have no notion of global coherence E : The large home E ′ 1 : A big house BLEU = 0 E ′ 2 : I am a dinosaur BLEU = 0 61

Post-Editing Final translations must be human quality (editing required) Good MT output should require less work for humans to edit Human-targeted translation edit rate (HTER, Snover et al., 2006) Human translators correct MT output 1 Automatically calculate number of edits using TER 2 TER = # edits | E | Edits: insertion, deletion, substitution, block shift “Better” translations not always easier to post-edit 62

Machine Translation for Human Translators Carnegie Mellon Ph.D. - PowerPoint PPT Presentation

Machine Translation for Human Translators Carnegie Mellon Ph.D. Thesis Michael Denkowski Language Technologies Institute School of Computer Science Carnegie Mellon University April 20, 2015 Thesis Committee: Alon Lavie (chair), Carnegie

Is machine translation ripe for EU translators? Josep Bonet Head of the IT Unit Paris,

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Workshop on statistical machine translation for curious translators Vctor M. Snchez-Cartagena

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Machine Translation for Human Translators Carnegie Mellon Ph.D. Thesis Proposal Michael Denkowski

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

INVARIANT GRAPH SUBSPACES OF A J -SELF-ADJOINT OPERATOR IN THE FESHBACH CASE Alexander K.

We Have a DREAM: Distributed Reac5ve Programming with

On the design of When we design message-authentication codes hash functions, stream ciphers,

Interative Hybrid Probabilistic Model Counting Steffen Michels, Arjen Hommersom, and Peter Lucas

CSE 1030 Inheritance Yves Lesp erance Often, we need to define a class that is similar to an

Distributed Systems Principles and Paradigms Chapter 08 (version October 5, 2007 ) Maarten van

30. Parallel Programming I Moores Law and the Free Lunch, Hardware Architectures, Parallel

CORTONA 2004 Updating incomplete factorizations for PDEs DANIELE BERTACCINI Universit` a di