ML4HMT: DCU Teams Overview Tsuyoshi Okita Dublin City University

DCU Teams Overview ◮ Meta information ◮ DCU-Alignment: alignment information ◮ DCU-QE: quality information ◮ DCU-DA: domain ID information ◮ DCU-NPLM: latent variable information 2 / 21

Our Strategies MBR BLEU QE Lucy backbone decoder a b c confusion topic network QE Alignment NPLM construction monolingual d e f external knowledge word alignment IHMM TERalign A B C D monotonic consensus baseline DA NPLM DA+NPLM decoding Standard system This presentation shows tuning results combination (green) of blue lines. 3 / 21

System Combination Overview ◮ System combination [Matusov et al., 05; Rosti et al., 07] ◮ We focus on three technical topics 1. Minimum-Bayes Risk (MBR) decoder (with MERT tuning) 2. Monolingual word aligner 3. Monotonic (consensus) decoder (with MERT tuning) 4 / 21

System Combination Overview ◮ System combination [Matusov et al., 05; Rosti et al., 07] ◮ Given: Set of MT outputs ◮ We focus on three technical topics 1. Minimum-Bayes Risk (MBR) decoder (with MERT tuning) 2. Monolingual word aligner 3. Monotonic (consensus) decoder (with MERT tuning) 4 / 21

System Combination Overview ◮ System combination [Matusov et al., 05; Rosti et al., 07] ◮ Given: Set of MT outputs 1. Build a confusion network ◮ We focus on three technical topics 1. Minimum-Bayes Risk (MBR) decoder (with MERT tuning) 2. Monolingual word aligner 3. Monotonic (consensus) decoder (with MERT tuning) 4 / 21

System Combination Overview ◮ System combination [Matusov et al., 05; Rosti et al., 07] ◮ Given: Set of MT outputs 1. Build a confusion network ◮ Select a backbone by Minimum-Bayes Risk (MBR) decoder (with MERT tuning) ◮ We focus on three technical topics 1. Minimum-Bayes Risk (MBR) decoder (with MERT tuning) 2. Monolingual word aligner 3. Monotonic (consensus) decoder (with MERT tuning) 4 / 21

System Combination Overview ◮ System combination [Matusov et al., 05; Rosti et al., 07] ◮ Given: Set of MT outputs 1. Build a confusion network ◮ Select a backbone by Minimum-Bayes Risk (MBR) decoder (with MERT tuning) ◮ Run monolingual word aligner ◮ We focus on three technical topics 1. Minimum-Bayes Risk (MBR) decoder (with MERT tuning) 2. Monolingual word aligner 3. Monotonic (consensus) decoder (with MERT tuning) 4 / 21

System Combination Overview ◮ System combination [Matusov et al., 05; Rosti et al., 07] ◮ Given: Set of MT outputs 1. Build a confusion network ◮ Select a backbone by Minimum-Bayes Risk (MBR) decoder (with MERT tuning) ◮ Run monolingual word aligner 2. Run monotonic (consensus) decoder (with MERT tuning) ◮ We focus on three technical topics 1. Minimum-Bayes Risk (MBR) decoder (with MERT tuning) 2. Monolingual word aligner 3. Monotonic (consensus) decoder (with MERT tuning) 4 / 21

System Combination Overview they are normally on a week . Input 1 these are normally made in a week . Input 2 este himself go normally in a week . Input 3 these do usually in a week . Input 4 ⇓ 1. MBR decoding these are normally made in a week . Backbone(2) ⇓ 2. monolingual word alignment these are normally made in a week . Backbone(2) hyp(1) they S are normally on S a week . ***** D hyp(3) este S himself S go S normally S in a week . hyp(4) these do S usually S in a week . ***** D ⇓ 3. monotonic consensus decoding these are normally in a week . Output ***** 5 / 21

1. MBR Decoding 1. Given MT outputs, choose 1 sentence. ˆ E MBR = argmin E ′ ∈E R ( E ′ ) best � L ( E , E ′ ) P ( E | F ) = argmin E ′ ∈E E ′ ∈E E � (1 − BLEU E ( E ′ )) P ( E | F ) = argmin E ′ ∈E E ′ ∈E E = argmin E ′ ∈E ⎡ ⎡ ⎤ ⎤ ⎡ ⎤ B E 1 ( E 1 ) B E 2 ( E 1 ) B E 3 ( E 1 ) B E 4 ( E 1 ) P ( E 1 | F ) B E 1 ( E 2 ) B E 2 ( E 2 ) B E 3 ( E 2 ) B E 4 ( E 2 ) P ( E 2 | F ) ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ ⎣ 1 − ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ P ( E 3 | F ) . . . . . . ⎣ ⎦ ⎦ ⎣ ⎦ B E 1 ( E 4 ) B E 2 ( E 4 ) B E 3 ( E 4 ) B E 4 ( E 4 ) P ( E 4 | F ) 6 / 21

1. MBR Decoding they are normally on a week . Input 1 these are normally made in a week . Input 2 este himself go normally in a week . Input 3 these do usually in a week . Input 4 ⎡ ⎡ ⎤ ⎤ ⎡ ⎤ 1 . 0 0 . 259 0 . 221 0 . 245 0 . 25 0 . 267 1 . 0 0 . 366 0 . 377 0 . 25 ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ = argmin ⎣ 1 − ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ 0 . 25 . . . . . . ⎣ ⎦ ⎦ ⎣ ⎦ 0 . 245 0 . 366 0 . 346 1 . 0 0 . 25 = argmin [0 . 565 , 0 . 502 , 0 . 517 , 0 . 506] = ( Input 2) these are normally made in a week . Backbone(2) 7 / 21

2. Monolingual Word Alignment ◮ TER-based monolingual word alignment ◮ Same words in different sentence are aligned ◮ Proceeded in a pairwise manner: Input 1 and backbone, Input 3 and backbone, Input 4 and backbone. these are normally made in a week . Backbone(2) hyp(1) they S are normally on S a week . ***** D these are normally made in a week . Backbone(2) hyp(3) este S himself S go S normally S in a week . these are normally made in a week . Backbone(2) hyp(4) these do S usually S in a week . ***** D 8 / 21

3. Monotonic Consensus Decoding ◮ Monotonic consensus decoding is limited version of MAP decoding ◮ monotonic (position dependent) ◮ phrase selection depends on the position (local TMs + global LM) I � φ ( i | ¯ e best = arg max e i ) p LM ( e ) e i =1 = arg max e { φ (1 | these) φ (2 | are) φ (3 | normally) φ (4 |∅ ) φ (5 | in) φ (6 | a) φ (7 | week) p LM ( e ) , . . . } = these are normally in a week (1) 1 ||| these ||| 0.50 2 ||| are ||| 0.50 3 ||| normally ||| 0.50 1 ||| they ||| 0.25 2 ||| himself ||| 0.25 ... 1 ||| este ||| 0.25 2 ||| ∅ ||| 0.25 ... 9 / 21

System Combination with Extra Alignment Information Xiaofeng Wu, Tsuyoshi Okita, Josef van Genabith, Qun Liu Dublin City University

Table Of Contents 1. Overview 2. System Combination with IHMM 3. Experiments 4. Conclusions and Further Works 11 / 21

Objective ◮ Meta information ◮ Alignment information ◮ ML4HMT dataset includes alignment information when MT systems decode. ◮ Usual monolingual alignment in system combination do not use such external alignment information. 12 / 21

Standard System Combination Procedures ◮ Procedures: For given set of MT outputs, 1. (Standard approach) Choose backbone by a MBR decoder from MT outputs E . ˆ E MBR = argmin E ′ ∈E R ( E ′ ) best � L ( E , E ′ ) P ( E | F ) = argmin E ′ ∈E H (2) E ′ ∈E E � BLEU E ( E ′ ) P ( E | F ) = argmax E ′ ∈E H (3) E ′ ∈E E 2. Monolingual word alignment between the backbone and translation outputs in a pairwise manner(This becomes a confusion network). ◮ TER alignment [Sim et al., 06] ◮ IHMM alignment [He et al., 08] 3. Run the (monotonic) consensus decoding algorithm to choose the best path in the confusion network. 13 / 21

Our System Combination Procedures ◮ Procedures: For given set of MT outputs, 1. (Standard approach) Choose backbone by a MBR decoder from MT outputs E . ˆ E MBR = argmin E ′ ∈E R ( E ′ ) best � L ( E , E ′ ) P ( E | F ) = argmin E ′ ∈E H (4) E ′ ∈E E � BLEU E ( E ′ ) P ( E | F ) = argmax E ′ ∈E H (5) E ′ ∈E E 2. Monolingual word alignment with prior knowledge (about alignment links) between the backbone and translation outputs in a pairwise manner (This becomes a confusion network). 3. Run the (monotonic) consensus decoding algorithm to choose the best path in the confusion network. 14 / 21

IHMM Alignment [He et al., 08] ◮ Same as conventional HMM alignment [Vogel et al., 96] except ◮ Word semantic similarity and word surface similarity ◮ word semantic similarity: source word seq = hidden word seq K K � � p ( e ′ p ( f k | e i ) p ( e ′ p ( f k | e i ) p ( e ′ j | e i ) = j | f k , e i ) ≈ j | f k ) k =0 k =0 ◮ exact match, longest matched prefix, longest common subsequences ◮ “week” and “week” (exact match). ◮ “week” and “weeks” (longest matched prefix). ◮ “week” and “biweekly” (longest common subsequences) ◮ Distance-based distortion penalty. 15 / 21

Alignment Bias ◮ In (monotonic) consensus decoding, ◮ big weight for Lucy alignment and ◮ low weight for conflicting alignment with Lucy. ◮ This can be expressed as θ ψ logp ( E ψ | F ) p ( E ψ ) = (6) where ψ = 1 , . . . , N nodes denotes the current node at which the beam search arrived. θ ψ > 1 if a current node is Lucy alignment and θ ψ = 1 if a current node is not Lucy alignment. 16 / 21

Lucy Backbone ◮ We used the Lucy backbone since it seems better than other backbone. Devset(1000) Testset(3003) TER Backbone 8.1168 0.3351 7.1092 0.2596 Lucy Backbone 8.1328 0.3376 7.4546 0.2607 Table: TER Backbone selection results. 17 / 21

Extra Alignment Information Experiments Devset(1000) Testset(3003) θ ψ NIST BLEU NIST BLEU 1 8.1328 0.3376 7.4546 0.2607 1.2 8.1179 0.3355 7.2109 0.2597 1.5 8.1171 0.3355 7.4512 0.2578 2 8.1252 0.3360 7.4532 0.2558 4 8.1180 0.3354 7.3540 0.2569 10 8.1190 0.3354 7.1026 0.2557 Table: The Lucy backbone with tuning of θ ψ . 18 / 21

ML4HMT: DCU Teams Overview Tsuyoshi Okita Dublin City University - PDF document

ML4HMT: DCU Teams Overview Tsuyoshi Okita Dublin City University DCU Teams Overview Meta information DCU-Alignment: alignment information DCU-QE: quality information DCU-DA: domain ID information DCU-NPLM: latent variable

T HE DCU W OMEN IN L EADERSHIP I NITIATIVE Tuesday, 3 rd November 2015 Launch of DCU Women in

Extending the DCU-250 Gold Standard f-structure Bank H. B echara hbechara@computing.dcu.ie

Communications Networks (EE414) Dr. Conor McArdle: room S335, mcardlec@eeng.dcu.ie,

DCU at the NTCIR-14 OpenLiveQ-2 Task Piyush Arora & Gareth J.F. Jones ADAPT Centre, School of

TEAMS AGENDA Update on Leading Teams Values Exercise LEADING TEAMS Leadership

Investigating Conditional Data Value Under GDPR Harshvardhan J. Pandit TCD , Plamen Petkov DCU ,

Advances in semiconductor materials and device metrology CIICT 2007 DCU, 28 Aug 2007. Patrick

DCU-SAVASA Par,cipa,on and Reflec,ons Rami Albatal

Working in Teams Working in Teams May, 2001 Natural Work Teams Team Member Expectations Team

AND AND CONDI CONDITIO TIONI NING NG FOR TEAMS FOR TEAMS The only research-based,

Group or Team Level Intervention Can We Question the Importance of Teams at Workplace in Present

DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth J.F. Jones CNGL Centre for

ISSS Clinical Workshop - 2018 Clinical Supervision and Self-Injury: What can we learn from

A Graph-Based Definition of Distillation G.W. Hamilton and G. Mendel-Gleason School of Computing

Invited speaker presentation (DCU) Presentation April 2016 CITATIONS READS 0 39 1 author:

A Graph-Based Definition of Distillation Geoff Hamilton Gavin Mendel-Gleason

Region Proposal Network with Adaptive Convolution Thang Vu Hyunjun Jang Pham X.

Using Backbone.js with Drupal 7 and 8 VADIM MIRGOROD Front-end, 05/22/2013 Building Bridges,

The Glimpse of Detectron : Dynamic Forwarding and Routing in Modern Detectors Ziwei Liu

The Pan-European IPv6 IX Backbone Towards deployment of IPv6 in Telcos / ISPs Jordi Palet

Collective Impact for Youth Understanding how the principles of collective impact can support

Improving Transformer Optimization Through Better Initialization Xiao Shi Huang, Felipe Perez,

Deep Factors for Forecasting Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus, Dean

The backbone Rohalt Liebert technology A choice that Business Development Manager matters! Meet