The HIT-LTRC Machine Translation System for IWSLT 2012 Xiaoning - PowerPoint PPT Presentation

The HIT-LTRC Machine Translation System for IWSLT 2012 � Xiaoning Zhu, Yiming Cui, Conghui Zhu, Tiejun Zhao and Hailong Cao Harbin Institute of Technology

Outline � • Introduction • System summary • Pialign • Experiments • Conclusion and future work � 2 � 12/6/12 �

Introduction � • Olympic shared task • Phrase-based model • Phrase table analysis • Phrase table combination – Pialign – Giza++ 3 � 12/6/12 �

System summary � • Training Giza++ � combine � model � Corpus � Pialign � • Decoding phrase post source target based process � text � text � decoder � 4 � 12/6/12 �

System summary � • Tools – Moses decoder – Giza++ for phrase extraction – Pialign for phrase extraction – SRILM for language model training – Mert for tuning 5 � 12/6/12 �

System summary � • Feature sets – Bidirectional translation probabilities – Bidirectional lexical translation probabilities – MSD-reordering model – Distortion model – Language model – Word penalty – Phrase penalty 6 � 12/6/12 �

Pialign � • Phrases of multiple granularities directly modeled � + No mismatch between alignment goal and final goal � + Completely probabilistic model, no heuristics � + Competitive accuracy, smaller phrase table � • Uses a hierarchical model for Inversion Transduction Grammars (ITG) � • Uses Bayesian non-parametric Pitman- Yor process �

Parameter Tuning of Pialign � § Samps (Sampling frequency) � – Small: cannot correctly reflect the translation knowledge � – Big: will produce a sampling bias � – Finally this value is set to 20 empirically � Sampling times � 1 � 20 � 80 � Phrase table scale � 382,137 � 1,413,367 � 2,005,941 �

Experiments � • Corpus – HIT_train – HIT_dev – BTEC_train – BTEC_dev Name � Corpus � # � Corpus 1 � BTEC_train+HIT_train � 72575 � Corpus 2 � Corpus 1 + BTEC_dev � 75552 � Corpus 3 � Corpus 2 + HIT_dev � 77609 � 9 � 12/6/12 �

Experiments � • Comparison of Giza++ and Pialign Corpus � align � total � common � different � 1 � Giza++ � 1182913 � 409443 � 773470 � Pialign � 1385520 � 976077 � 2 � Giza++ � 1208128 � 418788 � 789340 � Pialign � 1413367 � 994579 � 3 � Giza++ � 1236688 � 428377 � 808306 � Pialign � 1445577 � 1017200 � 10 � 12/6/12 �

Experiments � • Covering of test set # of phrases both in test set and in phrase table – c = # of phrases in test set Corpus � align � Chinese � English � 1 � Giza++ � 21.7% � 36.0% � Pialign � 23.6% � 38.3% � 2 � Giza++ � 21.7% � 36.1% � Pialign � 23.8% � 38.7% � 3 � Giza++ � 21.9% � 36.6% � Pialign � 23.9% � 38.9 � 11 � 12/6/12 �

Experiments � • Translation result with giza++ and pialign – After we tuned the parameters with HIT_dev, the result became worse.This may be caused by the mismatch between HIT_dev and HIT_train Corpus � align � Before tuning � After tuning � 1 � Giza++ � 20.76 � 19.97 � Pialign � 20.80 � 19.70 � 2 � Giza++ � 20.62 � 18.40 � Pialign � 21.20 � 19.66 � 3 � Giza++ � 20.51 � 15.52 � Pialign � 20.54 � 15.10 � 12 � 12/6/12 �

Experiments � • Phrase table combination – Linear Interpolation Interpolate BLEU% � parameter � 0.4 � 20.69 � 0.5 � 20.78 � 0.6 � 20.62 � 13 � 12/6/12 �

Conclusion and future work � • Tuning may not useful when the dev set not match the training set. • Pialign can get a better result with a little phrase table 14 � 12/6/12 �

The HIT-LTRC Machine Translation System for IWSLT 2012 Xiaoning - PowerPoint PPT Presentation

The HIT-LTRC Machine Translation System for IWSLT 2012 Xiaoning Zhu, Yiming Cui, Conghui Zhu, Tiejun Zhao and Hailong Cao Harbin Institute of Technology Outline Introduction System summary Pialign Experiments

FBK's Machine Translation Systems for IWSLT 2012's TED Lectures Nick Ruiz, Arianna Bisazza

NAISTs Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task Ryo

HILTI HIT INJECTION SYSTEM FOR REBAR APPLICATIONS Hilti HIT-RE 500 V3 Hilti HIT-RE 100

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

HIT MODERNIZATION PROJECT HHS // IHS HIT Modernization Project PROJECT PURPOSE The HHS IHS HIT

Introducing .. .. Introducing HIT- -4U 4U HIT Revised 03/07/2018 Revised 03/07/2018

Introducing .. .. Introducing HIT- -4G 4G HIT 03/09/2018 03/09/2018

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Algorithm X. Li, C. Bao, F. Baker March 2009 Abstract This document specifies an update to

Leveling for Non-Volatile Main Memories Haikun Liu , Yuanyuan Ye, Xiaofei Liao, Hai Jin, Yu Zhang,

Syntax-Directed Translation for Top-Down Parsing 1 Midterm next week during class online

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Semi-supervised Learning for Neural Machine Translation Yong Cheng joint work with Wei Xu,

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Sambuz

Useful Links

Newsletter

Mail Us