1
NAIST at WAT 2014
Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014
Graham Neubig Nara Institute of Science and Technology (NAIST) 2014-10-4
Forest-to-String SMT for Asian Language Translation: NAIST at WAT - - PowerPoint PPT Presentation
NAIST at WAT 2014 Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014 Graham Neubig Nara Institute of Science and Technology (NAIST) 2014-10-4 1 NAIST at WAT 2014 Features of ASPEC Translation between languages with
1
NAIST at WAT 2014
Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014
Graham Neubig Nara Institute of Science and Technology (NAIST) 2014-10-4
2
NAIST at WAT 2014
Features of ASPEC
grammatical structures
流動 プラズマ を 正確 に 測定 する ため に 画像 を 再 構成 した 。 an image was reconstituted in order to measure flowing plasma correctly .
for the accurate measurement of plasma flow image was reconstructed .
3
NAIST at WAT 2014
Solution?: 2-step Translation Process
我々 は 科学 論文 を 翻訳 する 我々 翻訳 する 科学 論文 we translate scientific papers 我々 は 科学 論文 を 翻訳 する we translate science thesis we translate scientific papers
4
NAIST at WAT 2014
This is a lot of work... :(
How do I make good Japanese-English preordering rules?! How do I make good Japanese-Chinese preorderering rules?! What about error propagation? What if better preordering accuracy doesn't equal better translation accuracy?
5
NAIST at WAT 2014
Evidence
6
NAIST at WAT 2014
Our Solution: Tree-to-String Translation [Liu+ 06]
友達 と ご飯 を 食べ た SUF5 VP0-5 PP0-1 VP2-5 PP2-3 N2 P3 V4 N0 P1 VP4-5 a meal
a meal
x1 x0 x1 x0 my friend
my friend
x1 with x0 x1 x0
with
ate
ate
7
NAIST at WAT 2014
Requirements for a Tree-to-String Model
This is a test . It uses data .
これ は テスト です 。 データ を 使用 します 。
Parallel Corpus
Source Sentence Parser
Alignments
Rule Extraction Rule Scoring Optimization
Tree-to-String Model
8
NAIST at WAT 2014
Reducing our work load.
How do I make good Japanese-English preordering rules?! How do I make good Japanese-Chinese preorderering rules?! What about error propagation? What if better preordering accuracy doesn't equal better translation accuracy?
9
NAIST at WAT 2014
Forest-to-string Translation [Mi+ 08]
I saw a girl with a telescope
PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 2,7 NP 0,1
10
NAIST at WAT 2014
Travatar Toolkit
Available open source! http://phontron.com/travatar
11
NAIST at WAT 2014
NAIST WAT System
12
NAIST at WAT 2014
WAT Results
en-ja ja-en zh-ja ja-zh 10 20 30 40 50 BLEU en-ja ja-en zh-ja ja-zh 20 40 60 HUMAN Other NAIST
First place in all tasks!
+2.2 +2.7 +3.6 +1.8 +13.0 +15.0 +28.3 +3.8
13
NAIST at WAT 2014
System Elements
Same as [Neubig & Duh, ACL2014]
Recurrent Neural Net Language Model
Pre/post Processing (UNK splitting, transliteration) Dictionaries
14
NAIST at WAT 2014
Recurrent Neural Network LM
I can eat an apple </s>
15
NAIST at WAT 2014
Pre/post processing
UNK segmentation (ja-en)
球内部 球 内部 試験 管立て 試験 管 立て
Kanji Normalization (ja-zh, zh-ja)
イチョウ黄叶 イチョウ黄葉 臭気鉴定师 臭気鑑定師
Transliteration (ja-en)
Japan インテック Japan Intekku Dictionary addition (ja-en) 膿瘍 apostema 典型 archetype
16
NAIST at WAT 2014
Conclusion
17
NAIST at WAT 2014
Future Work
(Make Travatar so easy to use that others can use it to make really good MT systems for Asian languages.)
Starting soon! Training scripts to be available: http://phontron.com/project/wat2014