Forest-to-String SMT for Asian Language Translation: NAIST at WAT - - PowerPoint PPT Presentation

forest to string smt for asian language translation naist
SMART_READER_LITE
LIVE PREVIEW

Forest-to-String SMT for Asian Language Translation: NAIST at WAT - - PowerPoint PPT Presentation

NAIST at WAT 2014 Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014 Graham Neubig Nara Institute of Science and Technology (NAIST) 2014-10-4 1 NAIST at WAT 2014 Features of ASPEC Translation between languages with


slide-1
SLIDE 1

1

NAIST at WAT 2014

Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014

Graham Neubig Nara Institute of Science and Technology (NAIST) 2014-10-4

slide-2
SLIDE 2

2

NAIST at WAT 2014

Features of ASPEC

  • Translation between languages with different

grammatical structures

流動 プラズマ を 正確 に 測定 する ため に 画像 を 再 構成 した 。 an image was reconstituted in order to measure flowing plasma correctly .

  • We all know: Phrase-based MT is not enough

for the accurate measurement of plasma flow image was reconstructed .

slide-3
SLIDE 3

3

NAIST at WAT 2014

Solution?: 2-step Translation Process

  • Pre-ordering [Weblio, SAS_MT, NII, TMU, NICT]
  • RBMT+Statistical Post Editing [TOSHIBA, EIWA]

我々 は 科学 論文 を 翻訳 する 我々 翻訳 する 科学 論文 we translate scientific papers 我々 は 科学 論文 を 翻訳 する we translate science thesis we translate scientific papers

slide-4
SLIDE 4

4

NAIST at WAT 2014

This is a lot of work... :(

How do I make good Japanese-English preordering rules?! How do I make good Japanese-Chinese preorderering rules?! What about error propagation? What if better preordering accuracy doesn't equal better translation accuracy?

slide-5
SLIDE 5

5

NAIST at WAT 2014

Evidence

slide-6
SLIDE 6

6

NAIST at WAT 2014

Our Solution: Tree-to-String Translation [Liu+ 06]

友達 と ご飯 を 食べ た SUF5 VP0-5 PP0-1 VP2-5 PP2-3 N2 P3 V4 N0 P1 VP4-5 a meal

a meal

x1 x0 x1 x0 my friend

my friend

x1 with x0 x1 x0

with

ate

ate

slide-7
SLIDE 7

7

NAIST at WAT 2014

Requirements for a Tree-to-String Model

This is a test . It uses data .

これ は テスト です 。 データ を 使用 します 。

Parallel Corpus

Source Sentence Parser

Alignments

Rule Extraction Rule Scoring Optimization

Tree-to-String Model

slide-8
SLIDE 8

8

NAIST at WAT 2014

Reducing our work load.

How do I make good Japanese-English preordering rules?! How do I make good Japanese-Chinese preorderering rules?! What about error propagation? What if better preordering accuracy doesn't equal better translation accuracy?

X X X

slide-9
SLIDE 9

9

NAIST at WAT 2014

Forest-to-string Translation [Mi+ 08]

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 2,7 NP 0,1

slide-10
SLIDE 10

10

NAIST at WAT 2014

Travatar Toolkit

  • Forest-to-string translation toolkit
  • Supports training, decoding
  • Includes preprocessing scripts for parsing, etc.
  • Many other features (optimization, Hiero, etc...)

Available open source! http://phontron.com/travatar

slide-11
SLIDE 11

11

NAIST at WAT 2014

NAIST WAT System

slide-12
SLIDE 12

12

NAIST at WAT 2014

WAT Results

en-ja ja-en zh-ja ja-zh 10 20 30 40 50 BLEU en-ja ja-en zh-ja ja-zh 20 40 60 HUMAN Other NAIST

First place in all tasks!

+2.2 +2.7 +3.6 +1.8 +13.0 +15.0 +28.3 +3.8

slide-13
SLIDE 13

13

NAIST at WAT 2014

System Elements

Travatar!

Same as [Neubig & Duh, ACL2014]

Recurrent Neural Net Language Model

Pre/post Processing (UNK splitting, transliteration) Dictionaries

slide-14
SLIDE 14

14

NAIST at WAT 2014

Recurrent Neural Network LM

  • Vector representation → robustness
  • Recurrent architecture → longer context

I can eat an apple </s>

slide-15
SLIDE 15

15

NAIST at WAT 2014

Pre/post processing

UNK segmentation (ja-en)

球内部 球 内部 試験 管立て 試験 管 立て

Kanji Normalization (ja-zh, zh-ja)

イチョウ黄叶 イチョウ黄葉 臭気鉴定师 臭気鑑定師

Transliteration (ja-en)

Japan インテック Japan Intekku Dictionary addition (ja-en) 膿瘍 apostema 典型 archetype

slide-16
SLIDE 16

16

NAIST at WAT 2014

Conclusion

slide-17
SLIDE 17

17

NAIST at WAT 2014

Future Work

LOSE at next year's WAT.

(Make Travatar so easy to use that others can use it to make really good MT systems for Asian languages.)

Starting soon! Training scripts to be available: http://phontron.com/project/wat2014