Experiments in EnglishJapanese Tree-to-String Machine Translation - - PowerPoint PPT Presentation

experiments in english japanese tree to string machine
SMART_READER_LITE
LIVE PREVIEW

Experiments in EnglishJapanese Tree-to-String Machine Translation - - PowerPoint PPT Presentation

Experiments in English-Japanese Tree-to-String Machine Translation Experiments in EnglishJapanese Tree-to-String Machine Translation Graham Neubig Nara Institute of Science and Technology 10/20/2012 1 Experiments in English-Japanese


slide-1
SLIDE 1

1

Experiments in English-Japanese Tree-to-String Machine Translation

Experiments in English↔Japanese Tree-to-String Machine Translation

Graham Neubig Nara Institute of Science and Technology

10/20/2012

slide-2
SLIDE 2

2

Experiments in English-Japanese Tree-to-String Machine Translation

Introduction/Motivation

slide-3
SLIDE 3

3

Experiments in English-Japanese Tree-to-String Machine Translation

Translation Models

he visited the white house 彼 は ホワイト ハウス を 訪問 した he visited the white house

PRP VBD DT NNP NNP NP VP S NP NP

he visited the white house

dobj nsubj n det

string tree (phrase structure) dependency string

彼 は ホワイト ハウス を 訪問 した

N P N N P N V NP NP VP PP PP VP S

tree (phrase structure) dependency

彼 は ホワイト ハウス を 訪問 した

subj dobj n n n n

to

slide-4
SLIDE 4

4

Experiments in English-Japanese Tree-to-String Machine Translation

Recent Usage in English↔Japanese

  • Phrase-based translation [Koehn+ 03] is still popular
  • Moses used in 25 papers at NLP2012
  • Also, hierarchical phrase-based translation [Chiang

07] ([Feng+ 11] is one of the few examples)

English: he visited the white house Japanese: 彼 は ホワイト ハウス を 訪問 した

slide-5
SLIDE 5

5

Experiments in English-Japanese Tree-to-String Machine Translation

Recent Usage in English↔Japanese

  • Pre-ordering [Xia+ 04] is another popular technique
  • First used for Japanese by [Komachi+ 06]?
  • Used by Google [Xu+ 09], NTT [Isozaki+ 11], others

[Nguyen+ 08, Neubig+ 12]

彼 は ホワイト ハウス を 訪問 した he visited the white house

  • bj

subj det adj

Source Dependencies: Pre-ordering:

he the white house visited subj v obj → subj obj v

Translation:

slide-6
SLIDE 6

6

Experiments in English-Japanese Tree-to-String Machine Translation

Recent Usage in English↔Japanese

  • Dependency-to-dependency used by Kyoto U

[Nakazawa+ 06] and rule based systems

he visited the white house

dobj nsubj n det

彼 は ホワイト ハウス を 訪問 した

subj dobj n n n n

訪問 した

dobj

X1 X2 X1 visited X2

nsubj dobj

slide-7
SLIDE 7

7

Experiments in English-Japanese Tree-to-String Machine Translation

Recent Usage in English↔Japanese

  • String-to-tree models [Yamada+ 01] used by NTT in

NTCIR task [Sudoh+ 11]

slide-8
SLIDE 8

8

Experiments in English-Japanese Tree-to-String Machine Translation

Recent Usage in English↔Japanese

he visited the white house 彼 は ホワイト ハウス を 訪問 した he visited the white house

PRP VBD DT NNP NNP NP VP S NP NP

he visited the white house

dobj nsubj n det

string tree (phrase structure) dependency string

彼 は ホワイト ハウス を 訪問 し

N P N N P N V NP NP VP PP PP VP S

tree (phrase structure) dependency

彼 は ホワイト ハウス を 訪問 した

subj dobj n n n n

(H)PBMT

Pre-

  • rdering

D2D S2T

slide-9
SLIDE 9

9

Experiments in English-Japanese Tree-to-String Machine Translation

What about Tree-driven Models?!

he visited the white house 彼 は ホワイト ハウス を 訪問 した he visited the white house

PRP VBD DT NNP NNP NP VP S NP NP

he visited the white house

dobj nsubj n det

string tree (phrase structure) dependency string

彼 は ホワイト ハウス を 訪問 し

N P N N P N V NP NP VP PP PP VP S

tree (phrase structure) dependency

彼 は ホワイト ハウス を 訪問 した

subj dobj n n n n

T2S D2S

slide-10
SLIDE 10

10

Experiments in English-Japanese Tree-to-String Machine Translation

Tree-to-String Models [Liu+ 06]

VP0-5 PP0-1 VP2-5 PP2-3 N2 P3 V4 N0 P1 友達 と ご飯 を 食べ た SUF5 VP4-5 x1 with x0 x1 x0 a friend a meal ate

ate a meal with a friend

x1 x0 x1 x0

slide-11
SLIDE 11

11

Experiments in English-Japanese Tree-to-String Machine Translation

Dependency-to-String Models [Quirk+ 05]

he visited the white house

dobj nsubj n det

彼 は ホワイト ハウス を 訪問 した 訪問 した X1 X2 X1 visited X2

nsubj dobj

slide-12
SLIDE 12

12

Experiments in English-Japanese Tree-to-String Machine Translation

T2S/D2S vs Phrase Based

  • + Better reordering through use of syntactic structure
  • + Very fast! (especially compared to HPBMT)
  • + Better lexical choice because long-range context

considered (especially D2S)

  • - Requires a parser
  • - Sensitive to parse errors
slide-13
SLIDE 13

13

Experiments in English-Japanese Tree-to-String Machine Translation

T2S/D2S vs Pre-ordering

  • + T2S/D2S jointly searches for reordering and translation
  • + T2S/D2S can easily handle lexicalized reordering
  • - Pre-ordering can find translation rules that overlap

constituent boundaries

X が 好き

PP VP

likes X X が 高い

PP VP

X is high

slide-14
SLIDE 14

14

Experiments in English-Japanese Tree-to-String Machine Translation

T2S vs. D2S

  • T2S: Can handle de-lexicalized rules = more general?

X2:VBD VP S X3:NP X1:NP

X1 X3 X2

(SVO → SOV)

  • D2S: Dependent words are close → good for lexical

choice?

dobj

run a program

dobj

run a marathon

slide-15
SLIDE 15

15

Experiments in English-Japanese Tree-to-String Machine Translation

Experiments and Summary

slide-16
SLIDE 16

16

Experiments in English-Japanese Tree-to-String Machine Translation

Question: How well do modern statistical tree-to- string methods work for English↔Japanese translation?

slide-17
SLIDE 17

17

Experiments in English-Japanese Tree-to-String Machine Translation

Previous Research

  • Three examples for En→Ja?
  • [Quirk+ 06] Uses dependency treelet translation and

shows improvement over PBMT

  • [Wu+ 10] Uses HPSG input and shows improvement
  • ver Joshua (HPBMT)
  • [DeNero+ 11] Shows forest-to-string does slightly better

than syntactic pre-ordering in terms of BLEU

  • One example for Ja→En?
  • [Menezes+ 05] Uses dependency treelet translation, no

direct comparison to other methods

slide-18
SLIDE 18

18

Experiments in English-Japanese Tree-to-String Machine Translation

Experimental Setup

  • System: In-house forest-to-string decoder “travatar”
  • Forest-to-string translation [Mi+ 08] with tree transducers
  • Alignment GIZA++, extraction GHKM, tuning MERT
  • Data: Kyoto Free Translation Task (KFTT [Neubig 11]),

~350k sentences of Wikipedia data for training

  • Baseline: Moses PBMT, PBMT + Preordering [Neubig+

12]

  • Evaluation: BLEU, RIBES, Acceptability (0-5)
slide-19
SLIDE 19

19

Experiments in English-Japanese Tree-to-String Machine Translation

Tree-to-String Settings (Explained in Detail Later)

  • Language Analysis:
  • En Parser: Stanford, Berkeley, Egret (Tree, Forest)
  • Ja: Juman+KNP, MeCab+Cabocha, KyTea+EDA
  • Composed Rules: 1, 2, 3, 4
  • Non-terminals: 1, 2, 3
  • Binarization: Left, Right
  • Null Attachment: Top, Exhaustive (1, 2)
  • Tuning: BLEU, RIBES, (BLEU+RIBES)/2
slide-20
SLIDE 20

20

Experiments in English-Japanese Tree-to-String Machine Translation

Summary (En-Ja)

PBMT PBMT+Pre T2S F2S 18.5 19 19.5 20 20.5 21 21.5 BLEU PBMT PBMT+Pre T2S F2S 62 63 64 65 66 67 68 69 RIBES PBMT PBMT+Pre T2S F2S 2.2 2.4 2.6 2.8 3 3.2 Acceptability

slide-21
SLIDE 21

21

Experiments in English-Japanese Tree-to-String Machine Translation

Summary (Ja-En)

PBMT PBMT+Pre T2S 15.6 15.8 16 16.2 16.4 16.6 16.8 17 BLEU PBMT PBMT+Pre T2S 62 62.5 63 63.5 64 64.5 65 65.5 RIBES PBMT PBMT+Pre T2S 2.2 2.4 2.6 2.8 3 3.2 Acceptability

slide-22
SLIDE 22

22

Experiments in English-Japanese Tree-to-String Machine Translation

En-Ja F2S vs. PBMT+Pre

Input: Department of Sociology in Faculty of Letters opened . PBMT+Pre: 開業 年 文学 部 社会 学科 。 F2S: 文学 部 社会 学 科 を 開設 。 Properly interprets noun phrase + verb

slide-23
SLIDE 23

23

Experiments in English-Japanese Tree-to-String Machine Translation

En-Ja F2S vs. PBMT+Pre

Input: Afterwards it was reconstructed but its influence declined . PBMT+Pre: その 後 衰退 し た が 、 その 影響 を 受け て 再建 さ れ た もの で あ る 。 F2S: その 後 再建 さ れ て い た が 、 影響 力 は 衰え た 。 Properly reconstructs relationship between two verb phrases

slide-24
SLIDE 24

24

Experiments in English-Japanese Tree-to-String Machine Translation

En-Ja F2S vs. PBMT+Pre

Input: Introduction of KANSAI THRU PASS Miyako Card PBMT+Pre: スルッと kansai 都 カード の 導入 F2S: 伝来 スルッと KANSAI 都 カード Parsing error:

(NP (NP Introduction) (PP of KANSAI THRU PASS) (NP Miyako) (NP Card))

slide-25
SLIDE 25

25

Experiments in English-Japanese Tree-to-String Machine Translation

Ja-En T2S vs. PBMT+Pre

Input: 史実 に は 直接 の 関係 は な い 。 PBMT+Pre: in the historical fact is not directly related to it . T2S: is not directly related to the historical facts . Properly translates “ … ” に は 関係 が as “related to”

slide-26
SLIDE 26

26

Experiments in English-Japanese Tree-to-String Machine Translation

Ja-En T2S vs. PBMT+Pre

Input: 九条 道家 は 嫡男 ・ 九条 教実 に 先立 た れ 、 次男 ・ 二 条 良実 は 事実 上 の 勘当 状態 に あ っ た 。 PBMT+Pre: michiie kujo was his eldest son and heir , norizane kujo , and his second son , yoshizane nijo was disinherited . T2S: michiie kujo to his legitimate son kujo norizane died before him , and the second son , nijo yoshizane was virtually disowned . Much better division between clauses

slide-27
SLIDE 27

27

Experiments in English-Japanese Tree-to-String Machine Translation

Ja-En T2S vs. PBMT+Pre

Input: 日本 語 日本 文学 科 1474 年 ~ 1478 年 - 山名 政 豊 PBMT+Pre: the department of japanese language and literature in 1474 to 1478 - masatoyo yamana T2S: japanese language and literature masatoyo yamana 1474 shokoku-ji in -

Errors due to more restrictive rule extraction (first example), parse errors (second example, “Yamana” is a single noun phrase)

slide-28
SLIDE 28

28

Experiments in English-Japanese Tree-to-String Machine Translation

Effect of Language Analysis

slide-29
SLIDE 29

29

Experiments in English-Japanese Tree-to-String Machine Translation

Question: How much do the language analysis tools used effect translation?

slide-30
SLIDE 30

30

Experiments in English-Japanese Tree-to-String Machine Translation

Language Analysis (En-Ja):

  • Which parser provides better translations?
  • Stanford Parser, Berkeley Parser, Egret (a clone of the

Berekely parser that can output forests)

PBMT PBMT+Pre Stanford Berkeley Egret Egret+F2S 62 63 64 65 66 67 68 69 RIBES PBMT PBMT+Pre Stanford Berkeley Egret Egret+F2S 18.5 19 19.5 20 20.5 21 21.5 BLEU

slide-31
SLIDE 31

31

Experiments in English-Japanese Tree-to-String Machine Translation

Language Analysis (Ja-En):

  • 3 morphological/dependency analysis combinations
  • Use head rules to change dependency into CFG
  • For bunsetsu-based, last content word is head
  • Punctuation dependencies reversed

Juman+KNP MeCab+CaboCha KyTea+EDA

Segmentation Long

Medium Short

OOV

Simple Simple Model

Parsing Unit

Bunsetsu Bunsetsu Word

Algorithm

CKY-Style Cascaded Chunking MST

slide-32
SLIDE 32

32

Experiments in English-Japanese Tree-to-String Machine Translation

Language Analysis (Ja-En):

PBMT PBMT+Pre Juman+KNP MeCab+CaboCha KyTea+EDA 58 59 60 61 62 63 64 65 66 RIBES PBMT PBMT+Pre Juman+KNP MeCab+CaboCha KyTea+EDA 5 10 15 20 BLEU PBMT PBMT+Pre Juman+KNP MeCab+CaboCha KyTea+EDA 2.2 2.4 2.6 2.8 3 3.2 Acceptability

slide-33
SLIDE 33

33

Experiments in English-Japanese Tree-to-String Machine Translation

EDA vs. KNP/CaboCha

Input: 向嶽寺派 祇園女御妹-後に平忠盛妻 MeCab+CaboCha: 向嶽寺 school

祇園女御 younger sister : later became the wife of taira no tadamori

KyTea+EDA: kogaku-ji temple school gion no nyogo younger sister - , later taira no tadamori 's wife Smaller, more accurate segmentation provides better translations (EDA)

slide-34
SLIDE 34

34

Experiments in English-Japanese Tree-to-String Machine Translation

EDA vs. CaboCha/KNP

Input: 大宮学舎旧守衛所 文学部社会学科を設置 MeCab+CaboCha:

former omiya campus . office

department of faculty of letters society was established . KyTea+EDA:

  • miya campus former guard office

department of sociology , faculty of letters was established . Word-based noun-phrase parsing helps translation (EDA)

slide-35
SLIDE 35

35

Experiments in English-Japanese Tree-to-String Machine Translation

EDA vs. CaboCha/KNP

Input: 芳崖と雅邦はともに地方の狩野派系絵師の家の出身であった。 MeCab+CaboCha:

hogai and gaho both was from a family of local painters of the kano school .

KyTea+EDA: hogai and gaho from the family of the region of the kano together school series painter . CaboCha/KNP wins followed no clear pattern. This case: CaboCha: “ → ” とみに 出身 EDA: “ → ” ともに 地方

slide-36
SLIDE 36

36

Experiments in English-Japanese Tree-to-String Machine Translation

CaboCha vs. KNP

Input: 谷万太郎 1391年-山名氏清 1392年~1394年-畠山基国 JUMAN/KNP:

taro million tani

in 1391 , - the yamana clan

  • in 1392 - 1394 hatakeyama ) province

MeCab+CaboCha: mantaro tani 1391 , : ujikiyo yamana 1392 1394 : motokuni hatakeyama Most prominent wins for CaboCha were segmentation

slide-37
SLIDE 37

37

Experiments in English-Japanese Tree-to-String Machine Translation

Conclusion

  • Egret is best for English, and forests are important.
  • KyTea+EDA is best for Japanese
  • At the moment, morphological analysis is more

important than parsing?

  • Future directions:
  • Forest-based parser!
  • Better bunsetsu→word dependency conversion rules
slide-38
SLIDE 38

38

Experiments in English-Japanese Tree-to-String Machine Translation

Other Settings

slide-39
SLIDE 39

39

Experiments in English-Japanese Tree-to-String Machine Translation

Question: What other settings have a significant effect on translation results?

slide-40
SLIDE 40

40

Experiments in English-Japanese Tree-to-String Machine Translation

Composed Rules

  • Combine two minimal rules into larger rules:

VP2-5 PP2-3 N2 P3 V4 ご飯 を 食べ た SUF5 VP4-5 x1 x0 ate VP2-5 PP2-3 N2 P3 V4 ご飯 を 食べ た SUF5 VP4-5 ate x0

slide-41
SLIDE 41

41

Experiments in English-Japanese Tree-to-String Machine Translation

Composed Rules (En-Ja)

  • Composed rules are very important

PBMT PBMT+Pre Comp 1 Comp 2 Comp 3 Comp 4 15 16 17 18 19 20 21 22 BLEU PBMT PBMT+Pre Comp 1 Comp 2 Comp 3 Comp 4 62 63 64 65 66 67 68 69 RIBES

slide-42
SLIDE 42

42

Experiments in English-Japanese Tree-to-String Machine Translation

Number of Non-Terminals

VP2-5 PP2-3 N2 P3 を VP4-5 V4 食べ た SUF5 VP4-5 VP2-5 PP2-3 N2 P3 V4 を 食べ た SUF5 VP4-5 ate x0 ate x1 x0

0 NT 1 NT 2 NT

slide-43
SLIDE 43

43

Experiments in English-Japanese Tree-to-String Machine Translation

Number of Non-Terminals (En-Ja)

  • 2 Non-terminals are necessary, but more are harmful
  • Why? Larger are more noisy?

PBMT PBMT+Pre NT 1 NT 2 NT 3 NT 4 16 17 18 19 20 21 22 BLEU PBMT PBMT+Pre NT 1 NT 2 NT 3 NT 4 62 63 64 65 66 67 68 69 RIBES

slide-44
SLIDE 44

44

Experiments in English-Japanese Tree-to-String Machine Translation

Binarization (En-Ja)

the White House NP the White House NP NP' the White House NP NP'

None Right Left

  • Right or left much better than none
  • In general right > left for En-Ja, left > right for Ja-En

ホワイト ハウス ホワイト ハウス ホワイト ハウス

slide-45
SLIDE 45

45

Experiments in English-Japanese Tree-to-String Machine Translation

Tuning

  • Two evaluation measures:
  • BLEU correlated with fluency
  • RIBES correlated with adequacy
  • Tune both of these measures with MERT
  • Also, might be worth considering both [Duh+ 12], so

we use linear combination BLEU+RIBES also

slide-46
SLIDE 46

46

Experiments in English-Japanese Tree-to-String Machine Translation

Tuning

BLEU RIBES BLEU+RIBES 16 17 18 19 20 21 BLEU BLEU RIBES BLEU+RIBES 66.5 67 67.5 68 68.5 RIBES

En-Ja Ja-En

BLEU RIBES BLEU+RIBES 15.6 15.8 16 16.2 16.4 16.6 16.8 BLEU BLEU RIBES BLEU+RIBES 62.5 63 63.5 64 64.5 65 65.5 RIBES

slide-47
SLIDE 47

47

Experiments in English-Japanese Tree-to-String Machine Translation

Conclusion

slide-48
SLIDE 48

48

Experiments in English-Japanese Tree-to-String Machine Translation

Insights

  • How well does tree-to-string work for En-Ja, Ja-En?
  • As well as phrase-based with pre-ordering [Neubig+ 12]
  • Forest-to-string translation works better for En-Ja
  • Egret worked best for English-Japanese KyTea+EDA

worked the best for Japanese-English

  • For Ja-En we need:
  • Better morphological analysis!
  • Pass multiple morphological analysis results to parsing!
  • n-best or forest based parser!
slide-49
SLIDE 49

49

Experiments in English-Japanese Tree-to-String Machine Translation

Thank You!