experiments in english japanese tree to string machine
play

Experiments in EnglishJapanese Tree-to-String Machine Translation - PowerPoint PPT Presentation

Experiments in English-Japanese Tree-to-String Machine Translation Experiments in EnglishJapanese Tree-to-String Machine Translation Graham Neubig Nara Institute of Science and Technology 10/20/2012 1 Experiments in English-Japanese


  1. Experiments in English-Japanese Tree-to-String Machine Translation Experiments in English↔Japanese Tree-to-String Machine Translation Graham Neubig Nara Institute of Science and Technology 10/20/2012 1

  2. Experiments in English-Japanese Tree-to-String Machine Translation Introduction/Motivation 2

  3. Experiments in English-Japanese Tree-to-String Machine Translation Translation Models string string he visited the white house 彼 は ホワイト ハウス を 訪問 した tree (phrase structure) tree (phrase structure) S S VP PP PP VP to NP NP NP NP NP VP PRP VBD DT NNP NNP N P N N P N V he visited the white house 彼 は ホワイト ハウス を 訪問 した dependency dependency dobj subj det nsubj n n n dobj n n 3 he visited the white house 彼 は ホワイト ハウス を 訪問 した

  4. Experiments in English-Japanese Tree-to-String Machine Translation Recent Usage in English↔Japanese ● Phrase-based translation [Koehn+ 03] is still popular English: he visited the white house Japanese: 彼 は ホワイト ハウス を 訪問 した ● Moses used in 25 papers at NLP2012 ● Also, hierarchical phrase-based translation [Chiang 07] ([Feng+ 11] is one of the few examples) 4

  5. Experiments in English-Japanese Tree-to-String Machine Translation Recent Usage in English↔Japanese ● Pre-ordering [Xia+ 04] is another popular technique obj Source det subj Dependencies: adj he visited the white house Pre-ordering: subj v obj → subj obj v he the white house visited Translation: 彼 は ホワイト ハウス を 訪問 した ● First used for Japanese by [Komachi+ 06]? ● Used by Google [Xu+ 09], NTT [Isozaki+ 11], others [Nguyen+ 08, Neubig+ 12] 5

  6. Experiments in English-Japanese Tree-to-String Machine Translation Recent Usage in English↔Japanese ● Dependency-to-dependency used by Kyoto U [Nakazawa+ 06] and rule based systems dobj det nsubj nsubj dobj n he visited the white house X1 visited X2 X1 X2 訪問 した 彼 は ホワイト ハウス を 訪問 した n n n dobj dobj n subj 6

  7. Experiments in English-Japanese Tree-to-String Machine Translation Recent Usage in English↔Japanese ● String-to-tree models [Yamada+ 01] used by NTT in NTCIR task [Sudoh+ 11] 7

  8. Experiments in English-Japanese Tree-to-String Machine Translation Recent Usage in English↔Japanese string string (H)PBMT he visited the white house 彼 は ホワイト ハウス を 訪問 した S2T tree (phrase structure) tree (phrase structure) S S VP Pre- PP PP VP ordering NP NP NP NP NP VP PRP VBD DT NNP NNP N P N N P N V he visited the white house 彼 は ホワイト ハウス を 訪問 し dependency dependency dobj subj D2D det nsubj n n n dobj n n 8 he visited the white house 彼 は ホワイト ハウス を 訪問 した

  9. Experiments in English-Japanese Tree-to-String Machine Translation What about Tree-driven Models?! string string he visited the white house 彼 は ホワイト ハウス を 訪問 した tree (phrase structure) tree (phrase structure) S S VP T2S PP PP VP NP NP NP NP NP VP PRP VBD DT NNP NNP N P N N P N V he visited the white house 彼 は ホワイト ハウス を 訪問 し dependency dependency D2S dobj subj det nsubj n n n dobj n n 9 he visited the white house 彼 は ホワイト ハウス を 訪問 した

  10. Experiments in English-Japanese Tree-to-String Machine Translation Tree-to-String Models [Liu+ 06] x1 with x0 VP 0-5 VP 2-5 x1 x0 PP 0-1 PP 2-3 N 0 P 1 VP 4-5 N 2 P 3 V 4 SUF 5 友達 と ate ご飯 を 食べ た a meal a friend x1 x0 x1 x0 ate a meal with a friend 10

  11. Experiments in English-Japanese Tree-to-String Machine Translation Dependency-to-String Models [Quirk+ 05] dobj det nsubj nsubj dobj n he visited the white house X1 visited X2 X1 X2 訪問 した 彼 は ホワイト ハウス を 訪問 した 11

  12. Experiments in English-Japanese Tree-to-String Machine Translation T2S/D2S vs Phrase Based ● + Better reordering through use of syntactic structure ● + Very fast! (especially compared to HPBMT) ● + Better lexical choice because long-range context considered (especially D2S) ● - Requires a parser ● - Sensitive to parse errors 12

  13. Experiments in English-Japanese Tree-to-String Machine Translation T2S/D2S vs Pre-ordering ● + T2S/D2S jointly searches for reordering and translation ● + T2S/D2S can easily handle lexicalized reordering VP VP PP PP X X が 高い が 好き X is high likes X ● - Pre-ordering can find translation rules that overlap constituent boundaries 13

  14. Experiments in English-Japanese Tree-to-String Machine Translation T2S vs. D2S ● T2S: Can handle de-lexicalized rules = more general? S VP X1 X3 X2 X1:NP X3:NP (SVO → SOV) X2:VBD ● D2S: Dependent words are close → good for lexical choice? dobj dobj run a program run a marathon 14

  15. Experiments in English-Japanese Tree-to-String Machine Translation Experiments and Summary 15

  16. Experiments in English-Japanese Tree-to-String Machine Translation Question: How well do modern statistical tree-to- string methods work for English↔Japanese translation? 16

  17. Experiments in English-Japanese Tree-to-String Machine Translation Previous Research ● Three examples for En→Ja? ● [Quirk+ 06] Uses dependency treelet translation and shows improvement over PBMT ● [Wu+ 10] Uses HPSG input and shows improvement over Joshua (HPBMT) ● [DeNero+ 11] Shows forest-to-string does slightly better than syntactic pre-ordering in terms of BLEU ● One example for Ja→En? ● [Menezes+ 05] Uses dependency treelet translation, no direct comparison to other methods 17

  18. Experiments in English-Japanese Tree-to-String Machine Translation Experimental Setup ● System: In-house forest-to-string decoder “travatar” ● Forest-to-string translation [Mi+ 08] with tree transducers ● Alignment GIZA++, extraction GHKM, tuning MERT ● Data: Kyoto Free Translation Task (KFTT [Neubig 11]), ~350k sentences of Wikipedia data for training ● Baseline: Moses PBMT, PBMT + Preordering [Neubig+ 12] ● Evaluation: BLEU, RIBES, Acceptability (0-5) 18

  19. Experiments in English-Japanese Tree-to-String Machine Translation Tree-to-String Settings (Explained in Detail Later) ● Language Analysis: ● En Parser: Stanford, Berkeley, Egret (Tree, Forest) ● Ja: Juman+KNP, MeCab+Cabocha, KyTea+EDA ● Composed Rules: 1, 2, 3, 4 ● Non-terminals: 1, 2 , 3 ● Binarization: Left, Right ● Null Attachment: Top, Exhaustive ( 1 , 2) ● Tuning: BLEU, RIBES, (BLEU+RIBES)/2 19

  20. Experiments in English-Japanese Tree-to-String Machine Translation Summary (En-Ja) 21.5 69 68 21 67 20.5 66 20 RIBES BLEU 65 19.5 64 19 63 18.5 62 PBMT+Pre F2S PBMT+Pre F2S PBMT T2S PBMT T2S 3.2 3 Acceptability 2.8 2.6 2.4 2.2 PBMT+Pre F2S 20 PBMT T2S

  21. Experiments in English-Japanese Tree-to-String Machine Translation Summary (Ja-En) 17 65.5 16.8 65 16.6 64.5 16.4 64 RIBES BLEU 16.2 63.5 16 63 15.8 62.5 15.6 62 PBMT PBMT+Pre T2S PBMT PBMT+Pre T2S 3.2 3 Acceptability 2.8 2.6 2.4 2.2 PBMT PBMT+Pre T2S 21

  22. Experiments in English-Japanese Tree-to-String Machine Translation En-Ja F2S vs. PBMT+Pre Input: Department of Sociology in Faculty of Letters opened . PBMT+Pre: 開業 年 文学 部 社会 学科 。 F2S: 文学 部 社会 学 科 を 開設 。 Properly interprets noun phrase + verb 22

  23. Experiments in English-Japanese Tree-to-String Machine Translation En-Ja F2S vs. PBMT+Pre Input: Afterwards it was reconstructed but its influence declined . PBMT+Pre: その 後 衰退 し た が 、 その 影響 を 受け て 再建 さ れ た もの で あ る 。 F2S: その 後 再建 さ れ て い た が 、 影響 力 は 衰え た 。 Properly reconstructs relationship between two verb phrases 23

  24. Experiments in English-Japanese Tree-to-String Machine Translation En-Ja F2S vs. PBMT+Pre Input: Introduction of KANSAI THRU PASS Miyako Card PBMT+Pre: スルッと kansai 都 カード の 導入 F2S: 伝来 スルッと KANSAI 都 カード Parsing error: (NP (NP Introduction) (PP of KANSAI THRU PASS) (NP Miyako) (NP Card)) 24

  25. Experiments in English-Japanese Tree-to-String Machine Translation Ja-En T2S vs. PBMT+Pre Input: 史実 に は 直接 の 関係 は な い 。 PBMT+Pre: in the historical fact is not directly related to it . T2S: is not directly related to the historical facts . … ” Properly translates “ as “related to” に は 関係 が 25

  26. Experiments in English-Japanese Tree-to-String Machine Translation Ja-En T2S vs. PBMT+Pre Input: 九条 道家 は 嫡男 ・ 九条 教実 に 先立 た れ 、 次男 ・ 二 条 良実 は 事実 上 の 勘当 状態 に あ っ た 。 PBMT+Pre: michiie kujo was his eldest son and heir , norizane kujo , and his second son , yoshizane nijo was disinherited . T2S: michiie kujo to his legitimate son kujo norizane died before him , and the second son , nijo yoshizane was virtually disowned . Much better division between clauses 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend