inducing a discriminative parser to optimize machine
play

Inducing a Discriminative Parser to Optimize Machine Translation - PowerPoint PPT Presentation

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 , Taro Watanabe 2 , Shinsuke Mori 1 1 2 3 now at 1 Inducing a


  1. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 , Taro Watanabe 2 , Shinsuke Mori 1 1 2 3 now at 1

  2. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Preordering ● Long-distance reordering is a weak point of SMT ● Preordering first reorders, then translates kare wa gohan o tabeta F= kare wa tabeta gohan o F’= E= he ate rice ● A good preorderer will effectively find F' given F 2

  3. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Syntactic Preordering ● Define rules over a syntactic parse of the source S VP D= waP oP S D'= VP PRN wa N o V waP oP F= kare wa gohan o tabeta PRN wa V N o F'= kare wa tabeta gohan o E= he ate rice ● What if we don't have a parser in the source language? 3

  4. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Bracketing Transduction Grammars [Wu 97] ● Binary CFGs with only straight (S) and inverted (I) non-terminals, and pre-terminals (T) S I D= S S S D'= I T T T T T S S F= kare wa gohan o tabeta T T T T T F'= kare wa tabeta gohan o ● Language independent ● BTG tree uniquely defines a reordering 4

  5. Inducing a Discriminative Parser to Optimize Machine Translation Reordering 3-Step BTG Grammar Training for Reordering [DeNero+ 11] F= kare wa gohan o tabeta Training A= he ate rice E= 1) Unsupervised Induction Bilingual (Several Hand-Tuned Features) Grammar S Induction I S S T T T T T kare wa gohan o tabeta Supervised Training Supervised Training (Max Label Accuracy) 2) 3) (Max Tree Accuracy) Parser Reorderer Reordering Model Training Training Parsing Model 5

  6. Inducing a Discriminative Parser to Optimize Machine Translation Reordering 3-Step BTG Grammar Induction for Reordering [DeNero+ 11] F= kare wa gohan o tabeta Testing 1) Parsing Parsing Model X X X X T T T T T kare wa gohan o tabeta 2) Reordering Model Reordering S I S S T T T T T 6 kare wa tabeta gohan o

  7. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Our Work: Inducing a Parser to Optimize Reordering ● What if we can reduce three steps to one, and directly maximize ordering accuracy? Testing Training F= kare wa gohan o tabeta F= kare wa gohan o tabeta A= he ate rice E= Parsing/Reordering Model Supervised Learning (Max Reordering Accuracy) S I S S Parsing/Reordering T T T T T Model kare wa tabeta gohan o 7

  8. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Optimization Framework 8

  9. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Optimization Framework ● Input: Source sentence F F= kare wa gohan o tabeta ● Output: Reordered source sentence F' F'= kare wa tabeta gohan o ● Latent: Bracketing transduction grammar derivation D S I D= S S T T T T T 9

  10. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Scores and Losses ● Define a score over source sentences and derivations S ( F ,D ; w )= ∑ i w i ∗ϕ i ( F , D ) ● Optimization finds a weight vector that minimizes loss * , argmax ∑ F ,F ' L ( F' argmin F ' ← F ,D S ( F ,D ; w )) w 10

  11. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Note: Latent Variable Ambiguity S S S I I S S S T T T T T T T T T T kare wa gohan o tabeta kare wa gohan o tabeta kare wa tabeta gohan o kare wa tabeta gohan o ● Out of these, we want easy-to-reproduce trees ● [DeNero+ 11] finds trees with bilingual parsing model ● Our model discovers trees during training 11

  12. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Training: Latent Online Learning ● Find ● model parse of maximal score and ● oracle parse of maximal score among parses of minimal loss BTG Trees oracle S ( F , ̃ ̂ D ∈ argmin D L ( F ,D ) S ( F , ̃ D = argmax D ) D = argmax D ) ̃ ̃ D ● Adjust weights (example: perceptron) w ← w +ϕ( F , ̂ D )−ϕ( F ,D ) 12

  13. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Considering Loss in Online Learning ● Consider loss (how bad is the mistake?) kare wa tabeta gohan o reference (L=0) kare wa gohan tabeta o L=1 L=8 o gohan tabeta wa kare ● Make it easy to choose trees with high loss in training →To avoid high-loss trees, must give a large penalty S ( F , ̃ D )+ L ( F , ̃ D = argmax D ) ̃ D 13

  14. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Parser 14

  15. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Parsing Setup: Standard Discriminative Parser ● Features independent with respect to each node ● Parsing, reordering possible in O(n 3 ) with CKY ● Multi-word pre-terminals allowed S I S T T T T kare wa gohan o tabeta 15

  16. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Language Independent Features ● No linguistic analysis necessary I kare wa gohan o tabeta 7 12 39 12 5 ● Lexical: Left, right, inside, outside, boundary words ● Class: Same as lexical but induced classes ● Phrase Table: Whether span exists in phrase table ● Balance: Left branching or right branching? 16

  17. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Language Dependent Features I I kare wa gohan o tabeta kare wa gohan o tabeta PRN wa N o V N wa N o V waP oP VP VP ● POS Features: Same as ● CFG Features: Whether lexical, but over POSs nodes match supervised parser's spans 17

  18. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Reordering Losses 18

  19. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Reordering Losses [Talbot+ 11]: Chunk Fragmentation ● How many chunks are necessary to reproduce reference? System <s> kare wa gohan o tabeta </s> Reordering: Reference <s> </s> kare wa tabeta gohan o Reordering: Loss: L chunk ( F , ̃ D )= Number of Chunks - 1 Accuracy: A chunk ( F , ̃ D )= 1 - (Number of Chunks - 1)/(J+1) 19

  20. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Reordering Losses [Talbot+ 11]: Kendall's Tau ● How many pairs of reversed words? System kare wa gohan o tabeta Reordering: Reference kare wa tabeta gohan o Reordering: Loss: L tau ( F , ̃ D )= Reversed Words Accuracy: A tau ( F , ̃ D )= 1 - Reversed Word/Potential Reversed Words 20

  21. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Calculating Loss by Node ● Large-margin training, must calculate loss efficiently S ( F , ̃ D )+ L ( F , ̃ D = argmax D ) ̃ D ● Can factor loss by node as well (detail in paper) S Tau Chunk S wa gohan kare wa gohan o tabeta or or * kare o L left * * L right or tabeta L left L right L between 21 L between

  22. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Experiments 22

  23. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Experimental Setup ● English-Japanese and Japanese-English translation ● Data from the Kyoto Free Translation Task sent. word (ja) word (en) RM-train 602 14.5k 14.3k Manually RM-test 555 11.2k 10.4k Aligned LM/TM 329k 6.08M 5.91M tune 1166 26.8k 24.3k test 1160 28.5k 26.7k 23

  24. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Experimental Setup ● Reordering Model Training: ● 500 iterations ● Using Pegasos with regularization constant 10 -3 ● Default: chunk fragmentation loss, standard features ● Translation: Moses with lexicalized reordering ● Compare: Original order, 3-step training, the proposed method 24

  25. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Proposed Model Improves Reordering ● Results for chunk fragmentation/Kendall's Tau Orig Orig Tau Chunk 100 3-Step 100 3-Step Proposed Proposed 90 90 80 80 70 70 60 60 50 50 en-ja ja-en en-ja ja-en 25

  26. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Proposed Model Improves Translation ● Results for BLEU and RIBES: Orig Orig BLEU RIBES 25 75 3-Step 3-Step Proposed Proposed 23 70 21 19 65 17 15 60 en-ja ja-en en-ja ja-en 26

  27. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Adding Linguistic Info (Generally) Helps Orig Orig BLEU RIBES Standard Standard 25 75 +POS +POS +CFG +CFG 23 70 21 19 65 17 15 60 en-ja ja-en en-ja ja-en 27

  28. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Training Loss Affects Reordering ● Optimized criterion is higher on test set as well Orig Orig Chunk Tau Chunk Chunk 100 100 Tau Tau Chunk+Tau Chunk+Tau 90 90 80 80 70 70 60 60 50 50 en-ja ja-en en-ja ja-en 28

  29. Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Training Loss Affects Translation ● Optimizing chunk fragmentation generally gives best results Orig Orig BLEU RIBES Chunk Chunk 25 75 Tau Tau Chunk+Tau Chunk+Tau 23 70 21 19 65 17 15 60 en-ja ja-en en-ja ja-en 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend