Inducing a Discriminative Parser to Optimize Machine Translation - - PowerPoint PPT Presentation

inducing a discriminative parser to optimize machine
SMART_READER_LITE
LIVE PREVIEW

Inducing a Discriminative Parser to Optimize Machine Translation - - PowerPoint PPT Presentation

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 , Taro Watanabe 2 , Shinsuke Mori 1 1 2 3 now at 1 Inducing a


slide-1
SLIDE 1

1

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Graham Neubig1,2,3, Taro Watanabe2, Shinsuke Mori1

1 2 3

now at

slide-2
SLIDE 2

2

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Preordering

  • Long-distance reordering is a weak point of SMT
  • Preordering first reorders, then translates
  • A good preorderer will effectively find F' given F

he ate rice

F= F’= E=

kare wa gohan

  • tabeta

kare wa gohan

  • tabeta
slide-3
SLIDE 3

3

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Syntactic Preordering

  • Define rules over a syntactic parse of the source

F= kare wa gohan o tabeta

PRN wa N

  • V

waP

  • P

VP S

F'= kare wa

gohan o tabeta

PRN wa N

  • V

waP

  • P

VP S

he ate rice

E=

D= D'=

  • What if we don't have a parser in the source language?
slide-4
SLIDE 4

4

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Bracketing Transduction Grammars [Wu 97]

  • Binary CFGs with only straight (S) and inverted (I)

non-terminals, and pre-terminals (T)

  • Language independent
  • BTG tree uniquely defines a reordering

F= kare wa gohan o tabeta

T T T T T S S I S

F'= kare wa

gohan o tabeta

T T T T T S S I S

D= D'=

slide-5
SLIDE 5

5

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

3-Step BTG Grammar Training for Reordering [DeNero+ 11]

F= kare wa gohan o tabeta E= he rice ate A= kare wa gohan o tabeta

T T T T T S S I S

1) Bilingual Grammar Induction

Unsupervised Induction (Several Hand-Tuned Features)

2) Parser Training 3) Reorderer Training

Supervised Training (Max Tree Accuracy) Supervised Training (Max Label Accuracy) Parsing Model Reordering Model

Training

slide-6
SLIDE 6

6

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

3-Step BTG Grammar Induction for Reordering [DeNero+ 11]

F= kare wa gohan o tabeta kare wa gohan o tabeta

T T T T T X X X X

1) Parsing

Parsing Model Reordering Model

Testing

kare wa gohan o tabeta

T T T T T S S I S

2) Reordering

slide-7
SLIDE 7

7

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Our Work: Inducing a Parser to Optimize Reordering

  • What if we can reduce three steps to one, and directly

maximize ordering accuracy?

F= kare wa gohan o tabeta E= he rice ate A=

Supervised Learning (Max Reordering Accuracy) Parsing/Reordering Model

Training Testing

kare wa gohan o tabeta

T T T T T S S I S

F= kare wa gohan o tabeta

Parsing/Reordering Model

slide-8
SLIDE 8

8

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Optimization Framework

slide-9
SLIDE 9

9

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Optimization Framework

  • Input: Source sentence F
  • Output: Reordered source sentence F'
  • Latent: Bracketing transduction grammar derivation D

T T T T T S S I S

D= F'= F= kare wa gohan o tabeta

kare wa gohan o tabeta

slide-10
SLIDE 10

10

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Scores and Losses

  • Define a score over source sentences and derivations
  • Optimization finds a weight vector that minimizes loss

S(F ,D ; w )=∑i w i∗ϕi(F , D) argmin

w

∑F ,F ' L(F'

* ,argmax F ' ←F ,D S(F ,D ; w ))

slide-11
SLIDE 11

11

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Note: Latent Variable Ambiguity

  • Out of these, we want easy-to-reproduce trees
  • [DeNero+ 11] finds trees with bilingual parsing model
  • Our model discovers trees during training

kare wa gohan o tabeta

T T T T T S S I S

kare wa gohan o tabeta

T T T T T S S I S

kare wa gohan o tabeta kare wa gohan o tabeta

slide-12
SLIDE 12

12

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Training: Latent Online Learning

BTG Trees

  • racle

D=argmax

̃ D

S(F , ̃ D) ̂ D= argmax

̃ D∈argminD L(F ,D)S(F , ̃

D)

  • Find
  • model parse of maximal score and
  • oracle parse of maximal score among parses of minimal

loss

  • Adjust weights (example: perceptron)

w ←w+ϕ(F , ̂ D)−ϕ(F ,D)

slide-13
SLIDE 13

13

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Considering Loss in Online Learning

  • Consider loss (how bad is the mistake?)

reference (L=0)

  • Make it easy to choose trees with high loss in training

→To avoid high-loss trees, must give a large penalty

D=argmax

̃ D

S(F , ̃ D)+L(F , ̃ D)

kare wa gohan o tabeta kare wa gohan

  • tabeta

L=1 L=8 kare wa gohan tabeta

slide-14
SLIDE 14

14

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Parser

slide-15
SLIDE 15

15

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Parsing Setup: Standard Discriminative Parser

  • Features independent with respect to each node
  • Parsing, reordering possible in O(n3) with CKY
  • Multi-word pre-terminals allowed

kare wa gohan o tabeta

T T T T S I S

slide-16
SLIDE 16

16

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Language Independent Features

  • Lexical: Left, right, inside, outside, boundary words
  • Class: Same as lexical but induced classes
  • Phrase Table: Whether span exists in phrase table
  • Balance: Left branching or right branching?

I

7 12 39 12 5

  • No linguistic analysis necessary

kare wa gohan o tabeta

slide-17
SLIDE 17

17

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Language Dependent Features

  • POS Features: Same as

lexical, but over POSs

I

N wa N o V kare wa gohan o tabeta

I

kare wa gohan o tabeta

PRN wa N

  • V
  • P

waP VP VP

  • CFG Features: Whether

nodes match supervised parser's spans

slide-18
SLIDE 18

18

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Reordering Losses

slide-19
SLIDE 19

19

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Reordering Losses [Talbot+ 11]: Chunk Fragmentation

  • How many chunks are necessary to reproduce

reference?

System Reordering: Reference Reordering:

kare wa gohan

  • tabeta

<s> </s> kare wa gohan

  • tabeta

</s> <s>

Lchunk (F , ̃ D)=

Loss: Number of Chunks - 1

Achunk (F , ̃ D)=

Accuracy: 1 - (Number of Chunks - 1)/(J+1)

slide-20
SLIDE 20

20

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Reordering Losses [Talbot+ 11]: Kendall's Tau

  • How many pairs of reversed words?

System Reordering: Reference Reordering:

kare wa gohan

  • tabeta

kare wa gohan

  • tabeta

Ltau(F , ̃ D)=

Loss: Reversed Words

Atau(F , ̃ D)=

Accuracy: 1 - Reversed Word/Potential Reversed Words

slide-21
SLIDE 21

21

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Calculating Loss by Node

  • Large-margin training, must calculate loss efficiently
  • Can factor loss by node as well (detail in paper)

D=argmax

̃ D

S(F , ̃ D)+L(F , ̃ D)

S

gohan o tabeta

Lleft Lright Lbetween

kare wa

Tau

S

gohan

  • r
  • r

tabeta

Lleft Lright Lbetween

wa

  • r

kare

* * *

Chunk

slide-22
SLIDE 22

22

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Experiments

slide-23
SLIDE 23

23

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Experimental Setup

  • English-Japanese and Japanese-English translation
  • Data from the Kyoto Free Translation Task

Manually Aligned

sent. word (ja) word (en) RM-train 602 14.5k 14.3k RM-test 555 11.2k 10.4k LM/TM 329k 6.08M 5.91M tune 1166 26.8k 24.3k test 1160 28.5k 26.7k

slide-24
SLIDE 24

24

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Experimental Setup

  • Reordering Model Training:
  • 500 iterations
  • Using Pegasos with regularization constant 10-3
  • Default: chunk fragmentation loss, standard features
  • Translation: Moses with lexicalized reordering
  • Compare: Original order, 3-step training, the proposed

method

slide-25
SLIDE 25

25

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

en-ja ja-en 50 60 70 80 90 100

Tau

Orig 3-Step Proposed

Result: Proposed Model Improves Reordering

en-ja ja-en 50 60 70 80 90 100

Chunk

Orig 3-Step Proposed

  • Results for chunk fragmentation/Kendall's Tau
slide-26
SLIDE 26

26

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Result: Proposed Model Improves Translation

  • Results for BLEU and RIBES:

en-ja ja-en 60 65 70 75

RIBES

Orig 3-Step Proposed

en-ja ja-en 15 17 19 21 23 25

BLEU

Orig 3-Step Proposed

slide-27
SLIDE 27

27

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Result: Adding Linguistic Info (Generally) Helps

en-ja ja-en 60 65 70 75

RIBES

Orig Standard +POS +CFG

en-ja ja-en 15 17 19 21 23 25

BLEU

Orig Standard +POS +CFG

slide-28
SLIDE 28

28

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Result: Training Loss Affects Reordering

  • Optimized criterion is higher on test set as well

en-ja ja-en 50 60 70 80 90 100

Tau

Orig Chunk Tau Chunk+Tau

en-ja ja-en 50 60 70 80 90 100

Chunk

Orig Chunk Tau Chunk+Tau

slide-29
SLIDE 29

29

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Result: Training Loss Affects Translation

  • Optimizing chunk fragmentation generally gives best

results

en-ja ja-en 60 65 70 75

RIBES

Orig Chunk Tau Chunk+Tau

en-ja ja-en 15 17 19 21 23 25

BLEU

Orig Chunk Tau Chunk+Tau

slide-30
SLIDE 30

30

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Result: Automatic Alignments, Better than Nothing, Worse than Manual

en-ja ja-en 60 65 70 75

RIBES

Orig

  • Man. 602

Auto 602 Auto 10k

en-ja ja-en 15 17 19 21 23 25

BLEU

Orig

  • Man. 602

Auto 602 Auto 10k

slide-31
SLIDE 31

31

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Parsing Result

S T T T T T T T T T T T T I I I I I I I S S S S T

yoshimitsu ashikaga the 3rd seii taishogun was

  • f the muromachi shogunate

and reigned from 1368 to 1394

.

slide-32
SLIDE 32

32

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Parsing Result

S T T T T T T T T T T T T I I I I I I I S S S S T

yoshimitsu ashikaga the 3rd seii taishogun was

  • f the muromachi shogunate

and reigned from 1368 to 1394

.

Proper names Verb final Noun phrases Preposition →postposition Coordination

slide-33
SLIDE 33

33

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Conclusion

  • Presented a method to induce a discriminative parser

to optimize machine translation reordering

  • Favorable results for English ↔ Japanese
  • Future Work:
  • Development of better features
  • Incorporation into tree-to-string translation
  • Probabilistic inference

Available Open Source: http://www.phontron.com/lader

^ Will be!

slide-34
SLIDE 34

34

Inducing a Discriminative Parser to Optimize Machine Translation Reordering

Thank you!