Non-projective Dependency-based Pre-Reordering with Recurrent Neural - - PowerPoint PPT Presentation

non projective dependency based pre reordering with
SMART_READER_LITE
LIVE PREVIEW

Non-projective Dependency-based Pre-Reordering with Recurrent Neural - - PowerPoint PPT Presentation

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions Non-projective Dependency-based Pre-Reordering with Recurrent Neural Network for Machine Translation Antonio Valerio Miceli Barone Giuseppe


slide-1
SLIDE 1

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Non-projective Dependency-based Pre-Reordering with Recurrent Neural Network for Machine Translation

Antonio Valerio Miceli Barone Giuseppe Attardi

University of Pisa, Italy

Jun 4, 2015

Antonio Valerio Miceli Barone, Giuseppe Attardi 1 / 15

slide-2
SLIDE 2

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Pre-reordering for statistical machine translation

Phrase-based machine translation quality is affected by the amount of reordering between languages [Birch et al. 2009]

Antonio Valerio Miceli Barone, Giuseppe Attardi 2 / 15

slide-3
SLIDE 3

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Pre-reordering for statistical machine translation

Phrase-based machine translation quality is affected by the amount of reordering between languages [Birch et al. 2009] Pre-reordering source sentences improves quality

Antonio Valerio Miceli Barone, Giuseppe Attardi 2 / 15

slide-4
SLIDE 4

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Pre-reordering for statistical machine translation

Phrase-based machine translation quality is affected by the amount of reordering between languages [Birch et al. 2009] Pre-reordering source sentences improves quality die Katze hat die Frau gekauft . the woman has bought the cat .

Antonio Valerio Miceli Barone, Giuseppe Attardi 2 / 15

slide-5
SLIDE 5

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Pre-reordering for statistical machine translation

Phrase-based machine translation quality is affected by the amount of reordering between languages [Birch et al. 2009] Pre-reordering source sentences improves quality die Katze hat die Frau gekauft . die Frau hat gekauft die Katze . the woman has bought the cat .

Antonio Valerio Miceli Barone, Giuseppe Attardi 2 / 15

slide-6
SLIDE 6

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Pre-reordering approaches

Syntax-based hand-coded rules [Collins et al. 2005, ...]

High quality improvement Require extensive language-specific linguistic expertise

Antonio Valerio Miceli Barone, Giuseppe Attardi 3 / 15

slide-7
SLIDE 7

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Pre-reordering approaches

Syntax-based hand-coded rules [Collins et al. 2005, ...]

High quality improvement Require extensive language-specific linguistic expertise

Syntax-based statistical [Genzel, 2010, ...]

Trained on word alignments, exploit syntax Only tree-local swaps, only constituency or projective dependency syntax

Antonio Valerio Miceli Barone, Giuseppe Attardi 3 / 15

slide-8
SLIDE 8

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Pre-reordering approaches

Syntax-based hand-coded rules [Collins et al. 2005, ...]

High quality improvement Require extensive language-specific linguistic expertise

Syntax-based statistical [Genzel, 2010, ...]

Trained on word alignments, exploit syntax Only tree-local swaps, only constituency or projective dependency syntax

Syntax-free statistical [Tromble and Eisner, 2009, ...]

Trained on word alignments Don’t exploit syntax, only word pair features

Antonio Valerio Miceli Barone, Giuseppe Attardi 3 / 15

slide-9
SLIDE 9

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

auf diese Weise wird die raffinierte Undurchsichtigkeit geschickt aufrechterhalten ; w¨ ahrend der Euro ” stark und stabil ” sein sollte und die W¨ ahrungsreserven anfangs lediglich zur Verteidigung w¨ ahrend des ¨ Ubergangszeitraums ( falls notwendig ) dienen sollten , erweist sich heute , daßweder die eine noch die andere dieser Behauptungen zutreffend waren und sich in Frankfurt ¨ uberhaupt nichts tut ! the issue therefore remains skilfully blurred ; while the euro was intended to be ’ strong and stable ’ and the reserve assets were originally intended to provide protection during the transitional period ( should this prove necessary ) , it now appears that neither of these expectations has been fulfilled and Frankfurt is totally deadlocked !

Antonio Valerio Miceli Barone, Giuseppe Attardi 4 / 15

slide-10
SLIDE 10

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

auf diese Weise wird die raffinierte Undurchsichtigkeit geschickt aufrechterhalten ; w¨ ahrend der Euro ” stark und stabil ” sein sollte und die W¨ ahrungsreserven anfangs lediglich zur Verteidigung w¨ ahrend des ¨ Ubergangszeitraums ( falls notwendig ) dienen sollten , erweist sich heute , daßweder die eine noch die andere dieser Behauptungen zutreffend waren und sich in Frankfurt ¨ uberhaupt nichts tut ! die raffinierte geschickt Undurchsichtigkeit aufrechterhalten ; w¨ ahrend der Euro sollte auf sein ” stark und stabil ” und die W¨ ahrungsreserven anfangs lediglich dienen sollten zur Verteidigung w¨ ahrend des ¨ Ubergangszeitraums ( diese Weise wird falls notwendig ) , erweist sich heute , daßweder die eine noch dieser Behauptungen zutreffend die andere waren und sich in Frankfurt ¨ uberhaupt nichts tut ! the issue therefore remains skilfully blurred ; while the euro was intended to be ’ strong and stable ’ and the reserve assets were originally intended to provide protection during the transitional period ( should this prove necessary ) , it now appears that neither of these expectations has been fulfilled and Frankfurt is totally deadlocked !

Antonio Valerio Miceli Barone, Giuseppe Attardi 4 / 15

slide-11
SLIDE 11

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

... die W¨ ahrungsreserven anfangs lediglich zur Verteidigung ... dienen sollten ... ... the reserve assets were originally intended to provide protection ... ...

Antonio Valerio Miceli Barone, Giuseppe Attardi 5 / 15

slide-12
SLIDE 12

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

... die W¨ ahrungsreserven anfangs lediglich zur Verteidigung ... dienen sollten ... ... the reserve assets were originally intended to provide protection ... ... die

Antonio Valerio Miceli Barone, Giuseppe Attardi 5 / 15

slide-13
SLIDE 13

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

... die W¨ ahrungsreserven anfangs lediglich zur Verteidigung ... dienen sollten ... ... the reserve assets were originally intended to provide protection ... ... die W¨ ahrungsreserven

Antonio Valerio Miceli Barone, Giuseppe Attardi 5 / 15

slide-14
SLIDE 14

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

... die W¨ ahrungsreserven anfangs lediglich zur Verteidigung ... dienen sollten ... ... the reserve assets were originally intended to provide protection ... ... die W¨ ahrungsreserven anfangs

Antonio Valerio Miceli Barone, Giuseppe Attardi 5 / 15

slide-15
SLIDE 15

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

... die W¨ ahrungsreserven anfangs lediglich zur Verteidigung ... dienen sollten ... ... the reserve assets were originally intended to provide protection ... ... die W¨ ahrungsreserven anfangs lediglich

Antonio Valerio Miceli Barone, Giuseppe Attardi 5 / 15

slide-16
SLIDE 16

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

... die W¨ ahrungsreserven anfangs lediglich zur Verteidigung ... dienen sollten ... ... the reserve assets were originally intended to provide protection ... ... die W¨ ahrungsreserven anfangs lediglich dienen

Antonio Valerio Miceli Barone, Giuseppe Attardi 5 / 15

slide-17
SLIDE 17

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

... die W¨ ahrungsreserven anfangs lediglich zur Verteidigung ... dienen sollten ... ... the reserve assets were originally intended to provide protection ... ... die W¨ ahrungsreserven anfangs lediglich dienen sollten

Antonio Valerio Miceli Barone, Giuseppe Attardi 5 / 15

slide-18
SLIDE 18

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

... die W¨ ahrungsreserven anfangs lediglich zur Verteidigung ... dienen sollten ... ... the reserve assets were originally intended to provide protection ... ... die W¨ ahrungsreserven anfangs lediglich dienen sollten zur

Antonio Valerio Miceli Barone, Giuseppe Attardi 5 / 15

slide-19
SLIDE 19

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Example

... die W¨ ahrungsreserven anfangs lediglich zur Verteidigung ... dienen sollten ... ... the reserve assets were originally intended to provide protection ... ... die W¨ ahrungsreserven anfangs lediglich dienen sollten zur Verteidigung ...

Antonio Valerio Miceli Barone, Giuseppe Attardi 5 / 15

slide-20
SLIDE 20

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Our proposal

Arbitrary permutation prediction

Antonio Valerio Miceli Barone, Giuseppe Attardi 6 / 15

slide-21
SLIDE 21

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Our proposal

Arbitrary permutation prediction

Use syntactical features from a non-projective dependency parse

Antonio Valerio Miceli Barone, Giuseppe Attardi 6 / 15

slide-22
SLIDE 22

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Our proposal

Arbitrary permutation prediction

Use syntactical features from a non-projective dependency parse Allow non-tree-local moves

Antonio Valerio Miceli Barone, Giuseppe Attardi 6 / 15

slide-23
SLIDE 23

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Our proposal

Arbitrary permutation prediction

Use syntactical features from a non-projective dependency parse Allow non-tree-local moves

Leverage language models technology

Antonio Valerio Miceli Barone, Giuseppe Attardi 6 / 15

slide-24
SLIDE 24

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Our proposal

Arbitrary permutation prediction

Use syntactical features from a non-projective dependency parse Allow non-tree-local moves

Leverage language models technology

Tried and tested approach [Feng et al. 2010, ...]

Antonio Valerio Miceli Barone, Giuseppe Attardi 6 / 15

slide-25
SLIDE 25

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Our proposal

Arbitrary permutation prediction

Use syntactical features from a non-projective dependency parse Allow non-tree-local moves

Leverage language models technology

Tried and tested approach [Feng et al. 2010, ...] Positive probability mass to non-permutations

Antonio Valerio Miceli Barone, Giuseppe Attardi 6 / 15

slide-26
SLIDE 26

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Our proposal

Arbitrary permutation prediction

Use syntactical features from a non-projective dependency parse Allow non-tree-local moves

Leverage language models technology

Tried and tested approach [Feng et al. 2010, ...] Positive probability mass to non-permutations NNLMs have performance issues due to normalization

  • ver dictionary

Antonio Valerio Miceli Barone, Giuseppe Attardi 6 / 15

slide-27
SLIDE 27

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Recurrent Neural Network Reordering Model

Take the Recurrent Neural Network Language Model [Mikolov et al. 2010]

Antonio Valerio Miceli Barone, Giuseppe Attardi 7 / 15

slide-28
SLIDE 28

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Recurrent Neural Network Reordering Model

Take the Recurrent Neural Network Language Model [Mikolov et al. 2010] Change output soft-max to consider only unemitted words in the current sentence

Antonio Valerio Miceli Barone, Giuseppe Attardi 7 / 15

slide-29
SLIDE 29

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Recurrent Neural Network Reordering Model

Take the Recurrent Neural Network Language Model [Mikolov et al. 2010] Change output soft-max to consider only unemitted words in the current sentence Include syntax features (POS, DEPREL, parent POS, ...)

Antonio Valerio Miceli Barone, Giuseppe Attardi 7 / 15

slide-30
SLIDE 30

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Recurrent Neural Network Reordering Model

Take the Recurrent Neural Network Language Model [Mikolov et al. 2010] Change output soft-max to consider only unemitted words in the current sentence Include syntax features (POS, DEPREL, parent POS, ...) and permutation-specific word pair features (distortion distance, dependency relation ...)

Antonio Valerio Miceli Barone, Giuseppe Attardi 7 / 15

slide-31
SLIDE 31

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Recurrent Neural Network Reordering Model

Take the Recurrent Neural Network Language Model [Mikolov et al. 2010] Change output soft-max to consider only unemitted words in the current sentence Include syntax features (POS, DEPREL, parent POS, ...) and permutation-specific word pair features (distortion distance, dependency relation ...) v(t) = τ(Θ(1) · x(t) + ΘREC · v(t − 1)) hj(t) =< τ(Θ(o) · xo(j)),θ(2) ⊙ v(t − 1) > + θ(α) · log(Lf − t) + θ(bias) pj(t) = exphj(t)

  • j′ exphj′(t)

(1)

Antonio Valerio Miceli Barone, Giuseppe Attardi 7 / 15

slide-32
SLIDE 32

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Trained on German-to-English Europarl v7

Antonio Valerio Miceli Barone, Giuseppe Attardi 8 / 15

slide-33
SLIDE 33

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Trained on German-to-English Europarl v7 Monolingual BLEU: Reordering BLEU improvement none 62.10

  • unlex. Base RNN-RM

64.03 +1.93

  • lex. Base RNN-RM

63.99 +1.89

Antonio Valerio Miceli Barone, Giuseppe Attardi 8 / 15

slide-34
SLIDE 34

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Trained on German-to-English Europarl v7 Monolingual BLEU: Reordering BLEU improvement none 62.10

  • unlex. Base RNN-RM

64.03 +1.93

  • lex. Base RNN-RM

63.99 +1.89 Translation BLEU: Test set system BLEU improvement Europarl baseline 33.00 Europarl ”oracle” 41.80 +8.80 Europarl Collins 33.52 +0.52 Europarl

  • unlex. Base RNN-RM

33.41 +0.41 Europarl

  • lex. Base RNN-RM

33.38 +0.38 news2013 baseline 18.80 news2013 Collins NA NA news2013

  • unlex. Base RNN-RM

19.19 +0.39 news2013

  • lex. Base RNN-RM

19.01 +0.21 news2009 baseline 18.09 news2009 Collins 18.74 +0.65 news2009

  • unlex. Base RNN-RM

18.50 +0.41 news2009

  • lex. Base RNN-RM

18.44 +0.35

Antonio Valerio Miceli Barone, Giuseppe Attardi 8 / 15

slide-35
SLIDE 35

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton

Walk on dependency tree and emit word

Antonio Valerio Miceli Barone, Giuseppe Attardi 9 / 15

slide-36
SLIDE 36

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton

Walk on dependency tree and emit word

EMIT, UP and DOWNchild actions

Antonio Valerio Miceli Barone, Giuseppe Attardi 9 / 15

slide-37
SLIDE 37

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton

Walk on dependency tree and emit word

EMIT, UP and DOWNchild actions Constraints to ensure proper permutation and no cycles

Antonio Valerio Miceli Barone, Giuseppe Attardi 9 / 15

slide-38
SLIDE 38

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton

Walk on dependency tree and emit word

EMIT, UP and DOWNchild actions Constraints to ensure proper permutation and no cycles Each permutation bijectively maps to an automaton execution

Antonio Valerio Miceli Barone, Giuseppe Attardi 9 / 15

slide-39
SLIDE 39

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton

Walk on dependency tree and emit word

EMIT, UP and DOWNchild actions Constraints to ensure proper permutation and no cycles Each permutation bijectively maps to an automaton execution

Directly predict sequence of actions [Miceli-Barone and Attardi, 2013]

Antonio Valerio Miceli Barone, Giuseppe Attardi 9 / 15

slide-40
SLIDE 40

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton

Walk on dependency tree and emit word

EMIT, UP and DOWNchild actions Constraints to ensure proper permutation and no cycles Each permutation bijectively maps to an automaton execution

Directly predict sequence of actions [Miceli-Barone and Attardi, 2013]

Didn’t work well. Executions lengths are variable

Antonio Valerio Miceli Barone, Giuseppe Attardi 9 / 15

slide-41
SLIDE 41

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton

Walk on dependency tree and emit word

EMIT, UP and DOWNchild actions Constraints to ensure proper permutation and no cycles Each permutation bijectively maps to an automaton execution

Directly predict sequence of actions [Miceli-Barone and Attardi, 2013]

Didn’t work well. Executions lengths are variable

Or

Break sequence of actions into fragments at word emission boundaries

Antonio Valerio Miceli Barone, Giuseppe Attardi 9 / 15

slide-42
SLIDE 42

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton

Walk on dependency tree and emit word

EMIT, UP and DOWNchild actions Constraints to ensure proper permutation and no cycles Each permutation bijectively maps to an automaton execution

Directly predict sequence of actions [Miceli-Barone and Attardi, 2013]

Didn’t work well. Executions lengths are variable

Or

Break sequence of actions into fragments at word emission boundaries Predict permutation at word level, including fragments as features

Antonio Valerio Miceli Barone, Giuseppe Attardi 9 / 15

slide-43
SLIDE 43

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton example

Starting from sollten

Antonio Valerio Miceli Barone, Giuseppe Attardi 10 / 15

slide-44
SLIDE 44

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton example

Starting from sollten DOWNdienen DOWNWahr... DOWNdie EMIT die

Antonio Valerio Miceli Barone, Giuseppe Attardi 10 / 15

slide-45
SLIDE 45

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton example

Starting from sollten DOWNdienen DOWNWahr... DOWNdie EMIT die UP EMIT W¨ ahrungsreserven

Antonio Valerio Miceli Barone, Giuseppe Attardi 10 / 15

slide-46
SLIDE 46

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton example

Starting from sollten DOWNdienen DOWNWahr... DOWNdie EMIT die UP EMIT W¨ ahrungsreserven UP UP DOWNanfangs EMIT anfangs

Antonio Valerio Miceli Barone, Giuseppe Attardi 10 / 15

slide-47
SLIDE 47

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton example

Starting from sollten DOWNdienen DOWNWahr... DOWNdie EMIT die UP EMIT W¨ ahrungsreserven UP UP DOWNanfangs EMIT anfangs UP DOWNdienen DOWNzur DOWNledi... EMIT lediglich

Antonio Valerio Miceli Barone, Giuseppe Attardi 10 / 15

slide-48
SLIDE 48

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton example

Starting from sollten DOWNdienen DOWNWahr... DOWNdie EMIT die UP EMIT W¨ ahrungsreserven UP UP DOWNanfangs EMIT anfangs UP DOWNdienen DOWNzur DOWNledi... EMIT lediglich UP UP EMIT dienen

Antonio Valerio Miceli Barone, Giuseppe Attardi 10 / 15

slide-49
SLIDE 49

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton example

Starting from sollten DOWNdienen DOWNWahr... DOWNdie EMIT die UP EMIT W¨ ahrungsreserven UP UP DOWNanfangs EMIT anfangs UP DOWNdienen DOWNzur DOWNledi... EMIT lediglich UP UP EMIT dienen UP EMIT sollten

Antonio Valerio Miceli Barone, Giuseppe Attardi 10 / 15

slide-50
SLIDE 50

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton example

Starting from sollten DOWNdienen DOWNWahr... DOWNdie EMIT die UP EMIT W¨ ahrungsreserven UP UP DOWNanfangs EMIT anfangs UP DOWNdienen DOWNzur DOWNledi... EMIT lediglich UP UP EMIT dienen UP EMIT sollten DOWNdienen DOWNzur EMIT zur

Antonio Valerio Miceli Barone, Giuseppe Attardi 10 / 15

slide-51
SLIDE 51

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Reordering automaton example

Starting from sollten DOWNdienen DOWNWahr... DOWNdie EMIT die UP EMIT W¨ ahrungsreserven UP UP DOWNanfangs EMIT anfangs UP DOWNdienen DOWNzur DOWNledi... EMIT lediglich UP UP EMIT dienen UP EMIT sollten DOWNdienen DOWNzur EMIT zur DOWNVert... EMIT Verteidigung

Antonio Valerio Miceli Barone, Giuseppe Attardi 10 / 15

slide-52
SLIDE 52

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Fragment RNN-RM

the RNN-RM requires fixed-size per-word features. Fragments have variable size

Antonio Valerio Miceli Barone, Giuseppe Attardi 11 / 15

slide-53
SLIDE 53

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Fragment RNN-RM

the RNN-RM requires fixed-size per-word features. Fragments have variable size

How do we transform fragments into fixed-size representations?

Antonio Valerio Miceli Barone, Giuseppe Attardi 11 / 15

slide-54
SLIDE 54

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Fragment RNN-RM

the RNN-RM requires fixed-size per-word features. Fragments have variable size

How do we transform fragments into fixed-size representations? Use a recurrent neural network!

Antonio Valerio Miceli Barone, Giuseppe Attardi 11 / 15

slide-55
SLIDE 55

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Fragment RNN-RM

the RNN-RM requires fixed-size per-word features. Fragments have variable size

How do we transform fragments into fixed-size representations? Use a recurrent neural network!

Two-level hierarchical recurrent neural network

Antonio Valerio Miceli Barone, Giuseppe Attardi 11 / 15

slide-56
SLIDE 56

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Fragment RNN-RM

the RNN-RM requires fixed-size per-word features. Fragments have variable size

How do we transform fragments into fixed-size representations? Use a recurrent neural network!

Two-level hierarchical recurrent neural network

Inner recurrence: one step per action in a fragment vr(t) = τ(Θ(r1) · xr(tr) + ΘrREC · vr(tr − 1)) (2)

Antonio Valerio Miceli Barone, Giuseppe Attardi 11 / 15

slide-57
SLIDE 57

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Fragment RNN-RM

the RNN-RM requires fixed-size per-word features. Fragments have variable size

How do we transform fragments into fixed-size representations? Use a recurrent neural network!

Two-level hierarchical recurrent neural network

Inner recurrence: one step per action in a fragment vr(t) = τ(Θ(r1) · xr(tr) + ΘrREC · vr(tr − 1)) (2) Outer recurrence: one step per fragment, takes inner state vector as a feature

Antonio Valerio Miceli Barone, Giuseppe Attardi 11 / 15

slide-58
SLIDE 58

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Fragment RNN-RM

the RNN-RM requires fixed-size per-word features. Fragments have variable size

How do we transform fragments into fixed-size representations? Use a recurrent neural network!

Two-level hierarchical recurrent neural network

Inner recurrence: one step per action in a fragment vr(t) = τ(Θ(r1) · xr(tr) + ΘrREC · vr(tr − 1)) (2) Outer recurrence: one step per fragment, takes inner state vector as a feature

Everything is differentiable, train end-to-end with generalized backpropagation through time .

Antonio Valerio Miceli Barone, Giuseppe Attardi 11 / 15

slide-59
SLIDE 59

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Fragment RNN-RM

the RNN-RM requires fixed-size per-word features. Fragments have variable size

How do we transform fragments into fixed-size representations? Use a recurrent neural network!

Two-level hierarchical recurrent neural network

Inner recurrence: one step per action in a fragment vr(t) = τ(Θ(r1) · xr(tr) + ΘrREC · vr(tr − 1)) (2) Outer recurrence: one step per fragment, takes inner state vector as a feature

Everything is differentiable, train end-to-end with generalized backpropagation through time . (Theano)

Antonio Valerio Miceli Barone, Giuseppe Attardi 11 / 15

slide-60
SLIDE 60

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Antonio Valerio Miceli Barone, Giuseppe Attardi 12 / 15

slide-61
SLIDE 61

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Monolingual BLEU: Reordering BLEU improvement none 62.10

  • unlex. Base RNN-RM

64.03 +1.93

  • lex. Base RNN-RM

63.99 +1.89

  • unlex. Fragment RNN-RM

64.43 +2.33

Antonio Valerio Miceli Barone, Giuseppe Attardi 12 / 15

slide-62
SLIDE 62

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Monolingual BLEU: Reordering BLEU improvement none 62.10

  • unlex. Base RNN-RM

64.03 +1.93

  • lex. Base RNN-RM

63.99 +1.89

  • unlex. Fragment RNN-RM

64.43 +2.33 Translation BLEU: Test set system BLEU improvement Europarl baseline 33.00 Europarl ”oracle” 41.80 +8.80 Europarl Collins 33.52 +0.52 Europarl

  • unlex. Base RNN-RM

33.41 +0.41 Europarl

  • lex. Base RNN-RM

33.38 +0.38 Europarl

  • unlex. Fragment RNN-RM

33.54 +0.54 news2013 baseline 18.80 news2013 Collins NA NA news2013

  • unlex. Base RNN-RM

19.19 +0.39 news2013

  • lex. Base RNN-RM

19.01 +0.21 news2013

  • unlex. Fragment RNN-RM

19.27 +0.47 news2009 baseline 18.09 news2009 Collins 18.74 +0.65 news2009

  • unlex. Base RNN-RM

18.50 +0.41 news2009

  • lex. Base RNN-RM

18.44 +0.35 news2009

  • unlex. Fragment RNN-RM

18.60 +0.51

Antonio Valerio Miceli Barone, Giuseppe Attardi 12 / 15

slide-63
SLIDE 63

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Monolingual BLEU: Reordering BLEU improvement none 62.10

  • unlex. Base RNN-RM

64.03 +1.93

  • lex. Base RNN-RM

63.99 +1.89

  • unlex. Fragment RNN-RM

64.43 +2.33 Translation BLEU: Test set system BLEU improvement Europarl baseline 33.00 Europarl ”oracle” 41.80 +8.80 Europarl Collins 33.52 +0.52 Europarl

  • unlex. Base RNN-RM

33.41 +0.41 Europarl

  • lex. Base RNN-RM

33.38 +0.38 Europarl

  • unlex. Fragment RNN-RM

33.54 +0.54 news2013 baseline 18.80 news2013 Collins NA NA news2013

  • unlex. Base RNN-RM

19.19 +0.39 news2013

  • lex. Base RNN-RM

19.01 +0.21 news2013

  • unlex. Fragment RNN-RM

19.27 +0.47 news2009 baseline 18.09 news2009 Collins 18.74 +0.65 news2009

  • unlex. Base RNN-RM

18.50 +0.41 news2009

  • lex. Base RNN-RM

18.44 +0.35 news2009

  • unlex. Fragment RNN-RM

18.60 +0.51 Slow!

Antonio Valerio Miceli Barone, Giuseppe Attardi 12 / 15

slide-64
SLIDE 64

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Monolingual BLEU: Reordering BLEU improvement none 62.10

  • unlex. Base RNN-RM

64.03 +1.93

  • lex. Base RNN-RM

63.99 +1.89

  • unlex. Fragment RNN-RM

64.43 +2.33 Translation BLEU: Test set system BLEU improvement Europarl baseline 33.00 Europarl ”oracle” 41.80 +8.80 Europarl Collins 33.52 +0.52 Europarl

  • unlex. Base RNN-RM

33.41 +0.41 Europarl

  • lex. Base RNN-RM

33.38 +0.38 Europarl

  • unlex. Fragment RNN-RM

33.54 +0.54 news2013 baseline 18.80 news2013 Collins NA NA news2013

  • unlex. Base RNN-RM

19.19 +0.39 news2013

  • lex. Base RNN-RM

19.01 +0.21 news2013

  • unlex. Fragment RNN-RM

19.27 +0.47 news2009 baseline 18.09 news2009 Collins 18.74 +0.65 news2009

  • unlex. Base RNN-RM

18.50 +0.41 news2009

  • lex. Base RNN-RM

18.44 +0.35 news2009

  • unlex. Fragment RNN-RM

18.60 +0.51 Slow! O(L3).

Antonio Valerio Miceli Barone, Giuseppe Attardi 12 / 15

slide-65
SLIDE 65

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Monolingual BLEU: Reordering BLEU improvement none 62.10

  • unlex. Base RNN-RM

64.03 +1.93

  • lex. Base RNN-RM

63.99 +1.89

  • unlex. Fragment RNN-RM

64.43 +2.33 Translation BLEU: Test set system BLEU improvement Europarl baseline 33.00 Europarl ”oracle” 41.80 +8.80 Europarl Collins 33.52 +0.52 Europarl

  • unlex. Base RNN-RM

33.41 +0.41 Europarl

  • lex. Base RNN-RM

33.38 +0.38 Europarl

  • unlex. Fragment RNN-RM

33.54 +0.54 news2013 baseline 18.80 news2013 Collins NA NA news2013

  • unlex. Base RNN-RM

19.19 +0.39 news2013

  • lex. Base RNN-RM

19.01 +0.21 news2013

  • unlex. Fragment RNN-RM

19.27 +0.47 news2009 baseline 18.09 news2009 Collins 18.74 +0.65 news2009

  • unlex. Base RNN-RM

18.50 +0.41 news2009

  • lex. Base RNN-RM

18.44 +0.35 news2009

  • unlex. Fragment RNN-RM

18.60 +0.51 Slow! O(L3). 5 days of training, 3 days decoding (no GPU)

Antonio Valerio Miceli Barone, Giuseppe Attardi 12 / 15

slide-66
SLIDE 66

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Gated Recurrent Unit Reordering Model

Standard (Elman) recurrent neural networks are hard to train and don’t capture well long-distance dependencies

Antonio Valerio Miceli Barone, Giuseppe Attardi 13 / 15

slide-67
SLIDE 67

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Gated Recurrent Unit Reordering Model

Standard (Elman) recurrent neural networks are hard to train and don’t capture well long-distance dependencies Gated Recurrent Unit [Cho et al. 2014]

Antonio Valerio Miceli Barone, Giuseppe Attardi 13 / 15

slide-68
SLIDE 68

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Gated Recurrent Unit Reordering Model

Standard (Elman) recurrent neural networks are hard to train and don’t capture well long-distance dependencies Gated Recurrent Unit [Cho et al. 2014] similar to LSTM

Antonio Valerio Miceli Barone, Giuseppe Attardi 13 / 15

slide-69
SLIDE 69

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Gated Recurrent Unit Reordering Model

Standard (Elman) recurrent neural networks are hard to train and don’t capture well long-distance dependencies Gated Recurrent Unit [Cho et al. 2014] similar to LSTM vrst(t) = π(Θ(1)

rst · x(t) + ΘREC rst · v(t − 1))

vupd(t) = π(Θ(1)

upd · x(t) + ΘREC upd · v(t − 1))

vraw(t) = τ(Θ(1) · x(t) + ΘREC · v(t − 1) ⊙ vupd(t)) v(t) = vrst(t) ⊙ v(t − 1) + (1 − vrst(t)) ⊙ vraw(t) (3)

Antonio Valerio Miceli Barone, Giuseppe Attardi 13 / 15

slide-70
SLIDE 70

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Antonio Valerio Miceli Barone, Giuseppe Attardi 14 / 15

slide-71
SLIDE 71

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Monolingual BLEU: Reordering BLEU improvement none 62.10

  • unlex. Base RNN-RM

64.03 +1.93

  • lex. Base RNN-RM

63.99 +1.89

  • unlex. Fragment RNN-RM

64.43 +2.33

  • unlex. Base GRU-RM

64.78 +2.68

Antonio Valerio Miceli Barone, Giuseppe Attardi 14 / 15

slide-72
SLIDE 72

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Results

Monolingual BLEU: Reordering BLEU improvement none 62.10

  • unlex. Base RNN-RM

64.03 +1.93

  • lex. Base RNN-RM

63.99 +1.89

  • unlex. Fragment RNN-RM

64.43 +2.33

  • unlex. Base GRU-RM

64.78 +2.68 Translation BLEU: Test set system BLEU improvement Europarl baseline 33.00 Europarl ”oracle” 41.80 +8.80 Europarl Collins 33.52 +0.52 Europarl

  • unlex. Base RNN-RM

33.41 +0.41 Europarl

  • lex. Base RNN-RM

33.38 +0.38 Europarl

  • unlex. Fragment RNN-RM

33.54 +0.54 Europarl

  • unlex. Base GRU-RM

34.15 +1.15 news2013 baseline 18.80 news2013 Collins NA NA news2013

  • unlex. Base RNN-RM

19.19 +0.39 news2013

  • lex. Base RNN-RM

19.01 +0.21 news2013

  • unlex. Fragment RNN-RM

19.27 +0.47 news2013

  • unlex. Base GRU-RM

19.28 +0.48 news2009 baseline 18.09 news2009 Collins 18.74 +0.65 news2009

  • unlex. Base RNN-RM

18.50 +0.41 news2009

  • lex. Base RNN-RM

18.44 +0.35 news2009

  • unlex. Fragment RNN-RM

18.60 +0.51 news2009

  • unlex. Base GRU-RM

18.58 +0.49

Antonio Valerio Miceli Barone, Giuseppe Attardi 14 / 15

slide-73
SLIDE 73

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Conclusions

Pre-reordering with non-projective dependency syntax and recurrent neural network improves machine translation

Antonio Valerio Miceli Barone, Giuseppe Attardi 15 / 15

slide-74
SLIDE 74

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Conclusions

Pre-reordering with non-projective dependency syntax and recurrent neural network improves machine translation Research question: Do complex moves over a non-projective dependency tree add value or do simple sibling swaps or sequence prediction suffice?

Antonio Valerio Miceli Barone, Giuseppe Attardi 15 / 15

slide-75
SLIDE 75

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Conclusions

Pre-reordering with non-projective dependency syntax and recurrent neural network improves machine translation Research question: Do complex moves over a non-projective dependency tree add value or do simple sibling swaps or sequence prediction suffice?

Fragment RNN-RM shows that complex moves add value

Antonio Valerio Miceli Barone, Giuseppe Attardi 15 / 15

slide-76
SLIDE 76

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Conclusions

Pre-reordering with non-projective dependency syntax and recurrent neural network improves machine translation Research question: Do complex moves over a non-projective dependency tree add value or do simple sibling swaps or sequence prediction suffice?

Fragment RNN-RM shows that complex moves add value But it is perhaps too slow for practical applications

Antonio Valerio Miceli Barone, Giuseppe Attardi 15 / 15

slide-77
SLIDE 77

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Conclusions

Pre-reordering with non-projective dependency syntax and recurrent neural network improves machine translation Research question: Do complex moves over a non-projective dependency tree add value or do simple sibling swaps or sequence prediction suffice?

Fragment RNN-RM shows that complex moves add value But it is perhaps too slow for practical applications

Base GRU-RM considers less sophisticated features but it is good enough for practical applications

Antonio Valerio Miceli Barone, Giuseppe Attardi 15 / 15

slide-78
SLIDE 78

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Conclusions

Pre-reordering with non-projective dependency syntax and recurrent neural network improves machine translation Research question: Do complex moves over a non-projective dependency tree add value or do simple sibling swaps or sequence prediction suffice?

Fragment RNN-RM shows that complex moves add value But it is perhaps too slow for practical applications

Base GRU-RM considers less sophisticated features but it is good enough for practical applications Future work: other language pairs

Antonio Valerio Miceli Barone, Giuseppe Attardi 15 / 15

slide-79
SLIDE 79

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Conclusions

Pre-reordering with non-projective dependency syntax and recurrent neural network improves machine translation Research question: Do complex moves over a non-projective dependency tree add value or do simple sibling swaps or sequence prediction suffice?

Fragment RNN-RM shows that complex moves add value But it is perhaps too slow for practical applications

Base GRU-RM considers less sophisticated features but it is good enough for practical applications Future work: other language pairs Thanks for your attention

Antonio Valerio Miceli Barone, Giuseppe Attardi 15 / 15

slide-80
SLIDE 80

Motivation Non-projective non-tree-local pre-reordering Reordering automaton GRU-RM Conclusions

Conclusions

Pre-reordering with non-projective dependency syntax and recurrent neural network improves machine translation Research question: Do complex moves over a non-projective dependency tree add value or do simple sibling swaps or sequence prediction suffice?

Fragment RNN-RM shows that complex moves add value But it is perhaps too slow for practical applications

Base GRU-RM considers less sophisticated features but it is good enough for practical applications Future work: other language pairs Thanks for your attention Questions?

Antonio Valerio Miceli Barone, Giuseppe Attardi 15 / 15