Document Context Neural Machine Translation with Memory Networks - - PowerPoint PPT Presentation

document context neural machine translation with memory
SMART_READER_LITE
LIVE PREVIEW

Document Context Neural Machine Translation with Memory Networks - - PowerPoint PPT Presentation

Document Context Neural Machine Translation with Memory Networks Document Context Neural Machine Translation with Memory Networks Sameen Maruf, Gholamreza Haffari Faculty of Information Technology Monash University July 17, 2017 1 / 30


slide-1
SLIDE 1

Document Context Neural Machine Translation with Memory Networks

Document Context Neural Machine Translation with Memory Networks

Sameen Maruf, Gholamreza Haffari

Faculty of Information Technology Monash University

July 17, 2017

1 / 30

slide-2
SLIDE 2

Document Context Neural Machine Translation with Memory Networks

Overview

1 Introduction 2 Document MT as Structured Prediction 3 Document NMT with MemNets 4 Experiments and Analysis 5 Conclusion 6 References

2 / 30

slide-3
SLIDE 3

Document Context Neural Machine Translation with Memory Networks Introduction

Overview

1 Introduction 2 Document MT as Structured Prediction 3 Document NMT with MemNets 4 Experiments and Analysis 5 Conclusion 6 References

3 / 30

slide-4
SLIDE 4

Document Context Neural Machine Translation with Memory Networks Introduction

Why document-level machine translation?

4 / 30

slide-5
SLIDE 5

Document Context Neural Machine Translation with Memory Networks Introduction

Why document-level machine translation?

Most MT models translate sentences independently

4 / 30

slide-6
SLIDE 6

Document Context Neural Machine Translation with Memory Networks Introduction

Why document-level machine translation?

Most MT models translate sentences independently Discourse phenomena are ignored, e.g. pronominal anaphora and lexical consistency which may have long range dependency

4 / 30

slide-7
SLIDE 7

Document Context Neural Machine Translation with Memory Networks Introduction

Why document-level machine translation?

Most MT models translate sentences independently Discourse phenomena are ignored, e.g. pronominal anaphora and lexical consistency which may have long range dependency

4 / 30

slide-8
SLIDE 8

Document Context Neural Machine Translation with Memory Networks Introduction

Why document-level machine translation?

Most MT models translate sentences independently Discourse phenomena are ignored, e.g. pronominal anaphora and lexical consistency which may have long range dependency

4 / 30

slide-9
SLIDE 9

Document Context Neural Machine Translation with Memory Networks Introduction

Why document-level machine translation?

5 / 30

slide-10
SLIDE 10

Document Context Neural Machine Translation with Memory Networks Introduction

Why document-level machine translation?

Statistical MT attempts to document MT do not yield significant empirical improvements [Hardmeier and Federico, 2010, Gong et al., 2011, Garcia et al., 2014]

5 / 30

slide-11
SLIDE 11

Document Context Neural Machine Translation with Memory Networks Introduction

Why document-level machine translation?

Statistical MT attempts to document MT do not yield significant empirical improvements [Hardmeier and Federico, 2010, Gong et al., 2011, Garcia et al., 2014] Previous context-NMT models only use local context and report deteriorated performance when using the target-side context [Jean et al., 2017, Wang et al., 2017, Bawden et al., 2018]

5 / 30

slide-12
SLIDE 12

Document Context Neural Machine Translation with Memory Networks Introduction

Why document-level machine translation?

Statistical MT attempts to document MT do not yield significant empirical improvements [Hardmeier and Federico, 2010, Gong et al., 2011, Garcia et al., 2014] Previous context-NMT models only use local context and report deteriorated performance when using the target-side context [Jean et al., 2017, Wang et al., 2017, Bawden et al., 2018] We incorporate global source and target document contexts

5 / 30

slide-13
SLIDE 13

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Overview

1 Introduction 2 Document MT as Structured Prediction 3 Document NMT with MemNets 4 Experiments and Analysis 5 Conclusion 6 References

6 / 30

slide-14
SLIDE 14

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

slide-15
SLIDE 15

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

slide-16
SLIDE 16

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

slide-17
SLIDE 17

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

slide-18
SLIDE 18

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

slide-19
SLIDE 19

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

7 / 30

slide-20
SLIDE 20

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Two types of factors: fθ(yt; xt, x−t), gθ(yt; y−t)

8 / 30

slide-21
SLIDE 21

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

9 / 30

slide-22
SLIDE 22

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Training objective:

9 / 30

slide-23
SLIDE 23

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Training objective: Maximise P(y1, . . . , y|d||x1, . . . , x|d|)

9 / 30

slide-24
SLIDE 24

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Training objective: Maximise P(y1, . . . , y|d||x1, . . . , x|d|) = ⇒ Maximise the pseudo-likelihood arg max

θ |d|

  • t=1

Pθ(yt|xt, y−t, x−t) (1) where fθ and gθ are subsumed in the Pθ(yt|xt, y−t, x−t)

9 / 30

slide-25
SLIDE 25

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

10 / 30

slide-26
SLIDE 26

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Challenge: During test time, the target document is not given

10 / 30

slide-27
SLIDE 27

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Challenge: During test time, the target document is not given Coordinate Ascent (i.e., Iterative Decoding)

slide-28
SLIDE 28

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Challenge: During test time, the target document is not given Coordinate Ascent (i.e., Iterative Decoding)

slide-29
SLIDE 29

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Challenge: During test time, the target document is not given Coordinate Ascent (i.e., Iterative Decoding)

slide-30
SLIDE 30

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Challenge: During test time, the target document is not given Coordinate Ascent (i.e., Iterative Decoding)

10 / 30

slide-31
SLIDE 31

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Iterative Decoding

slide-32
SLIDE 32

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Iterative Decoding

slide-33
SLIDE 33

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Iterative Decoding

slide-34
SLIDE 34

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Iterative Decoding

slide-35
SLIDE 35

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Iterative Decoding

slide-36
SLIDE 36

Document Context Neural Machine Translation with Memory Networks Document MT as Structured Prediction

Document MT as Structured Prediction

Iterative Decoding

11 / 30

slide-37
SLIDE 37

Document Context Neural Machine Translation with Memory Networks Document NMT with MemNets

Overview

1 Introduction 2 Document MT as Structured Prediction 3 Document NMT with MemNets 4 Experiments and Analysis 5 Conclusion 6 References

12 / 30

slide-38
SLIDE 38

Document Context Neural Machine Translation with Memory Networks Document NMT with MemNets

Document NMT with MemNets

= ⇒ Pθ(yt|xt, y−t, x−t)

13 / 30

slide-39
SLIDE 39

Document Context Neural Machine Translation with Memory Networks Document NMT with MemNets

Document NMT with MemNets

= ⇒

14 / 30

slide-40
SLIDE 40

Document Context Neural Machine Translation with Memory Networks Document NMT with MemNets

Document NMT with MemNets

= ⇒

15 / 30

slide-41
SLIDE 41

Document Context Neural Machine Translation with Memory Networks Document NMT with MemNets

Document NMT with MemNets

= ⇒

16 / 30

slide-42
SLIDE 42

Document Context Neural Machine Translation with Memory Networks Document NMT with MemNets

Document NMT with MemNets

= ⇒

Memory-to-Context: st,j = GRU(st,j−1, ET[yt,j−1], ct,j, csrc

t , ctrg t

)

17 / 30

slide-43
SLIDE 43

Document Context Neural Machine Translation with Memory Networks Document NMT with MemNets

Document NMT with MemNets

= ⇒

Memory-to-Output: yt,j ∼ softmax(Wy · rt,j + Wym · csrc

t

+ Wyt · ctrg

t

+ by)

18 / 30

slide-44
SLIDE 44

Document Context Neural Machine Translation with Memory Networks Document NMT with MemNets

Document NMT with MemNets

Use only source, target, or both external memories Use Memory-to-Context/Memory-to-Output architectures for incorporating the different contexts

19 / 30

slide-45
SLIDE 45

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Overview

1 Introduction 2 Document MT as Structured Prediction 3 Document NMT with MemNets 4 Experiments and Analysis 5 Conclusion 6 References

20 / 30

slide-46
SLIDE 46

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Experimental Setup

21 / 30

slide-47
SLIDE 47

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Experimental Setup

Training/dev/test corpora statistics:

corpus #docs (H) #sents (K) avg doc len Fr→En Ted-Talks 10/1.2/1.5 123/15/19 123/128/124 Et→En Europarl v7 150/10/18 209/14/25 14/14/14 De→En News-Commentary 49/.9/1.6 191/2/3 39/23/19

21 / 30

slide-48
SLIDE 48

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Experimental Setup

Training/dev/test corpora statistics:

corpus #docs (H) #sents (K) avg doc len Fr→En Ted-Talks 10/1.2/1.5 123/15/19 123/128/124 Et→En Europarl v7 150/10/18 209/14/25 14/14/14 De→En News-Commentary 49/.9/1.6 191/2/3 39/23/19

Evaluation Metrics: BLEU, METEOR

21 / 30

slide-49
SLIDE 49

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Experimental Setup

Training/dev/test corpora statistics:

corpus #docs (H) #sents (K) avg doc len Fr→En Ted-Talks 10/1.2/1.5 123/15/19 123/128/124 Et→En Europarl v7 150/10/18 209/14/25 14/14/14 De→En News-Commentary 49/.9/1.6 191/2/3 39/23/19

Evaluation Metrics: BLEU, METEOR Baselines: Context-free baseline (S-NMT) Local source context baselines:

  • [Jean et al., 2017] & [Wang et al., 2017]

21 / 30

slide-50
SLIDE 50

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Memory-to-Context Results

22 / 30

slide-51
SLIDE 51

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Memory-to-Context Results

Fr→En 20 21 22 23

20.85

BLEU De→En 9 9.5 10 10.5 11

9.18

S-NMT

Et→En 20 21 22 23

20.42

22 / 30

slide-52
SLIDE 52

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Memory-to-Context Results

Fr→En 20 21 22 23

20.85 21.91

BLEU De→En 9 9.5 10 10.5 11

9.18 10.2

S-NMT S-NMT+src

Et→En 20 21 22 23

20.42 22.1

22 / 30

slide-53
SLIDE 53

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Memory-to-Context Results

Fr→En 20 21 22 23

20.85 21.91 21.74

BLEU De→En 9 9.5 10 10.5 11

9.18 10.2 9.97

S-NMT S-NMT+src S-NMT+trg

Et→En 20 21 22 23

20.42 22.1 21.94

22 / 30

slide-54
SLIDE 54

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Memory-to-Context Results

Fr→En 20 21 22 23

20.85 21.91 21.74 22

BLEU De→En 9 9.5 10 10.5 11

9.18 10.2 9.97 10.54

S-NMT S-NMT+src S-NMT+trg S-NMT+both

Et→En 20 21 22 23

20.42 22.1 21.94 22.32

22 / 30

slide-55
SLIDE 55

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Memory-to-Output Results

23 / 30

slide-56
SLIDE 56

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Memory-to-Output Results

Fr→En 20 21 22 23

20.85

BLEU De→En 9 9.5 10 10.5 11

9.18

S-NMT

Et→En 20 21 22 23

20.42

23 / 30

slide-57
SLIDE 57

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Memory-to-Output Results

Fr→En 20 21 22 23

20.85 21.8

BLEU De→En 9 9.5 10 10.5 11

9.18 9.98

S-NMT S-NMT+src

Et→En 20 21 22 23

20.42 21.5

23 / 30

slide-58
SLIDE 58

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Memory-to-Output Results

Fr→En 20 21 22 23

20.85 21.8 21.76

BLEU De→En 9 9.5 10 10.5 11

9.18 9.98 10.04

S-NMT S-NMT+src S-NMT+trg

Et→En 20 21 22 23

20.42 21.5 21.82

23 / 30

slide-59
SLIDE 59

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Memory-to-Output Results

Fr→En 20 21 22 23

20.85 21.8 21.76 21.77

BLEU De→En 9 9.5 10 10.5 11

9.18 9.98 10.04 10.23

S-NMT S-NMT+src S-NMT+trg S-NMT+both

Et→En 20 21 22 23

20.42 21.5 21.82 22.2

23 / 30

slide-60
SLIDE 60

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Main Results

24 / 30

slide-61
SLIDE 61

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Main Results

Fr→En 21 21.5 22 22.5 23

21.95 21.87

BLEU De→En 10 10.2 10.4 10.6 10.8 11

10.26 10.14

[Jean et al., 2017] [Wang et al., 2017]

Et→En 21 21.5 22 22.5 23

21.67 22.06

24 / 30

slide-62
SLIDE 62

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Main Results

Fr→En 21 21.5 22 22.5 23

21.95 21.87 21.91

BLEU De→En 10 10.2 10.4 10.6 10.8 11

10.26 10.14 10.2

[Jean et al., 2017] [Wang et al., 2017] S-NMT+src

Et→En 21 21.5 22 22.5 23

21.67 22.06 22.1

24 / 30

slide-63
SLIDE 63

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Main Results

Fr→En 21 21.5 22 22.5 23

21.95 21.87 21.91 22

BLEU De→En 10 10.2 10.4 10.6 10.8 11

10.26 10.14 10.2 10.54

[Jean et al., 2017] [Wang et al., 2017] S-NMT+src S-NMT+both

Et→En 21 21.5 22 22.5 23

21.67 22.06 22.1 22.32

24 / 30

slide-64
SLIDE 64

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Example translation

25 / 30

slide-65
SLIDE 65

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Example translation

Source qimonda t¨ aidab lissaboni strateegia eesm¨ arke. Target qimonda meets the objectives of the lisbon strategy.

slide-66
SLIDE 66

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Example translation

Source qimonda t¨ aidab lissaboni strateegia eesm¨ arke. Target qimonda meets the objectives of the lisbon strategy. S-NMT <UNK> is the objectives of the lisbon strategy. +Src Mem the millennium development goals are fulfilling the millennium goals of the lisbon strategy. +Trg Mem in writing. - (ro) the lisbon strategy is fulfilling the

  • bjectives of the lisbon strategy.

+Both Mems qimonda fulfils the aims of the lisbon strategy.

slide-67
SLIDE 67

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Example translation

Source qimonda t¨ aidab lissaboni strateegia eesm¨ arke. Target qimonda meets the objectives of the lisbon strategy. S-NMT <UNK> is the objectives of the lisbon strategy. +Src Mem the millennium development goals are fulfilling the millennium goals of the lisbon strategy. +Trg Mem in writing. - (ro) the lisbon strategy is fulfilling the

  • bjectives of the lisbon strategy.

+Both Mems qimonda fulfils the aims of the lisbon strategy. [Wang et al., 2017] <UNK> fulfils the objectives of the lisbon strategy.

25 / 30

slide-68
SLIDE 68

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Example translation (contd.)

Source ... et riigis kehtib endiselt lukaˇ senka diktatuur, mis rikub inim- ning etnilise v¨ ahemuse ˜

  • igusi.

Target ... this country is still under the dictatorship of lukashenko, breaching human rights and the rights

  • f ethnic minorities.
slide-69
SLIDE 69

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Example translation (contd.)

Source ... et riigis kehtib endiselt lukaˇ senka diktatuur, mis rikub inim- ning etnilise v¨ ahemuse ˜

  • igusi.

Target ... this country is still under the dictatorship of lukashenko, breaching human rights and the rights

  • f ethnic minorities.

S-NMT ... the country still remains in a position of lukashenko to violate human rights and ethnic minorities. +Src Mem ... the country still applies to the brutal dictatorship of human and ethnic minority rights. +Trg Mem ... the country still keeps the <UNK> dictatorship that violates human rights and ethnic rights. +Both Mems ... the country still persists in lukashenko’s dictatorship that violate human rights and ethnic minority rights.

slide-70
SLIDE 70

Document Context Neural Machine Translation with Memory Networks Experiments and Analysis

Example translation (contd.)

Source ... et riigis kehtib endiselt lukaˇ senka diktatuur, mis rikub inim- ning etnilise v¨ ahemuse ˜

  • igusi.

Target ... this country is still under the dictatorship of lukashenko, breaching human rights and the rights

  • f ethnic minorities.

S-NMT ... the country still remains in a position of lukashenko to violate human rights and ethnic minorities. +Src Mem ... the country still applies to the brutal dictatorship of human and ethnic minority rights. +Trg Mem ... the country still keeps the <UNK> dictatorship that violates human rights and ethnic rights. +Both Mems ... the country still persists in lukashenko’s dictatorship that violate human rights and ethnic minority rights. [Wang et al., 2017] ... there is still a regime in the country that is violating the rights of human and ethnic minority in the country.

26 / 30

slide-71
SLIDE 71

Document Context Neural Machine Translation with Memory Networks Conclusion

Overview

1 Introduction 2 Document MT as Structured Prediction 3 Document NMT with MemNets 4 Experiments and Analysis 5 Conclusion 6 References

27 / 30

slide-72
SLIDE 72

Document Context Neural Machine Translation with Memory Networks Conclusion

Conclusion

28 / 30

slide-73
SLIDE 73

Document Context Neural Machine Translation with Memory Networks Conclusion

Conclusion

Proposed a model which incorporates the global source and target document contexts

28 / 30

slide-74
SLIDE 74

Document Context Neural Machine Translation with Memory Networks Conclusion

Conclusion

Proposed a model which incorporates the global source and target document contexts Proposed effective training and decoding methodologies for

  • ur model

28 / 30

slide-75
SLIDE 75

Document Context Neural Machine Translation with Memory Networks Conclusion

Conclusion

Proposed a model which incorporates the global source and target document contexts Proposed effective training and decoding methodologies for

  • ur model

28 / 30

slide-76
SLIDE 76

Document Context Neural Machine Translation with Memory Networks Conclusion

Conclusion

Proposed a model which incorporates the global source and target document contexts Proposed effective training and decoding methodologies for

  • ur model

Future Work: Investigate document-context NMT models which incorporate specific discourse-level phenomena

28 / 30

slide-77
SLIDE 77

Document Context Neural Machine Translation with Memory Networks References

Overview

1 Introduction 2 Document MT as Structured Prediction 3 Document NMT with MemNets 4 Experiments and Analysis 5 Conclusion 6 References

29 / 30

slide-78
SLIDE 78

Document Context Neural Machine Translation with Memory Networks References

References I

Hardmeier C. and Federico, M. (2010). Modelling pronominal anaphora in statistical machine translation. International Workshop on Spoken Language Translation. Gong Z. and Zhang M. and Zhou G. (2011). Cache-based document-level statistical machine translation. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Garcia E. M. and Espa˜ na-Bonet C. and M` arquez L. (2014). Document-level machine translation as a re-translation process. Procesamiento del Lenguaje Natural, 53:103110.. Jean, S. and Lauly, L. and Firat, O. and Cho, K. (2017). Does Neural Machine Translation Benefit from Larger Context? arXiv:1704.05135. Wang, L. and Tu, Z. and Way, A. and Liu, Q. (2017). Exploiting Cross-Sentence Context for Neural Machine Translation. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Bawden, R. and Sennrich, R. and Birch, A. and Haddow, B. (2018). Evaluating Discourse Phenomena in Neural Machine Translation. Proceedings of the NAACL-HLT 2018. 30 / 30