Exploiting Cross-Sentence Context for Neural Machine Translation - - PowerPoint PPT Presentation

exploiting cross sentence context for neural machine
SMART_READER_LITE
LIVE PREVIEW

Exploiting Cross-Sentence Context for Neural Machine Translation - - PowerPoint PPT Presentation

Exploiting Cross-Sentence Context for Neural Machine Translation Longyue Wang Zhaopeng Tu Andy Way Qun Liu ADAPT Centre, Dublin City University Tencent AI Lab Motivation The


slide-1
SLIDE 1

Exploiting Cross-Sentence Context for Neural Machine Translation

Longyue Wang♥ Zhaopeng Tu♠ Andy Way♥ Qun Liu♥

♥ ADAPT Centre, Dublin City University ♠ Tencent AI Lab

slide-2
SLIDE 2

Motivation

  • The majority of NMT models is sentence-level

这是 ⼀丁个 ⽣甠态 ⽹罒络 。 <eos> 0.1 0.2 0.7 0.1 0.0 0.0

c st s1 s0 this ct a neutral sT cT . <eos>

slide-3
SLIDE 3

Motivation

  • The continuous vector representation of a symbol

encodes multiple dimensions of similarity.

(Choi et al., 2016)

Word x0 Axis Nearest Neighbours notebook 1 diary notebooks (notebook) sketchbook jottings 2 palmtop notebooks (notebook) ipaq laptop power 1 powers authority (power) powerbase sovereignity 2 powers electrohydraulic microwatts hydel (power)

slide-4
SLIDE 4

Motivation

  • The continuous vector representation of a symbol

encodes multiple dimensions of similarity.

  • Consistency is another critical issue in document-

level translation.

Past 那么 在 这个 问题 上 , 伊朗 的 … well, on this issue , iran has a relatively … 在 任内 解决 伊朗 核 问题 , 不泌管是 ⽤甩 和平 … to resolve the iranian nuclear issue in his term , … Current 那 刚刚 提到 这个 … 谈判 的 问题 。 that just mentioned the issue of the talks …

slide-5
SLIDE 5

Motivation

  • The cross-sentence context has proven helpful for

the aforementioned two problems in multiple sequential tasks (Sordoni et al., 2015; Vinyals and Le, 2015; Serban et al., 2016).

slide-6
SLIDE 6

Motivation

  • The cross-sentence context has proven helpful for

the aforementioned two problems in multiple sequential tasks (Sordoni et al., 2015; Vinyals and Le, 2015; Serban et al., 2016).

  • However, it has received relatively little attention

from the NMT research community.

slide-7
SLIDE 7

Data and Setting

  • Chinese-English translation task
  • Training data: 1M sentence pairs from LDC corpora

that contain document information

  • Tuning: NIST MT05, Test: NIST MT06 and

MT08

  • Build the model on top of Nematus (https://

github.com/EdinburghNLP/nematus)

  • Vocabulary size: 35K for both languages
  • Word embedding: 600; Hidden size: 1000
slide-8
SLIDE 8

Approach

  • Use a Hierarchical RNN to summarize previous M

source sentences

这⾥里離 … ⾃臫然

保护区

。 <eos> … 我们 …

⼴庀义的

养殖 。 <eos> … … … …

Word-level RNN Sentence-level RNN Cross-Sentence Context

slide-9
SLIDE 9

Approach

  • Strategy I: Initialization — Encoder

Cross-Sentence Context

这是 ⼀丁个 ⽣甠态 ⽹罒络 。 <eos>

slide-10
SLIDE 10

Approach

  • Strategy I: Initialization — Decoder

Cross-Sentence Context

这是 ⼀丁个 ⽣甠态 ⽹罒络 。 <eos>

st s1 s0 sT

slide-11
SLIDE 11

Approach

  • Strategy I: Initialization — Both

Cross-Sentence Context

这是 ⼀丁个 ⽣甠态 ⽹罒络 。 <eos>

st s1 s0 sT

slide-12
SLIDE 12

Results

  • Impact of components

30.0 30.5 31.0 31.5 32.0

32.0 31.9 31.55 30.57

Baseline +Init_Enc +Init_Dec +Init_Both

slide-13
SLIDE 13

Approach

  • Strategy 2: Auxiliary Context

Cross-Sentence Context

这是 ⼀丁个 ⽣甠态 ⽹罒络 。 <eos> 0.1 0.2 0.7 0.1 0.0 0.0

st s1 s0 this ct a neutral sT . <eos>

Intra-Sentence Context

slide-14
SLIDE 14

Approach

  • Strategy 2: Auxiliary Context

ct st-1 yt-1 D 𝜏

𝑨t st-1 yt-1 D ct st-1 yt-1

(a) standard decoder (b) decoder with auxiliary context (c) decoder with gating auxiliary context

ct st

act.

st

act.

st

act.

slide-15
SLIDE 15

Results

  • Impact of components

30.0 30.5 31.0 31.5 32.0 32.5

32.24 31.3 30.57

Baseline +Aux. Ctx. +Gating Aux. Ctx.

slide-16
SLIDE 16

Approach

  • Initialization + Gating Auxiliary Context

Cross-Sentence Context

这是 ⼀丁个 ⽣甠态 ⽹罒络 。 <eos> 0.1 0.2 0.7 0.1 0.0 0.0

st s1 s0 this ct a neutral sT . <eos>

Intra-Sentence Context

slide-17
SLIDE 17

Results

  • Impact of components

30.0 30.5 31.0 31.5 32.0 32.5 33.0

32.67 32.24 32.00 30.57

Baseline +Init_Both +Gating Aux. Ctx. +Both

slide-18
SLIDE 18

Analysis

  • Translation error statistics

Errors Ambiguity Inconsistency All Total 38 32 70 Fixed 29 24 53 New 7 8 15

slide-19
SLIDE 19

Analysis

  • Case Study

Hist. Ÿ Ié @ –M J… *ò Ï v' l˚ j¡ ⌫ ? Input ˝& O6 å ⌥Q Pò ? Ref. Can it inhibit and deter corrupt offi- cials? NMT Can we contain and deter the enemy? Our Can it contain and deter the corrupt

  • fficials?
slide-20
SLIDE 20

Summary

  • We propose to use HRNN to summary previous

source sentences, which aims at providing cross- sentence context for NMT

  • Limitations
  • Computational expensive
  • Only exploit source sentences due to error

propagation

  • Encoded into a single fixed-length vector, not flexible
slide-21
SLIDE 21

Publicly Available

  • The source code is publicly available at https://

github.com/tuzhaopeng/LC-NMT

  • The trained models and translation results will be

released

slide-22
SLIDE 22
  • 1. Heeyoul Choi, Kyunghyun Cho, and Yoshua Bengio. Context-dependent word

representation for neural machine translation. arXiv 2016.

  • 2. Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue

Simonsen, and Jian- Yun Nie. A hierarchical recurrent encoder- decoder for generative context-aware query suggestion. CIKM 2015.

  • 3. Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle
  • Pineau. Building end-to-end dialogue systems using generative hierarchical neural

network models. AAAI 2016.

  • 4. Oriol Vinyals and Quoc Le. A neural conversa- tional model. In Proceedings of the

International Conference on Machine Learning, Deep Learning Workshop.

Reference

slide-23
SLIDE 23

Question & Answer