exploiting cross sentence context for neural machine
play

Exploiting Cross-Sentence Context for Neural Machine Translation - PowerPoint PPT Presentation

Exploiting Cross-Sentence Context for Neural Machine Translation Longyue Wang Zhaopeng Tu Andy Way Qun Liu ADAPT Centre, Dublin City University Tencent AI Lab Motivation The


  1. Exploiting Cross-Sentence Context for Neural Machine Translation Longyue Wang ♥ Zhaopeng Tu ♠ Andy Way ♥ Qun Liu ♥ ♥ ADAPT Centre, Dublin City University ♠ Tencent AI Lab

  2. 这是 ⼀丁个 ⽣甠态 ⽹罒络 。 Motivation • The majority of NMT models is sentence-level . <eos> this a neutral s 0 s 1 s t s T c t c T c ⨁ 0.0 0.1 0.0 0.2 0.7 0.1 <eos>

  3. Motivation • The continuous vector representation of a symbol encodes multiple dimensions of similarity . Word x 0 Axis Nearest Neighbours 1 diary notebooks (notebook) sketchbook jottings notebook 2 palmtop notebooks (notebook) ipaq laptop 1 powers authority (power) powerbase sovereignity power 2 powers electrohydraulic microwatts hydel (power) (Choi et al., 2016)

  4. Motivation • The continuous vector representation of a symbol encodes multiple dimensions of similarity . • Consistency is another critical issue in document- level translation. 那么 在 这个 问题 上 , 伊朗 的 … well, on this issue , iran has a relatively … Past 在 任内 解决 伊朗 核 问题 , 不泌管是 ⽤甩 和平 … to resolve the iranian nuclear issue in his term , … 那 刚刚 提到 这个 … 谈判 的 问题 。 Current that just mentioned the issue of the talks …

  5. Motivation • The cross-sentence context has proven helpful for the aforementioned two problems in multiple sequential tasks (Sordoni et al., 2015; Vinyals and Le, 2015; Serban et al., 2016).

  6. Motivation • The cross-sentence context has proven helpful for the aforementioned two problems in multiple sequential tasks (Sordoni et al., 2015; Vinyals and Le, 2015; Serban et al., 2016). • However, it has received relatively little attention from the NMT research community.

  7. Data and Setting • Chinese-English translation task • Training data: 1M sentence pairs from LDC corpora that contain document information • Tuning: NIST MT05, Test: NIST MT06 and MT08 • Build the model on top of Nematus ( https:// github.com/EdinburghNLP/nematus ) • Vocabulary size: 35K for both languages • Word embedding: 600; Hidden size: 1000

  8. 这⾥里離 。 ⾃臫然 保护区 。 养殖 ⼴庀义的 我们 Approach • Use a Hierarchical RNN to summarize previous M source sentences Cross-Sentence Context Sentence-level RNN Word-level RNN … … … … <eos> <eos> … … …

  9. 这是 ⼀丁个 ⽣甠态 ⽹罒络 。 Approach • Strategy I: Initialization — Encoder <eos> Cross-Sentence Context

  10. 这是 ⼀丁个 ⽣甠态 ⽹罒络 。 Approach • Strategy I: Initialization — Decoder s 0 s 1 s t s T <eos> Cross-Sentence Context

  11. 这是 ⼀丁个 ⽣甠态 ⽹罒络 。 Approach • Strategy I: Initialization — Both s 0 s 1 s t s T <eos> Cross-Sentence Context

  12. Results • Impact of components 32.0 32.0 31.9 31.55 31.5 Baseline +Init_Enc 31.0 +Init_Dec +Init_Both 30.57 30.5 30.0

  13. 。 ⼀丁个 ⽹罒络 ⽣甠态 这是 Approach • Strategy 2: Auxiliary Context . <eos> this a neutral s 0 s 1 s t s T c t Intra-Sentence Context ⨁ 0.0 0.1 0.0 0.2 0.7 0.1 <eos> Cross-Sentence Context

  14. Approach • Strategy 2: Auxiliary Context y t-1 y t-1 y t-1 s t s t s t s t-1 s t-1 s t-1 act. act. act. c t c t c t 𝜏 𝑨 t D D ✕ (a) standard (b) decoder with (c) decoder with decoder auxiliary context gating auxiliary context

  15. Results • Impact of components 32.5 32.24 32.0 31.5 Baseline 31.3 +Aux. Ctx. +Gating Aux. Ctx. 31.0 30.57 30.5 30.0

  16. 。 ⼀丁个 ⽹罒络 ⽣甠态 这是 Approach • Initialization + Gating Auxiliary Context . <eos> this a neutral s 0 s 1 s t s T c t Intra-Sentence Context ⨁ 0.0 0.1 0.0 0.2 0.7 0.1 <eos> Cross-Sentence Context

  17. Results • Impact of components 33.0 32.67 32.5 32.24 32.00 32.0 Baseline +Init_Both 31.5 +Gating Aux. Ctx. +Both 31.0 30.57 30.5 30.0

  18. Analysis • Translation error statistics Errors Ambiguity Inconsistency All Total 38 32 70 Fixed 29 24 53 New 7 8 15

  19. Analysis • Case Study Ÿ � I é � @ – M J … * ò Ï Hist. v ' l ˚ j ¡ ⌫ ? ˝ & O 6 å ⌥ Q P ò ? Input Can it inhibit and deter corrupt offi- Ref. cials? NMT Can we contain and deter the enemy ? Can it contain and deter the corrupt Our officials ?

  20. Summary • We propose to use HRNN to summary previous source sentences, which aims at providing cross- sentence context for NMT • Limitations • Computational expensive • Only exploit source sentences due to error propagation • Encoded into a single fixed-length vector, not flexible

  21. Publicly Available • The source code is publicly available at https:// github.com/tuzhaopeng/LC-NMT • The trained models and translation results will be released

  22. Reference 1. Heeyoul Choi, Kyunghyun Cho, and Yoshua Bengio. Context-dependent word representation for neural machine translation . arXiv 2016. 2. Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian- Yun Nie. A hierarchical recurrent encoder- decoder for generative context-aware query suggestion. CIKM 2015. 3. Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI 2016. 4. Oriol Vinyals and Quoc Le. A neural conversa- tional model. In Proceedings of the International Conference on Machine Learning, Deep Learning Workshop.

  23. Question & Answer

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend