Selective Attention for Context-aware Neural Machine Translation - PowerPoint PPT Presentation

Selective Attention for Context-aware Neural Machine Translation Sameen Maruf † , Andr´ e F. T. Martins ‡ , Gholamreza Haffari † † Faculty of Information Technology, Monash University, Australia ‡ Unbabel & Instituto de Telecomunica¸ c˜ oes, Lisbon, Portugal NAACL-HLT, Minneapolis, June, 2019 1 / 31

Overview The Whys? 1 Proposed Approach 2 Experiments and Analyses 3 Summary 4 2 / 31

The Whys? Overview The Whys? 1 Proposed Approach 2 Experiments and Analyses 3 Summary 4 3 / 31

The Whys? Why document-level machine translation? 4 / 31

The Whys? Why document-level machine translation? Most state-of-the-art NMT models translate sentences independently 4 / 31

The Whys? Why document-level machine translation? Most state-of-the-art NMT models translate sentences independently Discourse phenomena are ignored, e.g., pronominal anaphora and coherence, which may have long-range dependency 4 / 31

The Whys? Why document-level machine translation? Most state-of-the-art NMT models translate sentences independently Discourse phenomena are ignored, e.g., pronominal anaphora and coherence, which may have long-range dependency Most of the works in document NMT focus on using a few previous sentences as context ignoring the rest of the document [Jean et al., 2017, Wang et al., 2017, Bawden et al., 2018, Voita et al., 2018, Tu et al., 2018, Zhang et al., 2018, Miculicich et al., 2018] 4 / 31

The Whys? Why document-level machine translation? Most state-of-the-art NMT models translate sentences independently Discourse phenomena are ignored, e.g., pronominal anaphora and coherence, which may have long-range dependency Most of the works in document NMT focus on using a few previous sentences as context ignoring the rest of the document [Jean et al., 2017, Wang et al., 2017, Bawden et al., 2018, Voita et al., 2018, Tu et al., 2018, Zhang et al., 2018, Miculicich et al., 2018] The global document context for MT [Maruf and Haffari, 2018] 4 / 31

The Whys? Why selective attention for document MT? 5 / 31

The Whys? Why selective attention for document MT?

The Whys? Why selective attention for document MT? Soft attention over words in the document context 5 / 31

The Whys? Why selective attention for document MT? Soft attention over words in the document context Forms a long-tail absorbing significant probability mass 5 / 31

The Whys? Why selective attention for document MT? Soft attention over words in the document context Forms a long-tail absorbing significant probability mass Incapable of ignoring irrelevant words 5 / 31

The Whys? Why selective attention for document MT? Soft attention over words in the document context Forms a long-tail absorbing significant probability mass Incapable of ignoring irrelevant words Not scalable to long documents 5 / 31

The Whys? This Work We propose a sparse and hierarchical attention approach for document NMT which: identifies the key sentences in the global document context, and attends to the key words within those sentences 6 / 31

Proposed Approach Overview The Whys? 1 Proposed Approach 2 Experiments and Analyses 3 Summary 4 7 / 31

Proposed Approach Hierarchical Selective Context Attention 8 / 31

Proposed Approach Hierarchical Selective Context Attention For each query word: α s : attention weights given to sentences in context

Proposed Approach Hierarchical Selective Context Attention For each query word: α s : attention weights given to sentences in context α w : attention weights given to words in context

Proposed Approach Hierarchical Selective Context Attention For each query word: α s : attention weights given to sentences in context α w : attention weights given to words in context α hier : re-scaled attention weights of words in context

Proposed Approach Hierarchical Selective Context Attention For each query word: α s : attention weights given to sentences in context α w : attention weights given to words in context α hier : re-scaled attention weights of words in context V w : from words in context 8 / 31

Proposed Approach Hierarchical Selective Attention over Source Document 9 / 31

Proposed Approach Hierarchical Selective Attention over Source Document 1 Sparse sentence-level key matching : identify relevant sentences Q s : representation of words in current sentence K s : representation of sentences in context

Proposed Approach Hierarchical Selective Attention over Source Document 1 Sparse sentence-level key matching : identify relevant sentences Q s : representation of words in current sentence K s : representation of sentences in context 9 / 31

Proposed Approach Hierarchical Selective Attention over Source Document 2 Sparse word-level key matching : identify relevant words in relevant sentences Q w : representation of words in current sentence K w : representation of words in context

Proposed Approach Hierarchical Selective Attention over Source Document 2 Sparse word-level key matching : identify relevant words in relevant sentences Q w : representation of words in current sentence K w : representation of words in context 10 / 31

Proposed Approach Hierarchical Selective Attention over Source Document 3 Re-scale attention weights

Proposed Approach Hierarchical Selective Attention over Source Document 3 Re-scale attention weights 11 / 31

Proposed Approach Hierarchical Selective Attention over Source Document 4 Read the word-level values with the attention weights

Proposed Approach Hierarchical Selective Attention over Source Document 4 Read the word-level values with the attention weights Our sparse hierarchical attention module is able to selectively focus on relevant sentences in the document context and then attends to key words in those sentences 12 / 31

Proposed Approach Flat Attention over Source Document 13 / 31

Proposed Approach Flat Attention over Source Document Soft sentence-level attention over all sentences in the document context 13 / 31

Proposed Approach Flat Attention over Source Document Soft sentence-level attention over all sentences in the document context K , V : representation of sentences in context

Proposed Approach Flat Attention over Source Document Soft sentence-level attention over all sentences in the document context K , V : representation of sentences in context 13 / 31

Proposed Approach Flat Attention over Source Document Soft sentence-level attention over all sentences in the document context K , V : representation of sentences in context Comparison to [Maruf and Haffari, 2018]: 13 / 31

Proposed Approach Flat Attention over Source Document Soft sentence-level attention over all sentences in the document context K , V : representation of sentences in context Comparison to [Maruf and Haffari, 2018]: • multi-head attention 13 / 31

Proposed Approach Flat Attention over Source Document Soft sentence-level attention over all sentences in the document context K , V : representation of sentences in context Comparison to [Maruf and Haffari, 2018]: • multi-head attention • dynamic 13 / 31

Proposed Approach Flat Attention over Source Document Soft word-level attention over all words in the document context K , V : representation of words in context 14 / 31

Proposed Approach Document-level Context Layer Hierarchical selective or Flat 15 / 31

Proposed Approach Document-level Context Layer Hierarchical selective or Flat Monolingual context (source) integrated in encoder 15 / 31

Proposed Approach Document-level Context Layer Hierarchical selective or Flat Monolingual context (source) integrated in encoder Bilingual context (source & target) integrated in decoder 15 / 31

Proposed Approach Our Models and Settings 16 / 31

Proposed Approach Our Models and Settings Our Models: 16 / 31

Proposed Approach Our Models and Settings Our Models: Hierarchical Attention over context • sparse at sentence-level, soft at word-level • sparse at both sentence and word-level 16 / 31

Proposed Approach Our Models and Settings Our Models: Hierarchical Attention over context • sparse at sentence-level, soft at word-level • sparse at both sentence and word-level Flat Attention over context • soft at sentence-level • soft at word-level 16 / 31

Selective Attention for Context-aware Neural Machine Translation - PowerPoint PPT Presentation

Selective Attention for Context-aware Neural Machine Translation Sameen Maruf , Andr e F. T. Martins , Gholamreza Haffari Faculty of Information Technology, Monash University, Australia Unbabel & Instituto de

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

Advanced Neural Machine Translation Gongbo Tang 23 September 2019 Outline NMT with Attention

Advanced Neural Machine Translation Gongbo Tang 21 September 2020 Outline NMT with Attention

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Context-Aware Neural Machine Translation Learns Anaphora Resolution Elena Voita, Pavel

Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu Pham and Chris

Texas Instruments & RFAB TI Information Selective Disclosure TI Information Selective

Cimzia Selective rebrand Concept A Cimzia Selective rebrand Logo Main / Colour Grayscale

Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1.

Technical specification Jet-245 Information provided in this presentation is subject to change

What is an Open Space Study and Why Now Who Was Involved and Process The Case for Open

HATTERAS Alternative Mutual Funds Fund of funds in a mutual fund | Table of Contents Hatteras

Session attendees will: Understand enhancements to the budget process Get an overview of

ASX announcement 18 November 2019 UBS Investor Conference presentation Suncorps Group CEO,

Frontier and laggard firms: will there be significant changes to the distribution of

Investor Presentation NYSE: CVA JUNE 2018 Cautionary Statements All information included in

Stephanie Sepulveda LaToya Renee Garcia Michael Miller Sara Jackiewicz Evan Humphrey To

Selective Attention for Context-aware Neural Machine Translation - PowerPoint PPT Presentation

Selective Attention for Context-aware Neural Machine Translation Sameen Maruf , Andr e F. T. Martins , Gholamreza Haffari Faculty of Information Technology, Monash University, Australia Unbabel & Instituto de

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

Advanced Neural Machine Translation Gongbo Tang 23 September 2019 Outline NMT with Attention

Advanced Neural Machine Translation Gongbo Tang 21 September 2020 Outline NMT with Attention

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Context-Aware Neural Machine Translation Learns Anaphora Resolution Elena Voita, Pavel

Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu Pham and Chris

Texas Instruments &amp; RFAB TI Information Selective Disclosure TI Information Selective

Cimzia Selective rebrand Concept A Cimzia Selective rebrand Logo Main / Colour Grayscale

Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1.

Technical specification Jet-245 Information provided in this presentation is subject to change

What is an Open Space Study and Why Now Who Was Involved and Process The Case for Open

HATTERAS Alternative Mutual Funds Fund of funds in a mutual fund | Table of Contents Hatteras

Session attendees will: Understand enhancements to the budget process Get an overview of

ASX announcement 18 November 2019 UBS Investor Conference presentation Suncorps Group CEO,

Frontier and laggard firms: will there be significant changes to the distribution of

Investor Presentation NYSE: CVA JUNE 2018 Cautionary Statements All information included in

Stephanie Sepulveda LaToya Renee Garcia Michael Miller Sara Jackiewicz Evan Humphrey To

Texas Instruments & RFAB TI Information Selective Disclosure TI Information Selective