Attention Models Focus on parts of input Olof Mogren Improves NN - PowerPoint PPT Presentation

Attention Models Attention Models • Focus on parts of input Olof Mogren • Improves NN performance on different tasks Chalmers University of Technology • IBM1 attention mechanism (1980’s) Feb 2016 Attention Models Arxiv 2016 � � Multi-Way, Multilingual Neural Machine Translation with a Shared ... � � Incorporating Structural Alignment Biases into an Attentional Neural ... Language to Logical Form with Neural Attention � � Human Attention Estimation for Natural Images: An Automatic Gaze ... • “One of the most exciting advancements” � � Implicit Distortion and Fertility Models for Attention-based ... - Ilya Sutskever, Dec 2015 � � Survey on the attention based RNN model and its applications in ... � � From Softmax to Sparsemax: A Sparse Model of Attention and ... � � A Convolutional Attention Network for Extreme Summarization ... Learning Efficient Algorithms with Hierarchical Attentive Memory Attentive Pooling Networks � � Attention-Based Convolutional Neural Network for Machine ...

Modelling Language using RNNs Encoder-Decoder Framework y 1 y 2 y 3 y 1 y 2 y 3 encoder { { decoder x 1 x 2 x 3 x 1 x 2 x 3 • Language models: P ( word i | word 1 , ..., word i − 1 ) • Recurrent Neural Networks • Sequence to Sequence Learning with Neural Networks Ilya • Gated additive sequence modelling: Sutskever, Oriol Vinyals, Quoc V. Le, NIPS 2014 LSTM (and variants) details • Neural Machine Translation (NMT) • Fixed vector representation for sequences • Reversed input sentence! NMT with Attention NMT with Attention exp( e ij ) α ij = � Tx k =1 exp( e ik ) y y p ( y i | y 1 , ..., y i − 1 , x ) = g ( y i − 1 , s i , c i ) y y t-1 t e ij = a ( s i − 1 , h j ) t-1 t s t-1 s t s t-1 s t α t,5 s i = f ( s i − 1 , y i − 1 , c i ) + α t,1 + α t,3 α t,T α t,6 ... α t,1 α t,T α t,2 α t,4 α t,1 α t,T α t,2 α t,3 c i = � T x α t,2 j =1 α ij h j α t,3 h 1 h 2 h 3 h T h 1 h 2 h 3 h T exp( e ij ) α ij = h 1 h 2 h 3 h T � Tx h 1 h 2 h 3 h T k =1 exp( e ik ) x x x x x x x x 1 2 3 T 1 2 3 T e ij = a ( s i − 1 , h j ) si-1 hj Neural Machine Translation by Jointly Learning to Align and Translate Bahdanau, Cho, Bengio, ICLR 2015 Neural Machine Translation by Jointly Learning to Align and Translate Bahdanau, Cho, Bengio, ICLR 2015

Alignment - (more) Caption Generation agreement environments environment European Economic <end> August signed 1992 <end> Area marine should known was The the noted least on in that the the . be of It is L' . Il accord convient sur de la noter que zone l' économique environnement européenne marin a est été le moins signé connu en de août l' • “Translating” from images to natural language 1992 environnement . . <end> <end> (a) (b) Caption Generation Attention Visualization • Convolutional network: Oxford net, 19 layers, stacks of 3x3 conv-layers, max-pooling. • Annotation vectors: a = { a 1 , ..., a L } , a i ∈ R D • Attention over a .

Source Code Summarization Source Code Summarization Target Attention Vectors λ • Predict function names α = < s>{ r et ur n ( mFl ags & e Bul l et Fl ag ) = = e Bul l et Fl ag ; } < / s> i s 0.012 m 1 given function body κ = < s>{ r et ur n ( mFl ags & e Bul l et Fl ag ) = = e Bul l et Fl ag ; } < / s> α = • Convolutional attention < s>{ r et ur n ( mFl ags & e Bul l et Fl ag ) = = e Bul l et Fl ag ; } < / s> bul l et 0.436 m 2 κ = < s>{ r et ur n ( mFl ags & e Bul l et Fl ag ) = = e Bul l et Fl ag ; } < / s> mechanism; 1D patterns α = < s>{ r et ur n ( mFl ags & e Bul l et Fl ag ) = = e Bul l et Fl ag ; } < / s> E ND 0.174 m 3 • Out of vocabulary terms κ = < s>{ r et ur n ( mFl ags & e Bul l et Fl ag ) = = e Bul l et Fl ag ; } < / s> handled (copy mechanism) details A Convolutional Attention Network for Extreme Summarization of Source Code Allamanis et al. Feb 2016 (arxiv draft) A Convolutional Attention Network for Extreme Summarization of Source Code Allamanis et al. Feb 2016 (arxiv draft) Memory Networks mogren@chalmers.se http://mogren.one/ • Attention refers back to internal memory; state of encoder • Neural Turing Machines • (End-To-End) Memory Networks: explicit memory mechanisms http://www.cse.chalmers.se/research/lab/ (out of scope today)

Appendix Teaching Machines to Read and Comprehend, Dec 2015 Hermann, Kocisky, Greffenstette, Espeholt, Kay, Suleyman, Blunsom Alignment - (back) Draw Destruction equipment change chemical future weapons produce <end> This means longer will my Syria new that can " the no of " . La Cela destruction va de changer l' mon équipement avenir signifie que avec la ma Syrie famille ne " peut , plus a produire dit de l' nouvelles armes homme chimiques . . <end> DRAW, A Recurrent Neural Network For Image Generation - 2015 <end> Gregor, Danihelka, Graves, Rezende, Wierstra (c)

LSTM Source Code Summarization • K l 1 : patterns in input • K l 2 (and K α , K κ ): higher level abstractions • α , κ : attention over input subtokens • Simple version: only K α , for decoding • Complete version: uses K λ for deciding on generation or copying back Christopher Olah A Convolutional Attention Network for Extreme Summarization of Source Code Allamanis et al. Feb 2016 (arxiv draft) back IBM Model 1: The first translation attention model! Soft vs Hard Attention A simple generative model for p ( s | t ) is derived by introducing a latent variable a into the conditional probabiliy: Soft J p ( J | I ) � � p ( s | t ) = p ( s j | t a j ) , • Weighted average of whole input ( I + 1) J a j =1 • Differentiable loss • Increased computational cost where: • s and t are the input (source) and output (target) sentences Hard of length J and I respectively, • Sample parts of input • a is a vector of length J consisting of integer indexes into the • Policy gradient target sentence, known as the alignment, • Variational methods • p ( J | I ) is not importent for training the model and we’ll treat • Reinforcement Learning it as a constant ǫ . • Decreased computational cost To learn this model we use the EM algorithm to find the MLE values for the parameters p ( s j | t a j ). back

Attention Models Focus on parts of input Olof Mogren Improves NN - PowerPoint PPT Presentation

Attention Models Attention Models Focus on parts of input Olof Mogren Improves NN performance on different tasks Chalmers University of Technology IBM1 attention mechanism (1980s) Feb 2016 Attention Models Arxiv 2016

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Attention Models Attention Models: Motivation bird Image: H x W x 3 The whole input volume is

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

The Attention Economy What is the attention economy? A business model where you (as the

Advanced Neural Machine Translation Gongbo Tang 23 September 2019 Outline NMT with Attention

Attention is All You Need (Vaswani et. al. 2017) Slides and figures when not cited are from:

Advanced Neural Machine Translation Gongbo Tang 21 September 2020 Outline NMT with Attention

Consciousness First? Attention First? David Chalmers Some Issues Q1: Is there consciousness

Attention and its (mis)interpretation Danish Pruthi 1 Acknowledgements Mansi Gupta Bhuwan

Visual Attention FEF V4 spatial attention: simultaneous neural recordings in V4

A Convolutional Attention Network for Extreme Summarization of Source Code ATTENTION

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

START HERE Executive Function Skills: Focus and Attention Sustain Attention Shift too

Neural Text Summarization Piji Li NLP Center, Tencent AI Lab pijili@tencent.com Paper Reading,

Alternative Architectures Philipp Koehn 15 October 2020 Philipp Koehn Machine Translation:

Learning Multi-touch Conversion Attribution with Dual-attention Mechanisms for Online Advertising

8 Neural MT 2: Attentional Neural MT In the past chapter, we described a simple model for neural

Relationship between attentional processing of input and working Bimali Indrarathne memory:

Visual system Anatomy D aja et al., Front. Neuroanat. , 2014 The six layers of the

10/31/2019 My own journey with compassion Cultivate Compassion for Wellbeing: Recent Research

Robot attentional models for intuitive HRI. Verena Vanessa Hafner ! Kognitive Robotik, Institut