Investigating Relational Recurrent Neural Networks with Variable - PowerPoint PPT Presentation

Investigating Relational Recurrent Neural Networks with Variable Length Memory Pointer Mahtab Ahmed and Robert E. Mercer Department of Computer Science University of Western Ontario London, ON, Canada

Introduction • Memory based Neural Networks can remember information longer while modelling temporal data. • Encode a Relational Memory Core (RMC) as the cell state inside an LSTM cell. • Uses standard Multi-head Self Attention. • Uses variable length memory pointer. • Evaluate on four different tasks. • State of the art on one of them; On par with the other three. 2

Standard LSTM 3

The model: Fixed Length Memory Pointer Random Input at t • Apply Multi-head Self Attention and create a weighted version, 𝑁 • Add a residual connection • Apply Layer-Normalization block on top of 𝑁 • Maintain separate version of mean and variance projection matrices. 4

The model: Fixed Length Memory Pointer (contd.) • n non-linear projections of ℎ ! are applied followed by a residual connection f = RELU and ℎ ! = 𝑁 • Resultant tensor 𝑌 (having shape 2 × b × d) is split on the cardinal dimension to extract the memory • LSTM’s candidate cell state gets changed to • 𝑦 ! is replaced with the projected input (= 𝑋𝑦 ! ) in all LSTM equations. 5

Variable Length Memory Pointer • Share W across all time steps. • Apply all the steps as before. • For Layer-Normalization, maintain just one version of mean and variance projection matrices. • Memory is still at the cardinal dimension. • Rather than looking at everything before • Track a fixed window of words (n-grams). • Mimic the behavior of convolution kernel. 6

Model Architecture � �LSTM Eq�a�ion� � LSTM Eq�a�ion� � LSTM Eq�a�ion� La�e��N��mali�a�i�� La�e��N��mali�a�i�� La�e��N��mali�a�i�� N��-Li�ea� P��jec�i�� N��-Li�ea� P��jec�i�� N��-Li�ea� P��jec�i�� La�e��N��mali�a�i�� La�e��N��mali�a�i�� La�e��N��mali�a�i�� M�l�i-Head�A��e��i�� M�l�i-Head�A��e��i�� M�l�i-Head�A��e��i�� Linear Projec�ion Linear Projec�ion Linear Projec�ion 7

Sentence Pair Modelling Classes Classifier ⊕ Sentence Sentence Representation Representation Encoder Encoder Word Word Representations Representations Left Sentence Right Sentence InferSent - https://arxiv.org/abs/1705.02364 8

Hyperparameters We tried a range of values for each hyperparameter. The ones that worked for us are bold-faced. • 9

Experimental Results Models marked with † are the ones that we implemented • 10

Attention Visualization Me�� <�> Me�� <�> .�� .22 0.�� 0.22 Me�� <�> He Me�� <�> Bef��e .32 .25 .43 0.23 0.12 0.65 Me�� <�> He a�� Me�� <�> Bef��e �ha� .2� .15 .1� .3� 0.2� 0.10 0.43 0.1� Me�� <�> He a�� ed Me�� <�> Bef��e �ha� he .1� .10 .1� .1� .3� 0.2� 0.11 0.33 0.14 0.13 Me�� <�> He a�� ed i� Me�� <�> Bef��e �ha� he he�d .12 .10 .13 .14 .31 .20 0.26 0.12 0.2� 0.11 0.11 0.10 Me�� he Vi�gi�ia a��e� ge�e�a�'� �f�ce Me�� Vi�gi�ia i�c��di�g de�� a��e� ge�e�a� 0.16 0.1� 0.1� 0.16 0.22 0.11 .10 .0� .30 .30 .10 .13 Me�� Vi�gi�ia a��e� ge�e�a�'� �f�ce </�> Me�� i�c��di�g de�� a��e� ge�e�a� </�> .0� .22 .34 .13 .12 .10 0.1� 0.1� 0.15 0.2� 0.0� 0.15 He a�� ked i� �he Vi�gi�ia a��e� ge�e�a�'� �f�ce. Bef��e �ha� he he�d �a�i�� i� Vi�gi�ia, i�c��di�g de�� a��e� ge�e�a�. 11

Conclusion • Extend the classical RMC with variable length memory pointer. • Uses a non-local context to compute an enhanced memory. • Design a sentence pair modelling architecture. • Evaluate on four different tasks. • On par performance on most of the tasks and best performance on one of them. • Interprets the attention shifting very well. • Memory pointer length does not follow a uniform pattern across all datasets. 12

Thank you 13

Investigating Relational Recurrent Neural Networks with Variable - PowerPoint PPT Presentation

Investigating Relational Recurrent Neural Networks with Variable Length Memory Pointer Mahtab Ahmed and Robert E. Mercer Department of Computer Science University of Western Ontario London, ON, Canada Introduction Memory based Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Large cardinals and pcf theory in topology and infinite combinatorics Lajos Soukup Alfrd

The tree property at the double successor of Ajdin Halilovi c a measurable cardinal

The -Strongly Proper Forcing Axiom David Asper o University of East Anglia Fifth Workshop

The strength of Ramsey theorem for coloring large sets Konrad Zdanowski (join work with

Radial basis functions topics in 40 +1 slides Stefano De Marchi Department of Mathematics

Renegotiation and Coordination with Private Values Yuval Heller (Bar-Ilan) and Christoph Kuzmics

Isogenies, Polarisations and Real Multiplication 2015/10/06 Journes C2 La

Saka e Fuchino ( ) Graduate School of System Informatics Kobe University (