arXiv:1610.04211v2 [cs.CL] 17 Nov 2016 cult to train and - PDF document

Gated End-to-End Memory Networks Fei Liu ∗ Julien Perez The University of Melbourne Xerox Research Centre Europe Victoria, Australia Grenoble, France fliu3@student.unimelb.edu.au julien.perez@xrce.xerox.com Abstract 1 Introduction Deeper Neural Network models are more diffi- Machine reading using differentiable rea- arXiv:1610.04211v2 [cs.CL] 17 Nov 2016 cult to train and recurrency tends to complex- soning models has recently shown re- ify this optimization problem (Srivastava et al., markable progress. In this context, 2015b). While Deep Neural Network architec- End-to-End trainable Memory Networks tures have shown superior performance in numer- ( MemN2N ) have demonstrated promising ous areas, such as image, speech recognition and performance on simple natural language more recently text, the complexity of optimiz- based reasoning tasks such as factual rea- ing such large and non-convex parameter sets re- soning and basic deduction. However, mains a challenge. Indeed, the so-called vanish- other tasks, namely multi-fact question- ing/exploding gradient problem has been mainly answering, positional reasoning or dialog addressed using: 1. algorithmical responses, e.g., related tasks, remain challenging particu- normalized initialization stategies (LeCun et al., larly due to the necessity of more com- 1998; Glorot and Bengio, 2010); 2. architec- plex interactions between the memory and tural ones, e.g., intermediate normalization layers controller modules composing this family which facilitate the convergence of networks com- of models. In this paper, we introduce posed of tens of hidden layers (He et al., 2015; a novel end-to-end memory access regu- Saxe et al., 2014). Another problem of memory- lation mechanism inspired by the current enhanced neural models is the necessity of regulat- progress on the connection short-cutting ing memory access at the controller level. Mem- principle in the field of computer vision. ory access operations can be supervised (Kumar Concretely, we develop a Gated End-to- et al., 2016) and the number of times they are per- End trainable Memory Network architec- formed tends to be fixed apriori (Sukhbaatar et al., ture ( GMemN2N ). From the machine learn- 2015), a design choice which tends to be based ing perspective, this new capability is on the presumed degree of difficulty of the task in learned in an end-to-end fashion without question. Inspired by the recent success of object the use of any additional supervision sig- recognition in the field of computer vision (Srivas- nal which is, as far as our knowledge tava et al., 2015a; Srivastava et al., 2015b), we in- goes, the first of its kind. Our experi- vestigate the use of a gating mechanism in the con- ments show significant improvements on text of End-to-End Memory Networks ( MemN2N ) the most challenging tasks in the 20 bAbI (Sukhbaatar et al., 2015) in order to regulate the dataset, without the use of any domain access to the memory blocks in a differentiable knowledge. Then, we show improvements fashion. The formulation is realized by gated con- on the Dialog bAbI tasks including nections between the memory access layers and the real human-bot conversion-based Di- the controller stack of a MemN2N . As a result, the alog State Tracking Challenge ( DSTC-2 ) model is able to dynamically determine how and dataset. On these two datasets, our model when to skip its memory-based reasoning process. sets the new state of the art. Roadmap: Section 2 reviews state-of-the- ∗ art Memory Network models, connection short- work done as an Intern at Xerox Research Centre Europe

cutting in neural networks and memory dynamics. include more than one set of input/output memo- In Section 3, we propose a differentiable gating ries by stacking a number of memory layers. In mechanism in MemN2N . Section 4 and 5 present this setting, each memory layer is named a hop and the ( k + 1) th hop takes as input the output of a set of experiments on the 20 bAbI reasoning the k th hop: tasks and the Dialog bAbI dataset. We report new state-of-the-art results on several of the most u k +1 = o k + u k (3) challenging tasks of the set, namely positional reasoning, 3 -argument relation and the DSTC-2 task Lastly, the final step, the prediction of the an- while maintaining equally competitive results on swer to the question q , is performed by the rest. a = softmax ( W ( o K + u K )) ˆ (4) 2 Related Work where ˆ a is the predicted answer distribution, W ∈ This section starts with an introduction of the pri- R | V |× d is a parameter matrix for the model to learn mary elements of MemN2N . Then, we review two and K the total number of hops. key elements relevant to this work, namely shortcut connections in neural networks in and memory 2.2 Shortcut Connections dynamics in such models. Shortcut connections have been studied from both the theoretical and practical point of view in the 2.1 End-to-End Memory Networks general context of neural network architectures The architecture, introduced by MemN2N (Bishop, 1995; Ripley, 2007). More recently Sukhbaatar et al. (2015), consists of two main Residual Networks (He et al., 2016) and Highway components: supporting memories and final an- Networks (Srivastava et al., 2015a; Srivastava et swer prediction. Supporting memories are in turn al., 2015b) have been almost simultaneously pro- comprised of a set of input and output memory posed. While the former utilizes a residual cal- representations with memory cells. The input culus, the latter formulates a differentiable gate- and output memory cells, denoted by m i and c i , way mechanism as proposed in Long-Short Terms are obtained by transforming the input context Memory Networks in order to cope with long- x 1 , . . . , x n (or stories) using two embedding term dependency issues in the dataset in an end- matrices A and C (both of size d × | V | where to-end trainable manner. These two mechanisms d is the embedding size and | V | the vocabulary were proposed as a structural solution to the so- size) such that m i = A Φ( x i ) and c i = C Φ( x i ) called vanishing gradient problem by allowing the where Φ( · ) is a function that maps the input into model to shortcut its layered transformation struc- a bag of dimension | V | . Similarly, the question ture when necessary. q is encoded using another embedding matrix B ∈ R d ×| V | , resulting in a question embedding 2.3 Memory Dynamics u = B Φ( q ) . The input memories { m i } , together The necessity of dynamically regulating the in- with the embedding of the question u , are utilized teraction between the so-called controller and the to determine the relevance of each of the stories in memory blocks of a Memory Network model has the context, yielding a vector of attention weights been study in (Kumar et al., 2016; Xiong et al., 2016). In these works, the number of exchanges p i = softmax ( u ⊤ m i ) (1) between the controller stack and the memory mod- ule of the network is either monitored in a hard e a i where softmax ( a i ) = j ∈ [1 ,n ] e a j . Subse- supervised manner in the former or fixed apriori � in the latter. quently, the response o from the output memory In this paper, we propose an end-to-end super- is constructed by the weighted sum: vised model, with an automatically learned gating mechanism, to perform dynamic regulation of � o = p i c i (2) memory interaction. The next section presents the i formulation of this new Gated End-to-End Mem- For more difficult tasks requiring multiple sup- ory Networks ( GMemN2N ). This contribution can porting memories, the model can be extended to be placed in parallel to the recent transition from

arXiv:1610.04211v2 [cs.CL] 17 Nov 2016 cult to train and - PDF document

Gated End-to-End Memory Networks Fei Liu Julien Perez The University of Melbourne Xerox Research Centre Europe Victoria, Australia Grenoble, France fliu3@student.unimelb.edu.au julien.perez@xrce.xerox.com Abstract 1 Introduction

Introductiontothelarge chargeexpansion Domenico Orlando Introduction Whos who S. Reffert

Michael Duff Imperial College London based on [arXiv:1301.4176 arXiv:1309.0546 arXiv:1312.6523

Introductiontothelarge chargeexpansion Domenico Orlando Introduction Whos who S. Reffert

Dec 2017 Progress Report Nov Dec Maddux Nov Dec Maddux Nov Dec Maddux Nov Dec Maddux

Alargecharge torulestrongcoupling Domenico Orlando Introduction Whos who S. Reffert (AEC

THE FINEST HOMES DESERVE LisaKaros@gmail.com THE FINEST MARKETING. www.karosteam.com G oinG to

2020 Ocean Pathways Week Monday 11 Nov Tuesday 12 Nov Wednesday 13 Nov Thursday 14 Nov Friday

The Entropy of a Hole in Space-Time Based on: arXiv:1305.0856, arXiv:1310.4204, arXiv:1406.nnnn

Learning What and Where to Draw Scott E. Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt

SBI Group 2007 Information Meeting Nov. 16 Fukuoka Nov. 21 Nagoya Nov. 22 Osaka Nov. 26 Tokyo

Alpha-bits, Teleportation and Black Holes ArXiv:1706.09434, ArXiv:1807.06041 Geoffrey Penington,

DM models with two mediators. How to save the WIMP Michael Duerr MU Programmtag 2016 Mainz, 12

Schedule Date Day Class Title Chapters HW Lab Exam No. Due date Due date 10 Nov Mon

2/17/2016 1 2/17/2016 2 2/17/2016 3 2/17/2016 4 2/17/2016 5 2/17/2016 6 2/17/2016 7

Sea Change Origin Neuroplasticity-A Paradigm Sea From Shakespeare's The Tempest , 1610: Change

INFORMATION CAPSULE INFORMATION CAPSULE Research Services Vol 1610 Christie Blazer, Supervisor

Communication Lower Bounds for Matrix-Matrix Multiplication Dagstuhl Seminar #15281 July 6-9,

Applied Machine Learning Syllabus and logistics Siamak Ravanbakhsh COMP 551 (fall 2020) Admin

Chatbots for Language Learning Anja Reusch Technische Universit at Dresden Analyse eines

CS 378: Autonomous Intelligent Robotics (FRI) Dr. Todd Hester Are there any questions?

Bayesian Reinforcement Learning: A Survey Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau,

Video Object Mining : Issues and Perspectives Jonathan Weber, S ebastien Lef` evre, Pierre

Improving Background Based Conversation with Context-aware Knowledge Pre-selection Pengjie Ren

Dialogue corpora NPFL070 December 11, 2019 (NPFL070) Dialogue corpora December 11, 2019 1 /