Investigating Relational Recurrent Neural Networks with Variable - - PowerPoint PPT Presentation
Investigating Relational Recurrent Neural Networks with Variable - - PowerPoint PPT Presentation
Investigating Relational Recurrent Neural Networks with Variable Length Memory Pointer Mahtab Ahmed and Robert E. Mercer Department of Computer Science University of Western Ontario London, ON, Canada Introduction Memory based Neural
Introduction
- Memory based Neural Networks can remember information longer
while modelling temporal data.
- Encode a Relational Memory Core (RMC) as the cell state inside an
LSTM cell.
- Uses standard Multi-head Self Attention.
- Uses variable length memory pointer.
- Evaluate on four different tasks.
- State of the art on one of them; On par with the other three.
2
Standard LSTM
3
The model: Fixed Length Memory Pointer
- Apply Multi-head Self Attention and create a weighted version, π
- Add a residual connection
- Apply Layer-Normalization block on top of π
- Maintain separate version of mean and variance projection matrices.
4
Random Input at t
The model: Fixed Length Memory Pointer (contd.)
- n non-linear projections of β! are applied followed by a residual
connection
- Resultant tensor π (having shape 2 Γ b Γ d) is split on the cardinal
dimension to extract the memory
- LSTMβs candidate cell state gets changed to
- π¦! is replaced with the projected input (= ππ¦!) in all LSTM equations.
5
f = RELU and β! = π
Variable Length Memory Pointer
- Share W across all time steps.
- Apply all the steps as before.
- For Layer-Normalization, maintain just one version of mean and
variance projection matrices.
- Memory is still at the cardinal dimension.
- Rather than looking at everything before
- Track a fixed window of words (n-grams).
- Mimic the behavior of convolution kernel.
6
Model Architecture
7
Mli-HeadAei LaeNmaliai
LSTM Eqaion
N-Liea Pjeci LaeNmaliai
Linear Projecion
Mli-HeadAei LaeNmaliai
LSTM Eqaion
N-Liea Pjeci LaeNmaliai
Linear Projecion
Mli-HeadAei LaeNmaliai
LSTM Eqaion
N-Liea Pjeci LaeNmaliai
Linear Projecion
Sentence Pair Modelling
8
Word Representations Word Representations
InferSent - https://arxiv.org/abs/1705.02364
Left Sentence Right Sentence
β Classifier Encoder Encoder
Sentence Representation Sentence Representation Classes
Hyperparameters
9
- We tried a range of values for each hyperparameter. The ones that worked for us are bold-faced.
Experimental Results
- Models marked with β are the ones that we implemented
10
Attention Visualization
11
Me <> Me <> Me <> He Me <> He a He a Me <> He a ed ed i Me he Vigiia ae geea' fce Me <> Me <> Me <> Befe Me <> Befe ha Befe ha Me <> Befe ha he he hed Me Vigiia icdig de ae geea . .22 .32 .25 .2 .15 .1 .10 .12 .10 .1 .3 .1 .1 .13 .14 .31 .20 .3 0.16 0.1 0.1 0.16 0.22 0.11 0.26 0.12 0.2 0.11 0.11 0.10 0.2 0.11 0.33 0.14 0.13 0.2 0.10 0.43 0.1 0.23 0.12 0.65 0. 0.22 .43 Me Vigiia ae geea' fce .10 .0 .30 .30 .10 .13 </> .0 .22 .34 .13 .12 .10 Me icdig de ae geea 0.1 0.1 0.15 0.2 0.0 0.15 </>
He a ked i he Vigiia ae geea' fce. Befe ha he hed ai i Vigiia, icdig de ae geea.
Conclusion
- Extend the classical RMC with variable length memory pointer.
- Uses a non-local context to compute an enhanced memory.
- Design a sentence pair modelling architecture.
- Evaluate on four different tasks.
- On par performance on most of the tasks and best performance on one of them.
- Interprets the attention shifting very well.
- Memory pointer length does not follow a uniform pattern across all datasets.
12
Thank you
13