Introduction to RNNs
Arun Mallya
Best viewed with Computer Modern fonts installed
Introduction to RNNs Arun Mallya Best viewed with Computer Modern - - PowerPoint PPT Presentation
Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher
Best viewed with Computer Modern fonts installed
– Peephole LSTM – GRU
– Simple case: Output YES if the number of 1s is even, else NO 1000010101 – YES, 100011 – NO, …
– There can always be a new sample longer than anything seen
5 ¡
h1 y1 x1 t = 1
6 ¡
h1 y1 x1 t = 1 h2 y2 x2 h3 y3 x3 t = 2 t = 3
7 ¡
h1 y1 x1 t = 1 h2 y2 x2 h3 y3 x3 t = 2 t = 3 h0
8 ¡
ht xt
W
9 ¡
h1 x1 h0
y1 h2 x2 h1
y2 h3 x3 h2
y3
t )
10 ¡
h1 x1 h0
y1 h2 x2 h1
y2 h3 x3 h2
y3
t ) indicates shared weights
RNN The h1
RNN The RNN food h1 h2
RNN The RNN food h1 h2 RNN good hn-1 hn
RNN The RNN food h1 h2 RNN good hn-1 hn Linear Classifier
RNN The RNN food h1 h2 RNN good hn-1 hn Linear Classifier
Ignore Ignore h1 h2
RNN The RNN food h1 h2 RNN good hn-1 h = Sum(…) h1 h2 hn
http://deeplearning.net/tutorial/lstm.html
RNN The RNN food h1 h2 RNN good hn-1 h = Sum(…) h1 h2 hn Linear Classifier
http://deeplearning.net/tutorial/lstm.html
: The dog is hiding ¡
RNN
CNN
RNN
CNN RNN h2 h1 The h2 Linear Classifier
RNN
CNN RNN RNN h2 h3 h1 The dog h2 h3 Linear Classifier Linear Classifier
Show and Tell: A Neural Image Caption Generator, CVPR 15
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
VIOLA: Why, Salisbury must find his flesh and thought That which I am not aps, not a man and in fire, To show the reining of the raven and the wars To grace my hand reproach within, and not a fair are hand, That Caesar and my goodly father's world; When I was heaven of presence and our fleets, We spare with hours, but cut thy council I am great, Murdered and by thy master's ready there My power to give thee but so much as hell: Some service in the noble bondman here, Would show him to her wine. KING LEAR: O, if you were a feeble sight, the courtesy of your law, Your sight and several breath, will wear the gods With his heads, and my hands are wonder'd at the deeds, So drop upon your lordship's head, and your opinion Shall be against your honour.
Single - Single Single - Multiple Multiple - Single Multiple - Multiple Feed-forward Network Image Captioning Sentiment Classification Translation Image Captioning
28 ¡
h1 x1 h0
y1 h2 x2 h1
y2 h3 x3 h2
y3
t )
“Unfold” network through time by making copies at each time-step
f(x; W) x y C
f1(x; W1) x y1 C
f2(y1; W2) y2
f1(x; W1) x y1 C
f2(y1; W2) y2
Application of the Chain Rule
f(x; W) x y
f(x; W) Equations for common layers: http://arunmallya.github.io/writeups/nn/backprop.html
f(x; W) f1(y; W1) f2(y; W2) f(x; W) x y x y y y2 y1
f(x; W)
f1(y; W1)
f2(y; W2)
f(x; W)
f(x; W)
f1(y; W1)
f2(y; W2)
f(x; W)
Gradient Accumulation
38 ¡
h1 x1
y1 h2
C2
y2 h3
C3
y3 h0
big feed-forward network!
sequence as an input
usual backpropagation
39 ¡
h1 x1 h0
y1 h2 x2 h1
y2 h3 x3 h2
y3
40 ¡
h1 x1 h0
y1 h2 x2 h1
y2 h3 x3 h2
y3
41 ¡
h1 x1 h0
y1 h2 x2 h1
y2 h3 x3 h2
y3
t )
1 On the difficulty of training recurrent neural networks, Pascanu et al., 2013
t )
t )
Remember Resnets?
On the difficulty of training recurrent neural networks, Pascanu et al., 2013 Long Short-Term Memory, Hochreiter et al., 1997
46 ¡
Cell
ht
47 ¡
xt
ct
W
* Dashed line indicates time-lag
it
Input Gate Output Gate Cell
ht
48 ¡
xt ht-1
xt ht−1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟
ct
ht = ot ⊗ tanhct it = σ Wi xt ht−1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ + bi ⎛ ⎝ ⎜ ⎞ ⎠ ⎟
Similarly for ot
xt
Wi Wo
it
ft
Input Gate Output Gate Forget Gate
ht
49 ¡
xt ht-1
ct
ct = ft ⊗ ct−1 + it ⊗ tanhW xt ht−1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟
xt ht-1
Wi Wo Wf
ft = σ W f xt ht−1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ + bf ⎛ ⎝ ⎜ ⎞ ⎠ ⎟
50 ¡
51 ¡
52 ¡
On the difficulty of training recurrent neural networks, ICML 2013
1997 9(8), pp.1735-1780
LSTM: A search space odyssey, IEEE transactions on neural networks and learning systems, 2016
and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, ACL 2014
An empirical exploration of recurrent network architectures, JMLR 2015