Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 1
Lecture 10: Recurrent Neural Networks Fei-Fei Li & Andrej - - PowerPoint PPT Presentation
Lecture 10: Recurrent Neural Networks Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - Lecture 10 - 8 Feb 2016 8 Feb 2016 1 Administrative - Midterm this Wednesday!
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 1
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 2
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 3
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 4
http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 5
http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 6
Vanilla Neural Networks
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 7
e.g. Image Captioning image -> sequence of words
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 8
e.g. Sentiment Classification sequence of words -> sentiment
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 9
e.g. Machine Translation seq of words -> seq of words
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 10
e.g. Video classification on frame level
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 11
Multiple Object Recognition with Visual Attention, Ba et al.
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 12
DRAW: A Recurrent Neural Network For Image Generation, Gregor et al.
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 13
x RNN
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 14
x RNN y
usually want to predict a vector at some time steps
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 15
x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step:
new state
some time step some function with parameters W
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 16
x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step:
Notice: the same function and the same set
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 17
x RNN y The state consists of a single “hidden” vector h:
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 18
Character-level language model example Vocabulary: [h,e,l,o] Example training sequence: “hello”
x RNN y
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 19
Character-level language model example Vocabulary: [h,e,l,o] Example training sequence: “hello”
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 20
Character-level language model example Vocabulary: [h,e,l,o] Example training sequence: “hello”
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 21
Character-level language model example Vocabulary: [h,e,l,o] Example training sequence: “hello”
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 22
min-char-rnn.py gist: 112 lines of Python
(https://gist.github. com/karpathy/d4dee566867f8291f086)
min-char-rnn.py gist
Data I/O
min-char-rnn.py gist
Initializations recall:
min-char-rnn.py gist
Main loop
min-char-rnn.py gist
Main loop
min-char-rnn.py gist
Main loop
min-char-rnn.py gist
Main loop
min-char-rnn.py gist
Main loop
min-char-rnn.py gist
Loss function
min-char-rnn.py gist
Softmax classifier
min-char-rnn.py gist
recall:
min-char-rnn.py gist
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 34
x RNN y
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 35
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 36
train more train more train more at first:
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 37
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 38
Latex source
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 39
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 40
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 41
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 42
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 43
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 44
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 45
[Visualizing and Understanding Recurrent Networks, Andrej Karpathy*, Justin Johnson*, Li Fei-Fei]
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 46
quote detection cell
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 47
line length tracking cell
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 48
if statement cell
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 49
quote/comment cell
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 50
code depth cell
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 51
Explain Images with Multimodal Recurrent Neural Networks, Mao et al. Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei Show and Tell: A Neural Image Caption Generator, Vinyals et al. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al. Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 52
test image
test image
test image
test image
x0 <STA RT>
<START>
h0
x0 <STA RT>
y0
<START>
test image
before: h = tanh(Wxh * x + Whh * h) now: h = tanh(Wxh * x + Whh * h + Wih * v)
v
Wih
h0
x0 <STA RT>
y0
<START>
test image
straw
sample!
h0
x0 <STA RT>
y0
<START>
test image
straw
h1 y1
h0
x0 <STA RT>
y0
<START>
test image
straw
h1 y1
hat
sample!
h0
x0 <STA RT>
y0
<START>
test image
straw
h1 y1
hat
h2 y2
h0
x0 <STA RT>
y0
<START>
test image
straw
h1 y1
hat
h2 y2
sample <END> token => finish.
[Tsung-Yi Lin et al. 2014] mscoco.org
66
Show Attend and Tell, Xu et al., 2015
Preview of fancier architectures
RNN attends spatially to different parts of images while generating each word of the sentence:
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 67
time depth
RNN:
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 68
time depth
RNN: LSTM:
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 69
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 70
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
x h
vector from before (h)
W i f
vector from below (x)
sigmoid sigmoid tanh sigmoid
4n x 2n 4n 4*n
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 71
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
cell state c f
x
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 72
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
cell state c f
x
i g
x +
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 73
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
cell state c f
x + tanh
h c
i g
x
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 74
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
cell state c f
x + tanh
h c
i g
x
higher layer, or prediction
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 75
LSTM
cell state c f
x
i g
x + tanh
f
x
i g
x + tanh
h h h x x
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 76
f f f
state
f
+
f
+
f
+
(ignoring forget gates)
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 77
Recall: “PlainNets” vs. ResNets
ResNet is to PlainNet what LSTM is to RNN, kind of.
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 78
Cute backprop signal video: http://imgur.com/gallery/vaNahKE
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 79
if the largest eigenvalue is > 1, gradient will explode if the largest eigenvalue is < 1, gradient will vanish [On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 80
if the largest eigenvalue is > 1, gradient will explode if the largest eigenvalue is < 1, gradient will vanish [On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]
can control exploding with gradient clipping can control vanishing with LSTM
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 81
[LSTM: A Search Space Odyssey, Greff et al., 2015] [An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al., 2015] GRU [Learning phrase representations using rnn encoder- decoder for statistical machine translation, Cho et al. 2014]
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 Feb 2016 82
improve gradient flow
Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM)