Lecture 10: Recurrent Neural Networks Fei-Fei Li & Andrej - - PowerPoint PPT Presentation

lecture 10
SMART_READER_LITE
LIVE PREVIEW

Lecture 10: Recurrent Neural Networks Fei-Fei Li & Andrej - - PowerPoint PPT Presentation

Lecture 10: Recurrent Neural Networks Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - Lecture 10 - 8 Feb 2016 8 Feb 2016 1 Administrative - Midterm this Wednesday!


slide-1
SLIDE 1

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 1

Lecture 10:

Recurrent Neural Networks

slide-2
SLIDE 2

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 2

Administrative

  • Midterm this Wednesday! woohoo!
  • A3 will be out ~Wednesday
slide-3
SLIDE 3

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 3

slide-4
SLIDE 4

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 4

http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html

slide-5
SLIDE 5

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 5

http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html

slide-6
SLIDE 6

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 6

Recurrent Networks offer a lot of flexibility:

Vanilla Neural Networks

slide-7
SLIDE 7

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 7

Recurrent Networks offer a lot of flexibility:

e.g. Image Captioning image -> sequence of words

slide-8
SLIDE 8

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 8

Recurrent Networks offer a lot of flexibility:

e.g. Sentiment Classification sequence of words -> sentiment

slide-9
SLIDE 9

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 9

Recurrent Networks offer a lot of flexibility:

e.g. Machine Translation seq of words -> seq of words

slide-10
SLIDE 10

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 10

Recurrent Networks offer a lot of flexibility:

e.g. Video classification on frame level

slide-11
SLIDE 11

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 11

Multiple Object Recognition with Visual Attention, Ba et al.

Sequential Processing

  • f fixed

inputs

slide-12
SLIDE 12

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 12

DRAW: A Recurrent Neural Network For Image Generation, Gregor et al.

Sequential Processing

  • f fixed
  • utputs
slide-13
SLIDE 13

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 13

Recurrent Neural Network

x RNN

slide-14
SLIDE 14

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 14

Recurrent Neural Network

x RNN y

usually want to predict a vector at some time steps

slide-15
SLIDE 15

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 15

Recurrent Neural Network

x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step:

new state

  • ld state input vector at

some time step some function with parameters W

slide-16
SLIDE 16

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 16

Recurrent Neural Network

x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step:

Notice: the same function and the same set

  • f parameters are used at every time step.
slide-17
SLIDE 17

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 17

(Vanilla) Recurrent Neural Network

x RNN y The state consists of a single “hidden” vector h:

slide-18
SLIDE 18

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 18

Character-level language model example Vocabulary: [h,e,l,o] Example training sequence: “hello”

x RNN y

slide-19
SLIDE 19

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 19

Character-level language model example Vocabulary: [h,e,l,o] Example training sequence: “hello”

slide-20
SLIDE 20

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 20

Character-level language model example Vocabulary: [h,e,l,o] Example training sequence: “hello”

slide-21
SLIDE 21

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 21

Character-level language model example Vocabulary: [h,e,l,o] Example training sequence: “hello”

slide-22
SLIDE 22

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 22

min-char-rnn.py gist: 112 lines of Python

(https://gist.github. com/karpathy/d4dee566867f8291f086)

slide-23
SLIDE 23

min-char-rnn.py gist

Data I/O

slide-24
SLIDE 24

min-char-rnn.py gist

Initializations recall:

slide-25
SLIDE 25

min-char-rnn.py gist

Main loop

slide-26
SLIDE 26

min-char-rnn.py gist

Main loop

slide-27
SLIDE 27

min-char-rnn.py gist

Main loop

slide-28
SLIDE 28

min-char-rnn.py gist

Main loop

slide-29
SLIDE 29

min-char-rnn.py gist

Main loop

slide-30
SLIDE 30

min-char-rnn.py gist

Loss function

  • forward pass (compute loss)
  • backward pass (compute param gradient)
slide-31
SLIDE 31

min-char-rnn.py gist

Softmax classifier

slide-32
SLIDE 32

min-char-rnn.py gist

recall:

slide-33
SLIDE 33

min-char-rnn.py gist

slide-34
SLIDE 34

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 34

x RNN y

slide-35
SLIDE 35

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 35

slide-36
SLIDE 36

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 36

train more train more train more at first:

slide-37
SLIDE 37

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 37

slide-38
SLIDE 38

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 38

  • pen source textbook on algebraic geometry

Latex source

slide-39
SLIDE 39

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 39

slide-40
SLIDE 40

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 40

slide-41
SLIDE 41

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 41

slide-42
SLIDE 42

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 42

Generated C code

slide-43
SLIDE 43

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 43

slide-44
SLIDE 44

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 44

slide-45
SLIDE 45

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 45

Searching for interpretable cells

[Visualizing and Understanding Recurrent Networks, Andrej Karpathy*, Justin Johnson*, Li Fei-Fei]

slide-46
SLIDE 46

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 46

quote detection cell

Searching for interpretable cells

slide-47
SLIDE 47

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 47

line length tracking cell

Searching for interpretable cells

slide-48
SLIDE 48

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 48

if statement cell

Searching for interpretable cells

slide-49
SLIDE 49

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 49

quote/comment cell

Searching for interpretable cells

slide-50
SLIDE 50

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 50

code depth cell

Searching for interpretable cells

slide-51
SLIDE 51

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 51

Explain Images with Multimodal Recurrent Neural Networks, Mao et al. Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei Show and Tell: A Neural Image Caption Generator, Vinyals et al. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al. Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick

Image Captioning

slide-52
SLIDE 52

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 52

Convolutional Neural Network Recurrent Neural Network

slide-53
SLIDE 53

test image

slide-54
SLIDE 54

test image

slide-55
SLIDE 55

test image

X

slide-56
SLIDE 56

test image

x0 <STA RT>

<START>

slide-57
SLIDE 57

h0

x0 <STA RT>

y0

<START>

test image

before: h = tanh(Wxh * x + Whh * h) now: h = tanh(Wxh * x + Whh * h + Wih * v)

v

Wih

slide-58
SLIDE 58

h0

x0 <STA RT>

y0

<START>

test image

straw

sample!

slide-59
SLIDE 59

h0

x0 <STA RT>

y0

<START>

test image

straw

h1 y1

slide-60
SLIDE 60

h0

x0 <STA RT>

y0

<START>

test image

straw

h1 y1

hat

sample!

slide-61
SLIDE 61

h0

x0 <STA RT>

y0

<START>

test image

straw

h1 y1

hat

h2 y2

slide-62
SLIDE 62

h0

x0 <STA RT>

y0

<START>

test image

straw

h1 y1

hat

h2 y2

sample <END> token => finish.

slide-63
SLIDE 63

Image Sentence Datasets

Microsoft COCO

[Tsung-Yi Lin et al. 2014] mscoco.org

currently: ~120K images ~5 sentences each

slide-64
SLIDE 64
slide-65
SLIDE 65
slide-66
SLIDE 66

66

Show Attend and Tell, Xu et al., 2015

Preview of fancier architectures

RNN attends spatially to different parts of images while generating each word of the sentence:

slide-67
SLIDE 67

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 67

time depth

RNN:

slide-68
SLIDE 68

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 68

time depth

RNN: LSTM:

slide-69
SLIDE 69

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 69

LSTM

slide-70
SLIDE 70

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 70

Long Short Term Memory (LSTM)

[Hochreiter et al., 1997]

x h

vector from before (h)

W i f

  • g

vector from below (x)

sigmoid sigmoid tanh sigmoid

4n x 2n 4n 4*n

slide-71
SLIDE 71

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 71

Long Short Term Memory (LSTM)

[Hochreiter et al., 1997]

cell state c f

x

slide-72
SLIDE 72

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 72

Long Short Term Memory (LSTM)

[Hochreiter et al., 1997]

cell state c f

x

i g

x +

slide-73
SLIDE 73

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 73

Long Short Term Memory (LSTM)

[Hochreiter et al., 1997]

cell state c f

x + tanh

  • x

h c

i g

x

slide-74
SLIDE 74

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 74

Long Short Term Memory (LSTM)

[Hochreiter et al., 1997]

cell state c f

x + tanh

  • x

h c

i g

x

higher layer, or prediction

slide-75
SLIDE 75

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 75

LSTM

cell state c f

x

i g

x + tanh

  • x

f

x

i g

x + tanh

  • x
  • ne timestep
  • ne timestep

h h h x x

slide-76
SLIDE 76

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 76

f f f

RNN

state

f

+

f

+

f

+

LSTM

(ignoring forget gates)

slide-77
SLIDE 77

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 77

Recall: “PlainNets” vs. ResNets

ResNet is to PlainNet what LSTM is to RNN, kind of.

slide-78
SLIDE 78

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 78

Understanding gradient flow dynamics

Cute backprop signal video: http://imgur.com/gallery/vaNahKE

slide-79
SLIDE 79

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 79

Understanding gradient flow dynamics

if the largest eigenvalue is > 1, gradient will explode if the largest eigenvalue is < 1, gradient will vanish [On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]

slide-80
SLIDE 80

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 80

Understanding gradient flow dynamics

if the largest eigenvalue is > 1, gradient will explode if the largest eigenvalue is < 1, gradient will vanish [On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]

can control exploding with gradient clipping can control vanishing with LSTM

slide-81
SLIDE 81

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 81

LSTM variants and friends

[LSTM: A Search Space Odyssey, Greff et al., 2015] [An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al., 2015] GRU [Learning phrase representations using rnn encoder- decoder for statistical machine translation, Cho et al. 2014]

slide-82
SLIDE 82

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 Feb 2016 82

Summary

  • RNNs allow a lot of flexibility in architecture design
  • Vanilla RNNs are simple but don’t work very well
  • Common to use LSTM or GRU: their additive interactions

improve gradient flow

  • Backward flow of gradients in RNN can explode or vanish.

Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM)

  • Better/simpler architectures are a hot topic of current research
  • Better understanding (both theoretical and empirical) is needed.