Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for - - PDF document

recurrent neural networks rnn
SMART_READER_LITE
LIVE PREVIEW

Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for - - PDF document

Deep-Learning: Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for Robotics MINES ParisTech PSL Universit Paris Fabien.Moutarde@mines-paristech.fr http://people.mines-paristech.fr/fabien.moutarde Deep-Learning: Recurrent Neural


slide-1
SLIDE 1

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 1

Deep-Learning:

Recurrent Neural Networks (RNN)

  • Pr. Fabien MOUTARDE

Center for Robotics MINES ParisTech PSL Université Paris

Fabien.Moutarde@mines-paristech.fr http://people.mines-paristech.fr/fabien.moutarde

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 2

Acknowledgements

During preparation of these slides, I got inspiration and borrowed some slide content from several sources, in particular:

  • Fei-Fei Li + J.Johnson + S.Yeung: slides on “Recurrent Neural Networks”

from the “Convolutional Neural Networks for Visual Recognition” course at Stanford http://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture10.pdf

  • Yingyu Liang: slides on “Recurrent Neural Networks” from the “Deep

Learning Basics” course at Princeton https://www.cs.princeton.edu/courses/archive/spring16/cos495/slides/DL _lecture9_RNN.pdf

  • Arun Mallya: slides “Introduction to RNNs” from the “Trends in Deep

Learning and Recognition” course of Svetlana LAZEBNIK at University of Illinois at Urbana-Champaign http://slazebni.cs.illinois.edu/spring17/lec02_rnn.pdf

  • Tingwu Wang: slides on “Recurrent Neural Network” for a course at

University of Toronto https://www.cs.toronto.edu/%7Etingwuwang/rnn_tutorial.pdf

  • Christopher

Olah:

  • nline

tutorial “Understanding LSTM Networks” https://colah.github.io/posts/2015-08-Understanding-LSTMs/

slide-2
SLIDE 2

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 3

Outline

  • Standard Recurrent Neural Networks
  • Training RNN: BackPropagation Through Time
  • LSTM and GRU
  • Applications of RNNs

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 4

Recurrent Neural Networks (RNN)

Time-delay for each connection Equivalent form

f f

1 1 1 2

x2

  • utput

x1 x3

input

S

S

  • utput

f f x2(t) x1(t) x3(t)

input 1

x2(t-1)

1

x3(t-1) x2(t-1)

1

x2(t-2)

1

S S S

slide-3
SLIDE 3

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 5

Canonical form of RNN

Non-recurrent network

U(t) External input ............ .......... ........ Output Y(t)

1 1 1

....... ........

X(t-1) State variables X(t) State variables

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 6

Time unfolding of RNN

State variables at time t Output at time t :

Non-recurrent network

External input at time t : State variables at time t-1 Output at time t-1 : External input at time t-1 : State variables at time t-2 Output at time t-2 : External input at time t-2 : State variables at time t-3

Non-recurrent network Non-recurrent network

Y(t) Y(t-1) Y(t-2) U(t-2) U(t-1) U(t) X(t-2) X(t-3) X(t-1) X(t)

slide-4
SLIDE 4

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 7

Dynamic systems & RNN

If using a Neural Net for f, this is EXACTLY a RNN!

Figures from Deep Learning, Goodfellow, Bengio and Courville

!("#$) = % !("), &("#$)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 8

Standard (“vanilla”) RNN

State vector s ßà vector h of hidden neurons

  • u yt=softMax(Whyht)
slide-5
SLIDE 5

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 9

Advantages of RNN

The hidden state s of the RNN builds a kind of lossy summary of the past RNN totally adapted to processing SEQUENTIAL data (same computation formula applied at each time step, but modulated by the evolving “memory” contained in state s) Universality of RNNs: any function computable by a Turing Machine can be computed by a finite-size RNN (Siegelmann and Sontag, 1995)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 10

RNN hyper-parameters

  • As for MLP, main hyperparameter =

size of hidden layer (=size of vector h)

slide-6
SLIDE 6

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 11

Outline

  • Standard Recurrent Neural Networks
  • Training RNN: BackPropagation Through Time
  • LSTM and GRU
  • Applications of RNNs

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 12

RNN training

  • BackPropagation Through Time (BPTT)

gradients update for a whole sequence

  • or Real Time Recurrent Learning (RTRL)

gradients update for each frame in a sequence

t+1 t t+2 t+3 t+4

Temporal sequence W(t) W(t) W(t) e3 e4 W(t) W(t+4)

Horizon Nt = 4

slide-7
SLIDE 7

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 13

BackPropagation THROUGH TIME (BPTT)

  • Forward through entire sequence to compute SUM of

losses at ALL (or part of) time steps

  • Then backprop through ENTIRE sequence to compute

gradients

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 14

BPTT computation principle

Xd(t-1) bloc 1 W(t)

X(t)

U(t) U(t+1) bloc 2 W(t)

X(t+1)

U(t+2) bloc 3 W(t)

X(t+2)

D(t+2) D(t+1)

dE/dXn+1 dE/dXn

dW3 dW2 dW1 dW = dW1 + dW2 + dW3

slide-8
SLIDE 8

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 15

BPTT algorithm

W(t+Nt) = W(t) - l gradW(E) avec E = St (Yt-Dt)2 '*, +-. +/ = +-. +0

.

+0

.

+1.23 +1.23 +/

+- +/ = 4

.53 6 +-.

+/

Feedforward Network

U(t) Y(t)

X(t) delay X(t-1)

state

+1. +/ = 4

753 .23 +1.

+1.27 +1.27 +/

and (chain rule)

+1. +1.27 = 8

953 .

+1

9

+1

923

Jacobian matrix of the Feedforward net

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 16

Vanishing/exploding gradient problem

  • If eigenvalues of Jacobian matrix >1, then gradients tend

to EXPLODE è Learning will never converge.

  • Conversely, if eigenvalues of Jacobian matrix <1,

then gradients tend to VANISH è Error signals can only affect small time lags è short-term memory.

èPossible solutions for exploding gradient: CLIPPING trick è Possible solutions for vanishing gradient: – use ReLU instead of tanh – change what is inside the RNN!

slide-9
SLIDE 9

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 17

Outline

  • Standard Recurrent Neural Networks
  • Training RNN: BackPropagation Through Time
  • LSTM and GRU
  • Applications of RNNs

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 18

Long Short-Term Memory (LSTM)

LSTM = RNN variant for solving this issue

(proposed by Hochreiter & Schmidhuber in 1997)

  • Key idea = use “gates” that modulate respective

influences of input and memory

Problem of standard RNNs = no actual LONG-TERM memory

[Figures from https://colah.github.io/posts/2015-08-Understanding-LSTMs/]

slide-10
SLIDE 10

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 19

LSTM gates

Gate = pointwise multiplication by s in ]0;1[ è modulate between “let nothing through” and “let everything through”

  • FORGET gate
  • INPUT gate

è next state = mix between pure memory or pure new

[Figures from https://colah.github.io/posts/2015-08-Understanding-LSTMs/]

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 20

LSTM summary

  • OUTPUT gate

ALL weigths Wf, Wi, Wc and Wo

(and biases) are LEARNT

[Figure from Deep Learning book by I. Goodfellow, Y. Bengio & A. Courville]

slide-11
SLIDE 11

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 21

Why LSTM avoids vanishing gradients?

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 22

Gated Recurrent Unit (GRU)

Simplified variant of LSTM, with only 2 gates: a RESET gate & an UPDATE gate

(proposed by Cho, et al. in 2014)

slide-12
SLIDE 12

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 23

Outline

  • Standard Recurrent Neural Networks
  • Training RNN: BackPropagation Through Time
  • LSTM and GRU
  • Applications of RNNs

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 24

Typical usages of RNNs

Sequence

Sequence to Sequence

slide-13
SLIDE 13

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 25

Combining RNN with CNN

Input into RNN the features from last convolutional layer

For example, for image captioning

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 26

Deep RNNs

Several RNNs stacked (like layers in MLP)

slide-14
SLIDE 14

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 27

Bi-directional RNNs

(e.g. for offline classification of sequence of words)

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 28

Encoder-decoder RNN

slide-15
SLIDE 15

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 29

Applications of RNN/LSTM

Wherever data is intrinsicly SEQUENTIAL

  • Speech recognition
  • Natural Language Processing (NLP)

– Machine-Translation – Image caption generator

  • Gesture recognition
  • Music generation
  • Potentially any kind of time-series!!

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 30

Summary and perspectives on Recurrent Neural Networks

  • For SEQUENTIAL data

(speech, text, …, gestures, …)

  • Impressive results in

Natural Language Processing (in particular Automated Real-Time Translation)

  • Training of standard RNNs can be tricky

(vanishing gradient…)

  • LSTM / GRU now more used than standard RNNs
slide-16
SLIDE 16

Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 31

Any QUESTIONS ?