Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, - - PowerPoint PPT Presentation

recurrent neural networks
SMART_READER_LITE
LIVE PREVIEW

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, - - PowerPoint PPT Presentation

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep Learning textbook Ch. 10 Recurrent Neural Networks Long


slide-1
SLIDE 1

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Recurrent Neural Networks

Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep Learning textbook

  • Ch. 10
slide-2
SLIDE 2

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Sequential Data with Neural Networks

  • Sequential input / output
  • Many inputs, many outputs x1:T → y1:S
  • c.f. object tracking, speech recognition with HMMs;
  • n-line/batch processing
  • One input, many outputs x → y1:S
  • e.g. image captioning
  • Many inputs, one output x1:T → y
  • e.g. video classification
slide-3
SLIDE 3

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Outline

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

slide-4
SLIDE 4

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Outline

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

slide-5
SLIDE 5

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Hidden State

  • Basic idea: maintain a state ht
  • State at time t depends on input xt and previous state ht−1
  • It’s a neural network, so relation is non-linear function of

these inputs and some parameters W: ht = f(ht−1, xt; W)

  • Parameters W and function f(·) reused at all time steps
slide-6
SLIDE 6

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Outputs

  • Output yt also depends on the hidden state:

yt = f(ht; Wy)

  • Again, parameters/function reused across time
slide-7
SLIDE 7

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Outline

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

slide-8
SLIDE 8

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Gradients

  • Basic RNN not very effective
  • Need many time steps / complex model for challenging

tasks

  • Gradients in learning are a problem
  • Too large: can be handled with gradient clipping (truncate

gradient magnitude)

  • Too small: can be handled with network structures / gating

functions (LSTM, GRU)

slide-9
SLIDE 9

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Long Short-Term Memory

+

σ σ σ

xt ht-1 ht

= zt

Output Gate Input Gate Forget Gate Input Modulation Gate

LSTM Unit

ϕ

xt ht-1 ht Output zt

RNN Unit

σ σ ϕ

ft it gt

  • t

ct ct-1

  • Hochreiter and Schmidhuber, Neural Computation 1997
  • (Figure from Donohue et al. CVPR 2015)
  • Gating functions g(·), f(·), o(·), reduce vanishing gradients
slide-10
SLIDE 10

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Long Short-Term Memory

+

σ σ σ

xt ht-1 ht

= zt

Output Gate Input Gate Forget Gate Input Modulation Gate

LSTM Unit

ϕ

xt ht-1 ht Output zt

RNN Unit

σ σ ϕ

ft it gt

  • t

ct ct-1

it = σ(Wxixt + Whiht−1 + bi) (1) ft = σ(Wxf xt + Whf ht−1 + bf ) (2)

  • t = σ(Wxoxt + Whoht−1 + bo)

(3) gt = tanh(Wxcxt + Whcht−1 + bc) (4) ct = ft ⊙ ct−1 + it ⊙ gt (5) ht = ot ⊙ tanh(ct) (6)

see Graves, Liwicki, Fernandez, Bertolami, Bunke, and Schmidhuber, TPAMI 2009

slide-11
SLIDE 11

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Outline

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

slide-12
SLIDE 12

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Convolutions to Aggregate over Time

x0 x1 xT −1 xT . . . ˆ y1 ˆ y0 ˆ yT ˆ yT −1 . . . Input Hidden Hidden Output d = 1 d = 2 d = 4 x2 xT −2 ˆ yT −2 ˆ y2

(a) Figure 1. Architectural elements in a TCN. (a) A dilated causal

  • Control history by d (dilation, holes in the filter) and k (width
  • f the filter)
  • Causal convolution, only use elements from the past
  • Bai, Kolter, Koltun arXiv 2018
slide-13
SLIDE 13

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Residual (skip) Connections

x x . . . ˆ y ˆ y ˆ . . . ut en en put = 1 = 2 = 4 x ˆ y

Residual block (k, d)

1x1 Conv (optional) WeightNorm Dilated Causal Conv ReLU Dropout WeightNorm Dilated Causal Conv ReLU Dropout

+

ˆ z(i) = (ˆ z(i)

1 , . . . , ˆ

z(i)

T )

ˆ z(i−1) = (ˆ z(i−1)

1

, . . . , ˆ z(i−1)

T

)

x0 x1 xT . . . xT −1

+ +

ˆ z(1)

T −1 ˆ

z(1)

T

Residual block (k=3, d=1)

Convolutional Filter Identity Map (or 1x1 Conv)

(b) (c)

  • Include residual connections to allow long-range modeling

and gradient flow

slide-14
SLIDE 14

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Outline

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

slide-15
SLIDE 15

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Example: Image Captioning

  • Karpathy and Fei-Fei, CVPR 2015
slide-16
SLIDE 16

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Example: Video Description

LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM time <pad> <pad> <pad> <BOS> man LSTM LSTM LSTM LSTM LSTM LSTM LSTM is talking <EOS> <pad> <pad> <pad> <pad> <pad> <pad> A Encoding stage Decoding stage

  • S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T.

Darrell, K. Saenko, ICCV 2015

slide-17
SLIDE 17

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Example: Machine Translation

  • Wu et al., Google’s Neural Machine Translation System:

Bridging the Gap between Human and Machine Translation, arXiv 2016

slide-18
SLIDE 18

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

Conclusion

  • Readings: http://www.deeplearningbook.org/

contents/rnn.html

  • Recurrent neural networks, can model sequential

inputs/outputs

  • Input includes state (output) from previous time
  • Different structures:
  • RNN with multiple inputs/outputs
  • Gated recurrent unit (GRU)
  • Long short-term memory (LSTM)
  • Error gradients back-propagated across entire sequence