recurrent neural networks
play

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, - PowerPoint PPT Presentation

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep Learning textbook Ch. 10 Recurrent Neural Networks Long


  1. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep Learning textbook Ch. 10

  2. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Sequential Data with Neural Networks • Sequential input / output • Many inputs, many outputs x 1 : T → y 1 : S • c.f. object tracking, speech recognition with HMMs; on-line/batch processing • One input, many outputs x → y 1 : S • e.g. image captioning • Many inputs, one output x 1 : T → y • e.g. video classification

  3. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

  4. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

  5. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Hidden State • Basic idea: maintain a state h t • State at time t depends on input x t and previous state h t − 1 • It’s a neural network, so relation is non-linear function of these inputs and some parameters W : h t = f ( h t − 1 , x t ; W ) • Parameters W and function f ( · ) reused at all time steps

  6. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outputs • Output y t also depends on the hidden state: y t = f ( h t ; W y ) • Again, parameters/function reused across time

  7. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

  8. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Gradients • Basic RNN not very effective • Need many time steps / complex model for challenging tasks • Gradients in learning are a problem • Too large: can be handled with gradient clipping (truncate gradient magnitude) • Too small: can be handled with network structures / gating functions (LSTM, GRU)

  9. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Long Short-Term Memory RNN Unit LSTM Unit x t σ h t h t-1 Input Output σ σ σ Gate Gate z t Output i t o t g t ϕ + ϕ h t = z t Input Modulation Gate f t x t σ h t-1 Forget Gate c t-1 c t • Hochreiter and Schmidhuber, Neural Computation 1997 • (Figure from Donohue et al. CVPR 2015) • Gating functions g ( · ) , f ( · ) , o ( · ) , reduce vanishing gradients

  10. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Long Short-Term Memory RNN Unit LSTM Unit x t σ h t h t-1 Input Output σ σ Gate Gate σ z t Output i t o t g t ϕ ϕ + h t = z t Input Modulation Gate f t x t σ h t-1 Forget Gate c t-1 c t i t = σ ( W xi x t + W hi h t − 1 + b i ) (1) f t = σ ( W xf x t + W hf h t − 1 + b f ) (2) o t = σ ( W xo x t + W ho h t − 1 + b o ) (3) g t = tanh ( W xc x t + W hc h t − 1 + b c ) (4) c t = f t ⊙ c t − 1 + i t ⊙ g t (5) h t = o t ⊙ tanh ( c t ) (6) see Graves, Liwicki, Fernandez, Bertolami, Bunke, and Schmidhuber, TPAMI 2009

  11. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

  12. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Convolutions to Aggregate over Time . . . ˆ ˆ ˆ y T − 2 ˆ y T − 1 ˆ ˆ y 0 y 1 y 2 y T Output d = 4 Hidden d = 2 Hidden d = 1 Input x 0 x 1 x 2 . . . x T − 2 x T − 1 x T (a) Figure 1. Architectural elements in a TCN. (a) A dilated causal • Control history by d (dilation, holes in the filter) and k (width of the filter) • Causal convolution, only use elements from the past • Bai, Kolter, Koltun arXiv 2018

  13. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Residual (skip) Connections . . . z ( i ) = (ˆ ˆ y y ˆ y ˆ ˆ z ( i ) z ( i ) ˆ 1 , . . . , ˆ T ) Residual block (k=3, d=1) Residual block (k, d) put z (1) z (1) ˆ T − 1 ˆ T + Dropout = 4 + + Convolutional Filter ReLU Identity Map (or 1x1 Conv) WeightNorm en Dilated Causal Conv = 2 1x1 Conv Dropout (optional) ReLU en WeightNorm = 1 Dilated Causal Conv x 0 . . . x 1 x T − 1 x T ut z ( i − 1) = (ˆ z ( i − 1) z ( i − 1) x x x . . . ˆ , . . . , ˆ ) 1 T (b) (c) • Include residual connections to allow long-range modeling and gradient flow

  14. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples

  15. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Example: Image Captioning • Karpathy and Fei-Fei, CVPR 2015

  16. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Example: Video Description <pad> <pad> <pad> <pad> <pad> LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM <pad> <pad> <pad> <pad> <BOS> LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM A man is talking <EOS> Encoding stage Decoding stage time • S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, K. Saenko, ICCV 2015

  17. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Example: Machine Translation • Wu et al., Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , arXiv 2016

  18. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Conclusion • Readings: http://www.deeplearningbook.org/ contents/rnn.html • Recurrent neural networks, can model sequential inputs/outputs • Input includes state (output) from previous time • Different structures: • RNN with multiple inputs/outputs • Gated recurrent unit (GRU) • Long short-term memory (LSTM) • Error gradients back-propagated across entire sequence

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend