sequential data with neural networks recurrent neural
play

Sequential Data with Neural Networks Recurrent Neural Networks - PDF document

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Sequential Data with Neural Networks Recurrent Neural


  1. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Sequential Data with Neural Networks Recurrent Neural Networks • Sequential input / output Greg Mori - CMPT 419/726 • Many inputs, many outputs x 1 : T → y 1 : S • c.f. object tracking, speech recognition with HMMs; on-line/batch processing Goodfellow, Bengio, and Courville: Deep Learning textbook • One input, many outputs x → y 1 : S Ch. 10 • e.g. image captioning • Many inputs, one output x 1 : T → y • e.g. video classification Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Hidden State Recurrent Neural Networks • Basic idea: maintain a state h t • State at time t depends on input x t and previous state h t − 1 Long Short-Term Memory • It’s a neural network, so relation is non-linear function of these inputs and some parameters W : Temporal Convolutional Networks h t = f ( h t − 1 , x t ; W ) • Parameters W and function f ( · ) reused at all time steps Examples

  2. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outputs Gradients • Basic RNN not very effective • Need many time steps / complex model for challenging • Output y t also depends on the hidden state: tasks y t = f ( h t ; W y ) • Gradients in learning are a problem • Too large: can be handled with gradient clipping (truncate • Again, parameters/function reused across time gradient magnitude) • Too small: can be handled with network structures / gating functions (LSTM, GRU) Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Long Short-Term Memory Long Short-Term Memory RNN Unit LSTM Unit x t σ h t h t-1 RNN Unit Input Output LSTM Unit σ σ Gate Gate σ z t Output i t o t x t g t σ h t ϕ + ϕ h t = z t h t-1 Input Output σ σ Input Modulation Gate Gate Gate σ z t Output i t o t f t x t g t σ ϕ + ϕ h t = z t h t-1 Forget Gate c t-1 c t Input Modulation Gate f t x t σ h t-1 Forget Gate c t-1 c t i t = σ ( W xi x t + W hi h t − 1 + b i ) (1) f t = σ ( W xf x t + W hf h t − 1 + b f ) (2) • Hochreiter and Schmidhuber, Neural Computation 1997 o t = σ ( W xo x t + W ho h t − 1 + b o ) (3) • (Figure from Donohue et al. CVPR 2015) g t = tanh ( W xc x t + W hc h t − 1 + b c ) (4) • Gating functions g ( · ) , f ( · ) , o ( · ) , reduce vanishing gradients c t = f t ⊙ c t − 1 + i t ⊙ g t (5) h t = o t ⊙ tanh ( c t ) (6) see Graves, Liwicki, Fernandez, Bertolami, Bunke, and Schmidhuber, TPAMI 2009

  3. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Convolutions to Aggregate over Time Residual (skip) Connections . . . ˆ y 0 y 1 ˆ ˆ y 2 ˆ y T − 2 y T − 1 ˆ y T ˆ Output . . . y ˆ y ˆ y ˆ ˆ z ( i ) = (ˆ z ( i ) z ( i ) ˆ 1 , . . . , ˆ T ) d = 4 Residual block (k=3, d=1) Residual block (k, d) put z (1) z (1) ˆ T − 1 ˆ T + Dropout = 4 Convolutional Filter + + Hidden ReLU Identity Map (or 1x1 Conv) d = 2 en WeightNorm Dilated Causal Conv Hidden = 2 1x1 Conv Dropout (optional) d = 1 ReLU en WeightNorm Input = 1 x 0 x 1 x 2 . . . x T − 2 x T − 1 x T Dilated Causal Conv x 0 x 1 . . . x T − 1 x T (a) ut z ( i − 1) = (ˆ z ( i − 1) z ( i − 1) x x x . . . ˆ , . . . , ˆ ) 1 T Figure 1. Architectural elements in a TCN. (a) A dilated causal (b) (c) • Control history by d (dilation, holes in the filter) and k (width of the filter) • Include residual connections to allow long-range modeling • Causal convolution, only use elements from the past and gradient flow • Bai, Kolter, Koltun arXiv 2018 Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Example: Image Captioning Example: Video Description <pad> <pad> <pad> <pad> <pad> LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM <pad> <pad> <pad> <pad> <BOS> LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM A man is talking <EOS> Encoding stage Decoding stage time • S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, K. Saenko, ICCV 2015 • Karpathy and Fei-Fei, CVPR 2015

  4. Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Example: Machine Translation Conclusion • Readings: http://www.deeplearningbook.org/ contents/rnn.html • Recurrent neural networks, can model sequential inputs/outputs • Input includes state (output) from previous time • Different structures: • RNN with multiple inputs/outputs • Gated recurrent unit (GRU) • Long short-term memory (LSTM) • Error gradients back-propagated across entire sequence • Wu et al., Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , arXiv 2016

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend