understanding lstm networks
play

Understanding LSTM Networks Recurrent Neural Networks An unrolled - PowerPoint PPT Presentation

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The Problem of Long-Term Dependencies RNN short-term dependencies Language model trying to predict the next word based on the previous ones the clouds


  1. Understanding LSTM Networks

  2. Recurrent Neural Networks

  3. An unrolled recurrent neural network

  4. The Problem of Long-Term Dependencies

  5. RNN short-term dependencies Language model trying to predict the next word based on the previous ones the clouds are in the sky, h 0 h 1 h 2 h 3 h 4 A A A A A x 0 x 1 x 2 x 3 x 4

  6. RNN long-term dependencies Language model trying to predict the next word based on the previous ones I grew up in India… I speak fluent Hindi. h 0 h 1 h 2 h t − 1 h t A A A A A x 0 x 1 x 2 x t − 1 x t

  7. Standard RNN

  8. Backpropagation Through Time (BPTT)

  9. RNN forward pass s t = tanh ( Ux t + Ws t − 1 ) ^ y t = softmax ( Vs t ) V V V V V W W W W W y )=− ∑ U U U U U E ( y , ^ E t ( y t , ^ y t ) t

  10. Backpropagation Through Time ∂ E t ∂ E ∂ W = ∑ ∂ W t ∂ E 3 ∂ W =∂ E 3 ∂ ^ y 3 ∂ s 3 ∂ ^ y 3 ∂ s 3 ∂ W s 3 = tanh ( Ux t + Ws 2 ) But S_3 depends on s_2, which depends on W and s_1, and so on. 3 ∂ E 3 ∂ ^ ∂ E 3 ∂ s 3 ∂ s k y 3 ∂ W = ∑ ∂ ^ ∂ s 3 ∂ s k ∂ W y 3 k = 0

  11. The Vanishing Gradient Problem 3 ∂ E 3 ∂ ^ ∂ E 3 y 3 ∂ s 3 ∂ s k ∂ W = ∑ ∂ ^ ∂ s 3 ∂ s k ∂ W y 3 k = 0 3 ∂ E 3 ∂ s 3 ( ∏ ∂ s j − 1 ) ∂ ^ 3 ∂ E 3 y 3 ∂ s j ∂ s k ∂ W = ∑ ∂ ^ ∂ W y 3 k = 0 j = k + 1 ● Derivative of a vector w.r.t a vector is a matrix called jacobian ● 2-norm of the above Jacobian matrix has an upper bound of 1 ● tanh maps all values into a range between -1 and 1, and the derivative is bounded by 1 ● With multiple matrix multiplications, gradient values shrink exponentially ● Gradient contributions from “far away” steps become zero ● Depending on activation functions and network parameters, gradients could explode instead of vanishing

  12. Activation function

  13. Basic LSTM

  14. Unrolling the LSTM through time

  15. Constant error carousel s t = tanh ( Ux t + Ws t − 1 ) o t C t ⋅ o t Replaced by σ Π C t = ~ ( t ) + C t − 1 C t ⋅ i c σ Edge to next Π time step i t Edge from previous ~ σ C t time step (and current input) Weight fixed at 1

  16. Input gate ● Use contextual information to decide Store input into memory ● Protect memory from overwritten ● by other irrelevant inputs o t C t ⋅ o t σ Π C t = ~ ( t ) + C t − 1 C t ⋅ i c σ Edge to next Π time step i t Edge from previous ~ σ C t time step (and current input) Weight fixed at 1

  17. Output gate ● Use contextual information to decide Access information in memory ● Block irrelevant information ● o t C t ⋅ o t σ Π C t = ~ ( t ) + C t − 1 C t ⋅ i c σ Edge to next Π time step i t Edge from previous ~ σ C t time step (and current input) Weight fixed at 1

  18. Forget or reset gate o t C t ⋅ o t σ Π f t σ C t = ~ ( t ) + C t − 1 ⋅ f t C t ⋅ i c Π σ Edge to next Π time step i t Edge from previous ~ σ C t time step (and current input) Weight fixed at 1

  19. LSTM with four interacting layers

  20. The cell state

  21. Gates sigmoid layer

  22. Step-by-Step LSTM Walk Through

  23. Forget gate layer

  24. Input gate layer

  25. The current state

  26. Output layer

  27. Refrence ● http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ● http://www.wildml.com/ ● http://nikhilbuduma.com/2015/01/11/a-deep-dive-into-recurrent-neural-netwo rks/ ● http://deeplearning.net/tutorial/lstm.html ● https://theclevermachine.files.wordpress.com/2014/09/act-funs.png ● http://blog.terminal.com/demistifying-long-short-term-memory-lstm-recurrent -neural-networks/ ● A Critical Review of Recurrent Neural Networks for Sequence Learning, Zachary C. Lipton, John Berkowitz ● Long Short-Term Memory, Hochreiter, Sepp and Schmidhuber, Jurgen, 1997 ● Gers, F. A.; Schmidhuber, J. & Cummins, F. A. (2000), 'Learning to Forget: Continual Prediction with LSTM.', Neural Computation 12 (10) , 2451-2471 .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend