recurrent neural networks
play

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT - PowerPoint PPT Presentation

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020 Introduction Recurrent Neural Network LSTM Agenda Get introduced to different recurrent neural architecture e.g. , RNNs, LSTMs, GRUs etc. Get


  1. Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

  2. Introduction Recurrent Neural Network LSTM Agenda § Get introduced to different recurrent neural architecture e.g. , RNNs, LSTMs, GRUs etc. § Get introduced to tasks involving sequential inputs and/or sequential outputs. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 2 / 30

  3. Introduction Recurrent Neural Network LSTM Resources § Deep Learning by I. Goodfellow and Y. Bengio and A. Courville. [Link] [Chapter 10] § CS231n by Stanford University [Link] § Understanding LSTM Networks by Chris Olah [Link] Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 3 / 30

  4. Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § So far, we focused mainly on prediction problems with fixedsize inputs and outputs. § In image classification, input is fixed size image and and output is its class, in video classification, the input is fixed size video and output is its class, in bounding-box regression the input is fixed size region proposal (resized/RoI pooled) and output is bounding box coordinates. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 4 / 30

  5. Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § Suppose, we want our model to write down the caption of this image. Figure: Several people with umbrellas walk down a side walk on a rainy day. Image source: COCO Dataset, ICCV 2015 Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 5 / 30

  6. Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § Will this work? Several people with umbrellas Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 6 / 30

  7. Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § Will this work? Several people with umbrellas § When the model generates ‘people’, we need a way to tell the model that ‘several’ has already been generated and similarly for the other words. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 6 / 30

  8. Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? Several people with umbrellas § e.g. Image Captioning § image -> sequence of words Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 7 / 30 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017

  9. Introduction Recurrent Neural Network LSTM Recurrent Neural Networks: Process Sequences e.g., Machine Translation e.g., Image Captioning sequence of words -> sequence of words image-> sequence of words e.g., Frame Level Video Classification e.g., Sentiment Classification sequence of frames -> sequence of labels sequence of words -> sentiment Image source: CS231n from Stanford Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 8 / 30

  10. Introduction Recurrent Neural Network LSTM Recurrent Neural Network § The fundamental feature of a Recurrent Neural Network (RNN) is that the network contains at least one feedback connection so that activation can flow in a loop. § The feedback connection allows information to persist. Remember the generation of people would require the generation of several to be remembered. § The simplest form of RNN has the previous set of hidden unit activations feeding back into the network along with the inputs. Outputs ℎ " ℎ " Hidden Units Delay 𝑦 " ℎ "#$ Inputs Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 9 / 30

  11. Introduction Recurrent Neural Network LSTM Recurrent Neural Network Outputs ℎ " ℎ " Hidden Units Delay 𝑦 " ℎ "#$ Inputs § Note that the concept of ‘ time ’ or sequential processing comes into picture. § The activations are updated one time-step at a time. § The task of the delay unit is to simply delay the hidden layer activation until the next time-step. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 10 / 30

  12. Introduction Recurrent Neural Network LSTM Recurrent Neural Network Input vector ℎ " = 𝑔 𝑦 " , ℎ "'( Some function Old state New state § f , in particular, can be a layer of a neural network. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 11 / 30

  13. Introduction Recurrent Neural Network LSTM Recurrent Neural Network Input vector ℎ " = 𝑔 𝑦 " , ℎ "'( Some function Old state New state § f , in particular, can be a layer of a neural network. § Lets unroll the recurrent connection. 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ 𝑿 & 𝑿 & 𝑿 & 𝑿 & 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § Note that all weight matrices are same across timesteps. So, the weights are shared for all the timesteps. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 11 / 30

  14. Introduction Recurrent Neural Network LSTM Recurrent Neural Network: Forward Pass 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ 𝑿 & 𝑿 & 𝑿 & 𝑿 & 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + a t = W h h t − 1 + W i x t (1) h t = g ( a t ) (2) y t = W o h t (3) § Note that we can have biases too. For simplicity these are omitted. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 12 / 30

  15. Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § BPTT: Backpropagation through time Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 13 / 30

  16. Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § BPTT: Backpropagation through time T L t and we are after ∂L ∂L ∂L � § Total loss L = ∂ W o , ∂ W h and ∂ W i t =1 Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 13 / 30

  17. Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § BPTT: Backpropagation through time T L t and we are after ∂L ∂L ∂L � § Total loss L = ∂ W o , ∂ W h and ∂ W i t =1 § Lets compute ∂L ∂ y t . ∂L t ∂ y t = 1 . ∂L t ∂ y t = ∂L ∂L (4) ∂L t ∂ y t ∂L t ∂ y t is computable depending on the particular form of the loss § function. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 13 / 30

  18. Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + ∂ h t . The subtlety here is that all L t after timestep t § Lets compute ∂L are functions of h t . So, let us first consider ∂L ∂ h T , where T is the last timestep. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 14 / 30

  19. Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + ∂ h t . The subtlety here is that all L t after timestep t § Lets compute ∂L are functions of h t . So, let us first consider ∂L ∂ h T , where T is the last timestep. ∂ y T ∂ h T = ∂L ∂L ∂L ∂ h T = ∂ y T W o (5) ∂ y T Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 14 / 30

  20. Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + ∂ h t . The subtlety here is that all L t after timestep t § Lets compute ∂L are functions of h t . So, let us first consider ∂L ∂ h T , where T is the last timestep. ∂ y T ∂ h T = ∂L ∂L ∂L ∂ h T = ∂ y T W o (5) ∂ y T ∂L ∂ y T , we just computed last slide (eqn. (4)). § ∂ h t . h t affects y t and also h t +1 . § For a generic t , we need to compute ∂L For this we will use something that we used while studying Backpropagation for feedforward networks. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 14 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend