Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT - PowerPoint PPT Presentation

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Introduction Recurrent Neural Network LSTM Agenda § Get introduced to different recurrent neural architecture e.g. , RNNs, LSTMs, GRUs etc. § Get introduced to tasks involving sequential inputs and/or sequential outputs. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 2 / 30

Introduction Recurrent Neural Network LSTM Resources § Deep Learning by I. Goodfellow and Y. Bengio and A. Courville. [Link] [Chapter 10] § CS231n by Stanford University [Link] § Understanding LSTM Networks by Chris Olah [Link] Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 3 / 30

Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § So far, we focused mainly on prediction problems with fixedsize inputs and outputs. § In image classification, input is fixed size image and and output is its class, in video classification, the input is fixed size video and output is its class, in bounding-box regression the input is fixed size region proposal (resized/RoI pooled) and output is bounding box coordinates. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 4 / 30

Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § Suppose, we want our model to write down the caption of this image. Figure: Several people with umbrellas walk down a side walk on a rainy day. Image source: COCO Dataset, ICCV 2015 Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 5 / 30

Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § Will this work? Several people with umbrellas Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 6 / 30

Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § Will this work? Several people with umbrellas § When the model generates ‘people’, we need a way to tell the model that ‘several’ has already been generated and similarly for the other words. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 6 / 30

Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? Several people with umbrellas § e.g. Image Captioning § image -> sequence of words Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 7 / 30 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017

Introduction Recurrent Neural Network LSTM Recurrent Neural Networks: Process Sequences e.g., Machine Translation e.g., Image Captioning sequence of words -> sequence of words image-> sequence of words e.g., Frame Level Video Classification e.g., Sentiment Classification sequence of frames -> sequence of labels sequence of words -> sentiment Image source: CS231n from Stanford Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 8 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network § The fundamental feature of a Recurrent Neural Network (RNN) is that the network contains at least one feedback connection so that activation can flow in a loop. § The feedback connection allows information to persist. Remember the generation of people would require the generation of several to be remembered. § The simplest form of RNN has the previous set of hidden unit activations feeding back into the network along with the inputs. Outputs ℎ " ℎ " Hidden Units Delay 𝑦 " ℎ "#$ Inputs Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 9 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network Outputs ℎ " ℎ " Hidden Units Delay 𝑦 " ℎ "#$ Inputs § Note that the concept of ‘ time ’ or sequential processing comes into picture. § The activations are updated one time-step at a time. § The task of the delay unit is to simply delay the hidden layer activation until the next time-step. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 10 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network Input vector ℎ " = 𝑔 𝑦 " , ℎ "'( Some function Old state New state § f , in particular, can be a layer of a neural network. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 11 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network Input vector ℎ " = 𝑔 𝑦 " , ℎ "'( Some function Old state New state § f , in particular, can be a layer of a neural network. § Lets unroll the recurrent connection. 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ 𝑿 & 𝑿 & 𝑿 & 𝑿 & 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § Note that all weight matrices are same across timesteps. So, the weights are shared for all the timesteps. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 11 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network: Forward Pass 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ 𝑿 & 𝑿 & 𝑿 & 𝑿 & 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + a t = W h h t − 1 + W i x t (1) h t = g ( a t ) (2) y t = W o h t (3) § Note that we can have biases too. For simplicity these are omitted. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 12 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § BPTT: Backpropagation through time Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 13 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § BPTT: Backpropagation through time T L t and we are after ∂L ∂L ∂L � § Total loss L = ∂ W o , ∂ W h and ∂ W i t =1 Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 13 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § BPTT: Backpropagation through time T L t and we are after ∂L ∂L ∂L � § Total loss L = ∂ W o , ∂ W h and ∂ W i t =1 § Lets compute ∂L ∂ y t . ∂L t ∂ y t = 1 . ∂L t ∂ y t = ∂L ∂L (4) ∂L t ∂ y t ∂L t ∂ y t is computable depending on the particular form of the loss § function. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 13 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + ∂ h t . The subtlety here is that all L t after timestep t § Lets compute ∂L are functions of h t . So, let us first consider ∂L ∂ h T , where T is the last timestep. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 14 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + ∂ h t . The subtlety here is that all L t after timestep t § Lets compute ∂L are functions of h t . So, let us first consider ∂L ∂ h T , where T is the last timestep. ∂ y T ∂ h T = ∂L ∂L ∂L ∂ h T = ∂ y T W o (5) ∂ y T Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 14 / 30

Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + ∂ h t . The subtlety here is that all L t after timestep t § Lets compute ∂L are functions of h t . So, let us first consider ∂L ∂ h T , where T is the last timestep. ∂ y T ∂ h T = ∂L ∂L ∂L ∂ h T = ∂ y T W o (5) ∂ y T ∂L ∂ y T , we just computed last slide (eqn. (4)). § ∂ h t . h t affects y t and also h t +1 . § For a generic t , we need to compute ∂L For this we will use something that we used while studying Backpropagation for feedforward networks. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 14 / 30

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT - PowerPoint PPT Presentation

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020 Introduction Recurrent Neural Network LSTM Agenda Get introduced to different recurrent neural architecture e.g. , RNNs, LSTMs, GRUs etc. Get

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Introduction to Recurrent Neural Networks Jakob Verbeek Modeling sequential data with Recurrent

Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

IN5550 Neural Methods in Natural Language Processing Applications of Recurrent Neural Networks

Non-projective Dependency-based Pre-Reordering with Recurrent Neural Network for Machine

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting LSTM

Deep Recurrent Survival Analysis Kan Ren, Jiarui Qin, Lei Zheng, Zhengyu Yang, Weinan Zhang, Lin

A Comprehensive Survey on Deep Future Frame Video Prediction by Javier Selva Castell

Machine Learning for Computational Linguistics Recurrent neural networks (RNNs) ar

Machine Translation 2 Wikipedia Machine translation, often referred to by the acronym MT, is a

Sambuz

Useful Links

Newsletter

Mail Us