CS7015 (Deep Learning) : Lecture 11 Convolutional Neural Networks, - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 11 Convolutional Neural Networks, LeNet, AlexNet, ZF-Net, VGGNet, GoogLeNet and ResNet Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

Module 11.1 : The convolution operation 2/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals Now suppose our sensor is noisy x 0 x 1 x 2 To obtain a less noisy estimate we would like to average several measure- ∞ ments � s t = x t − a w − a = ( x ∗ w ) t More recent measurements are more a =0 important so we would like to take a weighted average input filter convolution 3/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

In practice, we would only sum over a 6 small window � s t = x t − a w − a The weight array ( w ) is known as the a =0 filter We just slide the filter over the input and compute the value of s t based on a window around x t w − 6 w − 5 w − 4 w − 3 w − 2 w − 1 w 0 Here the input (and the kernel) is one W 0.01 0.01 0.02 0.02 0.04 0.4 0.5 dimensional Can we use a convolutional operation on X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70 a 2D input also? S 0.00 1.80 0.00 0.00 0.00 0.00 0.00 s 6 = x 6 w 0 + x 5 w − 1 + x 4 w − 2 + x 3 w − 3 + x 2 w − 4 + x 1 w − 5 + x 0 w − 6 4/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

We can think of images as 2D inputs We would now like to use a 2D filter ( m × n ) First let us see what the 2D formula looks like This formula looks at all the preced- ing neighbours ( i − a, j − b ) In practice, we use the following formula which looks at the succeeding m − 1 n − 1 neighbours � � S ij = ( I ∗ K ) ij = I i − a,j − b K a,b I i + a,j + b K a,b a =0 b =0 9/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

Let us apply this idea to a toy ex- Input ample and see the results Kernel a c b d w x g e f h y z j i k ℓ Output aw+bx+ey+fz bw+cx+fy+gz cw+dx+gy+hz ew+fx+iy+jz fw+gx+jy+kz gw+hx+ky+ ℓ z 10/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

For the rest of the discussion we will ⌊ m 2 ⌋ ⌊ n 2 ⌋ use the following formula for convolu- � � S ij = ( I ∗ K ) ij = I i − a,j − b K m 2 + a, n 2 + b tion a = ⌊ − m 2 ⌋ b = ⌊ − n 2 ⌋ In other words we will assume that the kernel is centered on the pixel of interest pixel of interest So we will be looking at both preceed- ing and succeeding neighbors 11/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

Let us see some examples of 2D convolutions applied to images 12/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

1 1 1 1 1 1 = ∗ 1 1 1 blurs the image 13/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

0 -1 0 -1 5 -1 = ∗ 0 -1 0 sharpens the image 14/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

1 1 1 1 -8 1 = ∗ 1 1 1 detects the edges 15/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

We will now see a working example of 2D convolution. 16/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

We just slide the kernel over the input image Each time we slide the kernel we get one value in the output The resulting output is called a feature map. We can use multiple filters to get multiple feature maps. 17/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

a c b d Question g e f h In the 1D case, we slide a one dimensional filter over a one dimensional input j i k l In the 2D case, we slide a two dimen- stional filter over a two dimensional out- A B C B A B C put What would happen in the 3D case? 18/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution operation Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution operation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) OUTPUT INPUT Once again we can apply multiple filters to get multiple feature maps 19/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

Module 11.2 : Relation between input size, output size and filter size 20/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

So far we have not said anything explicit about the dimensions of the 1 inputs 2 filters 3 outputs and the relations between them We will see how they are related but before that we will define a few quantities 21/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

We first define the following quantities Width ( W 1 ), Height ( H 1 ) and Depth F ( D 1 ) of the original input The Stride S (We will come back to F H 2 this later) D 1 The number of filters K The spatial extent ( F ) of each filter H 1 (the depth of each filter is same as the depth of each input) The output is W 2 × H 2 × D 2 (we will soon see a formula for computing W 2 , W 2 W 1 H 2 and D 2 ) D 2 D 1 22/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

Let us compute the dimension ( W 2 , H 2 ) of the output Notice that we can’t place the kernel at the corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input 23/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

Let us compute the dimension ( W 2 , H 2 ) of the output Notice that we can’t place the kernel at the corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input As the size of the kernel increases, this be- In general, W 2 = W 1 − F + 1 comes true for even more pixels H 2 = H 1 − F + 1 For example, let’s consider a 5 × 5 kernel We have an even smaller output now We will refine this formula further 24/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

CS7015 (Deep Learning) : Lecture 11 Convolutional Neural Networks, - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 11 Convolutional Neural Networks, LeNet, AlexNet, ZF-Net, VGGNet, GoogLeNet and ResNet Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/1 Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning): Lecture 4 Feedforward Neural Networks, Backpropagation Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

Scott, Amy and Gina are employed at Catalpa Health, an outpatient mental health facility that

Demand Responsiveness from the Rand HIE Demand Responsiveness from the Rand HIE Mannings et al,

Tribal State Cooperation for Parole/Probation Supervision Majel M. Russell Elk River Law Office,

SESSION TRINITY RIVER EAST CAMPUS TARRANT COUNTY COLLEGE DISTRICT AGENDA FOR VASCULAR

Creating LaTeX and HTML documents from within Stata using textdoc and webdoc Ben Jann University

texdoc 2.0 An update on creating LaTeX documents from within Stata Ben Jann University of Bern,

Pair Programming Review and Practice Lots of time to work on Homeworks 3 and 4 Session 4 CSSE

Neural Networks Learning the network: Part 3 11-785, Fall 2020 Lecture 5 1 Recap : Training