cs480 680 machine learning lecture 20 convolutional
play

CS480/680 Machine Learning Lecture 20: Convolutional Neural Network - PowerPoint PPT Presentation

CS480/680 Machine Learning Lecture 20: Convolutional Neural Network Zahra Sheikhbahaee March 29, 2020 University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1 Outline Convolution Zero Padding Stride Weight Sharing Pooling


  1. CS480/680 Machine Learning Lecture 20: Convolutional Neural Network Zahra Sheikhbahaee March 29, 2020 University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1

  2. Outline Convolution Zero Padding Stride Weight Sharing Pooling Convolutional neural net architecture LeNet-5 AlexNet ResNet Inception University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 2

  3. Computer Vision Tasks Using Convolutional Networks Object Detection Neural Style transfer Figure: Faster R-CNN model Semantic Segmentation Figure: B The Shipwreck of the Minotaur by J.M.W. Turner, 1805. C The Starry Night by Vincent van Gogh, 1889. D Der Schrei by Edvard Munch, 1893. Figure: FCN University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 3

  4. Convolutional Neural Networks ◮ Convolutional neural network (CNN) is designed to automatically and adaptively learn spatial hierarchies of features through a backpropagation algorithm. ◮ A deficiency of fully connected architectures is that the topology of the input is entirely ignored. ◮ Convolutional neural networks combine three mechanisms: - local receptive field - Share weight - Spatial or temporal sampling ◮ CNN is composed of multiple building blocks, such as convolution layers, pooling layers, and fully connected layers. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 4

  5. Definition of Convolution Mathematical definition a b � � g ( x , y ) = K ∗ I = K ( i , j ) I ( x − i , y − j ) i = − a j = − b Figure: Edge detection with horizontal and vertical filters - The rhs image is convolved by a 3 × 3 Sobel filter which puts a little bit more weight on the central pixels. The coefficient matrices for the Sobel filter are   1 0 − 1  and G y = G T G x = 2 0 − 2 x . An edge is where the pixel intensity changes in a notorious way. A good way to express changes is by using  1 0 − 1 � derivatives. The Sobel operator calculates the approximation to a derivative of an image by G = G 2 x + G 2 y . University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 5

  6. Detecting Vertical Edges n in : number of input features n out : number of output features k : Convolutional kernel size University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 6

  7. Some Other Kernel Examples University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 7

  8. Gabor Filters ◮ Gabor filters: common feature maps inspired by the human vision system and it is used for texture analysis. ◮ A Gabor filter can be viewed as a sinusoidal plane of particular frequency and orientation, modulated by a Gaussian envelope. ◮ Weights : Grey → zero, white → positive, black → negative University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 8

  9. Padding Schemes Definition of Padding The number of zeros concatenated at the beginning and at the end of an axis ( p ). Why padding is important? - The shrinking output - Throwing away the information on corners of the images To preserve the size of the output as the input, p = k − 1 padding is needed for the input image. 2 University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 9

  10. Strided Convolution Definition of stride The distance between two consecutive positioning of the kernel along axes ( s ). (d) 2 × 2 strides (e) unit strides Padding with p zeros changes the effective input size from n in to n in + 2 p . Then the size of the output is equal to n out = n in − k + 2 p + 1 . s University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 10

  11. Weight Sharing – In CNNs, each filter is replicated across the entire visual field. These replicated units share the same parameterization (weight vector and bias) and form a feature map. – This provides the basis for the invariance of the network outputs to translation and distortions of the input images. – weight sharing helps in reducing over-fitting due to the reduced number of trainable parameters. – modeling local correlations is easy with CNNs through weight sharing scheme. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 11

  12. Pooling Layers Figure: Architecture of CNN: FC is a fully connected layer, ReLU denotes a rectified Linear Unit and c i is the number of input channels - A convolution layer computes feature response maps that involve multiple channels within some localized spatial region. - A pooling layer is restricted to act within just one channel at a time, condensing the activation values in each spatially local region in the currently considered channel. - pooling operations play a role in producing downstream representations that are more robust to the effects of variations in data while still preserving important motifs. - Pooling layer does not have any parameter. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 12

  13. Type of Pooling Layers ◮ Max Pooling ◮ Average Pooling University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 13

  14. Drawback of Max and Average Pooling ◮ Max pooling drawback ◮ Average pooling drawback University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 14

  15. Convolutional neural net architecture (LeNet-5) Figure: Lecun et al. 1998 ◮ The input is a 32 × 32 grayscale image which passes through the first convolutional layer. ◮ Layer C 1 has 6 feature maps (filters) and 5 × 5 filter size with a stride of one and no padding. The image dimensions changes from 32 × 32 × 1 to 28 × 28 × 6 . The layer has 156 trainable parameters. ◮ Layer S 2 is a subsampling layer with 6 feature maps. It has a filter size 2 × 2 and a stride of s = 2 . The resulting image dimensions will be reduced to 14 × 14 × 6 . Layer S 2 has only 12 trainable parameter. ◮ Layer C 3 is a convolutional layer with 16 feature maps. The filter size is 5 × 5 and a stride of 1 and it has 1516 trainable parameters. ◮ The S 4 layer is an average pooling layer with filter size 2 × 2 and a stride of 2. This layer has 16 feature maps with 32 parameters and its output will be reduced to 5 × 5 × 16 . ◮ The fifth layer C 5 is a fully connected convolutional layer with 120 feature maps each of size 1 × 1 . Each of the 120 units in C 5 is connected to all the 5 × 5 × 16 nodes in the s 4 layer. ◮ The sixth layer is a fully connected layer F 6 with 84 units and there is a fully connected Euclidean radial basis function (RBF) instead of softmax function . University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 15

  16. Convolutional neural net architecture (AlexNet) Figure: Krizhevsky et al. 2012 ◮ It contains 5 convolutional layers and 3 fully connected layers. ◮ The first convolutional layer filters the 227 × 227 × 3 input image with 96 kernels of size 11 × 11 × 3 with a stride of s = 4 pixels. ◮ The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 × 5 × 48 - Use Relu instead of Tanh to add non-linearity. It accelerates the speed by 6 times at the same accuracy. - Use dropout instead of regularisation to deal with overfitting. - Overlap pooling to reduce the size of network. - Use multiple GPU to train 62.3 millions parameters. - Employ Local Response Normalization University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 16

  17. Convolutional neural net architecture (AlexNet) ◮ The parallelization scheme that employed in AlexNet puts half of the kernels on each GPU. ◮ The GPUs communicate only in certain layers. ◮ top 48 kernels on GPU 1 : color-agnostic ◮ bottom 48 kernels on GPU 2 : color-specific. ◮ . This scheme reduces our top-1 and top-5 error rates by 1 . 7 % and 1 . 2 % , respectively. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 17

  18. Deep Residual Networks ◮ Training deep neural networks with gradient based optimizers and learning methods can cause vanishing and exploding gradients during backpropagation. ◮ the degradation problem: As the network depth increasing, accuracy gets saturated and then degrades rapidly. Adding more layers to a suitably deep model leads to higher training error. ◮ A residual network is a solution for the upper mentioned problems. These networks are easier to optimize, and can gain accuracy from considerably increased depth. ◮ The shortcut connections simply perform identity mapping, and their outputs are added to the outputs of the stacked layers. x l and x l + 1 : input and output of the l -th unit F and h ( x l ) : a residual function and an identity mapping y l = h ( x l ) + F ( x l , W l ) x l + 1 = f ( y l ) ���� ReLU If f is also an identity mapping: x l + 1 = y l , so we have x l + 1 = x l + F ( x l , W l ) L − 1 � x L = x l + F ( x i , W i ) i = l L − 1 ∂ε ∂ε ∂ x L ∂ε ∂ � = = ( 1 + F ( x i , W i )) ∂ x l ∂ x L ∂ x l ∂ x L ∂ x l i = l University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 18

  19. Convolutional neural net architecture (ResNet) Figure: He et al. 2015 ◮ 152-layer model for the ImageNet competition with 3 . 57 % top 5 error (better than human performance). ◮ Every residula block has two 3 × 3 convolutional layers ◮ Periodically, double number of filters and downsample spatially using stride 2 which divides by 2 in each dimension ◮ There is additional conv layer at the beginning ◮ No fully connected (FC) layers at the end, just a globally averaging pooling layer (only FC 1000 to output classes) University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend