advanced section 3 cnns and object detection
play

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science - PowerPoint PPT Presentation

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos Protopapas Lecture Outline Convnets review Classic Networks Residual networks Other combination blocks Object recognition systems Face recognition


  1. Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos Protopapas

  2. Lecture Outline Convnets review Classic Networks Residual networks Other combination blocks Object recognition systems Face recognition systems 2

  3. Convnets review 3

  4. Motivation for convnets ◮ Less parameters (weights) than a FC network. ◮ Invariant to object translation. ◮ Can tolerate some distortion in the images. ◮ Capable of generalizing and learning features. ◮ Require grid input. Source: http://cs231n.github.io/ 4

  5. CNN layers ◮ Convolutional layer: formed by filters , feature maps , and activation functions . – Convolution can be full , same or valid . � n input − f + 2 p � n output = + 1 . s ◮ Pooling layers: reduces overfitting. ◮ Fully connected layers: mix spacial and channel features together. Source: http://cs231n.github.io/ 5

  6. Introductory convolutional network example Fully connected Convolutional layer Max-pooling 1 0 c h a n n e l s 1 0 c h a n n e l s sigmoid or Input softmax 32x32x1 f = 2 f = 5 s = 2 s = 1 p = 0 p = 0 2 0 0 n e u r o n s 28x28x10 14x14x10 ◮ Training parameters: – 250 weights on the conv. filter + 10 bias terms. – 0 weights on the max-pool. – 13 × 13 × 10 = 1 , 690 output elements after max-pool. – 1 , 690 × 200 = 338 , 000 weights + 200 bias in the FC layer. – Total: 338,460 parameters to be trained. 6

  7. Classic Networks 7

  8. LeNet-5 ◮ Formulation is a bit outdated considering current practices. ◮ Uses convolutional networks followed by pooling layers and finishes with fully connected layers. ◮ Starts with high dimensional features and reduces their size while increasing the number of channels. ◮ Around 60k parameters. conv. layer a v g p o o l conv. layer a v g p o o l f = 2 f = 5 f = 5 f = 2 s = 2 s = 1 s = 1 s = 2 32x32x1 2 8 x 2 8 x 6 1 4 x 1 4 x 6 1 0 x 1 0 x 1 6 5 x 5 x 1 6 8 4 1 2 0 Yann LeCun, L´ eon Bottou, Yoshua Bengio, and Patrick Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998. 8

  9. AlexNet ◮ 1.2 million high-resolution (227x227x3) images in the ImageNet 2010 contest; ◮ 1000 different classes; NN with 60 million parameters to optimize ( ∼ 255 MB); ◮ Uses ReLu activation functions;. GPUs for training; 12 layers. c o n v . l a y e r ma x - p o o l ma x - p o o l c o n v . l a y e r f = 1 1 f = 3 f = 5 f = 3 s = 4 s = 2 s a me s = 2 5 5 x 5 5 x9 6 2 7 x 2 7 x 9 6 2 7 x 2 7 x 2 5 6 1 3 x 1 3 x 2 5 6 2 2 7 x 2 2 7 x 3 c o n v . l a y e r c o n v . l a y e r c o n v . l a y e r ma x - p o o l S o f t ma x 1 0 0 0 = f = 3 f = 3 f = 3 s = 1 s = 2 s = 1 1 3 x 1 3 x 3 8 4 1 3 x 1 3 x 3 8 4 1 3 x 1 3 x 2 5 6 6 x 6 x 2 5 6 9 2 1 6 4 0 9 6 4 0 9 6 Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet classification with deep convolutional neural 9 networks,” in Advances in neural information processing systems , pp. 1097–1105, 2012

  10. VGG-16 and VGG-19 ◮ ImageNet Challenge 2014; 16 or 19 layers; 138 million parameters (522 MB). ◮ Convolutional layers use ‘same’ padding and stride s = 1. ◮ Max-pooling layers use a filter size f = 2 and stride s = 2. C O N V = 3 x 3 fj l t e r , s = 1 , s a me MA X - P O O L = 2 x 2 fj l t e r , s = 2 2 2 4 x 2 2 4 x 6 4 1 1 2 x 1 1 2 x 6 4 1 1 2 x 1 1 2 x 1 2 8 5 6 x 5 6 x 1 2 8 POOL POOL [CONV 64] [CONV 128] x2 x2 2 2 4 x 2 2 4 x 3 5 6 x 5 6 x 2 5 6 2 8 x 2 8 x 2 5 6 [CONV 512] 1 4 x 1 4 x 5 1 2 2 8 x 2 8 x 5 1 2 [CONV 256] POOL POOL 3 x3 F C S o f t ma x F C 1 4 x 1 4 x 5 1 2 7 x 7 x 5 1 2 [CONV 512] POOL 4 0 9 6 4 0 9 6 1 0 0 0 x3 10 Karen Simonyan and Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014.

  11. Residual networks 11

  12. Residual block ◮ Residual nets appeared in 2016 to train very deep NN (100 or more layers). ◮ Their architecture uses ‘residual blocks’. ◮ Plain network structure: a [ l ] z [ l +1] a [ l +1] z [ l +2] a [ l +2] ReLu ReLu linear linear ◮ Residual network block : identity + a [ l ] z [ l +1] a [ l +1] z [ l +2] a [ l +2] linear ReLu linear ReLu Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2016. 12

  13. Equations of the residual block ◮ Plain network: ◮ Residual block: a [ l ] = g ( z [ l ] ) a [ l ] = g ( z [ l ] ) z [ l +1] = W [ l +1] a [ l ] + b [ l +1] z [ l +1] = W [ l +1] a [ l ] + b [ l +1] a [ l +1] = g ( z [ l +1] ) a [ l +1] = g ( z [ l +1] ) z [ l +2] = W [ l +2] a [ l +1] + b [ l +2] z [ l +2] = W [ l +2] a [ l +1] + b [ l +2] a [ l +2] = g ( z [ l +2] ) a [ l +2] = g ( z [ l +2] + a [ l ] ) ◮ With this extra connection gradients can travel backwards more easily. ◮ The residual block can very easily learn the identity function by setting W [ l +2] = 0 and b [ l +2] = 0. ◮ In such case, a [ l +2] = g ( a [ l ] ) = a [ l ] for ReLu units. – It becomes a flexible block that can expand the capacity of the network, or simply transform into a identity function that would not affect training. 13

  14. Residual network ◮ A residual network stacks residual blocks sequentially. ◮ The idea is to allow the network to become deeper without increasing the training complexity. P l a i n R e s N e t r r o o r r r r e e g g n n i i n n “practice” i i a a r r t t “theory” # l a y e r s # l a y e r s 14

  15. Residual network ◮ Residual networks implement blocks with convolutional layers that use ‘same’ padding option (even when max-pooling). – This allows the block to learn the identity function. ◮ The designer may want to reduce the size of features and use ‘valid’ padding. – In such case, the shortcut path can implement a new set of convolutional layers that reduces the size appropriately. 15

  16. Residual network 34 layer example VGG-19 34-layer plain 34-layer residual im ag e im ag e im ag e output 3x 3 conv, 64 size: 224 3x 3 conv, 64 pool, /2 output size: 112 3x 3 conv, 128 3x 3 conv, 128 7x 7 conv, 64, /2 7x 7 conv, 64, /2 pool, /2 pool, /2 pool, /2 output size: 56 3x 3 conv, 256 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv, 256 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv, 256 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv, 256 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv , 64 pool, /2 3x 3 conv , 128, /2 3x3 conv , 128, /2 output size:28 3x 3 conv, 512 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 512 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 512 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 512 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 output pool, /2 3x 3 conv , 256, /2 3x3 conv , 256, /2 size: 14 3x 3 conv, 512 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 512 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 512 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 512 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 Source: He2016 output pool, /2 3x 3 conv , 512, /2 3x3 conv , 512, /2 size: 7 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 output fc 4096 avg pool avg pool size: 1 fc 4096 fc 1000 fc 1000 fc 1000 16

  17. Classification error values on Imagenet ◮ Alexnet (2012) achieved a top-5 error of 15.3% (second place was 26.2%). ◮ ZFNet (2013) achieved a top-5 error of 14.8% (visualization of features). method top-1 err. top-5 err. 8.43 † VGG [40] (ILSVRC’14) - GoogLeNet [43] (ILSVRC’14) - 7.89 VGG [40] (v5) 24.4 7.1 PReLU-net [12] 21.59 5.71 BN-inception [16] 21.99 5.81 ResNet-34 B 21.84 5.71 ResNet-34 C 21.53 5.60 ResNet-50 20.74 5.25 ResNet-101 19.87 4.60 ResNet-152 19.38 4.49 17

  18. Dense Networks ◮ Goal: allow maximum information (and gradient) flow − → connect every layer directly with each other. ◮ DenseNets exploit the potential of the network through feature reuse − → no need to learn redundant feature maps. ◮ DenseNets layers are very narrow (e.g. 12 filters), and they just add a small set of new feature-maps. 18

  19. Dense Networks II ◮ DenseNets do not sum the output feature maps of the layer with the incoming feature maps but concatenate them: a [ l ] = g ([ a [0] , a [1] , . . . , a [ l − 1] ]) ◮ D imensions of the feature maps remains constant within a block, but the number of filters changes between them − → growth rate : k [ l ] = k [0] + k · ( l − 1) 19

  20. Dense Networks III: Full architecture 20

  21. Other combination blocks 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend