Advanced Section #3: CNNs and Object Detection AC 209B: Data Science - PowerPoint PPT Presentation

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos Protopapas

Lecture Outline Convnets review Classic Networks Residual networks Other combination blocks Object recognition systems Face recognition systems 2

Convnets review 3

Motivation for convnets ◮ Less parameters (weights) than a FC network. ◮ Invariant to object translation. ◮ Can tolerate some distortion in the images. ◮ Capable of generalizing and learning features. ◮ Require grid input. Source: http://cs231n.github.io/ 4

CNN layers ◮ Convolutional layer: formed by filters , feature maps , and activation functions . – Convolution can be full , same or valid . � n input − f + 2 p � n output = + 1 . s ◮ Pooling layers: reduces overfitting. ◮ Fully connected layers: mix spacial and channel features together. Source: http://cs231n.github.io/ 5

Introductory convolutional network example Fully connected Convolutional layer Max-pooling 1 0 c h a n n e l s 1 0 c h a n n e l s sigmoid or Input softmax 32x32x1 f = 2 f = 5 s = 2 s = 1 p = 0 p = 0 2 0 0 n e u r o n s 28x28x10 14x14x10 ◮ Training parameters: – 250 weights on the conv. filter + 10 bias terms. – 0 weights on the max-pool. – 13 × 13 × 10 = 1 , 690 output elements after max-pool. – 1 , 690 × 200 = 338 , 000 weights + 200 bias in the FC layer. – Total: 338,460 parameters to be trained. 6

Classic Networks 7

LeNet-5 ◮ Formulation is a bit outdated considering current practices. ◮ Uses convolutional networks followed by pooling layers and finishes with fully connected layers. ◮ Starts with high dimensional features and reduces their size while increasing the number of channels. ◮ Around 60k parameters. conv. layer a v g p o o l conv. layer a v g p o o l f = 2 f = 5 f = 5 f = 2 s = 2 s = 1 s = 1 s = 2 32x32x1 2 8 x 2 8 x 6 1 4 x 1 4 x 6 1 0 x 1 0 x 1 6 5 x 5 x 1 6 8 4 1 2 0 Yann LeCun, L´ eon Bottou, Yoshua Bengio, and Patrick Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998. 8

AlexNet ◮ 1.2 million high-resolution (227x227x3) images in the ImageNet 2010 contest; ◮ 1000 different classes; NN with 60 million parameters to optimize ( ∼ 255 MB); ◮ Uses ReLu activation functions;. GPUs for training; 12 layers. c o n v . l a y e r ma x - p o o l ma x - p o o l c o n v . l a y e r f = 1 1 f = 3 f = 5 f = 3 s = 4 s = 2 s a me s = 2 5 5 x 5 5 x9 6 2 7 x 2 7 x 9 6 2 7 x 2 7 x 2 5 6 1 3 x 1 3 x 2 5 6 2 2 7 x 2 2 7 x 3 c o n v . l a y e r c o n v . l a y e r c o n v . l a y e r ma x - p o o l S o f t ma x 1 0 0 0 = f = 3 f = 3 f = 3 s = 1 s = 2 s = 1 1 3 x 1 3 x 3 8 4 1 3 x 1 3 x 3 8 4 1 3 x 1 3 x 2 5 6 6 x 6 x 2 5 6 9 2 1 6 4 0 9 6 4 0 9 6 Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet classification with deep convolutional neural 9 networks,” in Advances in neural information processing systems , pp. 1097–1105, 2012

VGG-16 and VGG-19 ◮ ImageNet Challenge 2014; 16 or 19 layers; 138 million parameters (522 MB). ◮ Convolutional layers use ‘same’ padding and stride s = 1. ◮ Max-pooling layers use a filter size f = 2 and stride s = 2. C O N V = 3 x 3 fj l t e r , s = 1 , s a me MA X - P O O L = 2 x 2 fj l t e r , s = 2 2 2 4 x 2 2 4 x 6 4 1 1 2 x 1 1 2 x 6 4 1 1 2 x 1 1 2 x 1 2 8 5 6 x 5 6 x 1 2 8 POOL POOL [CONV 64] [CONV 128] x2 x2 2 2 4 x 2 2 4 x 3 5 6 x 5 6 x 2 5 6 2 8 x 2 8 x 2 5 6 [CONV 512] 1 4 x 1 4 x 5 1 2 2 8 x 2 8 x 5 1 2 [CONV 256] POOL POOL 3 x3 F C S o f t ma x F C 1 4 x 1 4 x 5 1 2 7 x 7 x 5 1 2 [CONV 512] POOL 4 0 9 6 4 0 9 6 1 0 0 0 x3 10 Karen Simonyan and Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014.

Residual networks 11

Residual block ◮ Residual nets appeared in 2016 to train very deep NN (100 or more layers). ◮ Their architecture uses ‘residual blocks’. ◮ Plain network structure: a [ l ] z [ l +1] a [ l +1] z [ l +2] a [ l +2] ReLu ReLu linear linear ◮ Residual network block : identity + a [ l ] z [ l +1] a [ l +1] z [ l +2] a [ l +2] linear ReLu linear ReLu Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2016. 12

Equations of the residual block ◮ Plain network: ◮ Residual block: a [ l ] = g ( z [ l ] ) a [ l ] = g ( z [ l ] ) z [ l +1] = W [ l +1] a [ l ] + b [ l +1] z [ l +1] = W [ l +1] a [ l ] + b [ l +1] a [ l +1] = g ( z [ l +1] ) a [ l +1] = g ( z [ l +1] ) z [ l +2] = W [ l +2] a [ l +1] + b [ l +2] z [ l +2] = W [ l +2] a [ l +1] + b [ l +2] a [ l +2] = g ( z [ l +2] ) a [ l +2] = g ( z [ l +2] + a [ l ] ) ◮ With this extra connection gradients can travel backwards more easily. ◮ The residual block can very easily learn the identity function by setting W [ l +2] = 0 and b [ l +2] = 0. ◮ In such case, a [ l +2] = g ( a [ l ] ) = a [ l ] for ReLu units. – It becomes a flexible block that can expand the capacity of the network, or simply transform into a identity function that would not affect training. 13

Residual network ◮ A residual network stacks residual blocks sequentially. ◮ The idea is to allow the network to become deeper without increasing the training complexity. P l a i n R e s N e t r r o o r r r r e e g g n n i i n n “practice” i i a a r r t t “theory” # l a y e r s # l a y e r s 14

Residual network ◮ Residual networks implement blocks with convolutional layers that use ‘same’ padding option (even when max-pooling). – This allows the block to learn the identity function. ◮ The designer may want to reduce the size of features and use ‘valid’ padding. – In such case, the shortcut path can implement a new set of convolutional layers that reduces the size appropriately. 15

Residual network 34 layer example VGG-19 34-layer plain 34-layer residual im ag e im ag e im ag e output 3x 3 conv, 64 size: 224 3x 3 conv, 64 pool, /2 output size: 112 3x 3 conv, 128 3x 3 conv, 128 7x 7 conv, 64, /2 7x 7 conv, 64, /2 pool, /2 pool, /2 pool, /2 output size: 56 3x 3 conv, 256 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv, 256 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv, 256 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv, 256 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv , 64 3x 3 conv , 64 pool, /2 3x 3 conv , 128, /2 3x3 conv , 128, /2 output size:28 3x 3 conv, 512 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 512 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 512 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 512 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 3x 3 conv, 128 output pool, /2 3x 3 conv , 256, /2 3x3 conv , 256, /2 size: 14 3x 3 conv, 512 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 512 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 512 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 512 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 3x 3 conv, 256 Source: He2016 output pool, /2 3x 3 conv , 512, /2 3x3 conv , 512, /2 size: 7 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 3x 3 conv, 512 output fc 4096 avg pool avg pool size: 1 fc 4096 fc 1000 fc 1000 fc 1000 16

Classification error values on Imagenet ◮ Alexnet (2012) achieved a top-5 error of 15.3% (second place was 26.2%). ◮ ZFNet (2013) achieved a top-5 error of 14.8% (visualization of features). method top-1 err. top-5 err. 8.43 † VGG [40] (ILSVRC’14) - GoogLeNet [43] (ILSVRC’14) - 7.89 VGG [40] (v5) 24.4 7.1 PReLU-net [12] 21.59 5.71 BN-inception [16] 21.99 5.81 ResNet-34 B 21.84 5.71 ResNet-34 C 21.53 5.60 ResNet-50 20.74 5.25 ResNet-101 19.87 4.60 ResNet-152 19.38 4.49 17

Dense Networks ◮ Goal: allow maximum information (and gradient) flow − → connect every layer directly with each other. ◮ DenseNets exploit the potential of the network through feature reuse − → no need to learn redundant feature maps. ◮ DenseNets layers are very narrow (e.g. 12 filters), and they just add a small set of new feature-maps. 18

Dense Networks II ◮ DenseNets do not sum the output feature maps of the layer with the incoming feature maps but concatenate them: a [ l ] = g ([ a [0] , a [1] , . . . , a [ l − 1] ]) ◮ D imensions of the feature maps remains constant within a block, but the number of filters changes between them − → growth rate : k [ l ] = k [0] + k · ( l − 1) 19

Dense Networks III: Full architecture 20

Other combination blocks 21

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science - PowerPoint PPT Presentation

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos Protopapas Lecture Outline Convnets review Classic Networks Residual networks Other combination blocks Object recognition systems Face recognition

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Recent Progress on CNNs for Object Detection & Image Compression Rahul Sukthankar Google

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

From image classification to object detection Image classification Object detection Image source

AutoML for Object Detection Xiangyu Zhang MEGVII Research 1 AutoML for Advances in AutoML

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Efficient Out-of-Distribution Detection in Digital Pathology Jasper Linmans, Jeroen van der Laak,

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

Lecture 11: Object detection Contains slides from S. Lazebnik, R. Girshick, B. Hariharan 1

EagleRI P Flexo 5 EagleRI P Flexo 5 w w w .foundereagle.com Contents Milestones Facts

Hot Forming Line Isgec Solution for g Hot Forming Line Composition : g p 1. De-Stacking and

Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger

SCL: Site Construction Language SCL: Site Construction Language Sudip Das, Clark Landis, Sudip

Measuring QoS in fixed and mobile networks Mari Ivona , spec. sci. E.E. Manager for Cable

INTERNATIONAL 450 ASSOCIATION Colin Chandler Vice Chairman International 450 Association IA

Creating a Simple Zigbee Communication Network using XBee ECE-480 SS13 DT2 Outline: What

MoCA HOME NETWORK INSTALLATION AND MAINTENANCE SCTE Greater Chicago Chapter Meeting December 2,

Sambuz

Useful Links

Newsletter

Mail Us

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science - PowerPoint PPT Presentation

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos Protopapas Lecture Outline Convnets review Classic Networks Residual networks Other combination blocks Object recognition systems Face recognition

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Recent Progress on CNNs for Object Detection &amp; Image Compression Rahul Sukthankar Google

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye &amp; Woon Kyoung Sung

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

From image classification to object detection Image classification Object detection Image source

AutoML for Object Detection Xiangyu Zhang MEGVII Research 1 AutoML for Advances in AutoML

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Efficient Out-of-Distribution Detection in Digital Pathology Jasper Linmans, Jeroen van der Laak,

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

Lecture 11: Object detection Contains slides from S. Lazebnik, R. Girshick, B. Hariharan 1

EagleRI P Flexo 5 EagleRI P Flexo 5 w w w .foundereagle.com Contents Milestones Facts

Hot Forming Line Isgec Solution for g Hot Forming Line Composition : g p 1. De-Stacking and

Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger

SCL: Site Construction Language SCL: Site Construction Language Sudip Das, Clark Landis, Sudip

Measuring QoS in fixed and mobile networks Mari Ivona , spec. sci. E.E. Manager for Cable

INTERNATIONAL 450 ASSOCIATION Colin Chandler Vice Chairman International 450 Association IA

Creating a Simple Zigbee Communication Network using XBee ECE-480 SS13 DT2 Outline: What

MoCA HOME NETWORK INSTALLATION AND MAINTENANCE SCTE Greater Chicago Chapter Meeting December 2,

Sambuz

Useful Links

Newsletter

Mail Us

Recent Progress on CNNs for Object Detection & Image Compression Rahul Sukthankar Google

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung