Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI - PowerPoint PPT Presentation

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI d2l.ai

Outline • GPUs • Convolutions • Pooling, Padding and Stride • Convolutional Neural Networks (LeNet) • Deep ConvNets (AlexNet) • Networks using Blocks (VGG) • Residual Neural Networks (ResNet) d2l.ai

d2l.ai GPUs NVIDIA Turing TU102

Intel i7-6700K • 4 Physical cores • Per core • 64KB L1 cache • 256KB L2 cache • Shared 8MB L3 cache • 30 GB/s to RAM d2l.ai

GPU performance d2l.ai

Highend Gaming / DeepLearning PC Intel i7 DDR4 0.15 TFLOPS 32 GB Nvidia Titan RTX 12 TFLOPS (130TF for FP16 TensorCores) 24 GB d2l.ai

Highend Gaming / DeepLearning PC Intel i7 DDR4 0.15 TFLOPS 32 GB ctx = npx.cpu() x.copyto(ctx) Nvidia Titan RTX 12 TFLOPS (130TF for FP16 TensorCores) 24 GB ctx = npx.gpu(0) d2l.ai

GPU Notebook d2l.ai

From fully connected to convolutions d2l.ai

Classifying Dogs and Cats in Images • Use a good camera • RGB image has 36M elements • The model size of a single hidden layer MLP with a 100 hidden size is 3.6 Billion parameters • Exceeds the population of dogs and cats on earth   (900M dogs + 600M cats) d2l.ai

Flashback - Network with one hidden layer 100 neurons 3.6B parameters = 14GB 36M features h = σ ( Wx + b ) d2l.ai

Where is Waldo? d2l.ai

Two Principles • Translation Invariance • Locality d2l.ai

Rethinking Dense Layers • Reshape inputs and output into matrix (width, height) • Reshape weights into 4-D tensors (h,w) to (h’,w’) h i , j = ∑ w i , j , k , l x k , l = ∑ v i , j , a , b x i + a , j + b k , l a , b V is re-indexes W such as that v i , j , a , b = w i , j , i + a , j + b d2l.ai

Idea #1 - Translation Invariance h i , j = ∑ v i , j , a , b x i + a , j + b a , b • A shift in x also leads to a shift in h • v should not depend on ( i,j). Fix via v i , j , a , b = v a , b h i , j = ∑ v a , b x i + a , j + b a , b That’s a 2-D convolution cross-correlation d2l.ai

Idea #2 - Locality h i , j = ∑ v a , b x i + a , j + b a , b • We shouldn’t look very far from x(i,j) in order to assess what’s going on at h(i,j) • Outside range parameters vanish v a , b = 0 | a | , | b | > Δ Δ Δ ∑ ∑ h i , j = v a , b x i + a , j + b a = −Δ b = −Δ d2l.ai

Convolution d2l.ai

2-D Cross Correlation 0 × 0 + 1 × 1 + 3 × 2 + 4 × 3 = 19, 1 × 0 + 2 × 1 + 4 × 2 + 5 × 3 = 25, 3 × 0 + 4 × 1 + 6 × 2 + 7 × 3 = 37, (vdumoulin@ Github) 4 × 0 + 5 × 1 + 7 × 2 + 8 × 3 = 43. d2l.ai

  2-D Convolution Layer • input matrix X : n h × n w • kernel matrix W : k h × k w • b: scalar bias • output matrix   Y : ( n h − k h + 1) × ( n w − k w + 1) Y = X ⋆ W + b • W and b are learnable parameters d2l.ai

Examples Edge Detection Sharpen (wikipedia) Gaussian Blur d2l.ai

Examples (Rob Fergus) d2l.ai

Convolutions Notebook d2l.ai

Padding and Stride d2l.ai

Padding • Given a 32 x 32 input image • Apply convolutional layer with 5 x 5 kernel • 28 x 28 output with 1 layer • 4 x 4 output with 7 layers • Shape decreases faster with larger kernels • Shape reduces from to n h × n w ( n h − k h + 1) × ( n w − k w + 1) d2l.ai

Padding Padding adds rows/columns around input 0 × 0 + 0 × 1 + 0 × 2 + 0 × 3 = 0 d2l.ai

Padding • Padding rows and columns, output shape will be p h p w ( n h − k h + p h + 1) × ( n w − k w + p w + 1) • A common choice is and p h = k h − 1 p w = k w − 1 • Odd : pad on both sides k h p h /2 • Even : pad on top, on bottom ⌈ p h /2 ⌉ ⌊ p h /2 ⌋ k h d2l.ai

Stride • Padding reduces shape linearly with #layers • Given a 224 x 224 input with a 5 x 5 kernel, needs 44 layers to reduce the shape to 4 x 4 • Requires a large amount of computation d2l.ai

Stride • Stride is the #rows/#columns per slide Strides of 3 and 2 for height and width 0 × 0 + 0 × 1 + 1 × 2 + 2 × 3 = 8 0 × 0 + 6 × 1 + 0 × 2 + 0 × 3 = 6 d2l.ai

Stride • Given stride for the height and stride for the width,   s h s w the output shape is ⌊ ( n h − k h + p h + s h )/ s h ⌋ × ⌊ ( n w − k w + p w + s w )/ s w ⌋ • With and p h = k h − 1 p w = k w − 1 ⌊ ( n h + s h − 1)/ s h ⌋ × ⌊ ( n w + s w − 1)/ s w ⌋ • If input height/width are divisible by strides ( n h / s h ) × ( n w / s w ) d2l.ai

d2l.ai M u l t i p l O e u I n t p p u u t t C a n h d a n n e l s

Multiple Input Channels • Color image may have three RGB channels • Converting to grayscale loses information d2l.ai

Multiple Input Channels • Have a kernel for each channel, and then sum results over channels (1 × 1 + 2 × 2 + 4 × 3 + 5 × 4) +(0 × 0 + 1 × 1 + 3 × 2 + 4 × 3) = 56 d2l.ai

Multiple Input Channels • input X : c i × n h × n w • kernel W : c i × k h × k w • output Y : m h × m w c i ∑ Y = X i ,:,: ⋆ W i ,:,: i =0 d2l.ai

Multiple Output Channels • No matter how many inputs channels, so far we always get single output channel • We can have multiple 3-D kernels, each one generates a output channel • Input X : c i × n h × n w Y i ,:,: = X ⋆ W i ,:,:,: • Kernel W : c o × c i × k h × k w for i = 1,…, c o • Output Y : c o × m h × m w d2l.ai

Multiple Input/Output Channels • Each output channel may recognize a particular pattern • Input channels kernels recognize and combines patterns in inputs d2l.ai

1 x 1 Convolutional Layer is a popular choice. It doesn’t recognize spatial k h = k w = 1 patterns, but fuse channels.   Equal to a dense layer with input and   n h n w × c i weight. c o × c i d2l.ai

2-D Convolution Layer Summary • Input X : c i × n h × n w • Kernel W : c o × c i × k h × k w Y = X ⋆ W + B • Bias B : c o × c i • Output Y : c o × m h × m w • Complexity (number of floating point operations FLOP) c i = c o = 100 O ( c i c o k h k w m h m w ) 1GFLOP k h = h w = 5 m h = m w = 64 • 10 layers, 1M examples: 10PF   (CPU: 0.15 TF = 18h, GPU: 12 TF = 14min) d2l.ai

P o o l i n g L a y e r d2l.ai

Pooling 0 output with • Convolution is sensitive to position 1 pixel shift • Detect vertical edges X Y • We need some degree of invariance to translation • Lighting, object positions, scales, appearance vary among images d2l.ai

2-D Max Pooling • Returns the maximal value in the sliding window max(0,1,3,4) = 4 d2l.ai

2-D Max Pooling • Returns the maximal value in the sliding window Vertical edge detection Conv output 2 x 2 max pooling Tolerant to 1 pixel shift d2l.ai

  Padding, Stride, and Multiple Channels • Pooling layers have similar padding and stride as convolutional layers • No learnable parameters • Apply pooling for each input channel to obtain the corresponding output channel   #output channels = #input channels d2l.ai

Average Pooling • Max pooling: the strongest pattern signal in a window • Average pooling: replace max with mean in max pooling • The average signal strength in a window Max pooling Average pooling d2l.ai

Pooling Notebook d2l.ai

LeNet d2l.ai

Handwritten Digit Recognition d2l.ai

MNIST • Centered and scaled • 50,000 training data • 10,000 test data • 28 x 28 images • 10 classes d2l.ai

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, 1998 Gradient-based learning applied to document recognition d2l.ai

Expensive if we have many outputs d2l.ai

LeNet Notebook d2l.ai

AlexNet d2l.ai

AlexNet Softmax SVM regression • AlexNet won ImageNet competition in 2012 • Deeper and bigger LeNet Manually Features learned • Key modifications engineered by a CNN features • Dropout (regularization) • ReLu (training) • MaxPooling • Paradigm shift for computer vision d2l.ai

AlexNet Architecture AlexNet LeNet Larger pool size, change to max pooling Larger kernel size, stride because of the increased image size, and more output channels. d2l.ai

AlexNet Architecture AlexNet LeNet 3 additional   convolutional layers More output channels. d2l.ai

AlexNet Architecture AlexNet LeNet 1000 classes output Increase hidden size   from 120 to 4096 d2l.ai

More Tricks • Change activation function from sigmoid to ReLu   (no more vanishing gradient) • Add a dropout layer after two hidden dense layers   (better robustness / regularization) • Data augmentation d2l.ai

Complexity #parameters FLOP AlexNet LeNet AlexNet LeNet Conv1 35K 150 101M 1.2M Conv2 614K 2.4K 415M 2.4M Conv3-5 3M 445M Dense1 26M 0.48M 26M 0.48M Dense2 16M 0.1M 16M 0.1M Total 46M 0.6M 1G 4M Increase 11x 1x 250x 1x d2l.ai

AlexNet Notebook d2l.ai

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI - PowerPoint PPT Presentation

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline GPUs Convolutions Pooling, Padding and Stride Convolutional Neural Networks (LeNet) Deep ConvNets (AlexNet) Networks using Blocks (VGG)

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Todays class

Convolutional Neural Networks II Milan Straka March 30, 2020 Charles University in Prague

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22

CNN Case Studies M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Fei

Statistical Machine Translation Clause Restructuring for

Restructuring The Care Delivery System: Will Limited Growth In Revenues Force A Change? Stuart H.

Programming for Engineers Iteration ICEN 200 Spring 2018 Prof. Dola Saha 1 Data

Hadronic EDMs from Dyson-Schwinger: Rho-Meson & Nucleon Mario Pitschmann Institute of Atomic

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI - PowerPoint PPT Presentation

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline GPUs Convolutions Pooling, Padding and Stride Convolutional Neural Networks (LeNet) Deep ConvNets (AlexNet) Networks using Blocks (VGG)

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Todays class

Convolutional Neural Networks II Milan Straka March 30, 2020 Charles University in Prague

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22

CNN Case Studies M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Fei

Statistical Machine Translation Clause Restructuring for

Restructuring The Care Delivery System: Will Limited Growth In Revenues Force A Change? Stuart H.

Programming for Engineers Iteration ICEN 200 Spring 2018 Prof. Dola Saha 1 Data

Hadronic EDMs from Dyson-Schwinger: Rho-Meson &amp; Nucleon Mario Pitschmann Institute of Atomic

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Hadronic EDMs from Dyson-Schwinger: Rho-Meson & Nucleon Mario Pitschmann Institute of Atomic