convolutional neural networks
play

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI - PowerPoint PPT Presentation

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline GPUs Convolutions Pooling, Padding and Stride Convolutional Neural Networks (LeNet) Deep ConvNets (AlexNet) Networks using Blocks (VGG)


  1. Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI d2l.ai

  2. Outline • GPUs • Convolutions • Pooling, Padding and Stride • Convolutional Neural Networks (LeNet) • Deep ConvNets (AlexNet) • Networks using Blocks (VGG) • Residual Neural Networks (ResNet) d2l.ai

  3. d2l.ai GPUs NVIDIA Turing TU102

  4. Intel i7-6700K • 4 Physical cores • Per core • 64KB L1 cache • 256KB L2 cache • Shared 8MB L3 cache • 30 GB/s to RAM d2l.ai

  5. GPU performance d2l.ai

  6. Highend Gaming / DeepLearning PC Intel i7 DDR4 0.15 TFLOPS 32 GB Nvidia Titan RTX 12 TFLOPS (130TF for FP16 TensorCores) 24 GB d2l.ai

  7. Highend Gaming / DeepLearning PC Intel i7 DDR4 0.15 TFLOPS 32 GB ctx = npx.cpu() x.copyto(ctx) Nvidia Titan RTX 12 TFLOPS (130TF for FP16 TensorCores) 24 GB ctx = npx.gpu(0) d2l.ai

  8. GPU Notebook d2l.ai

  9. From fully connected to convolutions d2l.ai

  10. Classifying Dogs and Cats in Images • Use a good camera • RGB image has 36M elements • The model size of a single hidden layer MLP with a 100 hidden size is 3.6 Billion parameters • Exceeds the population of dogs and cats on earth 
 (900M dogs + 600M cats) d2l.ai

  11. Flashback - Network with one hidden layer 100 neurons 3.6B parameters = 14GB 36M features h = σ ( Wx + b ) d2l.ai

  12. Where is Waldo? d2l.ai

  13. Two Principles • Translation Invariance • Locality d2l.ai

  14. Rethinking Dense Layers • Reshape inputs and output into matrix (width, height) • Reshape weights into 4-D tensors (h,w) to (h’,w’) h i , j = ∑ w i , j , k , l x k , l = ∑ v i , j , a , b x i + a , j + b k , l a , b V is re-indexes W such as that v i , j , a , b = w i , j , i + a , j + b d2l.ai

  15. Idea #1 - Translation Invariance h i , j = ∑ v i , j , a , b x i + a , j + b a , b • A shift in x also leads to a shift in h • v should not depend on ( i,j). Fix via v i , j , a , b = v a , b h i , j = ∑ v a , b x i + a , j + b a , b That’s a 2-D convolution cross-correlation d2l.ai

  16. Idea #2 - Locality h i , j = ∑ v a , b x i + a , j + b a , b • We shouldn’t look very far from x(i,j) in order to assess what’s going on at h(i,j) • Outside range parameters vanish v a , b = 0 | a | , | b | > Δ Δ Δ ∑ ∑ h i , j = v a , b x i + a , j + b a = −Δ b = −Δ d2l.ai

  17. Convolution d2l.ai

  18. 2-D Cross Correlation 0 × 0 + 1 × 1 + 3 × 2 + 4 × 3 = 19, 1 × 0 + 2 × 1 + 4 × 2 + 5 × 3 = 25, 3 × 0 + 4 × 1 + 6 × 2 + 7 × 3 = 37, (vdumoulin@ Github) 4 × 0 + 5 × 1 + 7 × 2 + 8 × 3 = 43. d2l.ai

  19. 
 2-D Convolution Layer • input matrix X : n h × n w • kernel matrix W : k h × k w • b: scalar bias • output matrix 
 Y : ( n h − k h + 1) × ( n w − k w + 1) Y = X ⋆ W + b • W and b are learnable parameters d2l.ai

  20. Examples Edge Detection Sharpen (wikipedia) Gaussian Blur d2l.ai

  21. Examples (Rob Fergus) d2l.ai

  22. Convolutions Notebook d2l.ai

  23. Padding and Stride d2l.ai

  24. Padding • Given a 32 x 32 input image • Apply convolutional layer with 5 x 5 kernel • 28 x 28 output with 1 layer • 4 x 4 output with 7 layers • Shape decreases faster with larger kernels • Shape reduces from to n h × n w ( n h − k h + 1) × ( n w − k w + 1) d2l.ai

  25. Padding Padding adds rows/columns around input 0 × 0 + 0 × 1 + 0 × 2 + 0 × 3 = 0 d2l.ai

  26. Padding • Padding rows and columns, output shape will be p h p w ( n h − k h + p h + 1) × ( n w − k w + p w + 1) • A common choice is and p h = k h − 1 p w = k w − 1 • Odd : pad on both sides k h p h /2 • Even : pad on top, on bottom ⌈ p h /2 ⌉ ⌊ p h /2 ⌋ k h d2l.ai

  27. Stride • Padding reduces shape linearly with #layers • Given a 224 x 224 input with a 5 x 5 kernel, needs 44 layers to reduce the shape to 4 x 4 • Requires a large amount of computation d2l.ai

  28. Stride • Stride is the #rows/#columns per slide Strides of 3 and 2 for height and width 0 × 0 + 0 × 1 + 1 × 2 + 2 × 3 = 8 0 × 0 + 6 × 1 + 0 × 2 + 0 × 3 = 6 d2l.ai

  29. Stride • Given stride for the height and stride for the width, 
 s h s w the output shape is ⌊ ( n h − k h + p h + s h )/ s h ⌋ × ⌊ ( n w − k w + p w + s w )/ s w ⌋ • With and p h = k h − 1 p w = k w − 1 ⌊ ( n h + s h − 1)/ s h ⌋ × ⌊ ( n w + s w − 1)/ s w ⌋ • If input height/width are divisible by strides ( n h / s h ) × ( n w / s w ) d2l.ai

  30. d2l.ai M u l t i p l O e u I n t p p u u t t C a n h d a n n e l s

  31. Multiple Input Channels • Color image may have three RGB channels • Converting to grayscale loses information d2l.ai

  32. Multiple Input Channels • Color image may have three RGB channels • Converting to grayscale loses information d2l.ai

  33. Multiple Input Channels • Have a kernel for each channel, and then sum results over channels (1 × 1 + 2 × 2 + 4 × 3 + 5 × 4) +(0 × 0 + 1 × 1 + 3 × 2 + 4 × 3) = 56 d2l.ai

  34. Multiple Input Channels • input X : c i × n h × n w • kernel W : c i × k h × k w • output Y : m h × m w c i ∑ Y = X i ,:,: ⋆ W i ,:,: i =0 d2l.ai

  35. Multiple Output Channels • No matter how many inputs channels, so far we always get single output channel • We can have multiple 3-D kernels, each one generates a output channel • Input X : c i × n h × n w Y i ,:,: = X ⋆ W i ,:,:,: • Kernel W : c o × c i × k h × k w for i = 1,…, c o • Output Y : c o × m h × m w d2l.ai

  36. Multiple Input/Output Channels • Each output channel may recognize a particular pattern • Input channels kernels recognize and combines patterns in inputs d2l.ai

  37. 1 x 1 Convolutional Layer is a popular choice. It doesn’t recognize spatial k h = k w = 1 patterns, but fuse channels. 
 Equal to a dense layer with input and 
 n h n w × c i weight. c o × c i d2l.ai

  38. 2-D Convolution Layer Summary • Input X : c i × n h × n w • Kernel W : c o × c i × k h × k w Y = X ⋆ W + B • Bias B : c o × c i • Output Y : c o × m h × m w • Complexity (number of floating point operations FLOP) c i = c o = 100 O ( c i c o k h k w m h m w ) 1GFLOP k h = h w = 5 m h = m w = 64 • 10 layers, 1M examples: 10PF 
 (CPU: 0.15 TF = 18h, GPU: 12 TF = 14min) d2l.ai

  39. P o o l i n g L a y e r d2l.ai

  40. Pooling 0 output with • Convolution is sensitive to position 1 pixel shift • Detect vertical edges X Y • We need some degree of invariance to translation • Lighting, object positions, scales, appearance vary among images d2l.ai

  41. 2-D Max Pooling • Returns the maximal value in the sliding window max(0,1,3,4) = 4 d2l.ai

  42. 2-D Max Pooling • Returns the maximal value in the sliding window Vertical edge detection Conv output 2 x 2 max pooling Tolerant to 1 pixel shift d2l.ai

  43. 
 Padding, Stride, and Multiple Channels • Pooling layers have similar padding and stride as convolutional layers • No learnable parameters • Apply pooling for each input channel to obtain the corresponding output channel 
 #output channels = #input channels d2l.ai

  44. Average Pooling • Max pooling: the strongest pattern signal in a window • Average pooling: replace max with mean in max pooling • The average signal strength in a window Max pooling Average pooling d2l.ai

  45. Pooling Notebook d2l.ai

  46. LeNet d2l.ai

  47. Handwritten Digit Recognition d2l.ai

  48. MNIST • Centered and scaled • 50,000 training data • 10,000 test data • 28 x 28 images • 10 classes d2l.ai

  49. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, 1998 Gradient-based learning applied to document recognition d2l.ai

  50. Expensive if we have many outputs d2l.ai

  51. LeNet Notebook d2l.ai

  52. AlexNet d2l.ai

  53. AlexNet Softmax SVM regression • AlexNet won ImageNet competition in 2012 • Deeper and bigger LeNet Manually Features learned • Key modifications engineered by a CNN features • Dropout (regularization) • ReLu (training) • MaxPooling • Paradigm shift for computer vision d2l.ai

  54. AlexNet Architecture AlexNet LeNet Larger pool size, change to max pooling Larger kernel size, stride because of the increased image size, and more output channels. d2l.ai

  55. AlexNet Architecture AlexNet LeNet 3 additional 
 convolutional layers More output channels. d2l.ai

  56. AlexNet Architecture AlexNet LeNet 1000 classes output Increase hidden size 
 from 120 to 4096 d2l.ai

  57. More Tricks • Change activation function from sigmoid to ReLu 
 (no more vanishing gradient) • Add a dropout layer after two hidden dense layers 
 (better robustness / regularization) • Data augmentation d2l.ai

  58. Complexity #parameters FLOP AlexNet LeNet AlexNet LeNet Conv1 35K 150 101M 1.2M Conv2 614K 2.4K 415M 2.4M Conv3-5 3M 445M Dense1 26M 0.48M 26M 0.48M Dense2 16M 0.1M 16M 0.1M Total 46M 0.6M 1G 4M Increase 11x 1x 250x 1x d2l.ai

  59. AlexNet Notebook d2l.ai

Recommend


More recommend