Convolutional Networks Lecture slides for Chapter 9 of Deep Learning - PowerPoint PPT Presentation

Convolutional Networks Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12 Convolutional Networks Scale up neural networks to process very large images / video sequences Sparse connections Parameter sharing

1. Convolutional Networks Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12

2. Convolutional Networks • Scale up neural networks to process very large images / video sequences • Sparse connections • Parameter sharing • Automatically generalize across spatial translations of inputs • Applicable to any input that is laid out on a grid (1-D, 2-D, 3-D, …) (Goodfellow 2016)

3. Key Idea • Replace matrix multiplication in neural nets with convolution • Everything else stays the same • Maximum likelihood • Back-propagation • etc. (Goodfellow 2016)

4. Matrix (Dot) Product C = AB . (2.4) X C i,j = (2.5) A i,k B k,j . k = • m m n p p n Must match (Goodfellow 2016)

5. Matrix Transpose ( A > ) i,j = A j,i . (2.3)  A 1 , 1 2 3 A 1 , 1 A 1 , 2 � A 2 , 1 A 3 , 1 5 ⇒ A > = A = A 2 , 1 A 2 , 2 4 A 1 , 2 A 2 , 2 A 3 , 2 A 3 , 1 A 3 , 2 Figure 2.1: The transpose of the matrix can be thought of as a mirror image across the main diagonal. ( AB ) > = B > A > . (2.9) (Goodfellow 2016)

6. 2D Convolution Input Kernel a b c d w x e f g h y z i j k l Output aw aw bx bx bw bw cx cx cw cw dx dx + + + + + + + + + + + + ey ey fz fz fy fy gz gz gy gy hz hz + + + + + + ew ew fx fx fw fw gx gx gw gw hx hx + + + + + + + + + + + + iy iy jz jz jy jy kz kz ky ky lz lz + + + + + + Figure 9.1 (Goodfellow 2016)

7. Three Operations • Convolution: like matrix multiplication • Take an input, produce an output (hidden layer) • “Deconvolution”: like multiplication by transpose of a matrix • Used to back-propagate error from output to input • Reconstruction in autoencoder / RBM • Weight gradient computation • Used to backpropagate error from output to weights • Accounts for the parameter sharing (Goodfellow 2016)

8. Sparse Connectivity Sparse s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 connections due to small convolution x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 kernel s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 Dense connections x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 Figure 9.2 (Goodfellow 2016)

9. Sparse Connectivity Sparse s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 connections due to small convolution x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 kernel s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 Dense connections x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 Figure 9.3 (Goodfellow 2016)

10. Growing Receptive Fields g 1 g 1 g 2 g 2 g 3 g 3 g 4 g 4 g 5 g 5 h 1 h 1 h 2 h 2 h 3 h 3 h 4 h 4 h 5 h 5 x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 Figure 9.4 (Goodfellow 2016)

11. Parameter Sharing Convolution s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 shares the same parameters across all spatial x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 locations Traditional s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 matrix multiplication does not share x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 any parameters Figure 9.5 (Goodfellow 2016)

12. Edge Detection by Convolution Input Output -1 -1 Kernel Figure 9.6 (Goodfellow 2016)

13. E ffi ciency of Convolution Input size: 320 by 280 Kernel size: 2 by 1 Output size: 319 by 280 Convolution Dense matrix Sparse matrix 319*280*320*280 2*319*280 = Stored floats 2 > 8e9 178,640 Same as Float muls or 319*280*3 = > 16e9 convolution adds 267,960 (267,960) (Goodfellow 2016)

14. Convolutional Network Components Complex layer terminology Simple layer terminology Next layer Next layer Convolutional Layer Pooling stage Pooling layer Detector stage: Detector layer: Nonlinearity Nonlinearity e.g., rectified linear e.g., rectified linear Convolution stage: Convolution layer: A ffi ne transform A ffi ne transform Input to layer Input to layers Figure 9.7 (Goodfellow 2016)

15. Max Pooling and Invariance to Translation POOLING STAGE 1. 1. 1. 0.2 ... ... 0.1 1. 0.2 0.1 ... ... DETECTOR STAGE POOLING STAGE 0.3 1. 1. 1. ... ... 0.3 0.1 1. 0.2 ... ... DETECTOR STAGE Figure 9.8 (Goodfellow 2016)

16. Cross-Channel Pooling and Invariance to Learned Transformations Large response Large response in pooling unit in pooling unit Large Large response response in detector in detector unit 1 unit 3 Figure 9.9 (Goodfellow 2016)

17. Pooling with Downsampling 1. 0.2 0.1 0.1 1. 0.2 0.1 0.0 0.1 Figure 9.10 (Goodfellow 2016)

18. Example Classification Architectures Output of softmax: Output of softmax: Output of softmax: 1,000 class 1,000 class 1,000 class probabilities probabilities probabilities Output of matrix Output of matrix Output of average multiply: 1,000 units multiply: 1,000 units pooling: 1x1x1,000 Output of reshape to Output of reshape to Output of vector: vector: convolution: 16,384 units 576 units 16x16x1,000 Output of pooling Output of pooling Output of pooling to with stride 4: with stride 4: 3x3 grid: 3x3x64 16x16x64 16x16x64 Output of Output of Output of convolution + convolution + convolution + ReLU: 64x64x64 ReLU: 64x64x64 ReLU: 64x64x64 Output of pooling Output of pooling Output of pooling with stride 4: with stride 4: with stride 4: 64x64x64 64x64x64 64x64x64 Output of Output of Output of Figure 9.11 convolution + convolution + convolution + ReLU: 256x256x64 ReLU: 256x256x64 ReLU: 256x256x64 Input image: Input image: Input image: 256x256x3 256x256x3 256x256x3 (Goodfellow 2016)

19. Convolution with Stride s 1 s 1 s 2 s 2 s 3 s 3 Strided convolution x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 s 1 s 1 s 2 s 2 s 3 s 3 Downsampling z 1 z 1 z 2 z 2 z 3 z 3 z 4 z 4 z 5 z 5 Convolution Figure 9.12 x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 (Goodfellow 2016)

20. Zero Padding Controls Size Without zero padding ... ... ... ... ... With zero padding ... ... Figure 9.13 ... ... (Goodfellow 2016)

21. Kinds of Connectivity Local connection: s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 like convolution, a b c d e f g h i but no sharing x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 Convolution a b a b a b a b a x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 Fully connected x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 Figure 9.14 (Goodfellow 2016)

22. Partial Connectivity Between Channels Output Tensor Input Tensor Channel coordinates Figure 9.15 Spatial coordinates (Goodfellow 2016)

23. Tiled convolution s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 Local connection a b c d e f g h i (no sharing) x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 Tiled convolution s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 (cycle between a b c d a b c d a groups of shared x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 parameters) Convolution s 1 s 1 s 2 s 2 s 3 s 3 s 4 s 4 s 5 s 5 (one group shared a b a b a b a b a everywhere) x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 x 5 x 5 Figure 9.16 (Goodfellow 2016)

24. Recurrent Pixel Labeling (1) (1) (2) (2) (3) (3) ˆ ˆ ˆ ˆ ˆ ˆ Y Y Y Y Y Y V W V W V H (1) H (1) H (2) H (2) H (3) H (3) U U U X Figure 9.17 (Goodfellow 2016)

25. Gabor Functions Figure 9.18 (Goodfellow 2016)

26. Gabor-like Learned Kernels Figure 9.19 (Goodfellow 2016)

27. Major Architectures • Spatial Transducer Net: input size scales with output size, all layers are convolutional • All Convolutional Net: no pooling layers, just use strided convolution to shrink representation size • Inception: complicated architecture designed to achieve high accuracy with low computational cost • ResNet: blocks of layers with same spatial size, with each layer’s output added to the same bu ff er that is repeatedly updated. Very many updates = very deep net, but without vanishing gradient. (Goodfellow 2016)

Recommend

More recommend