Convolutional Networks
Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12
Convolutional Networks Lecture slides for Chapter 9 of Deep Learning - - PowerPoint PPT Presentation
Convolutional Networks Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12 Convolutional Networks Scale up neural networks to process very large images / video sequences Sparse connections Parameter sharing
Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12
(Goodfellow 2016)
video sequences
3-D, …)
(Goodfellow 2016)
convolution
(Goodfellow 2016)
C = AB. (2.4)
Must match
Ci,j = X
k
Ai,kBk,j. (2.5)
(Goodfellow 2016)
A = 2 4 A1,1 A1,2 A2,1 A2,2 A3,1 A3,2 3 5 ⇒ A> = A1,1 A2,1 A3,1 A1,2 A2,2 A3,2
main diagonal.
(A>)i,j = Aj,i. (2.3)
(AB)> = B>A>. (2.9)
(Goodfellow 2016)
a b c d e f g h i j k l w x y z aw + bx + ey + fz aw + bx + ey + fz bw + cx + fy + gz bw + cx + fy + gz cw + dx + gy + hz cw + dx + gy + hz ew + fx + iy + jz ew + fx + iy + jz fw + gx + jy + kz fw + gx + jy + kz gw + hx + ky + lz gw + hx + ky + lz Input Kernel Output
Figure 9.1
(Goodfellow 2016)
(Goodfellow 2016)
x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5
Sparse connections due to small convolution kernel Dense connections Figure 9.2
(Goodfellow 2016)
x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5
Sparse connections due to small convolution kernel Dense connections Figure 9.3
(Goodfellow 2016)
x1 x1 x2 x2 x3 x3 h2 h2 h1 h1 h3 h3 x4 x4 h4 h4 x5 x5 h5 h5 g2 g2 g1 g1 g3 g3 g4 g4 g5 g5
Figure 9.4
(Goodfellow 2016)
x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 x1 x1 x2 x2 x3 x3 x4 x4 x5 x5 s2 s2 s1 s1 s3 s3 s4 s4 s5 s5
Convolution shares the same parameters across all spatial locations Traditional matrix multiplication does not share any parameters Figure 9.5
(Goodfellow 2016)
Input Kernel Output Figure 9.6
(Goodfellow 2016)
Input size: 320 by 280 Kernel size: 2 by 1 Output size: 319 by 280
Convolution Dense matrix Sparse matrix Stored floats 2 319*280*320*280 > 8e9 2*319*280 = 178,640 Float muls or adds 319*280*3 = 267,960 > 16e9 Same as convolution (267,960)
(Goodfellow 2016)
Convolutional Layer Input to layer Convolution stage: Affine transform Detector stage: Nonlinearity e.g., rectified linear Pooling stage Next layer Input to layers Convolution layer: Affine transform Detector layer: Nonlinearity e.g., rectified linear Pooling layer Next layer Complex layer terminology Simple layer terminology
Figure 9.7
(Goodfellow 2016)
0.1 1. 0.2 1. 1. 1. 0.1 0.2 ... ... ... ... 0.3 0.1 1. 1. 0.3 1. 0.2 1. ... ... ... ... DETECTOR STAGE POOLING STAGE POOLING STAGE DETECTOR STAGE
Figure 9.8
(Goodfellow 2016)
Large response in pooling unit Large response in pooling unit Large response in detector unit 1 Large response in detector unit 3
Figure 9.9
(Goodfellow 2016)
0.1 1. 0.2 1. 0.2 0.1 0.1 0.0 0.1
Figure 9.10
(Goodfellow 2016)
Input image: 256x256x3 Output of convolution + ReLU: 256x256x64 Output of pooling with stride 4: 64x64x64 Output of convolution + ReLU: 64x64x64 Output of pooling with stride 4: 16x16x64 Output of reshape to vector: 16,384 units Output of matrix multiply: 1,000 units Output of softmax: 1,000 class probabilities Input image: 256x256x3 Output of convolution + ReLU: 256x256x64 Output of pooling with stride 4: 64x64x64 Output of convolution + ReLU: 64x64x64 Output of pooling to 3x3 grid: 3x3x64 Output of reshape to vector: 576 units Output of matrix multiply: 1,000 units Output of softmax: 1,000 class probabilities Input image: 256x256x3 Output of convolution + ReLU: 256x256x64 Output of pooling with stride 4: 64x64x64 Output of convolution + ReLU: 64x64x64 Output of convolution: 16x16x1,000 Output of average pooling: 1x1x1,000 Output of softmax: 1,000 class probabilities Output of pooling with stride 4: 16x16x64
Figure 9.11
(Goodfellow 2016)
x1 x1 x2 x2 x3 x3 s1 s1 s2 s2 x4 x4 x5 x5 s3 s3 x1 x1 x2 x2 x3 x3 z2 z2 z1 z1 z3 z3 x4 x4 z4 z4 x5 x5 z5 z5 s1 s1 s2 s2 s3 s3 Strided convolution Downsampling Convolution
Figure 9.12
(Goodfellow 2016)
... ... ... ... ... ... ... ... ...
Figure 9.13 With zero padding Without zero padding
(Goodfellow 2016)
x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 x1 x1 x2 x2 s1 s1 s3 s3 x5 x5 s5 s5 x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 a b a b a b a b a a b c d e f g h i x4 x4 x3 x3 s4 s4 s2 s2
Figure 9.14 Local connection: like convolution, but no sharing Convolution Fully connected
(Goodfellow 2016)
Input Tensor Output Tensor Spatial coordinates Channel coordinates
Figure 9.15
(Goodfellow 2016)
x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 a b a b a b a b a a b c d e f g h i x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 a b c d a b c d a
Figure 9.16 Local connection (no sharing) Convolution (one group shared everywhere) Tiled convolution (cycle between groups of shared parameters)
(Goodfellow 2016)
ˆ Y
(1)
ˆ Y
(1)
ˆ Y
(2)
ˆ Y
(2)
ˆ Y
(3)
ˆ Y
(3)
H(1) H(1) H(2) H(2) H(3) H(3) X U U U V V V W W
Figure 9.17
(Goodfellow 2016)
Figure 9.18
(Goodfellow 2016)
Figure 9.19
(Goodfellow 2016)
layers are convolutional
convolution to shrink representation size
accuracy with low computational cost
many updates = very deep net, but without vanishing gradient.