Convolutional Networks Lecture slides for Chapter 9 of Deep Learning - - PowerPoint PPT Presentation

convolutional networks
SMART_READER_LITE
LIVE PREVIEW

Convolutional Networks Lecture slides for Chapter 9 of Deep Learning - - PowerPoint PPT Presentation

Convolutional Networks Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12 Convolutional Networks Scale up neural networks to process very large images / video sequences Sparse connections Parameter sharing


slide-1
SLIDE 1

Convolutional Networks

Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12

slide-2
SLIDE 2

(Goodfellow 2016)

Convolutional Networks

  • Scale up neural networks to process very large images /

video sequences

  • Sparse connections
  • Parameter sharing
  • Automatically generalize across spatial translations of inputs
  • Applicable to any input that is laid out on a grid (1-D, 2-D,

3-D, …)

slide-3
SLIDE 3

(Goodfellow 2016)

Key Idea

  • Replace matrix multiplication in neural nets with

convolution

  • Everything else stays the same
  • Maximum likelihood
  • Back-propagation
  • etc.
slide-4
SLIDE 4

(Goodfellow 2016)

Matrix (Dot) Product

C = AB. (2.4)

=

  • m

p m p n n

Must match

Ci,j = X

k

Ai,kBk,j. (2.5)

slide-5
SLIDE 5

(Goodfellow 2016)

Matrix Transpose

A = 2 4 A1,1 A1,2 A2,1 A2,2 A3,1 A3,2 3 5 ⇒ A> =  A1,1 A2,1 A3,1 A1,2 A2,2 A3,2

  • Figure 2.1: The transpose of the matrix can be thought of as a mirror image across the

main diagonal.

(A>)i,j = Aj,i. (2.3)

(AB)> = B>A>. (2.9)

slide-6
SLIDE 6

(Goodfellow 2016)

2D Convolution

a b c d e f g h i j k l w x y z aw + bx + ey + fz aw + bx + ey + fz bw + cx + fy + gz bw + cx + fy + gz cw + dx + gy + hz cw + dx + gy + hz ew + fx + iy + jz ew + fx + iy + jz fw + gx + jy + kz fw + gx + jy + kz gw + hx + ky + lz gw + hx + ky + lz Input Kernel Output

Figure 9.1

slide-7
SLIDE 7

(Goodfellow 2016)

Three Operations

  • Convolution: like matrix multiplication
  • Take an input, produce an output (hidden layer)
  • “Deconvolution”: like multiplication by transpose of a matrix
  • Used to back-propagate error from output to input
  • Reconstruction in autoencoder / RBM
  • Weight gradient computation
  • Used to backpropagate error from output to weights
  • Accounts for the parameter sharing
slide-8
SLIDE 8

(Goodfellow 2016)

Sparse Connectivity

x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5

Sparse connections due to small convolution kernel Dense connections Figure 9.2

slide-9
SLIDE 9

(Goodfellow 2016)

x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5

Sparse Connectivity

Sparse connections due to small convolution kernel Dense connections Figure 9.3

slide-10
SLIDE 10

(Goodfellow 2016)

Growing Receptive Fields

x1 x1 x2 x2 x3 x3 h2 h2 h1 h1 h3 h3 x4 x4 h4 h4 x5 x5 h5 h5 g2 g2 g1 g1 g3 g3 g4 g4 g5 g5

Figure 9.4

slide-11
SLIDE 11

(Goodfellow 2016)

Parameter Sharing

x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 x1 x1 x2 x2 x3 x3 x4 x4 x5 x5 s2 s2 s1 s1 s3 s3 s4 s4 s5 s5

Convolution shares the same parameters across all spatial locations Traditional matrix multiplication does not share any parameters Figure 9.5

slide-12
SLIDE 12

(Goodfellow 2016)

Edge Detection by Convolution

  • 1
  • 1

Input Kernel Output Figure 9.6

slide-13
SLIDE 13

(Goodfellow 2016)

Efficiency of Convolution

Input size: 320 by 280 Kernel size: 2 by 1 Output size: 319 by 280

Convolution Dense matrix Sparse matrix Stored floats 2 319*280*320*280 > 8e9 2*319*280 = 178,640 Float muls or adds 319*280*3 = 267,960 > 16e9 Same as convolution (267,960)

slide-14
SLIDE 14

(Goodfellow 2016)

Convolutional Network Components

Convolutional Layer Input to layer Convolution stage: Affine transform Detector stage: Nonlinearity e.g., rectified linear Pooling stage Next layer Input to layers Convolution layer: Affine transform Detector layer: Nonlinearity e.g., rectified linear Pooling layer Next layer Complex layer terminology Simple layer terminology

Figure 9.7

slide-15
SLIDE 15

(Goodfellow 2016)

Max Pooling and Invariance to Translation

0.1 1. 0.2 1. 1. 1. 0.1 0.2 ... ... ... ... 0.3 0.1 1. 1. 0.3 1. 0.2 1. ... ... ... ... DETECTOR STAGE POOLING STAGE POOLING STAGE DETECTOR STAGE

Figure 9.8

slide-16
SLIDE 16

(Goodfellow 2016)

Cross-Channel Pooling and Invariance to Learned Transformations

Large response in pooling unit Large response in pooling unit Large response in detector unit 1 Large response in detector unit 3

Figure 9.9

slide-17
SLIDE 17

(Goodfellow 2016)

Pooling with Downsampling

0.1 1. 0.2 1. 0.2 0.1 0.1 0.0 0.1

Figure 9.10

slide-18
SLIDE 18

(Goodfellow 2016)

Example Classification Architectures

Input image: 256x256x3 Output of convolution + ReLU: 256x256x64 Output of pooling with stride 4: 64x64x64 Output of convolution + ReLU: 64x64x64 Output of pooling with stride 4: 16x16x64 Output of reshape to vector: 16,384 units Output of matrix multiply: 1,000 units Output of softmax: 1,000 class probabilities Input image: 256x256x3 Output of convolution + ReLU: 256x256x64 Output of pooling with stride 4: 64x64x64 Output of convolution + ReLU: 64x64x64 Output of pooling to 3x3 grid: 3x3x64 Output of reshape to vector: 576 units Output of matrix multiply: 1,000 units Output of softmax: 1,000 class probabilities Input image: 256x256x3 Output of convolution + ReLU: 256x256x64 Output of pooling with stride 4: 64x64x64 Output of convolution + ReLU: 64x64x64 Output of convolution: 16x16x1,000 Output of average pooling: 1x1x1,000 Output of softmax: 1,000 class probabilities Output of pooling with stride 4: 16x16x64

Figure 9.11

slide-19
SLIDE 19

(Goodfellow 2016)

Convolution with Stride

x1 x1 x2 x2 x3 x3 s1 s1 s2 s2 x4 x4 x5 x5 s3 s3 x1 x1 x2 x2 x3 x3 z2 z2 z1 z1 z3 z3 x4 x4 z4 z4 x5 x5 z5 z5 s1 s1 s2 s2 s3 s3 Strided convolution Downsampling Convolution

Figure 9.12

slide-20
SLIDE 20

(Goodfellow 2016)

Zero Padding Controls Size

... ... ... ... ... ... ... ... ...

Figure 9.13 With zero padding Without zero padding

slide-21
SLIDE 21

(Goodfellow 2016)

Kinds of Connectivity

x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 x1 x1 x2 x2 s1 s1 s3 s3 x5 x5 s5 s5 x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 a b a b a b a b a a b c d e f g h i x4 x4 x3 x3 s4 s4 s2 s2

Figure 9.14 Local connection: like convolution, but no sharing Convolution Fully connected

slide-22
SLIDE 22

(Goodfellow 2016)

Partial Connectivity Between Channels

Input Tensor Output Tensor Spatial coordinates Channel coordinates

Figure 9.15

slide-23
SLIDE 23

(Goodfellow 2016)

Tiled convolution

x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 a b a b a b a b a a b c d e f g h i x1 x1 x2 x2 x3 x3 s2 s2 s1 s1 s3 s3 x4 x4 s4 s4 x5 x5 s5 s5 a b c d a b c d a

Figure 9.16 Local connection (no sharing) Convolution (one group shared everywhere) Tiled convolution (cycle between groups of shared parameters)

slide-24
SLIDE 24

(Goodfellow 2016)

Recurrent Pixel Labeling

ˆ Y

(1)

ˆ Y

(1)

ˆ Y

(2)

ˆ Y

(2)

ˆ Y

(3)

ˆ Y

(3)

H(1) H(1) H(2) H(2) H(3) H(3) X U U U V V V W W

Figure 9.17

slide-25
SLIDE 25

(Goodfellow 2016)

Gabor Functions

Figure 9.18

slide-26
SLIDE 26

(Goodfellow 2016)

Gabor-like Learned Kernels

Figure 9.19

slide-27
SLIDE 27

(Goodfellow 2016)

Major Architectures

  • Spatial Transducer Net: input size scales with output size, all

layers are convolutional

  • All Convolutional Net: no pooling layers, just use strided

convolution to shrink representation size

  • Inception: complicated architecture designed to achieve high

accuracy with low computational cost

  • ResNet: blocks of layers with same spatial size, with each layer’s
  • utput added to the same buffer that is repeatedly updated. Very

many updates = very deep net, but without vanishing gradient.