CS 4803 / 7643: Deep Learning Topics: Forward and backward though - - PowerPoint PPT Presentation

cs 4803 7643 deep learning
SMART_READER_LITE
LIVE PREVIEW

CS 4803 / 7643: Deep Learning Topics: Forward and backward though - - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of convolutional neural network (CNN) architectures Zsolt Kira Georgia Tech Administrative PS1/HW1 Due Feb 11 th ! (C) Dhruv Batra & Zsolt


slide-1
SLIDE 1

CS 4803 / 7643: Deep Learning

Zsolt Kira Georgia Tech

Topics:

– Forward and backward though conv – (Beginning) of convolutional neural network (CNN) architectures

slide-2
SLIDE 2

Administrative

  • PS1/HW1 Due Feb 11th!

(C) Dhruv Batra & Zsolt Kira 2

slide-3
SLIDE 3

(C) Dhruv Batra 3

Example: Reverse mode AD

+ sin( ) x1 x2 *

slide-4
SLIDE 4

Duality in Fprop and Bprop

(C) Dhruv Batra and Zsolt Kira 4

+ + FPROP BPROP

SUM COPY

slide-5
SLIDE 5

Convolutions for programmers

  • Iterate over the kernel instead of the image

5

y

  • Later - implementation as matrix multiplication
  • Implement cross-correlation instead of convolution

(C) Peter Anderson

slide-6
SLIDE 6

Discrete convolution

  • Discrete Convolution!
  • Very similar to correlation but associative

2D Convolution 1D Convolution Filter

slide-7
SLIDE 7

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 7

slide-8
SLIDE 8

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 8

slide-9
SLIDE 9

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 9

slide-10
SLIDE 10

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 10

slide-11
SLIDE 11

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 11

slide-12
SLIDE 12

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 12

slide-13
SLIDE 13

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 13

slide-14
SLIDE 14

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 14

slide-15
SLIDE 15

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 15

slide-16
SLIDE 16

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 16

slide-17
SLIDE 17

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 17

slide-18
SLIDE 18

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 18

slide-19
SLIDE 19

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 19

slide-20
SLIDE 20

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 20

slide-21
SLIDE 21

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 21

slide-22
SLIDE 22

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 22

slide-23
SLIDE 23

32 32 3

Convolution Layer

5x5x3 filter 32x32x3 image

Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” Filters always extend the full depth of the input volume

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-24
SLIDE 24

7x7 input (spatially) assume 3x3 filter 7 7

A closer look at spatial dimensions:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-25
SLIDE 25

7x7 input (spatially) assume 3x3 filter 7 7

A closer look at spatial dimensions:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-26
SLIDE 26

7x7 input (spatially) assume 3x3 filter 7 7

A closer look at spatial dimensions:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-27
SLIDE 27

7x7 input (spatially) assume 3x3 filter 7 7

A closer look at spatial dimensions:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-28
SLIDE 28

7x7 input (spatially) assume 3x3 filter => 5x5 output 7 7

A closer look at spatial dimensions:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-29
SLIDE 29

7x7 input (spatially) assume 3x3 filter applied with stride 2 7 7

A closer look at spatial dimensions:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-30
SLIDE 30

7x7 input (spatially) assume 3x3 filter applied with stride 2 7 7

A closer look at spatial dimensions:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-31
SLIDE 31

7x7 input (spatially) assume 3x3 filter applied with stride 2 => 3x3 output! 7 7

A closer look at spatial dimensions:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-32
SLIDE 32

7x7 input (spatially) assume 3x3 filter applied with stride 3? 7 7

A closer look at spatial dimensions:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-33
SLIDE 33

7x7 input (spatially) assume 3x3 filter applied with stride 3? 7 7

A closer look at spatial dimensions:

doesn’t fit! cannot apply 3x3 filter on 7x7 input with stride 3.

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-34
SLIDE 34

N N F F Output size: (N - F) / stride + 1 e.g. N = 7, F = 3: stride 1 => (7 - 3)/1 + 1 = 5 stride 2 => (7 - 3)/2 + 1 = 3 stride 3 => (7 - 3)/3 + 1 = 2.33 :\

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-35
SLIDE 35

In practice: Common to zero pad the border

0 0 0 0 0 0

e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border => what is the output?

(recall:) (N - F) / stride + 1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-36
SLIDE 36

In practice: Common to zero pad the border

e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border => what is the output? 7x7 output!

0 0 0 0 0 0

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-37
SLIDE 37

In practice: Common to zero pad the border

e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border => what is the output? 7x7 output! in general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially) e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3

0 0 0 0 0 0

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-38
SLIDE 38

Learn multiple filters.

E.g.: 200x200 image 100 Filters Filter size: 10x10 10K parameters

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra & Zsolt Kira 38

slide-39
SLIDE 39

32 32 3 Convolution Layer activation maps 6 28 28

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: We stack these up to get a “new image” of size 28x28x6!

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-40
SLIDE 40

General Matrix Multiply (GEMM)

(C) Dhruv Batra & Zsolt Kira 40

Figure Credit: https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/

slide-41
SLIDE 41

Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Output volume size: ?

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-42
SLIDE 42

Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Output volume size: (32+2*2-5)/1+1 = 32 spatially, so 32x32x10

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-43
SLIDE 43

Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Number of parameters in this layer?

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-44
SLIDE 44

Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Number of parameters in this layer? each filter has 5*5*3 + 1 = 76 params (+1 for bias) => 76*10 = 760

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-45
SLIDE 45

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-46
SLIDE 46

Common settings: K = (powers of 2, e.g. 32, 64, 128, 512)

  • F = 3, S = 1, P = 1
  • F = 5, S = 1, P = 2
  • F = 5, S = 2, P = ? (whatever fits)
  • F = 1, S = 1, P = 0

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-47
SLIDE 47

Example: CONV layer in Torch

Torch is licensed under BSD 3-clause.

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-48
SLIDE 48

Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions 32 32 3 28 28 6 CONV, ReLU e.g. 6 5x5x3 filters

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-49
SLIDE 49

Backprop through Conv

(C) Dhruv Batra 49 Image Credit: Yann LeCun, Kevin Murphy

slide-50
SLIDE 50

Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 32 32 3 CONV, ReLU e.g. 6 5x5x3 filters 28 28 6 CONV, ReLU e.g. 10 5x5x6 filters CONV, ReLU

….

10 24 24

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-51
SLIDE 51

Convolutional Neural Networks

(C) Dhruv Batra 51 Image Credit: Yann LeCun, Kevin Murphy

slide-52
SLIDE 52

The architecture of LeNet5

slide-53
SLIDE 53

Handwriting Recognition Example

slide-54
SLIDE 54

Translation Invariance

slide-55
SLIDE 55

Some Rotation Invariance

slide-56
SLIDE 56

Some Scale Invariance

slide-57
SLIDE 57

Case Studies

  • There are several generations of ConvNets

– 2012 – 2014: AlexNet, ZNet, VGGNet

  • Conv-Relu, Pooling, Fully connected, Softmax
  • Deeper ones (VGGNet) tend to do better

– 2014

  • Fully-convolutional networks for semantic segmentation
  • Matrix outputs rather than just one probability distribution

– 2014-2016

  • Fully-convolutional networks for classification
  • Less parameters, faster than comparable Gen1 networks
  • GoogleNet, ResNet

– 2014-2016

  • Detection layers (proposals)
  • Caption generation (combine with RNNs for language)