Le Lecture 9 9 - Convolu lutional l Neural l Networks I2DL: - - PowerPoint PPT Presentation

le lecture 9 9 convolu lutional l neural l networks
SMART_READER_LITE
LIVE PREVIEW

Le Lecture 9 9 - Convolu lutional l Neural l Networks I2DL: - - PowerPoint PPT Presentation

Le Lecture 9 9 - Convolu lutional l Neural l Networks I2DL: Prof. Niessner, Prof. Leal-Taix 1 Fu Fully C Connected N Neural N Network Width Width Width Depth I2DL: Prof. Niessner, Prof. Leal-Taix 2 Pr Problems usin ing FC


slide-1
SLIDE 1

Le Lecture 9 9 - Convolu lutional l Neural l Networks

I2DL: Prof. Niessner, Prof. Leal-Taixé 1

slide-2
SLIDE 2

Fu Fully C Connected N Neural N Network

Depth Width

I2DL: Prof. Niessner, Prof. Leal-Taixé 2

Width Width

slide-3
SLIDE 3

Pr Problems usin ing FC FC Lay ayers on

  • n Ima

mages

  • How to process a tiny image with FC layers

I2DL: Prof. Niessner, Prof. Leal-Taixé 3

3 neuron layer

3 5 5

5 weights

slide-4
SLIDE 4

Pr Problems usin ing FC FC Lay ayers on

  • n Image

ages

  • How to process a tiny image with FC layers

I2DL: Prof. Niessner, Prof. Leal-Taixé 4

25 weights For the whole 5×5 image on 1 channel 3 neuron layer

3 5 5

slide-5
SLIDE 5

Pr Problems usin ing FC FC La Layers on Im Images

  • How to process a tiny image with FC layers

I2DL: Prof. Niessner, Prof. Leal-Taixé 5

75 weights For the whole 5×5 image on the 3 channel 3 neuron layer

3 5 5

slide-6
SLIDE 6

Pr Problems usin ing FC FC La Layers on Im Images

  • How to process a tiny image with FC layers

I2DL: Prof. Niessner, Prof. Leal-Taixé 6

75 weights For the whole 5×5 image on the three channels pe per ne neuron

  • n

75 weights 75 weights 3 neuron layer

3 5 5

slide-7
SLIDE 7

Pr Problems usin ing FC FC La Layers on Im Images

  • How to process a no

normal al image with FC layers

I2DL: Prof. Niessner, Prof. Leal-Taixé 7

3 1000 1000

3 neuron layer

slide-8
SLIDE 8

Pr Problems usin ing FC FC La Layers on Im Images

  • How to process a no

normal al image with FC layers

I2DL: Prof. Niessner, Prof. Leal-Taixé 8

3 𝑐𝑗𝑚𝑚𝑗𝑝𝑜 weights

3 1000 1000

1000 neuron layer

IMPRAC PRACTICAL AL

slide-9
SLIDE 9

Wh Why n not s

  • t simply

ly m mor

  • re F

FC L Lay ayers?

We cannot make networks arbitrarily complex

  • Why not just go deeper and get better?

– No structure!! – It is just brute force! – Optimization becomes hard – Performance plateaus / drops!

I2DL: Prof. Niessner, Prof. Leal-Taixé 9

slide-10
SLIDE 10

Bet Better er Way Way than an FC C ?

  • We want to restrict the degrees of freedom

– We want a layer with structure – Weight sharing à using the same weights for different parts of the image

I2DL: Prof. Niessner, Prof. Leal-Taixé 10

slide-11
SLIDE 11

Usin Using C CNNs in s in C Comp mputer V Visio ision

I2DL: Prof. Niessner, Prof. Leal-Taixé 11 [Li et al., CS231n Course Slides] Lecture 12: Detection and Segmentation

slide-12
SLIDE 12

Convolu lutions

I2DL: Prof. Niessner, Prof. Leal-Taixé 12

slide-13
SLIDE 13

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

𝑔 ∗ 𝑕 = )

!" "

𝑔 𝜐 𝑕 𝑢 − 𝜐 𝑒𝜐 𝑔 = red 𝑕 = blue 𝑔 ∗ 𝑕 = green

Convolution of two box functions Convolution of two Gaussians Application of a filter to a function — The ‘smaller’ one is typically called the filter kernel

I2DL: Prof. Niessner, Prof. Leal-Taixé 13

slide-14
SLIDE 14

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 𝑔 1/3 1/3 1/3 𝑕

‘Slide’ fi filter r kern rnel from left to right; at each position, compute a single value in the output data

I2DL: Prof. Niessner, Prof. Leal-Taixé 14

Discrete case: box filter

slide-15
SLIDE 15

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 𝑔 𝑕 𝑔 ∗ 𝑕

Discrete case: box filter

4 ⋅ 1 3 + 3 ⋅ 1 3 + 2 ⋅ 1 3 = 3

I2DL: Prof. Niessner, Prof. Leal-Taixé 15

slide-16
SLIDE 16

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 𝑔 𝑕 𝑔 ∗ 𝑕

I2DL: Prof. Niessner, Prof. Leal-Taixé 16

Discrete case: box filter

3 ⋅ 1 3 + 2 ⋅ 1 3 + (−5) ⋅ 1 3 = 0

slide-17
SLIDE 17

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 𝑔 𝑕 𝑔 ∗ 𝑕

I2DL: Prof. Niessner, Prof. Leal-Taixé 17

Discrete case: box filter

2 ⋅ 1 3 + (−5) ⋅ 1 3 + 3 ⋅ 1 3 = 0

slide-18
SLIDE 18

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 1 𝑔 𝑕 𝑔 ∗ 𝑕

I2DL: Prof. Niessner, Prof. Leal-Taixé 18

Discrete case: box filter

−5 ⋅ 1 3 + 3 ⋅ 1 3 + 5 ⋅ 1 3 = 1

slide-19
SLIDE 19

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 𝑔 𝑕 𝑔 ∗ 𝑕

I2DL: Prof. Niessner, Prof. Leal-Taixé 19

Discrete case: box filter

3 ⋅ 1 3 + 5 ⋅ 1 3 + 2 ⋅ 1 3 = 10 3

slide-20
SLIDE 20

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 𝑔 𝑕 𝑔 ∗ 𝑕

I2DL: Prof. Niessner, Prof. Leal-Taixé 20

Discrete case: box filter

5 ⋅ 1 3 + 2 ⋅ 1 3 + 5 ⋅ 1 3 = 4

slide-21
SLIDE 21

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 4 𝑔 𝑕 𝑔 ∗ 𝑕

I2DL: Prof. Niessner, Prof. Leal-Taixé 21

Discrete case: box filter

2 ⋅ 1 3 + 5 ⋅ 1 3 + 5 ⋅ 1 3 = 4

slide-22
SLIDE 22

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 4 16/3 𝑔 𝑕 𝑔 ∗ 𝑕

I2DL: Prof. Niessner, Prof. Leal-Taixé 22

Discrete case: box filter

5 ⋅ 1 3 + 5 ⋅ 1 3 + 6 ⋅ 1 3 = 16 3

slide-23
SLIDE 23

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 ?? 3 1 10/3 4 4 16/3 ?? 1/3 1/3 1/3

What to do at boundaries?

I2DL: Prof. Niessner, Prof. Leal-Taixé 23

Discrete case: box filter

slide-24
SLIDE 24

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 ?? 3 1 10/3 4 4 16/3 ?? 1/3 1/3 1/3 3 1 10/3 4 4 16/3

Option 1: Shrink

I2DL: Prof. Niessner, Prof. Leal-Taixé 24

Discrete case: box filter

What to do at boundaries?

slide-25
SLIDE 25

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6 ?? 3 1 10/3 4 4 16/3 ?? 1/3 1/3 1/3

Option 2: Pa Pad (often 0’s)

7/3 3 1 10/3 4 4 16/3 11/3

I2DL: Prof. Niessner, Prof. Leal-Taixé 25

0 ⋅ 1 3 + 4 ⋅ 1 3 + 3 ⋅ 1 3 = 7 3

Discrete case: box filter

What to do at boundaries?

slide-26
SLIDE 26

Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

Image 5×5 Kernel 3×3 6 Output 3×3 5 ⋅ 3 + −1 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 0 + −1 ⋅ 4 = 15 − 9 = 6

I2DL: Prof. Niessner, Prof. Leal-Taixé 26

slide-27
SLIDE 27

Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

6 1 5 ⋅ 2 + −1 ⋅ 2 + −1 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 3 = 10 − 9 = 1

I2DL: Prof. Niessner, Prof. Leal-Taixé 27

Image 5×5 Kernel 3×3 Output 3×3

slide-28
SLIDE 28

Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

6 1 8 5 ⋅ 1 + −1 ⋅ −5 + −1 ⋅ −3 + −1 ⋅ 3 + −1 ⋅ 2 = 5 + 3 = 8

I2DL: Prof. Niessner, Prof. Leal-Taixé 28

Image 5×5 Kernel 3×3 Output 3×3

slide-29
SLIDE 29

Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

6 1 8

  • 7

5 ⋅ 0 + −1 ⋅ 3 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 3 = 0 − 7 = −7

I2DL: Prof. Niessner, Prof. Leal-Taixé 29

Image 5×5 Kernel 3×3 Output 3×3

slide-30
SLIDE 30

Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

6 1 8

  • 7

9 5 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 0 = 15 − 6 = 9

I2DL: Prof. Niessner, Prof. Leal-Taixé 30

Image 5×5 Kernel 3×3 Output 3×3

slide-31
SLIDE 31

Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

6 1 8

  • 7

9 2 5 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 5 + −1 ⋅ 4 + −1 ⋅ 3 = 15 − 13 = 2

I2DL: Prof. Niessner, Prof. Leal-Taixé 31

Image 5×5 Kernel 3×3 Output 3×3

slide-32
SLIDE 32

Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

6 1 8

  • 7

9 2

  • 5

5 ⋅ 0 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 6 + −1 ⋅ −2 = −5

I2DL: Prof. Niessner, Prof. Leal-Taixé 32

Image 5×5 Kernel 3×3 Output 3×3

slide-33
SLIDE 33

Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

6 1 8

  • 7

9 2

  • 5
  • 9

5 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 7 + −1 ⋅ 0 = 5 − 14 = −9

I2DL: Prof. Niessner, Prof. Leal-Taixé 33

Image 5×5 Kernel 3×3 Output 3×3

slide-34
SLIDE 34

Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

6 1 8

  • 7

9 2

  • 5
  • 9

3 5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1 = 20 − 17 = 3

I2DL: Prof. Niessner, Prof. Leal-Taixé 34

Image 5×5 Kernel 3×3 Output 3×3

slide-35
SLIDE 35

Im Image Filters

  • Each kernel gives us a different image filter

I2DL: Prof. Niessner, Prof. Leal-Taixé 35

Input Edge detection

−1 −1 −1 −1 8 −1 −1 −1 −1

Sharpen

−1 −1 5 −1 −1

Box mean

1 9 1 1 1 1 1 1 1 1 1

Gaussian blur

1 16 1 2 1 2 4 2 1 2 1

slide-36
SLIDE 36

Convolutions on RGB Im Images

Images have depth: e.g. RGB -> 3 channels Convolve filter with image i.e., ‘slide’ over it and: − apply filter at each location − dot products

I2DL: Prof. Niessner, Prof. Leal-Taixé 36

width height depth filter 5×5×3

3 5 5

De Depth dimension *mus ust* match; i.e., , filter extends the full depth of the in input

32 32 3

image 32×32×3

slide-37
SLIDE 37

Convolutions on RGB Im Images

𝑨# = 𝒙$𝒚# + 𝑐 5×5×3 ×1 (5×5×3)×1 1

I2DL: Prof. Niessner, Prof. Leal-Taixé 37

32×32×3 image (pixels 𝒀) 1 number at a time: equal to dot product between filter weights 𝒙 and 𝒚𝒋 − 𝑢ℎ chunk of the image. Here: 5 ⋅ 5 ⋅ 3 = 75-dim dot product + bias

32 32 3 3 5 5 𝑨#

5×5×3 filter (weights vector 𝒙)

slide-38
SLIDE 38

Convolutions on RGB Im Images

Activation map (also feature map)

Slide over all spatial locations 𝑦# and compute all output 𝑨#; w/o padding, there are 28×28 locations

I2DL: Prof. Niessner, Prof. Leal-Taixé 38

Co Convolve 1 28 28 32 32 3 3 5 5

32×32×3 image 5×5×3 filter

slide-39
SLIDE 39

Convolu lution Layer

I2DL: Prof. Niessner, Prof. Leal-Taixé 39

slide-40
SLIDE 40

Con Convol

  • lution

ion Layer er

Let’s apply a different filter with different weights!

I2DL: Prof. Niessner, Prof. Leal-Taixé 40

Co Convolve

5×5×3 filter

32 32 3 3 5 5

32×32×3 image

1 28 28 1

Activation maps

slide-41
SLIDE 41

Con Convol

  • lution

ion Layer er

I2DL: Prof. Niessner, Prof. Leal-Taixé 41

Let’s apply **five** filters, each with different weights!

32 32 3

32×32×3 image

Convolution “Layer”

Co Convolve 5 28 28

Activation maps

slide-42
SLIDE 42

Con Convol

  • lution

ion Layer er

  • A basic layer is defined by

– Filter width and height (depth is implicitly given) – Number of different filter banks (#weight sets)

  • Each filter captures a different image characteristic

I2DL: Prof. Niessner, Prof. Leal-Taixé 42

slide-43
SLIDE 43

Di Different t Filte lters

I2DL: Prof. Niessner, Prof. Leal-Taixé 43

  • Each filter captures different

image characteristics:

– Horizontal edges – Vertical edges – Circles – Squares – …

[Zeiler & Fergus, ECCV’14] Visualizing and Understanding Convolutional Networks

slide-44
SLIDE 44

Di Dimen ensi sion

  • ns

s of

  • f a

Convolu lution Layer

I2DL: Prof. Niessner, Prof. Leal-Taixé 44

slide-45
SLIDE 45

Con Convol

  • lution

ion Layer ers: Dimen imension ions

Input: Filter: Output: 7×7 3×3 5×5

I2DL: Prof. Niessner, Prof. Leal-Taixé 45

1 Image 7×7

slide-46
SLIDE 46

Con Convol

  • lution

ion Layer ers: Dimen imension ions

I2DL: Prof. Niessner, Prof. Leal-Taixé 46

2

Input: Filter: Output: 7×7 3×3 5×5

Image 7×7

slide-47
SLIDE 47

Con Convol

  • lution

ion Layer ers: Dimen imension ions

I2DL: Prof. Niessner, Prof. Leal-Taixé 47

3

Input: Filter: Output: 7×7 3×3 5×5

Image 7×7

slide-48
SLIDE 48

Con Convol

  • lution

ion Layer ers: Dimen imension ions

I2DL: Prof. Niessner, Prof. Leal-Taixé 48

4

Input: Filter: Output: 7×7 3×3 5×5

Image 7×7

slide-49
SLIDE 49

Con Convol

  • lution

ion Layer ers: Dimen imension ions

I2DL: Prof. Niessner, Prof. Leal-Taixé 49

5

Input: Filter: Output: 7×7 3×3 5×5

Image 7×7

slide-50
SLIDE 50

Con Convol

  • lution

ion Layer ers: Strid ide

Image 7×7 With a st stride of 1

Stride of 𝑇: apply filter every 𝑇-th spatial location; i.e. subsample the image

I2DL: Prof. Niessner, Prof. Leal-Taixé 50

Input: Filter: Stride: Output: 7×7 3×3 1 5×5

slide-51
SLIDE 51

Con Convol

  • lution

ion Layer ers: Strid ide

With a st stride of 2

I2DL: Prof. Niessner, Prof. Leal-Taixé 51

Input: Filter: Stride: Output: 7×7 3×3 2 3×3

Image 7×7

slide-52
SLIDE 52

Con Convol

  • lution

ion Layer ers: Strid ide

With a st stride of 2

I2DL: Prof. Niessner, Prof. Leal-Taixé 52

Image 7×7

Input: Filter: Stride: Output: 7×7 3×3 2 3×3

slide-53
SLIDE 53

Con Convol

  • lution

ion Layer ers: Strid ide

With a st stride of 2

I2DL: Prof. Niessner, Prof. Leal-Taixé 53

Image 7×7

Input: Filter: Stride: Output: 7×7 3×3 2 3×3

slide-54
SLIDE 54

Con Convol

  • lution

ion Layer ers: Strid ide

With a st stride of 3

I2DL: Prof. Niessner, Prof. Leal-Taixé 54

Image 7×7

Input: Filter: Stride: Output: 7×7 3×3 3 ? × ?

slide-55
SLIDE 55

Con Convol

  • lution

ion Layer ers: Strid ide

With a st stride of 3

I2DL: Prof. Niessner, Prof. Leal-Taixé 55

Image 7×7

Input: Filter: Stride: Output: 7×7 3×3 3 ? × ?

slide-56
SLIDE 56

Con Convol

  • lution

ion Layer ers: Strid ide

With a st stride of 3

Does not really fit (remainder left) à Illegal stride for input & filter size!

I2DL: Prof. Niessner, Prof. Leal-Taixé 56

Image 7×7

Input: Filter: Stride: Output: 7×7 3×3 3 ? × ?

slide-57
SLIDE 57

Con Convol

  • lution

ion Layer ers: Dimen imension ions

In Input ut height of 𝑶 In Input ut widt dth of 𝑶

𝑂 = 7, 𝐺 = 3, 𝑇 = 1: "#$

% + 1 = 5

𝑂 = 7, 𝐺 = 3, 𝑇 = 2: "#$

& + 1 = 3

𝑂 = 7, 𝐺 = 3, 𝑇 = 3: "#$

$ + 1 = 2. <

3 Fractions are illegal

Fi Filter w width of

  • f 𝑮

Fi Filter h heigh ght of

  • f 𝑮

I2DL: Prof. Niessner, Prof. Leal-Taixé 57

Input: Filter: Stride: Output: 𝑂×𝑂 F×𝐺 𝑇

'#( )

+ 1 ×

'#( )

+ 1

slide-58
SLIDE 58

Con Convol

  • lution

ion Layer ers: Dimen imension ions

Shrinking down so quickly (32à28à24à20) is typically not a good idea…

I2DL: Prof. Niessner, Prof. Leal-Taixé 58

32 32 3 28 28 5 24 24 8

Conv + ReLU Conv + ReLU Conv + ReLU

12 5 filters 5×5×3 8 filters 5×5×5 12 filters 5×5×8

Input Image

20

slide-59
SLIDE 59

Con Convol

  • lution

ion Layer ers: Paddin ing

I2DL: Prof. Niessner, Prof. Leal-Taixé 59

Why padding?

  • Sizes get small too quickly
  • Corner pixel is only used
  • nce
slide-60
SLIDE 60

Con Convol

  • lution

ion Layer ers: Paddin ing

Image 7×7 + zero padding

I2DL: Prof. Niessner, Prof. Leal-Taixé 60

Why padding?

  • Sizes get small too quickly
  • Corner pixel is only used
  • nce
slide-61
SLIDE 61

Input (𝑂×𝑂): Filter (𝐺×𝐺): Padding (𝑄): Stride (𝑇): Output 7×7 3×3 1 1 7×7

Con Convol

  • lution

ion Layer ers: Paddin ing

Most common is ‘zero’ padding Output Size:

'*&⋅,#( )

+ 1 ×

'*&⋅,#( )

+ 1

I2DL: Prof. Niessner, Prof. Leal-Taixé 61

denotes the floor operator (as in practice an integer division is performed) Image 7×7 + zero padding

slide-62
SLIDE 62

Con Convol

  • lution

ion Layer ers: Paddin ing

Set padding to 𝑄 = (#%

&

I2DL: Prof. Niessner, Prof. Leal-Taixé 62

Types of convolutions:

  • Va

Valid convolution: using no padding

  • Sa

Same convoluti tion:

  • utput=input size

Image 7×7 + zero padding

slide-63
SLIDE 63

Con Convol

  • lution

ion Layer ers: Dimen imension ions

Example

I2DL: Prof. Niessner, Prof. Leal-Taixé 63

Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2 Depth of 3 is implicitly given

3

Output size is: 32 + 2 ⋅ 2 − 5 1 + 1 = 32 i.e. 32×32×10

10 Gilters 5×5×3 32 32 3

Remember Output:

'*&⋅,#( )

+ 1 ×

'*&⋅,#( )

+ 1

slide-64
SLIDE 64

Output size is: 32 + 2 ⋅ 2 − 5 1 + 1 = 32 i.e. 32×32×10 Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2

Con Convol

  • lution

ion Layer ers: Dimen imension ions

Example

I2DL: Prof. Niessner, Prof. Leal-Taixé 64

Remember Output:

'*&⋅,#( )

+ 1 ×

'*&⋅,#( )

+ 1

10 Gilters 5×5×3 32 32 3

slide-65
SLIDE 65

Number of parameters (weights): Each filter has 5×5×3 + 1 = 76 params (+1 for bias)

  • > 76 ⋅ 10 = 760 parameters in layer

Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2

Con Convol

  • lution

ion Layer ers: Dimen imension ions

Example

I2DL: Prof. Niessner, Prof. Leal-Taixé 65

10 Gilters 5×5×3 32 32 3

slide-66
SLIDE 66

Ex Examp mple

  • You are given a convolutional layer with 4 filters,

kernel size 5, stride 1, and no padding that operates

  • n an RGB image.
  • Q1: What are the dimensions and the shape of its

weight tensor?

pA1: (3, 4, 5, 5) pA2: (4, 5, 5) pA3: depends on the width and height of the image

I2DL: Prof. Niessner, Prof. Leal-Taixé 66

slide-67
SLIDE 67

Ex Examp mple

  • You are given a convolutional layer with 4 filters,

kernel size 5, stride 1, and no padding that operates

  • n an RGB image.
  • Q1: What are the dimensions and the shape of its

we weight ht tens nsor?

pA1: (3, 4, 5, 5) pA2: (4, 5, 5) pA3: depends on the width and height of the image

I2DL: Prof. Niessner, Prof. Leal-Taixé 67

slide-68
SLIDE 68

Ex Examp mple

  • You are given a convolutional layer with 4 filters,

kernel size 5, stride 1, and no padding that operates

  • n an RGB image.
  • Q1: What are the dimensions and the shape of its

we weight ht tens nsor?

I2DL: Prof. Niessner, Prof. Leal-Taixé 68

Input channels (RGB = 3) Output size = 4 filters Filter size = 5×5

A1: (3, 4, 5, 5)

slide-69
SLIDE 69

Convolu lutional l Neural l Network (C (CNN)

I2DL: Prof. Niessner, Prof. Leal-Taixé 69

slide-70
SLIDE 70

CN CNN Prot

  • tot
  • type

ConvNet is concatenation of Conv Layers and activations

32 32 3 28 28 5 24 24 8

Conv + ReLU Conv + ReLU Conv + ReLU

12 5 filters 5×5×3 8 filters 5×5×5 12 filters 5×5×8

Input Image

20

I2DL: Prof. Niessner, Prof. Leal-Taixé 70

slide-71
SLIDE 71

CN CNN Lea earned ed Fil ilter ers

I2DL: Prof. Niessner, Prof. Leal-Taixé 71 [Zeiler & Fergus, ECCV’14] Visualizing and Understanding Convolutional Networks

slide-72
SLIDE 72

Pooli ling

I2DL: Prof. Niessner, Prof. Leal-Taixé 72

slide-73
SLIDE 73

Po Pooli ling g Laye yer

I2DL: Prof. Niessner, Prof. Leal-Taixé 73 [Li et al., CS231n Course Slides] Lecture 5: Convolutional Neural Networks

slide-74
SLIDE 74

Po Pooli ling g Laye yer: Max x Po Pooli ling

3 1 3 5 6 7 9 3 2 1 4 2 4 3 6 9 3 4

Single depth slice of input Max pool with 2×2 filters and stride 2 ‘Pooled’ output

I2DL: Prof. Niessner, Prof. Leal-Taixé 74

slide-75
SLIDE 75

Po Pooli ling g Laye yer

  • Conv Layer = ‘Feature Extraction’

– Computes a feature in a given region

  • Pooling Layer = ‘Feature Selection’

– Picks the strongest activation in a region

I2DL: Prof. Niessner, Prof. Leal-Taixé 75

slide-76
SLIDE 76

Po Pooli ling g Laye yer

  • Input is a volume of size 𝑋

!"×𝐼!"×𝐸!"

  • Two hyperparameters

– Spatial filter extent 𝐺 – Stride 𝑇

  • Output volume is of size 𝑋

#$%×𝐼#$%×𝐸#$%

– 𝑋

#$% = &%&'( )

+ 1 – 𝐼#$% =

*%&'( )

+ 1 – 𝐸#$% = 𝐸!"

  • Does not contain parameters; e.g. it’s fixed function

Filter count 𝐿 and padding 𝑄 make no sense here

I2DL: Prof. Niessner, Prof. Leal-Taixé 76

slide-77
SLIDE 77

Po Pooli ling g Laye yer

  • Input is a volume of size 𝑋

!"×𝐼!"×𝐸!"

  • Two hyperparameters

– Spatial filter extent 𝐺 – Stride 𝑇

  • Output volume is of size 𝑋

#$%×𝐼#$%×𝐸#$%

– 𝑋

#$% = &%&'( )

+ 1 – 𝐼#$% =

*%&'( )

+ 1 – 𝐸#$% = 𝐸!"

  • Does not contain parameters; e.g. it’s fixed function

Common settings: 𝐺 = 2, S = 2 𝐺 = 3, 𝑇 = 2

I2DL: Prof. Niessner, Prof. Leal-Taixé 77

slide-78
SLIDE 78
  • Typically used deeper in the network

Po Pooli ling g Laye yer: Average ge Po Pooli ling

3 1 3 5 6 7 9 3 2 1 4 2 4 3

2.5 6 1.75 3

Single depth slice of input Average pool with 2×2 filters and stride 2 ‘Pooled’ output

I2DL: Prof. Niessner, Prof. Leal-Taixé 78

slide-79
SLIDE 79

CN CNN Prot

  • tot
  • type

I2DL: Prof. Niessner, Prof. Leal-Taixé 79 [Li et al., CS231n Course Slides] Lecture 5: Convolutional Neural Networks

slide-80
SLIDE 80

Fin Final Fu Fully-Con Connec ected ed Layer er

  • Same as what we had in ‘ordinary’ neural networks

– Make the final decision with the extracted features from the convolutions – One or two FC layers typically

I2DL: Prof. Niessner, Prof. Leal-Taixé 80

slide-81
SLIDE 81

Con Convol

  • lution

ions vs Fully-Con Connec ected ed

  • In contrast to fully-connected layers, we want to

restrict the degrees of freedom

– FC is somewhat brute force – Convolutions are st structured

  • Sliding window to with the same filter parameters to

extract image features

– Concept of weight sharing – Extract same features independent of location

I2DL: Prof. Niessner, Prof. Leal-Taixé 81

slide-82
SLIDE 82

Receptive field ld

I2DL: Prof. Niessner, Prof. Leal-Taixé 82

slide-83
SLIDE 83

Rec Recep eptive e Fiel eld

  • Spatial extent of the connectivity of a convolutional

filter

83

5x5 input 3x3 filter 3x3 output

=

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-84
SLIDE 84

Rec Recep eptive e Fiel eld

  • Spatial extent of the connectivity of a convolutional

filter

84

5x5 input 3x3 filter 3x3 output

=

3x3 receptive field = 1 output pixel is connected to 9 input pixels

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-85
SLIDE 85

Rec Recep eptive e Fiel eld

  • Spatial extent of the connectivity of a convolutional

filter

85

7x7 input 3x3 output 3x3 receptive field = 1 output pixel is connected to 9 input pixels

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-86
SLIDE 86

Rec Recep eptive e Fiel eld

  • Spatial extent of the connectivity of a convolutional

filter

86

7x7 input 3x3 output 3x3 receptive field = 1 output pixel is connected to 9 input pixels

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-87
SLIDE 87

Rec Recep eptive e Fiel eld

  • Spatial extent of the connectivity of a convolutional

filter

87

7x7 input 3x3 output 3x3 receptive field = 1 output pixel is connected to 9 input pixels

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-88
SLIDE 88

Rec Recep eptive e Fiel eld

  • Spatial extent of the connectivity of a convolutional

filter

88

3x3 output 5x5 receptive field on the original input:

  • ne output value is connected to 25 input pixels

7x7 input

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-89
SLIDE 89

See See you

  • u ne

next t time ime!

I2DL: Prof. Niessner, Prof. Leal-Taixé 89

slide-90
SLIDE 90

Ref Refer eren ences es

  • Goodfellow et al. “Deep Learning” (2016),

– Chapter 9: Convolutional Networks

  • http://cs231n.github.io/convolutional-networks/

I2DL: Prof. Niessner, Prof. Leal-Taixé 90