Le Lecture 9 9 - Convolu lutional l Neural l Networks
I2DL: Prof. Niessner, Prof. Leal-Taixé 1
Le Lecture 9 9 - Convolu lutional l Neural l Networks I2DL: - - PowerPoint PPT Presentation
Le Lecture 9 9 - Convolu lutional l Neural l Networks I2DL: Prof. Niessner, Prof. Leal-Taix 1 Fu Fully C Connected N Neural N Network Width Width Width Depth I2DL: Prof. Niessner, Prof. Leal-Taix 2 Pr Problems usin ing FC
I2DL: Prof. Niessner, Prof. Leal-Taixé 1
Depth Width
I2DL: Prof. Niessner, Prof. Leal-Taixé 2
Width Width
I2DL: Prof. Niessner, Prof. Leal-Taixé 3
3 neuron layer
3 5 5
5 weights
I2DL: Prof. Niessner, Prof. Leal-Taixé 4
25 weights For the whole 5×5 image on 1 channel 3 neuron layer
3 5 5
I2DL: Prof. Niessner, Prof. Leal-Taixé 5
75 weights For the whole 5×5 image on the 3 channel 3 neuron layer
3 5 5
I2DL: Prof. Niessner, Prof. Leal-Taixé 6
75 weights For the whole 5×5 image on the three channels pe per ne neuron
75 weights 75 weights 3 neuron layer
3 5 5
normal al image with FC layers
I2DL: Prof. Niessner, Prof. Leal-Taixé 7
3 1000 1000
3 neuron layer
normal al image with FC layers
I2DL: Prof. Niessner, Prof. Leal-Taixé 8
3 𝑐𝑗𝑚𝑚𝑗𝑝𝑜 weights
3 1000 1000
1000 neuron layer
IMPRAC PRACTICAL AL
We cannot make networks arbitrarily complex
– No structure!! – It is just brute force! – Optimization becomes hard – Performance plateaus / drops!
I2DL: Prof. Niessner, Prof. Leal-Taixé 9
– We want a layer with structure – Weight sharing à using the same weights for different parts of the image
I2DL: Prof. Niessner, Prof. Leal-Taixé 10
I2DL: Prof. Niessner, Prof. Leal-Taixé 11 [Li et al., CS231n Course Slides] Lecture 12: Detection and Segmentation
I2DL: Prof. Niessner, Prof. Leal-Taixé 12
𝑔 ∗ = )
!" "
𝑔 𝜐 𝑢 − 𝜐 𝑒𝜐 𝑔 = red = blue 𝑔 ∗ = green
Convolution of two box functions Convolution of two Gaussians Application of a filter to a function — The ‘smaller’ one is typically called the filter kernel
I2DL: Prof. Niessner, Prof. Leal-Taixé 13
4 3 2
3 5 2 5 5 6 𝑔 1/3 1/3 1/3
‘Slide’ fi filter r kern rnel from left to right; at each position, compute a single value in the output data
I2DL: Prof. Niessner, Prof. Leal-Taixé 14
Discrete case: box filter
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 𝑔 𝑔 ∗
Discrete case: box filter
4 ⋅ 1 3 + 3 ⋅ 1 3 + 2 ⋅ 1 3 = 3
I2DL: Prof. Niessner, Prof. Leal-Taixé 15
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 𝑔 𝑔 ∗
I2DL: Prof. Niessner, Prof. Leal-Taixé 16
Discrete case: box filter
3 ⋅ 1 3 + 2 ⋅ 1 3 + (−5) ⋅ 1 3 = 0
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 𝑔 𝑔 ∗
I2DL: Prof. Niessner, Prof. Leal-Taixé 17
Discrete case: box filter
2 ⋅ 1 3 + (−5) ⋅ 1 3 + 3 ⋅ 1 3 = 0
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 1 𝑔 𝑔 ∗
I2DL: Prof. Niessner, Prof. Leal-Taixé 18
Discrete case: box filter
−5 ⋅ 1 3 + 3 ⋅ 1 3 + 5 ⋅ 1 3 = 1
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 𝑔 𝑔 ∗
I2DL: Prof. Niessner, Prof. Leal-Taixé 19
Discrete case: box filter
3 ⋅ 1 3 + 5 ⋅ 1 3 + 2 ⋅ 1 3 = 10 3
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 𝑔 𝑔 ∗
I2DL: Prof. Niessner, Prof. Leal-Taixé 20
Discrete case: box filter
5 ⋅ 1 3 + 2 ⋅ 1 3 + 5 ⋅ 1 3 = 4
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 4 𝑔 𝑔 ∗
I2DL: Prof. Niessner, Prof. Leal-Taixé 21
Discrete case: box filter
2 ⋅ 1 3 + 5 ⋅ 1 3 + 5 ⋅ 1 3 = 4
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 4 16/3 𝑔 𝑔 ∗
I2DL: Prof. Niessner, Prof. Leal-Taixé 22
Discrete case: box filter
5 ⋅ 1 3 + 5 ⋅ 1 3 + 6 ⋅ 1 3 = 16 3
4 3 2
3 5 2 5 5 6 ?? 3 1 10/3 4 4 16/3 ?? 1/3 1/3 1/3
What to do at boundaries?
I2DL: Prof. Niessner, Prof. Leal-Taixé 23
Discrete case: box filter
4 3 2
3 5 2 5 5 6 ?? 3 1 10/3 4 4 16/3 ?? 1/3 1/3 1/3 3 1 10/3 4 4 16/3
Option 1: Shrink
I2DL: Prof. Niessner, Prof. Leal-Taixé 24
Discrete case: box filter
What to do at boundaries?
4 3 2
3 5 2 5 5 6 ?? 3 1 10/3 4 4 16/3 ?? 1/3 1/3 1/3
Option 2: Pa Pad (often 0’s)
7/3 3 1 10/3 4 4 16/3 11/3
I2DL: Prof. Niessner, Prof. Leal-Taixé 25
0 ⋅ 1 3 + 4 ⋅ 1 3 + 3 ⋅ 1 3 = 7 3
Discrete case: box filter
What to do at boundaries?
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
Image 5×5 Kernel 3×3 6 Output 3×3 5 ⋅ 3 + −1 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 0 + −1 ⋅ 4 = 15 − 9 = 6
I2DL: Prof. Niessner, Prof. Leal-Taixé 26
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
6 1 5 ⋅ 2 + −1 ⋅ 2 + −1 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 3 = 10 − 9 = 1
I2DL: Prof. Niessner, Prof. Leal-Taixé 27
Image 5×5 Kernel 3×3 Output 3×3
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
6 1 8 5 ⋅ 1 + −1 ⋅ −5 + −1 ⋅ −3 + −1 ⋅ 3 + −1 ⋅ 2 = 5 + 3 = 8
I2DL: Prof. Niessner, Prof. Leal-Taixé 28
Image 5×5 Kernel 3×3 Output 3×3
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
6 1 8
5 ⋅ 0 + −1 ⋅ 3 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 3 = 0 − 7 = −7
I2DL: Prof. Niessner, Prof. Leal-Taixé 29
Image 5×5 Kernel 3×3 Output 3×3
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
6 1 8
9 5 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 0 = 15 − 6 = 9
I2DL: Prof. Niessner, Prof. Leal-Taixé 30
Image 5×5 Kernel 3×3 Output 3×3
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
6 1 8
9 2 5 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 5 + −1 ⋅ 4 + −1 ⋅ 3 = 15 − 13 = 2
I2DL: Prof. Niessner, Prof. Leal-Taixé 31
Image 5×5 Kernel 3×3 Output 3×3
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
6 1 8
9 2
5 ⋅ 0 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 6 + −1 ⋅ −2 = −5
I2DL: Prof. Niessner, Prof. Leal-Taixé 32
Image 5×5 Kernel 3×3 Output 3×3
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
6 1 8
9 2
5 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 7 + −1 ⋅ 0 = 5 − 14 = −9
I2DL: Prof. Niessner, Prof. Leal-Taixé 33
Image 5×5 Kernel 3×3 Output 3×3
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
6 1 8
9 2
3 5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1 = 20 − 17 = 3
I2DL: Prof. Niessner, Prof. Leal-Taixé 34
Image 5×5 Kernel 3×3 Output 3×3
I2DL: Prof. Niessner, Prof. Leal-Taixé 35
Input Edge detection
−1 −1 −1 −1 8 −1 −1 −1 −1
Sharpen
−1 −1 5 −1 −1
Box mean
1 9 1 1 1 1 1 1 1 1 1
Gaussian blur
1 16 1 2 1 2 4 2 1 2 1
Images have depth: e.g. RGB -> 3 channels Convolve filter with image i.e., ‘slide’ over it and: − apply filter at each location − dot products
I2DL: Prof. Niessner, Prof. Leal-Taixé 36
width height depth filter 5×5×3
3 5 5
De Depth dimension *mus ust* match; i.e., , filter extends the full depth of the in input
32 32 3
image 32×32×3
𝑨# = 𝒙$𝒚# + 𝑐 5×5×3 ×1 (5×5×3)×1 1
I2DL: Prof. Niessner, Prof. Leal-Taixé 37
32×32×3 image (pixels 𝒀) 1 number at a time: equal to dot product between filter weights 𝒙 and 𝒚𝒋 − 𝑢ℎ chunk of the image. Here: 5 ⋅ 5 ⋅ 3 = 75-dim dot product + bias
32 32 3 3 5 5 𝑨#
5×5×3 filter (weights vector 𝒙)
Activation map (also feature map)
Slide over all spatial locations 𝑦# and compute all output 𝑨#; w/o padding, there are 28×28 locations
I2DL: Prof. Niessner, Prof. Leal-Taixé 38
Co Convolve 1 28 28 32 32 3 3 5 5
32×32×3 image 5×5×3 filter
I2DL: Prof. Niessner, Prof. Leal-Taixé 39
Let’s apply a different filter with different weights!
I2DL: Prof. Niessner, Prof. Leal-Taixé 40
Co Convolve
5×5×3 filter
32 32 3 3 5 5
32×32×3 image
1 28 28 1
Activation maps
I2DL: Prof. Niessner, Prof. Leal-Taixé 41
Let’s apply **five** filters, each with different weights!
32 32 3
32×32×3 image
Convolution “Layer”
Co Convolve 5 28 28
Activation maps
– Filter width and height (depth is implicitly given) – Number of different filter banks (#weight sets)
I2DL: Prof. Niessner, Prof. Leal-Taixé 42
I2DL: Prof. Niessner, Prof. Leal-Taixé 43
image characteristics:
– Horizontal edges – Vertical edges – Circles – Squares – …
[Zeiler & Fergus, ECCV’14] Visualizing and Understanding Convolutional Networks
I2DL: Prof. Niessner, Prof. Leal-Taixé 44
Input: Filter: Output: 7×7 3×3 5×5
I2DL: Prof. Niessner, Prof. Leal-Taixé 45
1 Image 7×7
I2DL: Prof. Niessner, Prof. Leal-Taixé 46
2
Input: Filter: Output: 7×7 3×3 5×5
Image 7×7
I2DL: Prof. Niessner, Prof. Leal-Taixé 47
3
Input: Filter: Output: 7×7 3×3 5×5
Image 7×7
I2DL: Prof. Niessner, Prof. Leal-Taixé 48
4
Input: Filter: Output: 7×7 3×3 5×5
Image 7×7
I2DL: Prof. Niessner, Prof. Leal-Taixé 49
5
Input: Filter: Output: 7×7 3×3 5×5
Image 7×7
Image 7×7 With a st stride of 1
Stride of 𝑇: apply filter every 𝑇-th spatial location; i.e. subsample the image
I2DL: Prof. Niessner, Prof. Leal-Taixé 50
Input: Filter: Stride: Output: 7×7 3×3 1 5×5
With a st stride of 2
I2DL: Prof. Niessner, Prof. Leal-Taixé 51
Input: Filter: Stride: Output: 7×7 3×3 2 3×3
Image 7×7
With a st stride of 2
I2DL: Prof. Niessner, Prof. Leal-Taixé 52
Image 7×7
Input: Filter: Stride: Output: 7×7 3×3 2 3×3
With a st stride of 2
I2DL: Prof. Niessner, Prof. Leal-Taixé 53
Image 7×7
Input: Filter: Stride: Output: 7×7 3×3 2 3×3
With a st stride of 3
I2DL: Prof. Niessner, Prof. Leal-Taixé 54
Image 7×7
Input: Filter: Stride: Output: 7×7 3×3 3 ? × ?
With a st stride of 3
I2DL: Prof. Niessner, Prof. Leal-Taixé 55
Image 7×7
Input: Filter: Stride: Output: 7×7 3×3 3 ? × ?
With a st stride of 3
Does not really fit (remainder left) à Illegal stride for input & filter size!
I2DL: Prof. Niessner, Prof. Leal-Taixé 56
Image 7×7
Input: Filter: Stride: Output: 7×7 3×3 3 ? × ?
In Input ut height of 𝑶 In Input ut widt dth of 𝑶
𝑂 = 7, 𝐺 = 3, 𝑇 = 1: "#$
% + 1 = 5
𝑂 = 7, 𝐺 = 3, 𝑇 = 2: "#$
& + 1 = 3
𝑂 = 7, 𝐺 = 3, 𝑇 = 3: "#$
$ + 1 = 2. <
3 Fractions are illegal
Fi Filter w width of
Fi Filter h heigh ght of
I2DL: Prof. Niessner, Prof. Leal-Taixé 57
Input: Filter: Stride: Output: 𝑂×𝑂 F×𝐺 𝑇
'#( )
+ 1 ×
'#( )
+ 1
Shrinking down so quickly (32à28à24à20) is typically not a good idea…
I2DL: Prof. Niessner, Prof. Leal-Taixé 58
32 32 3 28 28 5 24 24 8
Conv + ReLU Conv + ReLU Conv + ReLU
12 5 filters 5×5×3 8 filters 5×5×5 12 filters 5×5×8
Input Image
20
I2DL: Prof. Niessner, Prof. Leal-Taixé 59
Why padding?
Image 7×7 + zero padding
I2DL: Prof. Niessner, Prof. Leal-Taixé 60
Why padding?
Input (𝑂×𝑂): Filter (𝐺×𝐺): Padding (𝑄): Stride (𝑇): Output 7×7 3×3 1 1 7×7
Most common is ‘zero’ padding Output Size:
'*&⋅,#( )
+ 1 ×
'*&⋅,#( )
+ 1
I2DL: Prof. Niessner, Prof. Leal-Taixé 61
denotes the floor operator (as in practice an integer division is performed) Image 7×7 + zero padding
Set padding to 𝑄 = (#%
&
I2DL: Prof. Niessner, Prof. Leal-Taixé 62
Types of convolutions:
Valid convolution: using no padding
Same convoluti tion:
Image 7×7 + zero padding
Example
I2DL: Prof. Niessner, Prof. Leal-Taixé 63
Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2 Depth of 3 is implicitly given
3
Output size is: 32 + 2 ⋅ 2 − 5 1 + 1 = 32 i.e. 32×32×10
10 Gilters 5×5×3 32 32 3
Remember Output:
'*&⋅,#( )
+ 1 ×
'*&⋅,#( )
+ 1
Output size is: 32 + 2 ⋅ 2 − 5 1 + 1 = 32 i.e. 32×32×10 Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2
Example
I2DL: Prof. Niessner, Prof. Leal-Taixé 64
Remember Output:
'*&⋅,#( )
+ 1 ×
'*&⋅,#( )
+ 1
10 Gilters 5×5×3 32 32 3
Number of parameters (weights): Each filter has 5×5×3 + 1 = 76 params (+1 for bias)
Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2
Example
I2DL: Prof. Niessner, Prof. Leal-Taixé 65
10 Gilters 5×5×3 32 32 3
kernel size 5, stride 1, and no padding that operates
weight tensor?
pA1: (3, 4, 5, 5) pA2: (4, 5, 5) pA3: depends on the width and height of the image
I2DL: Prof. Niessner, Prof. Leal-Taixé 66
kernel size 5, stride 1, and no padding that operates
we weight ht tens nsor?
pA1: (3, 4, 5, 5) pA2: (4, 5, 5) pA3: depends on the width and height of the image
I2DL: Prof. Niessner, Prof. Leal-Taixé 67
kernel size 5, stride 1, and no padding that operates
we weight ht tens nsor?
I2DL: Prof. Niessner, Prof. Leal-Taixé 68
Input channels (RGB = 3) Output size = 4 filters Filter size = 5×5
A1: (3, 4, 5, 5)
I2DL: Prof. Niessner, Prof. Leal-Taixé 69
ConvNet is concatenation of Conv Layers and activations
32 32 3 28 28 5 24 24 8
Conv + ReLU Conv + ReLU Conv + ReLU
12 5 filters 5×5×3 8 filters 5×5×5 12 filters 5×5×8
Input Image
20
I2DL: Prof. Niessner, Prof. Leal-Taixé 70
I2DL: Prof. Niessner, Prof. Leal-Taixé 71 [Zeiler & Fergus, ECCV’14] Visualizing and Understanding Convolutional Networks
I2DL: Prof. Niessner, Prof. Leal-Taixé 72
I2DL: Prof. Niessner, Prof. Leal-Taixé 73 [Li et al., CS231n Course Slides] Lecture 5: Convolutional Neural Networks
3 1 3 5 6 7 9 3 2 1 4 2 4 3 6 9 3 4
Single depth slice of input Max pool with 2×2 filters and stride 2 ‘Pooled’ output
I2DL: Prof. Niessner, Prof. Leal-Taixé 74
– Computes a feature in a given region
– Picks the strongest activation in a region
I2DL: Prof. Niessner, Prof. Leal-Taixé 75
!"×𝐼!"×𝐸!"
– Spatial filter extent 𝐺 – Stride 𝑇
#$%×𝐼#$%×𝐸#$%
– 𝑋
#$% = &%&'( )
+ 1 – 𝐼#$% =
*%&'( )
+ 1 – 𝐸#$% = 𝐸!"
Filter count 𝐿 and padding 𝑄 make no sense here
I2DL: Prof. Niessner, Prof. Leal-Taixé 76
!"×𝐼!"×𝐸!"
– Spatial filter extent 𝐺 – Stride 𝑇
#$%×𝐼#$%×𝐸#$%
– 𝑋
#$% = &%&'( )
+ 1 – 𝐼#$% =
*%&'( )
+ 1 – 𝐸#$% = 𝐸!"
Common settings: 𝐺 = 2, S = 2 𝐺 = 3, 𝑇 = 2
I2DL: Prof. Niessner, Prof. Leal-Taixé 77
3 1 3 5 6 7 9 3 2 1 4 2 4 3
2.5 6 1.75 3
Single depth slice of input Average pool with 2×2 filters and stride 2 ‘Pooled’ output
I2DL: Prof. Niessner, Prof. Leal-Taixé 78
I2DL: Prof. Niessner, Prof. Leal-Taixé 79 [Li et al., CS231n Course Slides] Lecture 5: Convolutional Neural Networks
– Make the final decision with the extracted features from the convolutions – One or two FC layers typically
I2DL: Prof. Niessner, Prof. Leal-Taixé 80
restrict the degrees of freedom
– FC is somewhat brute force – Convolutions are st structured
extract image features
– Concept of weight sharing – Extract same features independent of location
I2DL: Prof. Niessner, Prof. Leal-Taixé 81
I2DL: Prof. Niessner, Prof. Leal-Taixé 82
filter
83
5x5 input 3x3 filter 3x3 output
I2DL: Prof. Niessner, Prof. Leal-Taixé
filter
84
5x5 input 3x3 filter 3x3 output
3x3 receptive field = 1 output pixel is connected to 9 input pixels
I2DL: Prof. Niessner, Prof. Leal-Taixé
filter
85
7x7 input 3x3 output 3x3 receptive field = 1 output pixel is connected to 9 input pixels
I2DL: Prof. Niessner, Prof. Leal-Taixé
filter
86
7x7 input 3x3 output 3x3 receptive field = 1 output pixel is connected to 9 input pixels
I2DL: Prof. Niessner, Prof. Leal-Taixé
filter
87
7x7 input 3x3 output 3x3 receptive field = 1 output pixel is connected to 9 input pixels
I2DL: Prof. Niessner, Prof. Leal-Taixé
filter
88
3x3 output 5x5 receptive field on the original input:
7x7 input
I2DL: Prof. Niessner, Prof. Leal-Taixé
I2DL: Prof. Niessner, Prof. Leal-Taixé 89
– Chapter 9: Convolutional Networks
I2DL: Prof. Niessner, Prof. Leal-Taixé 90