Le Lecture 9 9 R Recap ap
1 I2DL: Prof. Niessner, Prof. Leal-Taixé
Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix - - PowerPoint PPT Presentation
Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 What Wh at ar are e Con onvol olution ons? " = % !" = red = blue = green
1 I2DL: Prof. Niessner, Prof. Leal-Taixé
2
𝑔 ∗ = %
!" "
𝑔 𝜐 𝑢 − 𝜐 𝑒𝜐 𝑔 = red = blue 𝑔 ∗ = green Convolution of two box functions Convolution of two Gaussians application of a filter to a function the ‘smaller’ one is typically called the filter kernel
I2DL: Prof. Niessner, Prof. Leal-Taixé
4 3 2
3 5 2 5 5 6
3
?? ?? 3 1 10 10/3 4 4 16 16/3 ?? ?? Discrete case: box filter 1/ 1/3 1/ 1/3 1/ 1/3 Wh What t to d do a at b boundaries? 3 1 10 10/3 4 4 16 16/3 1) Shrink 2) Pad
7/3 7/3 3 1 10 10/3 4 4 16 16/3 11/ 11/3
I2DL: Prof. Niessner, Prof. Leal-Taixé
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
4
5
Image 5x5 Kernel 3x3 6 1 8
9 2
3 Output 3x3 5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1 = 20 − 17 = 3
I2DL: Prof. Niessner, Prof. Leal-Taixé
5
Input Edge detection −1 −1 −1 −1 8 −1 −1 −1 −1 Sharpen −1 −1 5 −1 −1 Box mean 1 9 1 1 1 1 1 1 1 1 1 Gaussian blur 1 16 1 2 1 2 4 2 1 2 1
I2DL: Prof. Niessner, Prof. Leal-Taixé
6
32 32 3 3 5 5 32×32×3 image (pixels 𝑦) 5×5×3 filter (weights 𝑥) 1 28 28 activation map (also feature map) slide over all spatial locations 𝑦# and compute all output 𝑨#; w/o padding, there are 28×28 locations Co Convolve
I2DL: Prof. Niessner, Prof. Leal-Taixé
7
32 32 3 3 5 5 32×32×3 image 5×5×3 filter 1 28 28 activation maps 1 Le Let’s s apply y a different filter wi with th different t we weights ts! Co Convolve
I2DL: Prof. Niessner, Prof. Leal-Taixé
8
32 32 3 32×32×3 image 5 28 28 activation maps Le Let’s s apply y **fi **five** ** fi filters, ea each wit ith dif iffer eren ent weigh eights! Co Convolution “Layer” Co Convolve
I2DL: Prof. Niessner, Prof. Leal-Taixé
9
In Input ut height of N Input: 𝑂×𝑂 Filter: 𝐺×𝐺 Stride: 𝑇 Output: (
$!% &
+ 1)×(
$!% &
+ 1) In Input ut widt dth of N 𝑂 = 7, 𝐺 = 3, 𝑇 = 1:
'!( ) + 1 = 5
𝑂 = 7, 𝐺 = 3, 𝑇 = 2:
'!( * + 1 = 3
𝑂 = 7, 𝐺 = 3, 𝑇 = 3:
'!( ( + 1 = 2.3333
Fractions are illegal Fi Filter width h
Fi Filter he height ht
I2DL: Prof. Niessner, Prof. Leal-Taixé
10
Set padding to 𝑄 =
%!) *
Image 7x7 + zero padding
Types of convolutions:
alid convolution: using no padding
ame convolution:
I2DL: Prof. Niessner, Prof. Leal-Taixé
11
Remember: Ou Output =
KLM⋅NOP Q
+ 1 ×
KLM⋅NOP Q
+ 1
RE REMARK RK: in practice, typically in integer div ivis ision ion is used (i.e., apply the fl floor–op
Example: 3x3 conv with same padding and strides of 2
Output:
!"#$⋅&'( $
+ 1 ×
!"#$⋅&'( $
+ 1 = 𝑔𝑚𝑝𝑝𝑠 32.5 × 𝑔𝑚𝑝𝑝𝑠 32.5 = 32× 32
I2DL: Prof. Niessner, Prof. Leal-Taixé
12 I2DL: Prof. Niessner, Prof. Leal-Taixé
13
Slide by Karpathy
I2DL: Prof. Niessner, Prof. Leal-Taixé
3 1 3 5 6 7 9 3 2 1 4 2 4 3
14
6 9 3 4
Single depth slice of input Max pool with 2×2 filters and stride 2 ‘Pooled’ output
I2DL: Prof. Niessner, Prof. Leal-Taixé
filter
15
3x3 output 5x5 receptive field on the original input:
7x7 input
I2DL: Prof. Niessner, Prof. Leal-Taixé
16 I2DL: Prof. Niessner, Prof. Leal-Taixé
17 I2DL: Prof. Niessner, Prof. Leal-Taixé
18
[LeCun et al. ’98] LeNet
Input: 32×32 grayscale images This one: Labeled as class “7”
I2DL: Prof. Niessner, Prof. Leal-Taixé
19
6
I2DL: Prof. Niessner, Prof. Leal-Taixé
pooling is much more common
20 I2DL: Prof. Niessner, Prof. Leal-Taixé
21 I2DL: Prof. Niessner, Prof. Leal-Taixé
22 I2DL: Prof. Niessner, Prof. Leal-Taixé
23 I2DL: Prof. Niessner, Prof. Leal-Taixé
24
60k parameters
I2DL: Prof. Niessner, Prof. Leal-Taixé
ageNe Net Dataset:
ImageNet Large Scale Visual Recognition Competition (ILSVRC)
25
[Russakovsky et al., IJCV’15] “ImageNet Large Scale Visual Recognition Challenge.“
I2DL: Prof. Niessner, Prof. Leal-Taixé
26
Top-1 1 score: check if a sample’s top class (i.e. the one with highest probability) is the same as its target label
Top-5 s 5 scor
e: check if your label is in your 5 first predictions (i.e. predictions with 5 highest probabilities)
Top-5 er 5 error
the correct class was not in the top 5 predicted classes
I2DL: Prof. Niessner, Prof. Leal-Taixé
27
Non-CNN CNN
I2DL: Prof. Niessner, Prof. Leal-Taixé
28
[Krizhevsky et al. NIPS’12] AlexNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
29
[Krizhevsky et al. NIPS’12] AlexNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
30
[Krizhevsky et al. NIPS’12] AlexNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
31
[Krizhevsky et al. NIPS’12] AlexNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
32
[Krizhevsky et al. NIPS’12] AlexNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
33
60M parameters
[Krizhevsky et al. NIPS’12] AlexNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
34
[Simonyan and Zisserman ICLR’15] VGGNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
filters
35
Conv=3x3,s=1,same Maxpool=2x2,s=2
[Simonyan and Zisserman ICLR’15] VGGNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
36
Conv=3x3,s=1,same Maxpool=2x2,s=2
[Simonyan and Zisserman ICLR’15] VGGNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
37
Conv=3x3,s=1,same Maxpool=2x2,s=2
[Simonyan and Zisserman ICLR’15] VGGNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
38
Conv=3x3,s=1,same Maxpool=2x2,s=2
[Simonyan and Zisserman ICLR’15] VGGNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
39
Conv=3x3,s=1,same Maxpool=2x2,s=2
[Simonyan and Zisserman ICLR’15] VGGNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
40
138M parameters
[Simonyan and Zisserman ICLR’15] VGGNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
were analyzed
41
[Simonyan and Zisserman 2014]
I2DL: Prof. Niessner, Prof. Leal-Taixé
42 I2DL: Prof. Niessner, Prof. Leal-Taixé
harder
43 I2DL: Prof. Niessner, Prof. Leal-Taixé
44
Input Linear Non-linearity
𝑦!"# 𝑦!$#
𝑦R
𝑦! = 𝑔(𝑋!𝑦!$# + 𝑐!) 𝑋!𝑦!$# + 𝑐! 𝑦!"# = 𝑔(𝑋!"#𝑦! + 𝑐!"#)
I2DL: Prof. Niessner, Prof. Leal-Taixé
45
Linear Linear Main path Input Skip connection
𝑦!"# 𝑦!$#
𝑦R
I2DL: Prof. Niessner, Prof. Leal-Taixé
46
Linear Linear Input
𝑦!"# 𝑦!$#
𝑦R
𝑦!"# = 𝑔(𝑋!"#𝑦! + 𝑐!"# + 𝑦!$#)
𝑦!"# = 𝑔(𝑋!"#𝑦! + 𝑐!"#)
I2DL: Prof. Niessner, Prof. Leal-Taixé
dimensions
matrix of learned weights or zero padding
47
𝑦!$#
𝑦R
I2DL: Prof. Niessner, Prof. Leal-Taixé
48
Weight Layer Weight Layer 𝑦 𝑆𝑓𝑀𝑉 𝑆𝑓𝑀𝑉 𝐼(𝑦)
Any two stacked layers
Weight Layer Weight Layer 𝑦 𝑆𝑓𝑀𝑉 𝑆𝑓𝑀𝑉 + 𝐺 𝑦 + 𝑦
Pl Plain Net Re Residu idual Ne Net
𝐺(𝑦) Identity 𝑦
[He et al. CVPR’16] ResNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
49
ResNet-152: 60M parameters
[He et al. CVPR’16] ResNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
performance starts to degrade
50 I2DL: Prof. Niessner, Prof. Leal-Taixé
performance starts to degrade
51 I2DL: Prof. Niessner, Prof. Leal-Taixé
52
NN
𝑦!"# 𝑦!$#
𝑦R
I2DL: Prof. Niessner, Prof. Leal-Taixé
53
NN
~zero ~zero
𝑦!"# 𝑦!$#
𝑦R
𝑦!"# = 𝑔(𝑋!"#𝑦! + 𝑐!"# + 𝑦!$#) 𝑦!"# = 𝑔(𝑦!$#)
I2DL: Prof. Niessner, Prof. Leal-Taixé
54
NN
𝑦!"# 𝑦!$#
𝑦R
𝑦!"# = 𝑔(𝑦!$#)
I2DL: Prof. Niessner, Prof. Leal-Taixé
improve
55
NN
𝑦!"# 𝑦!$#
𝑦R
I2DL: Prof. Niessner, Prof. Leal-Taixé
56 I2DL: Prof. Niessner, Prof. Leal-Taixé
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
57
5
Image 5x5 Kernel 3x3 6 Output 3x3 5 ⋅ 3 + −1 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 0 + −1 ⋅ 4 = 15 − 9 = 6
I2DL: Prof. Niessner, Prof. Leal-Taixé
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
58
2 Image 5x5 Kernel 1x1
What is the output size?
I2DL: Prof. Niessner, Prof. Leal-Taixé
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
59
2 Image 5x5 Kernel 1x1 −5 ∗ 2 = −10
10
I2DL: Prof. Niessner, Prof. Leal-Taixé
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
60
2 Image 5x5 Kernel 1x1 −1 ∗ 2 = −2
10 6 4
10 6 8 6 4 2
2 6 6 10 10
2 8 8 10 10 12 12 14 14 18 18
I2DL: Prof. Niessner, Prof. Leal-Taixé
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
61
Image 5x5
10 6 4
10 6 8 6 4 2
2 6 6 10 10
2 8 8 10 10 12 12 14 14 18 18
I2DL: Prof. Niessner, Prof. Leal-Taixé
62
32 32 3 1 output
I2DL: Prof. Niessner, Prof. Leal-Taixé
63
32 32 3 5 32 32 5 𝑔𝑗𝑚𝑢𝑓𝑠𝑡 1𝑦1𝑦3 [Li et al. 2013]
I2DL: Prof. Niessner, Prof. Leal-Taixé
64
complex functions
32 32 200 32 32 32 32 Conv 1x1x200 + + Re ReLU
I2DL: Prof. Niessner, Prof. Leal-Taixé
65 I2DL: Prof. Niessner, Prof. Leal-Taixé
66 I2DL: Prof. Niessner, Prof. Leal-Taixé
67 I2DL: Prof. Niessner, Prof. Leal-Taixé
28×28×192 28×28×64 28×28×32 28×28×128 28×28×192 28×28×416
68
32 32 200 32 32 92 92 Conv 5x5x200 + ReLU Multiplications: 5x5x200 1 value of the output volume x 32x32x92 ~ 470 million
I2DL: Prof. Niessner, Prof. Leal-Taixé
69
32 32 200 92 Conv 5x5 + ReLU Multiplications: 1x1x200x32x32x16 5x5x16x32x32x92 ~ 40 million 32 32 92 16 Conv 1x1 + ReLU 32 32 16
Reduction of multiplications by 1/10
I2DL: Prof. Niessner, Prof. Leal-Taixé
70
[Szegedy et al CVPR’15] GoogLeNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
71
28×28×192 28×28×64 28×28×16 28×28×32 28×28×128 28×28×96 28×28×192 28×28×32 28×28×256
We do not want max pool result to take up almost all the output
[Szegedy et al CVPR’15] GoogLeNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
72
Inception block Extra max pool layers to reduce dimensionality
[Szegedy et al CVPR’15] GoogLeNet
I2DL: Prof. Niessner, Prof. Leal-Taixé
Depthwi hwise Separ arab able Conv nvolutions ns instead of normal convolutions
skip conne nnections ns
73
[Chollet CVPR’17] Xception
I2DL: Prof. Niessner, Prof. Leal-Taixé
74
Normal convolutions act on all channels.
I2DL: Prof. Niessner, Prof. Leal-Taixé
75 classtorch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3) classtorch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3)
Filters are applied only at certain depths of the features. Normal convolutions have groups set to 1, the convolutions used in this image have groups set to 3.
I2DL: Prof. Niessner, Prof. Leal-Taixé
76 classtorch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3) classtorch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3)
But the depth size is always the same!
I2DL: Prof. Niessner, Prof. Leal-Taixé
77
Solution: 1x1 convs!
I2DL: Prof. Niessner, Prof. Leal-Taixé
78
Or Orig igin inal al convolutio ion 256 kernels of size 5x5x3 Multiplications: 256x5x5x3 x (8x8 locations) = 1.228.800
I2DL: Prof. Niessner, Prof. Leal-Taixé
79
Or Orig igin inal al convolutio ion 256 kernels of size 5x5x3 Multiplications: 256x5x5x3 x (8x8 locations) = 1.228.800 Dep Depth-wi wise convolution 3 kernels of size 5x5x1 Multiplications: 5x5x3 x (8x8 locations) = 4800 1x 1x1 1 convolu luti tion 256 kernels of size 1x1x3 Multiplications: 256x1x1x3x (8x8 locations) = 49152
Less computations!
I2DL: Prof. Niessner, Prof. Leal-Taixé
28,2 25,8 16,4 11,7 7,3 6,66 5,5 3,57 2,99 2,25 20 40 60 80 100 120 140 160 5 10 15 20 25 30 ILSVRC 2010 ILSVRC 2011 AlexNet (ILSVRC 2012) ZF (ILSVRC 2013) VGG (ILSVRC 2014) GoogLeNet (ILSVRC 2014) Xception (2016) ResNet (ILSVRC 2015) Trimps- Soushen (ILSVRC 2016) SENet (ILSVRC 2017)
ImageNet Classification top-5-error (%)
83
22 Layers 36 Layers 19 Layers 8 Layers 8 Layers Shallow
Re Revolution of Depth
152 Layers
I2DL: Prof. Niessner, Prof. Leal-Taixé
84 I2DL: Prof. Niessner, Prof. Leal-Taixé
“tabby cat”
85 I2DL: Prof. Niessner, Prof. Leal-Taixé
86
Convert fully connected layers to convolutional layers!
I2DL: Prof. Niessner, Prof. Leal-Taixé
87 I2DL: Prof. Niessner, Prof. Leal-Taixé
88 I2DL: Prof. Niessner, Prof. Leal-Taixé
89
[Long and Shelhamer. 15] FCN
How do we go back to the input size?
I2DL: Prof. Niessner, Prof. Leal-Taixé
90
?
I2DL: Prof. Niessner, Prof. Leal-Taixé
91
Original image Nearest neighbor interpolation Bilinear interpolation Bicubic interpolation
Image: Michael Guerzhoy
I2DL: Prof. Niessner, Prof. Leal-Taixé
Few artifacts
92 I2DL: Prof. Niessner, Prof. Leal-Taixé
93
[A. Dosovitskiy, TPAMI 2017] “Learning to Generate Chairs, Tables and Cars with Convolutional Networks“
+ CONVS
efficient
I2DL: Prof. Niessner, Prof. Leal-Taixé
(ne never deconvolution)
94
Output 5x5 Input 3x3
I2DL: Prof. Niessner, Prof. Leal-Taixé
architecture
connections (aka U-Net)
95 I2DL: Prof. Niessner, Prof. Leal-Taixé
96
U-Net architecture: Each blue box is a multichannel feature map. Number of channels denoted at the top of the box . Dimensions at the top of the box. White boxes are the copied feature maps.
[Ronneberger et al. MICCAI’15] U-Net
I2DL: Prof. Niessner, Prof. Leal-Taixé
Left side: Cont ntrac action n Pat Path (Encoder)
– Repeated application of 2 unpadded 3x3 convolutions – Each followed by ReLU activation – 2x2 maxpooling operation with stride 2 for downsampling – At each downsampling step, # of channels is doubled
à as before: Height, Width , Depth:
97
[Ronneberger et al. MICCAI’15] U-Net
I2DL: Prof. Niessner, Prof. Leal-Taixé
Right Side: Ex Expansio ion Path (Decoder):
class labels to each pixel
– 2x2 up-convolution that halves number of input channels – Sk Skip ip C Connectio ions: outputs of up-convolutions are concatenated with feature maps from encoder – Followed by 2 ordinary 3x3 convs – final layer: 1x1 conv to map 64 channels to # classes
Depth:
98
[Ronneberger et al. MICCAI’15] U-Net
I2DL: Prof. Niessner, Prof. Leal-Taixé
99 I2DL: Prof. Niessner, Prof. Leal-Taixé
We highly recommend to read through these papers!
100 I2DL: Prof. Niessner, Prof. Leal-Taixé