AMMI – Introduction to Deep Learning 6.5. Residual networks
Fran¸ cois Fleuret https://fleuret.org/ammi-2018/ Fri Nov 9 22:38:28 UTC 2018
ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
AMMI Introduction to Deep Learning 6.5. Residual networks Fran - - PowerPoint PPT Presentation
AMMI Introduction to Deep Learning 6.5. Residual networks Fran cois Fleuret https://fleuret.org/ammi-2018/ Fri Nov 9 22:38:28 UTC 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE Residual networks Fran cois Fleuret AMMI
ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 1 / 21
H ×
− T
+
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 2 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 3 / 21
Linear BN ReLU Linear BN ReLU
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 3 / 21
Linear BN ReLU Linear BN + ReLU
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 3 / 21
Linear BN ReLU Linear BN + ReLU
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 3 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 4 / 21
x conv1 bn1 relu conv2 bn2 y
+
relu
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 5 / 21
x conv1 bn1 relu conv2 bn2 y
+
relu
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 5 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 6 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 7 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 8 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 8 / 21
. . . φ . . .
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 9 / 21
. . . φ
+
. . .
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 9 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 10 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 11 / 21
7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000 image 3x3 conv, 512 3x3 conv, 64 3x3 conv, 64 pool, /2 3x3 conv, 128 3x3 conv, 128 pool, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 pool, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 pool, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 pool, /2 fc 4096 fc 4096 fc 1000 image
size: 112
size: 224
size: 56
size: 28
size: 14
size: 7
size: 1 7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128, /2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 256, /2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 512, /2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 avg pool fc 1000 image
Figure 3. Example network architectures for ImageNet. Left: the
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 12 / 21
10 20 30 40 50 20 30 40 50 60
error (%)
plain-18 plain-34
10 20 30 40 50 20 30 40 50 60
error (%)
ResNet-18 ResNet-34 18-layer 34-layer 18-layer 34-layer
Figure 4. Training on ImageNet. Thin curves denote training error, and bold curves denote validation error of the center crops. Left: plain networks of 18 and 34 layers. Right: ResNets of 18 and 34 layers. In this plot, the residual networks have no extra parameter compared to their plain counterparts.
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 13 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 14 / 21
Φ + Φ + Φ +
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 15 / 21
×ℬ(p1)
Φ +
×ℬ(p2)
Φ +
×ℬ(p3)
Φ +
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 15 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 16 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 17 / 21
Gradients
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.5 0.0 0.5 1.0 1.5 2.0 gradient 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 gradient 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 gradientNoise
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 3 2 1 1 2 3 4 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.5 0.0 0.5 1.0 1.5 2.0 gradient 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 gradient 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 gradient 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 3 2 1 1 2 3 4 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250
(a) 1-layer feedforward. (b) 24-layer feedforward. (c) 50-layer resnet. (d) Brown noise. (e) White noise.
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 17 / 21
Gradients
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.5 0.0 0.5 1.0 1.5 2.0 gradient 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 gradient 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 gradientNoise
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 3 2 1 1 2 3 4 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.5 0.0 0.5 1.0 1.5 2.0 gradient 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 gradient 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 input 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 gradient 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 3 2 1 1 2 3 4 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250
(a) 1-layer feedforward. (b) 24-layer feedforward. (c) 50-layer resnet. (d) Brown noise. (e) White noise.
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 17 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 18 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 18 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 18 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 19 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 19 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 20 / 21
Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 6.5. Residual networks 21 / 21