Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix - - PowerPoint PPT Presentation

le lecture 9 9 r recap ap
SMART_READER_LITE
LIVE PREVIEW

Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix - - PowerPoint PPT Presentation

Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 What Wh at ar are e Con onvol olution ons? " = % !" = red = blue = green


slide-1
SLIDE 1

Le Lecture 9 9 R Recap ap

1 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-2
SLIDE 2

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

2

𝑔 ∗ 𝑕 = %

!" "

𝑔 𝜐 𝑕 𝑢 − 𝜐 𝑒𝜐 𝑔 = red 𝑕 = blue 𝑔 ∗ 𝑕 = green Convolution of two box functions Convolution of two Gaussians application of a filter to a function the ‘smaller’ one is typically called the filter kernel

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-3
SLIDE 3

Wh What at ar are e Con

  • nvol
  • lution
  • ns?

4 3 2

  • 5

3 5 2 5 5 6

3

?? ?? 3 1 10 10/3 4 4 16 16/3 ?? ?? Discrete case: box filter 1/ 1/3 1/ 1/3 1/ 1/3 Wh What t to d do a at b boundaries? 3 1 10 10/3 4 4 16 16/3 1) Shrink 2) Pad

  • ften ‘0’

7/3 7/3 3 1 10 10/3 4 4 16 16/3 11/ 11/3

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-4
SLIDE 4

Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1

4

  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 1 8

  • 7

9 2

  • 5
  • 9

3 Output 3x3 5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1 = 20 − 17 = 3

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-5
SLIDE 5

Im Image Filters

  • Each kernel gives us a different image filter

5

Input Edge detection −1 −1 −1 −1 8 −1 −1 −1 −1 Sharpen −1 −1 5 −1 −1 Box mean 1 9 1 1 1 1 1 1 1 1 1 Gaussian blur 1 16 1 2 1 2 4 2 1 2 1

LET’S LEAR ARN N THESE FILTERS!

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-6
SLIDE 6

Convolutions on RGB Im Images

6

32 32 3 3 5 5 32×32×3 image (pixels 𝑦) 5×5×3 filter (weights 𝑥) 1 28 28 activation map (also feature map) slide over all spatial locations 𝑦# and compute all output 𝑨#; w/o padding, there are 28×28 locations Co Convolve

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-7
SLIDE 7

Con Convol

  • lution

ion Layer er

7

32 32 3 3 5 5 32×32×3 image 5×5×3 filter 1 28 28 activation maps 1 Le Let’s s apply y a different filter wi with th different t we weights ts! Co Convolve

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-8
SLIDE 8

Con Convol

  • lution

ion Layer er

8

32 32 3 32×32×3 image 5 28 28 activation maps Le Let’s s apply y **fi **five** ** fi filters, ea each wit ith dif iffer eren ent weigh eights! Co Convolution “Layer” Co Convolve

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-9
SLIDE 9

Con Convol

  • lution

ion Layer ers: Dimen imension ions

9

In Input ut height of N Input: 𝑂×𝑂 Filter: 𝐺×𝐺 Stride: 𝑇 Output: (

$!% &

+ 1)×(

$!% &

+ 1) In Input ut widt dth of N 𝑂 = 7, 𝐺 = 3, 𝑇 = 1:

'!( ) + 1 = 5

𝑂 = 7, 𝐺 = 3, 𝑇 = 2:

'!( * + 1 = 3

𝑂 = 7, 𝐺 = 3, 𝑇 = 3:

'!( ( + 1 = 2.3333

Fractions are illegal Fi Filter width h

  • f
  • f F

Fi Filter he height ht

  • f
  • f F

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-10
SLIDE 10

Con Convol

  • lution

ion Layer ers: Paddin ing

10

Set padding to 𝑄 =

%!) *

Image 7x7 + zero padding

Types of convolutions:

  • Val

alid convolution: using no padding

  • Sam

ame convolution:

  • utput=input size

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-11
SLIDE 11

Con Convol

  • lution

ion La Layers: : Di Dimensions

11

Remember: Ou Output =

KLM⋅NOP Q

+ 1 ×

KLM⋅NOP Q

+ 1

RE REMARK RK: in practice, typically in integer div ivis ision ion is used (i.e., apply the fl floor–op

  • perator
  • r!)

Example: 3x3 conv with same padding and strides of 2

  • n an 64x64 RGB image -> N = 64, F = 3, P = 1, S = 2

Output:

!"#$⋅&'( $

+ 1 ×

!"#$⋅&'( $

+ 1 = 𝑔𝑚𝑝𝑝𝑠 32.5 × 𝑔𝑚𝑝𝑝𝑠 32.5 = 32× 32

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-12
SLIDE 12

CN CNN Lea earned ed Fil ilter ers

12 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-13
SLIDE 13

CN CNN Prot

  • tot
  • type

13

Slide by Karpathy

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-14
SLIDE 14

Po Pooli ling g Laye yer: Max x Po Pooli ling

3 1 3 5 6 7 9 3 2 1 4 2 4 3

14

6 9 3 4

Single depth slice of input Max pool with 2×2 filters and stride 2 ‘Pooled’ output

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-15
SLIDE 15

Rec Recep eptive e Fiel eld

  • Spatial extent of the connectivity of a convolutional

filter

15

3x3 output 5x5 receptive field on the original input:

  • ne output value is connected to 25 input pixels

7x7 input

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-16
SLIDE 16

Le Lecture 10 10 – CNNs CNNs (p (part 2) 2)

16 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-17
SLIDE 17

Cla lassic Architectures

17 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-18
SLIDE 18

Le LeNet

18

  • Digit recognition: 10 classes

[LeCun et al. ’98] LeNet

Input: 32×32 grayscale images This one: Labeled as class “7”

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-19
SLIDE 19

Le LeNet

  • Digit recognition: 10 classes
  • Valid convolution: size shrinks
  • How many conv filters are there in the first layer?

19

6

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-20
SLIDE 20

Le LeNet

  • Digit recognition: 10 classes
  • At that time average pooling was used, now max

pooling is much more common

20 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-21
SLIDE 21

Le LeNet

  • Digit recognition: 10 classes
  • Again valid convolutions, how many filters?

21 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-22
SLIDE 22

Le LeNet

  • Digit recognition: 10 classes
  • Use of tanh/sigmoid activations à not common now!

22 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-23
SLIDE 23

Le LeNet

  • Digit recognition: 10 classes
  • Conv -> Pool -> Conv -> Pool -> Conv -> FC

23 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-24
SLIDE 24

Le LeNet

  • Digit recognition: 10 classes
  • Conv -> Pool -> Conv -> Pool -> Conv -> FC
  • As we go deeper: Width, Height Number of Filters

24

60k parameters

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-25
SLIDE 25

Te Test t Benchm nchmarks ks

  • Imag

ageNe Net Dataset:

ImageNet Large Scale Visual Recognition Competition (ILSVRC)

25

[Russakovsky et al., IJCV’15] “ImageNet Large Scale Visual Recognition Challenge.“

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-26
SLIDE 26

Common Common Per erfor

  • rma

mance e Met etric ics

26

  • To

Top-1 1 score: check if a sample’s top class (i.e. the one with highest probability) is the same as its target label

  • To

Top-5 s 5 scor

  • re:

e: check if your label is in your 5 first predictions (i.e. predictions with 5 highest probabilities)

  • → To

Top-5 er 5 error

  • r: percentage of test samples for which

the correct class was not in the top 5 predicted classes

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-27
SLIDE 27

Al AlexNet

  • Cut ImageNet error down in half

27

Non-CNN CNN

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-28
SLIDE 28

Al AlexNet

28

[Krizhevsky et al. NIPS’12] AlexNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-29
SLIDE 29

Al AlexNet

  • First filter with stride 4 to reduce size significantly
  • 96 filters

29

[Krizhevsky et al. NIPS’12] AlexNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-30
SLIDE 30

Al AlexNet

  • Use of same convolutions
  • As with LeNet, Width, height Number of filters

30

  • Use of same convolutions
  • As with LeNet: Width, Height Number of Filters

[Krizhevsky et al. NIPS’12] AlexNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-31
SLIDE 31

Al AlexNet

31

[Krizhevsky et al. NIPS’12] AlexNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-32
SLIDE 32

Al AlexNet

  • Softmax for 1000 classes

32

[Krizhevsky et al. NIPS’12] AlexNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-33
SLIDE 33

Al AlexNet

  • Similar to LeNet but much bigger (~1000 times)
  • Use of ReLU instead of tanh/sigmoid

33

60M parameters

[Krizhevsky et al. NIPS’12] AlexNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-34
SLIDE 34

VG VGGNet et

  • Striving for simplicity
  • CONV = 3x3 filters with stride 1, same convolutions
  • MAXPOOL = 2x2 filters with stride 2

34

[Simonyan and Zisserman ICLR’15] VGGNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-35
SLIDE 35

VG VGGNet et

  • 2 consecutive convolutional layers, each one with 64

filters

  • What is the output size?

35

Conv=3x3,s=1,same Maxpool=2x2,s=2

[Simonyan and Zisserman ICLR’15] VGGNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-36
SLIDE 36

VG VGGNet et

36

Conv=3x3,s=1,same Maxpool=2x2,s=2

[Simonyan and Zisserman ICLR’15] VGGNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-37
SLIDE 37

VG VGGNet et

37

Conv=3x3,s=1,same Maxpool=2x2,s=2

[Simonyan and Zisserman ICLR’15] VGGNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-38
SLIDE 38

VG VGGNet et

  • Number of filters is multiplied by 2

38

Conv=3x3,s=1,same Maxpool=2x2,s=2

[Simonyan and Zisserman ICLR’15] VGGNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-39
SLIDE 39

VG VGGNet et

39

Conv=3x3,s=1,same Maxpool=2x2,s=2

[Simonyan and Zisserman ICLR’15] VGGNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-40
SLIDE 40

VG VGGNet et

  • Conv -> Pool -> Conv -> Pool -> Conv -> FC
  • As we go deeper: Width, Height Number of Filters
  • Called VGG-16: 16 layers that have weights
  • Large but simplicity makes it appealing

40

138M parameters

[Simonyan and Zisserman ICLR’15] VGGNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-41
SLIDE 41

VG VGGNet et

  • A lot of architectures

were analyzed

41

[Simonyan and Zisserman 2014]

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-42
SLIDE 42

Sk Skip p Con Connecti ection

  • ns

42 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-43
SLIDE 43

The The Proble lem of De Depth th

  • As we add more and more layers, training becomes

harder

  • Vanishing and exploding gradients
  • How can we train very deep nets?

43 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-44
SLIDE 44

Res Residu dual al Bl Bloc

  • ck
  • Two layers

44

Input Linear Non-linearity

𝑦!"# 𝑦!$#

𝑦R

𝑦! = 𝑔(𝑋!𝑦!$# + 𝑐!) 𝑋!𝑦!$# + 𝑐! 𝑦!"# = 𝑔(𝑋!"#𝑦! + 𝑐!"#)

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-45
SLIDE 45

Res Residu dual al Bl Bloc

  • ck
  • Two layers

45

Linear Linear Main path Input Skip connection

𝑦!"# 𝑦!$#

𝑦R

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-46
SLIDE 46

Res Residu dual al Bl Bloc

  • ck
  • Two layers

46

Linear Linear Input

𝑦!"# 𝑦!$#

𝑦R

𝑦!"# = 𝑔(𝑋!"#𝑦! + 𝑐!"# + 𝑦!$#)

𝑦!"# = 𝑔(𝑋!"#𝑦! + 𝑐!"#)

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-47
SLIDE 47

Res Residu dual al Bl Bloc

  • ck
  • Two layers
  • Usually use a same convolution since we need same

dimensions

  • Otherwise we need to convert the dimensions with a

matrix of learned weights or zero padding

47

+ 𝑦!"#

𝑦!$#

𝑦R

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-48
SLIDE 48

Res ResNet et Bl Bloc

  • ck

48

Weight Layer Weight Layer 𝑦 𝑆𝑓𝑀𝑉 𝑆𝑓𝑀𝑉 𝐼(𝑦)

Any two stacked layers

Weight Layer Weight Layer 𝑦 𝑆𝑓𝑀𝑉 𝑆𝑓𝑀𝑉 + 𝐺 𝑦 + 𝑦

Pl Plain Net Re Residu idual Ne Net

𝐺(𝑦) Identity 𝑦

[He et al. CVPR’16] ResNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-49
SLIDE 49

Res ResNet et

  • Xavier/2 initialization
  • SGD + Momentum (0.9)
  • Learning rate 0.1, divided by 10 when plateau
  • Mini-batch size 256
  • Weight decay of 1e-5
  • No dropout

49

ResNet-152: 60M parameters

[He et al. CVPR’16] ResNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-50
SLIDE 50

Res ResNet et

  • If we make the network deeper, at some point

performance starts to degrade

50 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-51
SLIDE 51

Res ResNet et

  • If we make the network deeper, at some point

performance starts to degrade

51 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-52
SLIDE 52

Wh Why do do Res ResNet ets Wor Work?

  • How is this block really affecting me?

52

+

NN

𝑦!"# 𝑦!$#

𝑦R

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-53
SLIDE 53

Wh Why do do Res ResNet ets Wor Work?

53

+

NN

~zero ~zero

𝑦!"# 𝑦!$#

𝑦R

𝑦!"# = 𝑔(𝑋!"#𝑦! + 𝑐!"# + 𝑦!$#) 𝑦!"# = 𝑔(𝑦!$#)

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-54
SLIDE 54

Wh Why do do Res ResNet ets Wor Work?

  • We kept the same values and added a non-linearity

54

+

NN

𝑦!"# 𝑦!$#

𝑦R

𝑦!"# = 𝑔(𝑦!$#)

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-55
SLIDE 55

Wh Why do do Res ResNet ets Wor Work?

  • The identity is easy for the residual block to learn
  • Guaranteed it will not hurt performance, can only

improve

55

+

NN

𝑦!"# 𝑦!$#

𝑦R

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-56
SLIDE 56

1x1 Convolu lutions

56 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-57
SLIDE 57

Recall: Convolutions on Im Images

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1

57

  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 Output 3x3 5 ⋅ 3 + −1 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 0 + −1 ⋅ 4 = 15 − 9 = 6

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-58
SLIDE 58

1x 1x1 C 1 Convolut ution

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1

58

2 Image 5x5 Kernel 1x1

What is the output size?

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-59
SLIDE 59

1x 1x1 C 1 Convolut ution

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1

59

2 Image 5x5 Kernel 1x1 −5 ∗ 2 = −10

  • 10

10

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-60
SLIDE 60

1x 1x1 C 1 Convolut ution

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1

60

2 Image 5x5 Kernel 1x1 −1 ∗ 2 = −2

  • 10

10 6 4

  • 10

10 6 8 6 4 2

  • 6

2 6 6 10 10

  • 4

2 8 8 10 10 12 12 14 14 18 18

  • 2

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-61
SLIDE 61

1x 1x1 C 1 Convolut ution

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1

61

Image 5x5

  • 10

10 6 4

  • 10

10 6 8 6 4 2

  • 6

2 6 6 10 10

  • 4

2 8 8 10 10 12 12 14 14 18 18

  • 2
  • 1x1 kernel: keeps the dimensions and scales input

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-62
SLIDE 62

1x 1x1 C 1 Convolut ution

62

  • Same as having a 3 neuron fully connected layer

32 32 3 1 output

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-63
SLIDE 63

1x 1x1 C 1 Convolut ution

63

  • As always we use more convolutional filters

32 32 3 5 32 32 5 𝑔𝑗𝑚𝑢𝑓𝑠𝑡 1𝑦1𝑦3 [Li et al. 2013]

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-64
SLIDE 64

Usin Using 1 1x1 C Convolutio ions

64

  • Use it to shrink the number of channels
  • Further adds a non-linearity à one can learn more

complex functions

32 32 200 32 32 32 32 Conv 1x1x200 + + Re ReLU

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-65
SLIDE 65

Inc Inceptio ion La n Layer

65 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-66
SLIDE 66

In Inception Layer

  • Tired of choosing filter sizes?
  • Use them all!
  • All same convolutions
  • 3x3 max pooling is with stride 1

66 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-67
SLIDE 67

In Inception Layer

  • Possible size of the
  • utput
  • Not sustainable!

67 I2DL: Prof. Niessner, Prof. Leal-Taixé

28×28×192 28×28×64 28×28×32 28×28×128 28×28×192 28×28×416

slide-68
SLIDE 68

In Inception Layer: Computational Cost

68

32 32 200 32 32 92 92 Conv 5x5x200 + ReLU Multiplications: 5x5x200 1 value of the output volume x 32x32x92 ~ 470 million

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-69
SLIDE 69

In Inception Layer: Computational Cost

69

32 32 200 92 Conv 5x5 + ReLU Multiplications: 1x1x200x32x32x16 5x5x16x32x32x92 ~ 40 million 32 32 92 16 Conv 1x1 + ReLU 32 32 16

Reduction of multiplications by 1/10

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-70
SLIDE 70

In Inception Layer

70

[Szegedy et al CVPR’15] GoogLeNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-71
SLIDE 71

In Inception Layer: Dimensions

71

  • (b) Inception module with dimension reductions

28×28×192 28×28×64 28×28×16 28×28×32 28×28×128 28×28×96 28×28×192 28×28×32 28×28×256

We do not want max pool result to take up almost all the output

[Szegedy et al CVPR’15] GoogLeNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-72
SLIDE 72

Go GoogL gLeNe Net: Using the In Inception Layer

72

Inception block Extra max pool layers to reduce dimensionality

[Szegedy et al CVPR’15] GoogLeNet

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-73
SLIDE 73

Xc Xception Ne Net

  • „Extreme version of Inception“: applying (modified)

Depthwi hwise Separ arab able Conv nvolutions ns instead of normal convolutions

  • 36 conv layers, structured into several modules with

skip conne nnections ns

  • outperforms Inception Net V3

73

[Chollet CVPR’17] Xception

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-74
SLIDE 74

De Depth-wi wise separa rable le convoluti lutions

74

Normal convolutions act on all channels.

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-75
SLIDE 75

De Depth-wi wise separa rable le convoluti lutions

75 classtorch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3) classtorch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3)

Filters are applied only at certain depths of the features. Normal convolutions have groups set to 1, the convolutions used in this image have groups set to 3.

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-76
SLIDE 76

De Depth-wi wise separa rable le convoluti lutions

76 classtorch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3) classtorch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=3)

But the depth size is always the same!

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-77
SLIDE 77

De Depth-wi wise separa rable le convoluti lutions

77

Solution: 1x1 convs!

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-78
SLIDE 78

But wh why?

78

Or Orig igin inal al convolutio ion 256 kernels of size 5x5x3 Multiplications: 256x5x5x3 x (8x8 locations) = 1.228.800

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-79
SLIDE 79

But wh why?

79

Or Orig igin inal al convolutio ion 256 kernels of size 5x5x3 Multiplications: 256x5x5x3 x (8x8 locations) = 1.228.800 Dep Depth-wi wise convolution 3 kernels of size 5x5x1 Multiplications: 5x5x3 x (8x8 locations) = 4800 1x 1x1 1 convolu luti tion 256 kernels of size 1x1x3 Multiplications: 256x1x1x3x (8x8 locations) = 49152

Less computations!

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-80
SLIDE 80

Im ImageNet Benchmark

28,2 25,8 16,4 11,7 7,3 6,66 5,5 3,57 2,99 2,25 20 40 60 80 100 120 140 160 5 10 15 20 25 30 ILSVRC 2010 ILSVRC 2011 AlexNet (ILSVRC 2012) ZF (ILSVRC 2013) VGG (ILSVRC 2014) GoogLeNet (ILSVRC 2014) Xception (2016) ResNet (ILSVRC 2015) Trimps- Soushen (ILSVRC 2016) SENet (ILSVRC 2017)

ImageNet Classification top-5-error (%)

83

22 Layers 36 Layers 19 Layers 8 Layers 8 Layers Shallow

Re Revolution of Depth

152 Layers

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-81
SLIDE 81

Fully lly Convolu lutional l Netw Network

84 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-82
SLIDE 82

“tabby cat”

Cl Classif ific ication ion Net etwor

  • rk

85 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-83
SLIDE 83

FC FCN: B : Becomin ming Fu Fully C Convolutio ional

86

Convert fully connected layers to convolutional layers!

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-84
SLIDE 84

FC FCN: B : Becomin ming Fu Fully C Convolutio ional

87 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-85
SLIDE 85

FC FCN: Up : Upsa samp mplin ing O Output

88 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-86
SLIDE 86

Se Semanti ntic c Se Segmenta ntati tion n (FCN)

89

[Long and Shelhamer. 15] FCN

How do we go back to the input size?

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-87
SLIDE 87

Ty Types of Up Upsa samp mplin ing

  • 1. Interpolation

90

?

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-88
SLIDE 88

Ty Types of Up Upsa samp mplin ing

  • 1. Interpolation

91

Original image Nearest neighbor interpolation Bilinear interpolation Bicubic interpolation

Image: Michael Guerzhoy

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-89
SLIDE 89

Ty Types of Up Upsa samp mplin ing

  • 1. Interpolation

Few artifacts

92 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-90
SLIDE 90

Ty Types of Up Upsa samp mplin ing

  • 2. Transposed conv

93

[A. Dosovitskiy, TPAMI 2017] “Learning to Generate Chairs, Tables and Cars with Convolutional Networks“

+ CONVS

efficient

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-91
SLIDE 91

Ty Types of Up Upsa samp mplin ing

  • 2. Transposed convolution
  • Unpooling
  • Convolution filter (learned)
  • Also called up-convolution

(ne never deconvolution)

94

Output 5x5 Input 3x3

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-92
SLIDE 92

Ref Refined ed Outputs

  • If one does a cascade of unpooling + conv
  • perations, we get to the encoder-decoder

architecture

  • Even more refined: Autoencoders with skip

connections (aka U-Net)

95 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-93
SLIDE 93

96

U-Ne Net

U-Net architecture: Each blue box is a multichannel feature map. Number of channels denoted at the top of the box . Dimensions at the top of the box. White boxes are the copied feature maps.

[Ronneberger et al. MICCAI’15] U-Net

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-94
SLIDE 94

U-Ne Net: E Encoder

Left side: Cont ntrac action n Pat Path (Encoder)

  • Captures context of the image
  • Follows typical architecture of a CNN:

– Repeated application of 2 unpadded 3x3 convolutions – Each followed by ReLU activation – 2x2 maxpooling operation with stride 2 for downsampling – At each downsampling step, # of channels is doubled

à as before: Height, Width , Depth:

97

[Ronneberger et al. MICCAI’15] U-Net

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-95
SLIDE 95

U-Ne Net: D Decoder

Right Side: Ex Expansio ion Path (Decoder):

  • Upsampling to recover spatial locations for assigning

class labels to each pixel

– 2x2 up-convolution that halves number of input channels – Sk Skip ip C Connectio ions: outputs of up-convolutions are concatenated with feature maps from encoder – Followed by 2 ordinary 3x3 convs – final layer: 1x1 conv to map 64 channels to # classes

  • Height, Width: ,

Depth:

98

[Ronneberger et al. MICCAI’15] U-Net

I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-96
SLIDE 96

See See you

  • u ne

next t time ime!

99 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-97
SLIDE 97

Ref Refer eren ences es

We highly recommend to read through these papers!

  • AlexNet [Krizhevsky et al. 2012]
  • VGGNet [Simonyan & Zisserman 2014]
  • ResNet [He et al. 2015]
  • GoogLeNet [Szegedy et al. 2014]
  • Xception [Chollet 2016]
  • Fast R-CNN [Girshick 2015]
  • U-Net [Ronneberger et al. 2015]
  • EfficientNet [Tan & Le 2019]

100 I2DL: Prof. Niessner, Prof. Leal-Taixé