Lectu ture 8 Recap Prof. Leal-Taix and Prof. Niessner 1 Wh What - - PowerPoint PPT Presentation

lectu ture 8 recap
SMART_READER_LITE
LIVE PREVIEW

Lectu ture 8 Recap Prof. Leal-Taix and Prof. Niessner 1 Wh What - - PowerPoint PPT Presentation

Lectu ture 8 Recap Prof. Leal-Taix and Prof. Niessner 1 Wh What d do w we k know so so fa far? r? Width Depth Prof. Leal-Taix and Prof. Niessner 2 Wh What d do w we k know so so fa far? r? Activation Functions


slide-1
SLIDE 1

Lectu ture 8 Recap

  • Prof. Leal-Taixé and Prof. Niessner

1

slide-2
SLIDE 2

Wh What d do w we k know so so fa far? r?

Depth Width

  • Prof. Leal-Taixé and Prof. Niessner

2

slide-3
SLIDE 3

Wh What d do w we k know so so fa far? r?

Activation Functions (non-linearities)

Sigmoid: ! " =

$ ($&'())

tanh: tanh " ReLU: max 0, " Leaky ReLU: max 0.1", "

  • Prof. Leal-Taixé and Prof. Niessner

3

slide-4
SLIDE 4

Wh What d do w we k know so so fa far? r?

!" #" !$ #$ %

*−1 + 1 # )* ∗ ∗ + 2.00 −1.00 −2.00 −3.00 −2.00 6.00 +1 4.00 −3.00 −1.00 1.00 0.37 1.37 0.73 1.00 −0.53 −0.53 −0.20 0.20 0.20 0.20 0.20 0.20 −0.20 −0.39 −0.39 −0.59

Backpropagation

  • Prof. Leal-Taixé and Prof. Niessner

4

slide-5
SLIDE 5

Wh What d do w we k know so so fa far? r?

Batchnorm

D = #features N = mini-batch size

SGD Variations (Momentum, etc.)

  • Prof. Leal-Taixé and Prof. Niessner

5

slide-6
SLIDE 6

Wh Why n not o

  • nly mo

more re L Layers? rs?

  • We can not make networks arbitrarily complex

– Why not just go deeper and get better? – No structure!! – It’s just brute force! – Optimization becomes hard – Performance plateaus / drops!

  • Prof. Leal-Taixé and Prof. Niessner

6

slide-7
SLIDE 7

Dealing with th ima mages

  • Prof. Leal-Taixé and Prof. Niessner

7

slide-8
SLIDE 8

Usi Using CNNs s in Comp mput uter r Visi sion

Credit: Li/Karpathy/Johnson

  • Prof. Leal-Taixé and Prof. Niessner

8

slide-9
SLIDE 9

FC FC layers rs on ima mages

  • How to process a tiny image with FC layers
  • Prof. Leal-Taixé and Prof. Niessner

9

3 5

3 neuron layer

5 weights 5

slide-10
SLIDE 10

FC FC layers rs on ima mages

  • How to process a tiny image with FC layers
  • Prof. Leal-Taixé and Prof. Niessner

10

3 5

3 neuron layer

25 weights

For the whole 5x5 image on

5

slide-11
SLIDE 11

FC FC layers rs on ima mages

  • How to process a tiny image with FC layers
  • Prof. Leal-Taixé and Prof. Niessner

11

3 5

3 neuron layer

75 weights

For the whole 5x5 image on the three channels

5

slide-12
SLIDE 12

FC FC layers rs on ima mages

  • How to process a tiny image with FC layers
  • Prof. Leal-Taixé and Prof. Niessner

12

3 5 5

3 neuron layer

75 weights

For the whole 5x5 image on the three channels pe per r ne neuron

  • n

75 weights 75 weights

slide-13
SLIDE 13

FC FC layers rs on ima mages

  • How to process a no

normal mal image with FC layers

  • Prof. Leal-Taixé and Prof. Niessner

13

3 1000 1000

3 neuron layer

slide-14
SLIDE 14

FC FC layers rs on ima mages

  • How to process a no

normal mal image with FC layers

  • Prof. Leal-Taixé and Prof. Niessner

14

3 1000 1000

1000 neuron layer

3 $%&&%'( weights

IM IMPRACTIC TICAL

slide-15
SLIDE 15

An An alterna native to Fully-Con Connec ected ed

  • We want to restrict the degrees of freedom

– FC is somewhat brute force – We want a layer with structure – Weight sharing à using the same weights for different parts of the image

  • Prof. Leal-Taixé and Prof. Niessner

15

slide-16
SLIDE 16

Convoluti tions

  • Prof. Leal-Taixé and Prof. Niessner

16

slide-17
SLIDE 17

Wh What a are re Co Convolutions? s?

! ∗ # = %

&' '

! ( # ) − ( +( ! = red # = blue ! ∗ # = green Convolution of two box functions Convolution of two Gaussians application of a filter to a function the ‘smaller’ one is typically called the filter kernel

  • Prof. Leal-Taixé and Prof. Niessner

17

slide-18
SLIDE 18

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 ! Discrete case: box filter 1/3 1/3 1/3 " ‘Slide’ filter kernel from left to right; at each position, compute a single value in the output data

  • Prof. Leal-Taixé and Prof. Niessner

18

slide-19
SLIDE 19

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 ! " ! ∗ " Discrete case: box filter 4 ⋅ 1 3 + 3 ⋅ 1 3 + 2 ⋅ 1 3 = 3

  • Prof. Leal-Taixé and Prof. Niessner

19

slide-20
SLIDE 20

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 ! " ! ∗ " Discrete case: box filter 3 ⋅ 1 3 + 2 ⋅ 1 3 + (−5) ⋅ 1 3 = 0

  • Prof. Leal-Taixé and Prof. Niessner

20

slide-21
SLIDE 21

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 ! " ! ∗ " Discrete case: box filter 2 ⋅ 1 3 + (−5) ⋅ 1 3 + 3 ⋅ 1 3 = 0

  • Prof. Leal-Taixé and Prof. Niessner

21

slide-22
SLIDE 22

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 1 ! " ! ∗ " Discrete case: box filter (−5) ⋅ 1 3 + 3 ⋅ 1 3 + 5 ⋅ 1 3 = 1

  • Prof. Leal-Taixé and Prof. Niessner

22

slide-23
SLIDE 23

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 ! " ! ∗ " Discrete case: box filter 3 ⋅ 1 3 + 5 ⋅ 1 3 + 2 ⋅ 1 3 = 10 3

  • Prof. Leal-Taixé and Prof. Niessner

23

slide-24
SLIDE 24

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 ! " ! ∗ " Discrete case: box filter 5 ⋅ 1 3 + 2 ⋅ 1 3 + 5 ⋅ 1 3 = 4

  • Prof. Leal-Taixé and Prof. Niessner

24

slide-25
SLIDE 25

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 4 ! " ! ∗ " Discrete case: box filter 2 ⋅ 1 3 + 5 ⋅ 1 3 + 5 ⋅ 1 3 = 4

  • Prof. Leal-Taixé and Prof. Niessner

25

slide-26
SLIDE 26

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 4 16/3 ! " ! ∗ " Discrete case: box filter 5 ⋅ 1 3 + 5 ⋅ 1 3 + 6 ⋅ 1 3 = 16 3

  • Prof. Leal-Taixé and Prof. Niessner

26

slide-27
SLIDE 27

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 ?? 3 1 10/3 4 4 16/3 ?? Discrete case: box filter 1/3 1/3 1/3 Wh What t to d do a at b boundaries?

  • Prof. Leal-Taixé and Prof. Niessner

27

slide-28
SLIDE 28

Wh What a are re Co Convolutions? s?

4 3 2

  • 5

3 5 2 5 5 6 ?? 3 1 10/3 4 4 16/3 ?? Discrete case: box filter 1/3 1/3 1/3 Wh What t to d do a at b boundaries? 3 1 10/3 4 4 16/3 1) Shrink 2) Pad

  • ften ‘0’

7/3 3 1 10/3 4 4 16/3 11/3

  • Prof. Leal-Taixé and Prof. Niessner

28

slide-29
SLIDE 29

Con Convol

  • lution
  • ns on
  • n I

Images es

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 Output 3x3 5 ⋅ 3 + −1 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 0 + −1 ⋅ 4 = 15 − 9 = 6

  • Prof. Leal-Taixé and Prof. Niessner

29

slide-30
SLIDE 30

Con Convol

  • lution
  • ns on
  • n I

Images es

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 1 Output 3x3 5 ⋅ 2 + −1 ⋅ 2 + −1 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 3 = 10 − 9 = 1

  • Prof. Leal-Taixé and Prof. Niessner

30

slide-31
SLIDE 31

Con Convol

  • lution
  • ns on
  • n I

Images es

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 1 8 Output 3x3 5 ⋅ 1 + −1 ⋅ (−5) + −1 ⋅ (−3) + −1 ⋅ 3 + −1 ⋅ 2 = 5 + 3 = 1

  • Prof. Leal-Taixé and Prof. Niessner

31

slide-32
SLIDE 32

Con Convol

  • lution
  • ns on
  • n I

Images es

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 1 8

  • 7

Output 3x3 5 ⋅ 0 + −1 ⋅ 3 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 3 = 0 − 7 = −7

  • Prof. Leal-Taixé and Prof. Niessner

32

slide-33
SLIDE 33

Con Convol

  • lution
  • ns on
  • n I

Images es

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 1 8

  • 7

9 Output 3x3 5 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 0 = 15 − 6 = 9

  • Prof. Leal-Taixé and Prof. Niessner

33

slide-34
SLIDE 34

Con Convol

  • lution
  • ns on
  • n I

Images es

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 1 8

  • 7

9 2 Output 3x3 5 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 5 + −1 ⋅ 4 + −1 ⋅ 3 = 15 − 13 = 2

  • Prof. Leal-Taixé and Prof. Niessner

34

slide-35
SLIDE 35

Con Convol

  • lution
  • ns on
  • n I

Images es

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 1 8

  • 7

9 2

  • 5

Output 3x3 5 ⋅ 0 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 6 + −1 ⋅ (−2) = −5

  • Prof. Leal-Taixé and Prof. Niessner

35

slide-36
SLIDE 36

Con Convol

  • lution
  • ns on
  • n I

Images es

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 1 8

  • 7

9 2

  • 5
  • 9

Output 3x3 5 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 7 + −1 ⋅ 0 = 5 − 14 = −9

  • Prof. Leal-Taixé and Prof. Niessner

36

slide-37
SLIDE 37

Con Convol

  • lution
  • ns on
  • n I

Images es

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1
  • 1
  • 1

5

  • 1
  • 1

Image 5x5 Kernel 3x3 6 1 8

  • 7

9 2

  • 5
  • 9

3 Output 3x3 5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1 = 20 − 17 = 3

  • Prof. Leal-Taixé and Prof. Niessner

37

slide-38
SLIDE 38

Im Imag age fil ilters

  • Each kernel gives us a different image filter
  • Prof. Leal-Taixé and Prof. Niessner

38

Input Edge detection −1 −1 −1 −1 8 −1 −1 −1 −1 Sharpen −1 −1 5 −1 −1 Box mean 1 9 1 1 1 1 1 1 1 1 1 Gaussian blur 1 16 1 2 1 2 4 2 1 2 1

L E L E T T ’ ’ S S L E L E A A R R N N T T H E H E S S E E F F I I L T L T E E R R S S ! !

slide-39
SLIDE 39

Con Convol

  • lution
  • ns on
  • n RG

RGB I B Images es

Images have depth: e.g., RGB -> 3 channels 32×32×3 image 32 32 3 width height depth 3 5 5 5×5×3 filter Convolve filter with image i.e., ‘slide’ over it and:

  • > apply filter at each location
  • > dot products

De Depth dimensi sion *must st* match; i.e., , filter extends the full depth of the input

  • Prof. Leal-Taixé and Prof. Niessner

39

slide-40
SLIDE 40

Con Convol

  • lution
  • ns on
  • n RG

RGB I B Images es

32 32 3 3 5 5 32×32×3 image (pixels %) 5×5×3 filter (weights &) 1 number: dot product between filter weights & and %( − *ℎ chunk of the image Here: 5 ⋅ 5 ⋅ 3 = 75-dim dot product + bias /( = &0%( + 2

5×5×3 5×5×3 1

  • Prof. Leal-Taixé and Prof. Niessner

40

slide-41
SLIDE 41

Con Convol

  • lution
  • ns on
  • n RG

RGB I B Images es

32 32 3 3 5 5 32×32×3 image (pixels %) 5×5×3 filter (weights &) 1 28 28 activation map (also feature map)

Co Convolve slide over all spatial locations %) and compute all output *); w/o padding, there are 28×28 locations

  • Prof. Leal-Taixé and Prof. Niessner

41

slide-42
SLIDE 42

Convoluti tion Layer

  • Prof. Leal-Taixé and Prof. Niessner

42

slide-43
SLIDE 43

Con Convol

  • lution
  • n L

Layer er

32 32 3 3 5 5 32×32×3 image 5×5×3 filter 1 28 28 activation maps

Co Convolve

1 Le Let’s s apply y a different filter wi with th different t we weights ts!

  • Prof. Leal-Taixé and Prof. Niessner

43

slide-44
SLIDE 44

Con Convol

  • lution
  • n L

Layer er

32 32 3 32×32×3 image 5 28 28 activation maps

Co Convolve

Le Let’s s apply y **fi **five** ** fi filters, ea each wit ith dif iffer eren ent weigh eights! Co Convolution “Layer”

  • Prof. Leal-Taixé and Prof. Niessner

44

slide-45
SLIDE 45

Con Convol

  • lution
  • n L

Layer er

  • A basic layer is defined by

– Filter width and height (depth is implicitly given) – Number of different filter banks (#weight sets)

  • Each filter captures a different image characteristic
  • Prof. Leal-Taixé and Prof. Niessner

45

slide-46
SLIDE 46

Di Different nt filte ters

  • Prof. Leal-Taixé and Prof. Niessner

46

  • Each filter captures different

image characteristics: horizontal edges, vertical edges, circles, squares….

slide-47
SLIDE 47

Dime mensions of a convoluti tion layer

  • Prof. Leal-Taixé and Prof. Niessner

47

slide-48
SLIDE 48

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns

Image 7x7 Input: 7×7 Filter: 3×3 Output: 5×5

  • Prof. Leal-Taixé and Prof. Niessner

48

1

slide-49
SLIDE 49

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns

Image 7x7 Input: 7×7 Filter: 3×3 Output: 5×5

  • Prof. Leal-Taixé and Prof. Niessner

49

2

slide-50
SLIDE 50

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns

Image 7x7 Input: 7×7 Filter: 3×3 Output: 5×5

  • Prof. Leal-Taixé and Prof. Niessner

50

3

slide-51
SLIDE 51

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns

Image 7x7 Input: 7×7 Filter: 3×3 Output: 5×5

  • Prof. Leal-Taixé and Prof. Niessner

51

4

slide-52
SLIDE 52

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns

Image 7x7 Input: 7×7 Filter: 3×3 Output: 5×5

  • Prof. Leal-Taixé and Prof. Niessner

52

5

slide-53
SLIDE 53

Con Convol

  • lution
  • n L

Layers ers: S : Stri ride

Image 7x7 Input: 7×7 Filter: 3×3 Stride: 1 Output: 5×5 With a st stride of 1 Stride of n: apply filter every n-th spatial location; i.e., subsample the image

  • Prof. Leal-Taixé and Prof. Niessner

53

slide-54
SLIDE 54

Con Convol

  • lution
  • n L

Layers ers: S : Stri ride

Image 7x7 Input: 7×7 Filter: 3×3 Stride: 2 Output: 3×3 With a st stride of 2

  • Prof. Leal-Taixé and Prof. Niessner

54

slide-55
SLIDE 55

Con Convol

  • lution
  • n L

Layers ers: S : Stri ride

Image 7x7 Input: 7×7 Filter: 3×3 Stride: 2 Output: 3×3 With a st stride of 2

  • Prof. Leal-Taixé and Prof. Niessner

55

slide-56
SLIDE 56

Con Convol

  • lution
  • n L

Layers ers: S : Stri ride

Image 7x7 Input: 7×7 Filter: 3×3 Stride: 2 Output: 3×3 With a st stride of 2

  • Prof. Leal-Taixé and Prof. Niessner

56

slide-57
SLIDE 57

Con Convol

  • lution
  • n L

Layers ers: S : Stri ride

Image 7x7 Input: 7×7 Filter: 3×3 Stride: 3 Output: ? ? × ? ? With a st stride of 3

  • Prof. Leal-Taixé and Prof. Niessner

57

slide-58
SLIDE 58

Con Convol

  • lution
  • n L

Layers ers: S : Stri ride

Image 7x7 Input: 7×7 Filter: 3×3 Stride: 3 Output: ? ? × ? ? With a st stride of 3

  • Prof. Leal-Taixé and Prof. Niessner

58

slide-59
SLIDE 59

Con Convol

  • lution
  • n L

Layers ers: S : Stri ride

Image 7x7 Input: 7×7 Filter: 3×3 Stride: 3 Output: ? ? × ? ? With a st stride of 3 Does not really fit; remainder left…

  • >

> Illegal st stride for input & filter si size!

  • Prof. Leal-Taixé and Prof. Niessner

59

slide-60
SLIDE 60

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns

In Input ut height of N Input: !×! Filter: #×# Stride: $ Output: (&'(

)

+ 1)×(

&'( )

+ 1) In Input ut widt dth of N ! = 7, # = 3, $ = 1:

2'3 4 + 1 = 5

! = 7, # = 3, $ = 2:

2'3 7 + 1 = 3

! = 7, # = 3, $ = 3:

2'3 3 + 1 = 2.3333

Fractions are illegal

Fi Filt lter wi width of F Fi Filt lter height of F

  • Prof. Leal-Taixé and Prof. Niessner

60

slide-61
SLIDE 61

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns

Shrinking down so quickly (32->28->24->20) is typically not a good idea…

  • Prof. Leal-Taixé and Prof. Niessner

61

32 32 3 28 28 5 24 24 8 Conv + ReLU Conv + ReLU Conv + ReLU 12 5 'ilters 5×5×3 8 'ilters 5×5×5 12 'ilters 5×5×8 Input Image 20

slide-62
SLIDE 62

Con Convol

  • lution
  • n L

Layers ers: P : Padding

  • Prof. Leal-Taixé and Prof. Niessner

62

Why padding:

  • Sizes get small too

quickly

  • Corner pixel is only used
  • nce
slide-63
SLIDE 63

Con Convol

  • lution
  • n L

Layers ers: P : Padding

Image 7x7 + zero padding

  • Prof. Leal-Taixé and Prof. Niessner

63

Why padding:

  • Sizes get small too

quickly

  • Corner pixel is only used
  • nce
slide-64
SLIDE 64

Con Convol

  • lution
  • n L

Layers ers: P : Padding

Most common is ‘zero’ padding Image 7x7 + zero padding Input: 7×7 Filter: 3×3 Padding: 1 Stride: 1 Output: 7×7 Output Size: (

&'(⋅*+,

  • + 1)×(

&'(⋅*+,

  • + 1)
  • Prof. Leal-Taixé and Prof. Niessner

64

slide-65
SLIDE 65

Con Convol

  • lution
  • n L

Layers ers: P : Padding

Set padding to ! =

#$% &

Image 7x7 + zero padding

  • Prof. Leal-Taixé and Prof. Niessner

65

Types of convolutions:

  • Va

Valid id convolution: using no padding

  • Same

me convolution:

  • utput=input size
slide-66
SLIDE 66

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns

Example

  • Prof. Leal-Taixé and Prof. Niessner

66

Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2 Depth of 3 is implicitly given 3 32 32 10 'ilters 5×5×3 Output size is: 32 + 2 ⋅ 2 − 5 1 + 1 = 32 I.e., 32×32×10 3 Remember Output: (

345⋅678 9

+ 1)×(

345⋅678 9

+ 1)

slide-67
SLIDE 67

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns

Example

Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2 3 32 32 10 'ilters 5×5×3 Output size is: 32 + 2 ⋅ 2 − 5 1 + 1 = 32 I.e., 32×32×10 Remember Output: (

345⋅678 9

+ 1)×(

345⋅678 9

+ 1)

  • Prof. Leal-Taixé and Prof. Niessner

67

slide-68
SLIDE 68

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns

Example

Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2 3 32 32 10 'ilters 5×5×3 Number of parameters (weights): Each filter has 5×5×3 + 1 = 76 params (+1 for bias)

  • > 76 ⋅ 10 = 760 params in layer
  • Prof. Leal-Taixé and Prof. Niessner

68

slide-69
SLIDE 69

Con Convol

  • lution
  • n L

Layers ers: D : Dimen ension

  • ns
  • Input is a volume of size !

"#×%"#×&"#

  • Four hyperparameters

– Number of filters ' – Spatial filter extent ( – Stride ) – Zero padding *

  • Output volume is of size !

+,-×%+,-×&+,-

– !

+,- = /012345⋅7 8

+ 1 – %+,- =

;012345⋅7 8

+ 1 – &+,- = '

  • There are ( ⋅ ( ⋅ &"# weights per filter; i.e., a total of ( ⋅ ( ⋅ &"# ⋅ ' weights and ' biases
  • In the output volume, the &-th depth slice of size (!

+,-×%+,-) is the result of the

convolution of the &-th over the input volume with a stride of ), and offset by its bias

Slide by Li/Karpathy/Johnson

Common settings: ' => powers of 2>, e. g. , 32, 64, 128, 512 ( = 3, ) = 1, * = 1 ( = 5, ) = 1, * = 2 ( = 5, ) = 2, * = (OℎQRSTSU VWRX) ( = 1, ) = 1, * = 0

  • Prof. Leal-Taixé and Prof. Niessner

69

slide-70
SLIDE 70

Convoluti tional Neural Netw twork (C (CNN) NN)

  • Prof. Leal-Taixé and Prof. Niessner

70

slide-71
SLIDE 71

CN CNN P Prot rotot

  • type

ConvNet is concatenation of Conv Layers and activations

32 32 3 28 28 5 24 24 8 Conv + ReLU Conv + ReLU Conv + ReLU 12 5 'ilters 5×5×3 8 'ilters 5×5×5 12 'ilters 5×5×8 Input Image 20

  • Prof. Leal-Taixé and Prof. Niessner

71

slide-72
SLIDE 72

CN CNN l lea earn rned ed fi filters ers

  • Prof. Leal-Taixé and Prof. Niessner

72

slide-73
SLIDE 73

CN CNN P Prot rotot

  • type

Slide by Karpathy

  • Prof. Leal-Taixé and Prof. Niessner

73

slide-74
SLIDE 74

Po Pooli ling ng

  • Prof. Leal-Taixé and Prof. Niessner

74

slide-75
SLIDE 75

Po Pooling g Laye yer

Slide by Li/Karpathy/Johnson

  • Prof. Leal-Taixé and Prof. Niessner

75

slide-76
SLIDE 76

Po Pooling g Laye yer: Max x Po Pooling

3 1 3 5 6 7 9 3 2 1 4 2 4 3 6 9 3 4

Single depth slice of input Max pool with 2×2 filters and stride 2 ‘Pooled’ output

  • Prof. Leal-Taixé and Prof. Niessner

76

slide-77
SLIDE 77

Po Pooling g Laye yer

  • Conv Layer = ‘Feature Extraction’

– Computes a feature in a given region

  • Pooling Layer = ‘Feature Selection’

– Picks the strongest activation in a region

  • Prof. Leal-Taixé and Prof. Niessner

77

slide-78
SLIDE 78

Po Pooling g Laye yer

  • Input is a volume of size !

"#×%"#×&"#

  • Four hyperparameters

– Spatial filter extent ' – Stride (

  • Output volume is of size !

)*+×%)*+×&)*+

– !

)*+ =

  • ./01

2

+ 1 – %)*+ =

5./01 2

+ 1 – &)*+ = &"#

  • Does not contain parameters; e.g., its fixed function

Filter count and padding make no sense here

  • Prof. Leal-Taixé and Prof. Niessner

78

slide-79
SLIDE 79

Po Pooling g Laye yer

  • Input is a volume of size !

"#×%"#×&"#

  • Four hyperparameters

– Spatial filter extent ' – Stride (

  • Output volume is of size !

)*+×%)*+×&)*+

– !

)*+ =

  • ./01

2

+ 1 – %)*+ =

5./01 2

+ 1 – &)*+ = &"#

  • Does not contain parameters; e.g., its fixed function

Common settings: ' = 2, S = 2 ' = 3, ( = 2

  • Prof. Leal-Taixé and Prof. Niessner

79

slide-80
SLIDE 80

Po Pooling g Laye yer: Ave verage ge Po Pooling

3 1 3 5 6 7 9 3 2 1 4 2 4 3

2.5 6 1.75 3

Single depth slice of input Max pool with 2×2 filters and stride 2 ‘Pooled’ output

  • Prof. Leal-Taixé and Prof. Niessner

80

  • Typically used deeper in the network
slide-81
SLIDE 81

Con Convol

  • lution
  • nal N

Neu eura ral N Net etwork

  • rk
  • Prof. Leal-Taixé and Prof. Niessner

81

slide-82
SLIDE 82

Fi Final Fu Fully-Con Connec ected ed L Layer er

  • Same as what we had in ‘ordinary’ Neural Networks

– Make the final decision with the extracted features from the convolutions – One or two FC layers typically

  • Prof. Leal-Taixé and Prof. Niessner

82

slide-83
SLIDE 83

Con Convol

  • lution
  • ns v

vs Fu Fully-Con Connec ected ed

  • In contrast to fully-connected layers, we want to

restrict the degrees of freedom

– FC is somewhat brute force – Convolutions are structured

  • Sliding window to with the same filter parameters to

extract image features

– Concept of weight sharing – Extract same features independent of location

  • Prof. Leal-Taixé and Prof. Niessner

83

slide-84
SLIDE 84

Con Convol

  • lution
  • nal N

Neu eura ral N Net etwork

  • rk
  • Turns out that CNNs are similar to the visual cortex:

[Hubel & Wiesel, 59, 62, 68, …]

  • Prof. Leal-Taixé and Prof. Niessner

84

slide-85
SLIDE 85

Backprop th through CNN CNN la layer ers

  • Prof. Leal-Taixé and Prof. Niessner

85

slide-86
SLIDE 86

Ba Backprop

  • p th

through ugh CNN NN Laye yers

32 32 3 3 5 5 32×32×3 image (pixels %) 5×5×3 filter (weights &) 1 number: dot product between filter weights & and %( − *ℎ chunk of the image Here: 5 ⋅ 5 ⋅ 3 = 75-dim dot product + bias /( = &0%( + 2

5×5×3 5×5×3 1

  • Prof. Leal-Taixé and Prof. Niessner

86

slide-87
SLIDE 87

Ba Backprop

  • p th

through ugh CNN NN Laye yers

http://www.jefkine.com/general/2016/09/05/ backpropagation-in-convolutional-neural-networks/

  • Prof. Leal-Taixé and Prof. Niessner

87

slide-88
SLIDE 88

Ba Backprop

  • p th

through ugh CNN NN Laye yers

!"", !"$, !$", !$$ gradient

http://www.jefkine.com/general/2016/09/05/ backpropagation-in-convolutional-neural-networks/

  • Prof. Leal-Taixé and Prof. Niessner

88

slide-89
SLIDE 89

Ba Backprop

  • p th

through ugh CNN NN Laye yers

Input: 16-dim vector Output: 4-dim vector (will be re-shaped as 2 x 2 eventually) ! = Backward pass is simply multiplying with !# [Dumoulin et al. 16]

Task for at home: think it through on a piece of paper J

  • Prof. Leal-Taixé and Prof. Niessner

89

slide-90
SLIDE 90

Ad Admini nistrative Thi hing ngs

  • Next Monday: Deep Learning research projects at TUM
  • Monday June 25th: second CNN lecture (more about

architectures, VGG, Inception)

  • Prof. Leal-Taixé and Prof. Niessner

90