Lectu ture 8 Recap
- Prof. Leal-Taixé and Prof. Niessner
1
Lectu ture 8 Recap Prof. Leal-Taix and Prof. Niessner 1 Wh What - - PowerPoint PPT Presentation
Lectu ture 8 Recap Prof. Leal-Taix and Prof. Niessner 1 Wh What d do w we k know so so fa far? r? Width Depth Prof. Leal-Taix and Prof. Niessner 2 Wh What d do w we k know so so fa far? r? Activation Functions
1
Depth Width
2
Activation Functions (non-linearities)
Sigmoid: ! " =
$ ($&'())
tanh: tanh " ReLU: max 0, " Leaky ReLU: max 0.1", "
3
!" #" !$ #$ %
*−1 + 1 # )* ∗ ∗ + 2.00 −1.00 −2.00 −3.00 −2.00 6.00 +1 4.00 −3.00 −1.00 1.00 0.37 1.37 0.73 1.00 −0.53 −0.53 −0.20 0.20 0.20 0.20 0.20 0.20 −0.20 −0.39 −0.39 −0.59
Backpropagation
4
Batchnorm
D = #features N = mini-batch size
SGD Variations (Momentum, etc.)
5
– Why not just go deeper and get better? – No structure!! – It’s just brute force! – Optimization becomes hard – Performance plateaus / drops!
6
7
Credit: Li/Karpathy/Johnson
8
9
3 5
3 neuron layer
5 weights 5
10
3 5
3 neuron layer
25 weights
For the whole 5x5 image on
5
11
3 5
3 neuron layer
75 weights
For the whole 5x5 image on the three channels
5
12
3 5 5
3 neuron layer
75 weights
For the whole 5x5 image on the three channels pe per r ne neuron
75 weights 75 weights
normal mal image with FC layers
13
3 1000 1000
3 neuron layer
normal mal image with FC layers
14
3 1000 1000
1000 neuron layer
3 $%&&%'( weights
IM IMPRACTIC TICAL
– FC is somewhat brute force – We want a layer with structure – Weight sharing à using the same weights for different parts of the image
15
16
! ∗ # = %
&' '
! ( # ) − ( +( ! = red # = blue ! ∗ # = green Convolution of two box functions Convolution of two Gaussians application of a filter to a function the ‘smaller’ one is typically called the filter kernel
17
4 3 2
3 5 2 5 5 6 ! Discrete case: box filter 1/3 1/3 1/3 " ‘Slide’ filter kernel from left to right; at each position, compute a single value in the output data
18
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 ! " ! ∗ " Discrete case: box filter 4 ⋅ 1 3 + 3 ⋅ 1 3 + 2 ⋅ 1 3 = 3
19
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 ! " ! ∗ " Discrete case: box filter 3 ⋅ 1 3 + 2 ⋅ 1 3 + (−5) ⋅ 1 3 = 0
20
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 ! " ! ∗ " Discrete case: box filter 2 ⋅ 1 3 + (−5) ⋅ 1 3 + 3 ⋅ 1 3 = 0
21
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 1 ! " ! ∗ " Discrete case: box filter (−5) ⋅ 1 3 + 3 ⋅ 1 3 + 5 ⋅ 1 3 = 1
22
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 ! " ! ∗ " Discrete case: box filter 3 ⋅ 1 3 + 5 ⋅ 1 3 + 2 ⋅ 1 3 = 10 3
23
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 ! " ! ∗ " Discrete case: box filter 5 ⋅ 1 3 + 2 ⋅ 1 3 + 5 ⋅ 1 3 = 4
24
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 4 ! " ! ∗ " Discrete case: box filter 2 ⋅ 1 3 + 5 ⋅ 1 3 + 5 ⋅ 1 3 = 4
25
4 3 2
3 5 2 5 5 6 1/3 1/3 1/3 3 1 10/3 4 4 16/3 ! " ! ∗ " Discrete case: box filter 5 ⋅ 1 3 + 5 ⋅ 1 3 + 6 ⋅ 1 3 = 16 3
26
4 3 2
3 5 2 5 5 6 ?? 3 1 10/3 4 4 16/3 ?? Discrete case: box filter 1/3 1/3 1/3 Wh What t to d do a at b boundaries?
27
4 3 2
3 5 2 5 5 6 ?? 3 1 10/3 4 4 16/3 ?? Discrete case: box filter 1/3 1/3 1/3 Wh What t to d do a at b boundaries? 3 1 10/3 4 4 16/3 1) Shrink 2) Pad
7/3 3 1 10/3 4 4 16/3 11/3
28
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
Image 5x5 Kernel 3x3 6 Output 3x3 5 ⋅ 3 + −1 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 0 + −1 ⋅ 4 = 15 − 9 = 6
29
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
Image 5x5 Kernel 3x3 6 1 Output 3x3 5 ⋅ 2 + −1 ⋅ 2 + −1 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 3 = 10 − 9 = 1
30
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
Image 5x5 Kernel 3x3 6 1 8 Output 3x3 5 ⋅ 1 + −1 ⋅ (−5) + −1 ⋅ (−3) + −1 ⋅ 3 + −1 ⋅ 2 = 5 + 3 = 1
31
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
Image 5x5 Kernel 3x3 6 1 8
Output 3x3 5 ⋅ 0 + −1 ⋅ 3 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 3 = 0 − 7 = −7
32
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
Image 5x5 Kernel 3x3 6 1 8
9 Output 3x3 5 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 0 = 15 − 6 = 9
33
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
Image 5x5 Kernel 3x3 6 1 8
9 2 Output 3x3 5 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 5 + −1 ⋅ 4 + −1 ⋅ 3 = 15 − 13 = 2
34
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
Image 5x5 Kernel 3x3 6 1 8
9 2
Output 3x3 5 ⋅ 0 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 6 + −1 ⋅ (−2) = −5
35
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
Image 5x5 Kernel 3x3 6 1 8
9 2
Output 3x3 5 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 7 + −1 ⋅ 0 = 5 − 14 = −9
36
3 2
3 4 3 2 1
1 3 3 5
1 4 4 5 6 7 9
5
Image 5x5 Kernel 3x3 6 1 8
9 2
3 Output 3x3 5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1 = 20 − 17 = 3
37
38
Input Edge detection −1 −1 −1 −1 8 −1 −1 −1 −1 Sharpen −1 −1 5 −1 −1 Box mean 1 9 1 1 1 1 1 1 1 1 1 Gaussian blur 1 16 1 2 1 2 4 2 1 2 1
Images have depth: e.g., RGB -> 3 channels 32×32×3 image 32 32 3 width height depth 3 5 5 5×5×3 filter Convolve filter with image i.e., ‘slide’ over it and:
De Depth dimensi sion *must st* match; i.e., , filter extends the full depth of the input
39
32 32 3 3 5 5 32×32×3 image (pixels %) 5×5×3 filter (weights &) 1 number: dot product between filter weights & and %( − *ℎ chunk of the image Here: 5 ⋅ 5 ⋅ 3 = 75-dim dot product + bias /( = &0%( + 2
5×5×3 5×5×3 1
40
32 32 3 3 5 5 32×32×3 image (pixels %) 5×5×3 filter (weights &) 1 28 28 activation map (also feature map)
Co Convolve slide over all spatial locations %) and compute all output *); w/o padding, there are 28×28 locations
41
42
32 32 3 3 5 5 32×32×3 image 5×5×3 filter 1 28 28 activation maps
Co Convolve
1 Le Let’s s apply y a different filter wi with th different t we weights ts!
43
32 32 3 32×32×3 image 5 28 28 activation maps
Co Convolve
Le Let’s s apply y **fi **five** ** fi filters, ea each wit ith dif iffer eren ent weigh eights! Co Convolution “Layer”
44
– Filter width and height (depth is implicitly given) – Number of different filter banks (#weight sets)
45
46
image characteristics: horizontal edges, vertical edges, circles, squares….
47
Image 7x7 Input: 7×7 Filter: 3×3 Output: 5×5
48
1
Image 7x7 Input: 7×7 Filter: 3×3 Output: 5×5
49
2
Image 7x7 Input: 7×7 Filter: 3×3 Output: 5×5
50
3
Image 7x7 Input: 7×7 Filter: 3×3 Output: 5×5
51
4
Image 7x7 Input: 7×7 Filter: 3×3 Output: 5×5
52
5
Image 7x7 Input: 7×7 Filter: 3×3 Stride: 1 Output: 5×5 With a st stride of 1 Stride of n: apply filter every n-th spatial location; i.e., subsample the image
53
Image 7x7 Input: 7×7 Filter: 3×3 Stride: 2 Output: 3×3 With a st stride of 2
54
Image 7x7 Input: 7×7 Filter: 3×3 Stride: 2 Output: 3×3 With a st stride of 2
55
Image 7x7 Input: 7×7 Filter: 3×3 Stride: 2 Output: 3×3 With a st stride of 2
56
Image 7x7 Input: 7×7 Filter: 3×3 Stride: 3 Output: ? ? × ? ? With a st stride of 3
57
Image 7x7 Input: 7×7 Filter: 3×3 Stride: 3 Output: ? ? × ? ? With a st stride of 3
58
Image 7x7 Input: 7×7 Filter: 3×3 Stride: 3 Output: ? ? × ? ? With a st stride of 3 Does not really fit; remainder left…
> Illegal st stride for input & filter si size!
59
In Input ut height of N Input: !×! Filter: #×# Stride: $ Output: (&'(
)
+ 1)×(
&'( )
+ 1) In Input ut widt dth of N ! = 7, # = 3, $ = 1:
2'3 4 + 1 = 5
! = 7, # = 3, $ = 2:
2'3 7 + 1 = 3
! = 7, # = 3, $ = 3:
2'3 3 + 1 = 2.3333
Fractions are illegal
Fi Filt lter wi width of F Fi Filt lter height of F
60
Shrinking down so quickly (32->28->24->20) is typically not a good idea…
61
32 32 3 28 28 5 24 24 8 Conv + ReLU Conv + ReLU Conv + ReLU 12 5 'ilters 5×5×3 8 'ilters 5×5×5 12 'ilters 5×5×8 Input Image 20
62
Why padding:
quickly
Image 7x7 + zero padding
63
Why padding:
quickly
Most common is ‘zero’ padding Image 7x7 + zero padding Input: 7×7 Filter: 3×3 Padding: 1 Stride: 1 Output: 7×7 Output Size: (
&'(⋅*+,
&'(⋅*+,
64
Set padding to ! =
#$% &
Image 7x7 + zero padding
65
Types of convolutions:
Valid id convolution: using no padding
me convolution:
Example
66
Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2 Depth of 3 is implicitly given 3 32 32 10 'ilters 5×5×3 Output size is: 32 + 2 ⋅ 2 − 5 1 + 1 = 32 I.e., 32×32×10 3 Remember Output: (
345⋅678 9
+ 1)×(
345⋅678 9
+ 1)
Example
Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2 3 32 32 10 'ilters 5×5×3 Output size is: 32 + 2 ⋅ 2 − 5 1 + 1 = 32 I.e., 32×32×10 Remember Output: (
345⋅678 9
+ 1)×(
345⋅678 9
+ 1)
67
Example
Input image: 32×32×3 10 filters 5×5 Stride 1 Pad 2 3 32 32 10 'ilters 5×5×3 Number of parameters (weights): Each filter has 5×5×3 + 1 = 76 params (+1 for bias)
68
"#×%"#×&"#
– Number of filters ' – Spatial filter extent ( – Stride ) – Zero padding *
+,-×%+,-×&+,-
– !
+,- = /012345⋅7 8
+ 1 – %+,- =
;012345⋅7 8
+ 1 – &+,- = '
+,-×%+,-) is the result of the
convolution of the &-th over the input volume with a stride of ), and offset by its bias
Slide by Li/Karpathy/Johnson
Common settings: ' => powers of 2>, e. g. , 32, 64, 128, 512 ( = 3, ) = 1, * = 1 ( = 5, ) = 1, * = 2 ( = 5, ) = 2, * = (OℎQRSTSU VWRX) ( = 1, ) = 1, * = 0
69
70
ConvNet is concatenation of Conv Layers and activations
32 32 3 28 28 5 24 24 8 Conv + ReLU Conv + ReLU Conv + ReLU 12 5 'ilters 5×5×3 8 'ilters 5×5×5 12 'ilters 5×5×8 Input Image 20
71
72
Slide by Karpathy
73
74
Slide by Li/Karpathy/Johnson
75
3 1 3 5 6 7 9 3 2 1 4 2 4 3 6 9 3 4
Single depth slice of input Max pool with 2×2 filters and stride 2 ‘Pooled’ output
76
– Computes a feature in a given region
– Picks the strongest activation in a region
77
"#×%"#×&"#
– Spatial filter extent ' – Stride (
)*+×%)*+×&)*+
– !
)*+ =
2
+ 1 – %)*+ =
5./01 2
+ 1 – &)*+ = &"#
Filter count and padding make no sense here
78
"#×%"#×&"#
– Spatial filter extent ' – Stride (
)*+×%)*+×&)*+
– !
)*+ =
2
+ 1 – %)*+ =
5./01 2
+ 1 – &)*+ = &"#
Common settings: ' = 2, S = 2 ' = 3, ( = 2
79
3 1 3 5 6 7 9 3 2 1 4 2 4 3
2.5 6 1.75 3
Single depth slice of input Max pool with 2×2 filters and stride 2 ‘Pooled’ output
80
81
– Make the final decision with the extracted features from the convolutions – One or two FC layers typically
82
restrict the degrees of freedom
– FC is somewhat brute force – Convolutions are structured
extract image features
– Concept of weight sharing – Extract same features independent of location
83
[Hubel & Wiesel, 59, 62, 68, …]
84
85
32 32 3 3 5 5 32×32×3 image (pixels %) 5×5×3 filter (weights &) 1 number: dot product between filter weights & and %( − *ℎ chunk of the image Here: 5 ⋅ 5 ⋅ 3 = 75-dim dot product + bias /( = &0%( + 2
5×5×3 5×5×3 1
86
http://www.jefkine.com/general/2016/09/05/ backpropagation-in-convolutional-neural-networks/
87
!"", !"$, !$", !$$ gradient
http://www.jefkine.com/general/2016/09/05/ backpropagation-in-convolutional-neural-networks/
88
Input: 16-dim vector Output: 4-dim vector (will be re-shaped as 2 x 2 eventually) ! = Backward pass is simply multiplying with !# [Dumoulin et al. 16]
Task for at home: think it through on a piece of paper J
89
architectures, VGG, Inception)
90