le lecture 9 9 r recap ap
play

Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix - PowerPoint PPT Presentation

Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 What Wh at ar are e Con onvol olution ons? " = % !" = red = blue = green


  1. Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taixé 1

  2. What Wh at ar are e Con onvol olution ons? " 𝑔 ∗ 𝑕 = % 𝑔 𝜐 𝑕 𝑢 − 𝜐 𝑒𝜐 !" 𝑔 = red 𝑕 = blue 𝑔 ∗ 𝑕 = green Convolution of two box functions Convolution of two Gaussians application of a filter to a function the ‘smaller’ one is typically called the filter kernel I2DL: Prof. Niessner, Prof. Leal-Taixé 2

  3. Wh What at ar are e Con onvol olution ons? Discrete case: box filter 4 3 2 -5 3 5 2 5 5 6 1/3 1/ 1/ 1/3 1/3 1/ ?? ?? 3 0 0 1 10 10/3 4 4 16/3 16 ?? ?? Wh What t to d do a at b boundaries? 1) Shrink 3 0 0 1 10/3 10 4 4 16 16/3 2) Pad 7/3 7/3 3 0 0 1 10 10/3 4 4 16 16/3 11/ 11/3 often ‘0’ I2DL: Prof. Niessner, Prof. Leal-Taixé 3

  4. Convolutions on Im Images -5 3 2 -5 3 Image 5x5 4 3 2 1 -3 1 0 3 3 5 -2 0 1 4 4 Output 3x3 6 1 8 5 6 7 9 -1 -7 9 2 -5 -9 3 Kernel 3x3 0 -1 0 -1 5 -1 5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1 = 20 − 17 = 3 0 -1 0 I2DL: Prof. Niessner, Prof. Leal-Taixé 4

  5. Im Image Filters • Each kernel gives us a different image filter N THESE FILTERS! Box mean Edge detection 1 1 1 −1 −1 −1 1 Input 1 1 1 −1 8 −1 9 1 1 1 −1 −1 −1 ARN LET’S LEAR Sharpen Gaussian blur 0 −1 0 1 2 1 1 −1 5 −1 2 4 2 16 0 −1 0 1 2 1 I2DL: Prof. Niessner, Prof. Leal-Taixé 5

  6. Convolutions on RGB Im Images 32×32×3 image (pixels 𝑦 ) activation map 5×5×3 filter (weights 𝑥 ) (also feature map) Co Convolve 28 32 5 5 slide over all spatial locations 𝑦 # 3 and compute all output 𝑨 # ; 28 w/o padding, there are 32 1 28×28 locations 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 6

  7. Convol Con olution ion Layer er 32×32×3 image 5×5×3 filter activation maps Convolve Co 28 32 5 5 3 Le Let’s s apply y a different filter wi with th different t we weights ts! 28 1 32 1 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 7

  8. Con Convol olution ion Layer er 32×32×3 image Convolution “Layer” Co activation maps Convolve Co 32 28 Let’s Le s apply y **fi **five** ** fi filters, ea each wit ith dif iffer eren ent weigh eights! 28 32 5 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 8

  9. Con Convol olution ion Layer ers: Dimen imension ions In Input ut widt dth of N ht height Input: 𝑂×𝑂 Filter: 𝐺×𝐺 Filter he ut height of N Stride: 𝑇 $!% $!% of F Output: ( + 1)×( + 1) of Fi & & Fi Filter width h of F of Input '!( 𝑂 = 7, 𝐺 = 3, 𝑇 = 1: ) + 1 = 5 In '!( 𝑂 = 7, 𝐺 = 3, 𝑇 = 2: * + 1 = 3 '!( 𝑂 = 7, 𝐺 = 3, 𝑇 = 3: ( + 1 = 2.3333 Fractions are illegal I2DL: Prof. Niessner, Prof. Leal-Taixé 9

  10. Con Convol olution ion Layer ers: Paddin ing Types of convolutions: 0 0 0 0 0 0 0 0 0 Image 7x7 + zero padding 0 0 0 0 • Val alid convolution: using 0 0 no padding 0 0 0 0 • Sam ame convolution: 0 0 output=input size 0 0 0 0 0 0 0 0 0 0 0 %!) Set padding to 𝑄 = * I2DL: Prof. Niessner, Prof. Leal-Taixé 10

  11. Con Convol olution ion La Layers: : Di Dimensions KLM⋅NOP KLM⋅NOP Output = Remember: Ou + 1 × + 1 Q Q ion is used RE REMARK RK : in practice, typically in integer div ivis ision (i.e., apply the fl floor–op operator or !) Example: 3x3 conv with same padding and strides of 2 on an 64x64 RGB image -> N = 64, F = 3, P = 1, S = 2 !"#$⋅&'( !"#$⋅&'( Output: + 1 × + 1 $ $ = 𝑔𝑚𝑝𝑝𝑠 32.5 × 𝑔𝑚𝑝𝑝𝑠 32.5 = 32× 32 I2DL: Prof. Niessner, Prof. Leal-Taixé 11

  12. CN CNN Lea earned ed Fil ilter ers I2DL: Prof. Niessner, Prof. Leal-Taixé 12

  13. CN CNN Prot otot otype Slide by Karpathy I2DL: Prof. Niessner, Prof. Leal-Taixé 13

  14. Pooli Po ling g Laye yer: Max x Po Pooli ling Single depth slice of input ‘Pooled’ output 3 1 3 5 Max pool with 2×2 filters and stride 2 6 9 6 0 7 9 3 4 3 2 1 4 0 2 4 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 14

  15. Rec Recep eptive e Fiel eld • Spatial extent of the connectivity of a convolutional filter 3x3 output 5x5 receptive field on the original input: 7x7 input one output value is connected to 25 input pixels I2DL: Prof. Niessner, Prof. Leal-Taixé 15

  16. Le Lecture 10 10 – CNNs CNNs (p (part 2) 2) I2DL: Prof. Niessner, Prof. Leal-Taixé 16

  17. Cla lassic Architectures I2DL: Prof. Niessner, Prof. Leal-Taixé 17

  18. Le LeNet • Digit recognition: 10 classes Input: 32 × 32 grayscale images This one: Labeled as class “7” [LeCun et al. ’98] LeNet I2DL: Prof. Niessner, Prof. Leal-Taixé 18

  19. Le LeNet • Digit recognition: 10 classes • Valid convolution: size shrinks • How many conv filters are there in the first layer? 6 I2DL: Prof. Niessner, Prof. Leal-Taixé 19

  20. Le LeNet • Digit recognition: 10 classes • At that time average pooling was used, now max pooling is much more common I2DL: Prof. Niessner, Prof. Leal-Taixé 20

  21. Le LeNet • Digit recognition: 10 classes • Again valid convolutions, how many filters? I2DL: Prof. Niessner, Prof. Leal-Taixé 21

  22. Le LeNet • Digit recognition: 10 classes • Use of tanh/sigmoid activations à not common now! I2DL: Prof. Niessner, Prof. Leal-Taixé 22

  23. Le LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv -> Pool -> Conv -> FC I2DL: Prof. Niessner, Prof. Leal-Taixé 23

  24. Le LeNet 60k parameters • Digit recognition: 10 classes • Conv -> Pool -> Conv -> Pool -> Conv -> FC • As we go deeper: Width, Height Number of Filters I2DL: Prof. Niessner, Prof. Leal-Taixé 24

  25. Te Test t Benchm nchmarks ks Net Dataset: • Imag ageNe ImageNet Large Scale Visual Recognition Competition (ILSVRC) [Russakovsky et al., IJCV’15] “ImageNet Large Scale Visual Recognition Challenge.“ I2DL: Prof. Niessner, Prof. Leal-Taixé 25

  26. Common Per Common erfor orma mance e Met etric ics • To Top-1 1 score: check if a sample’s top class (i.e. the one with highest probability) is the same as its target label • To Top-5 s 5 scor ore: e: check if your label is in your 5 first predictions (i.e. predictions with 5 highest probabilities) • → To Top-5 er 5 error or : percentage of test samples for which the correct class was not in the top 5 predicted classes I2DL: Prof. Niessner, Prof. Leal-Taixé 26

  27. Al AlexNet • Cut ImageNet error down in half Non-CNN CNN I2DL: Prof. Niessner, Prof. Leal-Taixé 27

  28. Al AlexNet [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 28

  29. Al AlexNet • First filter with stride 4 to reduce size significantly • 96 filters [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 29

  30. Al AlexNet • Use of same convolutions • Use of same convolutions • As with LeNet: Width, Height Number of Filters • As with LeNet, Width, height Number of filters [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 30

  31. Al AlexNet [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 31

  32. Al AlexNet • Softmax for 1000 classes [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 32

  33. Al AlexNet • Similar to LeNet but much bigger (~1000 times) • Use of ReLU instead of tanh/sigmoid 60M parameters [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 33

  34. VG VGGNet et • Striving for simplicity • CONV = 3x3 filters with stride 1, same convolutions • MAXPOOL = 2x2 filters with stride 2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 34

  35. VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 • 2 consecutive convolutional layers, each one with 64 filters • What is the output size? [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 35

  36. VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 36

  37. VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 37

  38. VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 • Number of filters is multiplied by 2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 38

  39. VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 39

  40. VG VGGNet et • Conv -> Pool -> Conv -> Pool -> Conv -> FC • As we go deeper: Width, Height Number of Filters • Called VGG-16: 16 layers that have weights 138M parameters • Large but simplicity makes it appealing [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 40

  41. VG VGGNet et • A lot of architectures were analyzed [Simonyan and Zisserman 2014] I2DL: Prof. Niessner, Prof. Leal-Taixé 41

  42. Sk Skip p Con Connecti ection ons I2DL: Prof. Niessner, Prof. Leal-Taixé 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend