convolutional neural nets ii
play

Convolutional Neural Nets II EECS 442 Prof. David Fouhey Winter - PowerPoint PPT Presentation

Convolutional Neural Nets II EECS 442 Prof. David Fouhey Winter 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/ Previously Backpropagation = + 3 2 x -x -x+3 (-x+3) 2 -n n 2 n+3


  1. Convolutional Neural Nets II EECS 442 – Prof. David Fouhey Winter 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/

  2. Previously – Backpropagation 𝑔 𝑦 = βˆ’π‘¦ + 3 2 x -x -x+3 (-x+3) 2 -n n 2 n+3 1 2x βˆ’ 6 βˆ’2𝑦 + 6 βˆ’2𝑦 + 6 Forward pass: compute function Backward pass: compute derivative of all parts of the function

  3. Setting Up A Neural Net Input Hidden Output h 1 y 1 x 1 h 2 y 2 x 2 h 3 y 3 h 4

  4. Setting Up A Neural Net Input Hidden 2 Output Hidden 1 a 1 h 1 y 1 x 1 a 2 h 2 y 2 x 2 a 3 h 3 y 3 a 4 h 4

  5. Fully Connected Network a 1 h 1 y 1 Each neuron connects x 1 a 2 h 2 to each neuron in the y 2 previous layer x 2 a 3 h 3 y 3 a 4 h 4

  6. Fully Connected Network Define New Block: β€œLinear Layer” (Ok technically it’s Affine) W b 𝑀 𝒐 = 𝑿𝒐 + 𝒄 n L Can get gradient with respect to all the inputs (do on your own; useful trick: have to be able to do matrix multiply)

  7. Fully Connected Network a 1 h 1 y 1 x 1 a 2 h 2 y 2 x 2 a 3 h 3 y 3 a 4 h 4 W 1 b 1 W 2 b 2 W 3 b 3 x L f (n) L f (n) L f (n)

  8. Convolutional Layer New Block: 2D Convoluiton W b 𝐷 𝒐 = 𝒐 βˆ— 𝑿 + 𝒄 n C

  9. Convolution Layer F w 32 F h c 𝐺 β„Ž 𝐺 𝑑 π‘₯ 𝑐 + ෍ ෍ ෍ 𝐺 𝑗,π‘˜,𝑙 βˆ— 𝐽 𝑧+𝑗,𝑦+π‘˜,𝑑 32 𝑗=1 π‘˜=1 𝑙=1 3 Slide credit: Karpathy and Fei-Fei

  10. Convolutional Neural Network (CNN) W 1 b 1 W 2 b 2 W 3 b 3 x C f (n) C f (n) C f (n)

  11. Today C W F 1 CNN 1 H Convert HxW image into a F-dimensional vector β€’ What’s the probability this image is a cat (F=1) β€’ Which of 1000 categories is this image? (F=1000) β€’ At what GPS coord was this image taken? (F=2) β€’ Identify the X,Y coordinates of 28 body joints of an image of a human (F=56)

  12. Today’s Running Example: Classification C W F 1 CNN 1 H Running example: image classification P(image is class #1) P(image is class #2) P(image is class #F)

  13. Today’s Running Example: Classification C W 1 CNN 1 0.5 0.2 0.1 0.2 H y i : class #0 Loss function β€œHippo” exp( 𝑋𝑦 𝑧 𝑗 βˆ’ log Οƒ 𝑙 exp( 𝑋𝑦 𝑙 ))

  14. Today’s Running Example: Classification C W 1 CNN 1 0.5 0.2 0.1 0.2 H y i : class #3 Loss function β€œBaboon” exp( 𝑋𝑦 𝑧 𝑗 βˆ’ log Οƒ 𝑙 exp( 𝑋𝑦 𝑙 ))

  15. Model For Your Head C W F 1 CNN 1 H β€’ Provide: β€’ Examples of images and desired outputs β€’ Sequence of layers producing a 1x1xF output β€’ A loss function that measures success β€’ Train the network -> network figures out the parameters that makes this work

  16. Layer Collection You can construct functions out of layers. The only requirement is the layers β€œfit” together. Optimization figures out what the parameters of the layers are. Image credit: lego.com

  17. Review – Pooling Idea: just want spatial resolution of activations / images smaller; applied per-channel Max-pool 1 1 2 4 2x2 Filter 5 6 7 8 6 8 Stride 2 3 2 1 0 3 4 1 1 3 4 Slide credit: Karpathy and Fei-Fei

  18. Review – Pooling Max-pool 2x2 Filter Stride 2 1 1 2 4 6 8 5 6 7 8 3 2 1 0 3 4 1 1 3 4

  19. Other Layers – Fully Connected 1x1xC 1x1xF Map C-dimensional feature to F-dimensional feature using linear transformation W (FxC matrix) + b (Fx1 vector) How can we write this as a convolution?

  20. Everything’s a Convolution 1x1xC 1x1xF Set Fh=1, Fw=1 1x1 Convolution with F Filters 𝐺 β„Ž 𝐺 𝑑 𝑑 π‘₯ 𝑐 + ෍ ෍ ෍ 𝐺 𝑗,π‘˜,𝑙 βˆ— 𝐽 𝑧+𝑗,𝑦+π‘˜,𝑑 𝑐 + ෍ 𝐺 𝑙 βˆ— 𝐽 𝑑 𝑗=1 π‘˜=1 𝑙=1 𝑙=1

  21. Converting to a Vector HxWxC 1x1xF How can we do this?

  22. Converting to a Vector* – Pool HxWxC 1x1xF Avg Pool 1 1 2 4 HxW Filter 5 6 7 8 Stride 1 3.1 3 2 1 0 1 1 3 4 *(If F == C)

  23. Converting to a Vector – Convolve HxWxC 1x1xF HxW Convolution with F Filters Single value βˆ— Per-filter

  24. Looking At Networks β€’ We’ll look at 3 landmark networks, each trained to solve a 1000-way classification output (Imagenet) β€’ Alexnet (2012) β€’ VGG-16 (2014) β€’ Resnet (2015)

  25. AlexNet Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 Each block is a HxWxC volume. You transform one volume to another with convolution

  26. CNN Terminology Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 Each entry is called an β€œactivation”/β€œneuron”/β€œfeature”

  27. AlexNet Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384

  28. AlexNet Input Conv 1 227x227 55x55 55x55 227x227 55x55 3 96 96 3 96 ReLU 11x11 filter, stride of 4 (227-11)/4+1 = 55

  29. AlexNet Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 All layers followed by ReLU Red layers are followed by maxpool Early layers have β€œnormalization”

  30. AlexNet – Details Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 C: 11 C:5 C:3 C:3 C:3 P: 3 P:3 P:3 C: Size of conv P: Size of pool

  31. AlexNet Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 13x13 Input, 1x1 output. How?

  32. Alexnet – How Many Parameters? Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384

  33. Alexnet – How Many Parameters? Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 96 11x11 filters on 3 -channel input 11x11 x 3 x 96+96 = 34,944

  34. Alexnet – How Many Parameters? Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 Note: max pool to 6x6 4096 6x6 filters on 256 -channel input 6x6 x 256 x 4096+4096 = 38 million

  35. Alexnet – How Many Parameters? Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 4096 1x1 filters on 4096 -channel input 1x1 x 4096 x 4096+4096 = 17 million

  36. Alexnet – How Many Parameters How long would it take you to list the parameters of Alexnet at 4s / parameter? 1 year? 4 years? 8 years? 16 years? β€’ 62.4 million parameters β€’ Vast majority in fully connected layers β€’ But... paper notes that removing the convolutions is disastrous for performance.

  37. Dataset – ILSVRC β€’ Imagenet Largescale Visual Recognition Challenge β€’ 1000 Categories β€’ 1.4M images

  38. Dataset – ILSVRC Figure Credit: O. Russakovsky

  39. Visualizing Filters Input Conv 1 227x227 55x55 3 96 Conv 1 Filters β€’ Q. How many input dimensions? β€’ A: 3 β€’ What does the input mean? β€’ R, G, B, duh.

  40. What’s Learned First layer filters of a network trained to distinguish 1000 categories of objects Remember these filters go over color. Figure Credit: Karpathy and Fei-Fei

  41. Visualizing Later Filters Input Conv Conv 1 2 227x227 55x55 27x27 3 96 256 Conv 2 Filters β€’ Q. How many input dimensions? β€’ A: 96…. hmmm β€’ What does the input mean? β€’ Uh, the uh, previous slide

  42. Visualizing Later Filters β€’ Understanding the meaning of the later filters from their values is typically impossible: too many input dimensions, not even clear what the input means.

  43. Understanding Later Filters Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 CNN that extracts a 2-hidden layer 13x13x256 output Neural network

  44. Understanding Later Filters Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 CNN that extracts a 1-hidden 1x1x4096 feature layer NN

  45. Understanding Later Filters Input Conv Conv Conv Conv Conv 1 2 3 4 5 227x227 55x55 27x27 13x13 13x13 13x13 256 3 96 256 384 384 CNN that extracts a 13x13x256 output

  46. Understanding Later Filters Feed an image in, see what score the filter gives it. A more pleasant version of a real neuroscience procedure. 13x13 256 Which one’s bigger? What image makes the output biggest? 13x13 256

  47. Figure Credit: Girschick et al. CVPR 2014.

  48. What’s Up With the White Boxes? 3 384 13 227 227 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend