convolutional networks ii

Convolutional Networks II Bhiksha Raj 1 Story so far Pattern - PowerPoint PPT Presentation

Deep Neural Networks Convolutional Networks II Bhiksha Raj 1 Story so far Pattern classification tasks such as does this picture contain a cat, or does this recording include HELLO are best performed by scanning for the


  1. Supervising the neocognitron Output class label(s) β€’ Add an extra decision layer after the final C layer – Produces a class-label output β€’ We now have a fully feed forward MLP with shared parameters – All the S-cells within an S-plane have the same weights β€’ Simple backpropagation can now train the S-cell weights in every plane of every layer – C-cells are not updated

  2. Scanning vs. multiple filters β€’ Note : The original Neocognitron actually uses many identical copies of a neuron in each S and C plane

  3. Supervising the neocognitron Output class label(s) β€’ The Math – Assuming square receptive fields, rather than elliptical ones – Receptive field of S cells in lth layer is 𝐿 π‘š Γ— 𝐿 π‘š – Receptive field of C cells in lth layer is 𝑀 π‘š Γ— 𝑀 π‘š

  4. Supervising the neocognitron Output class label(s) 𝑳 π’Ž 𝑳 π’Ž 𝑽 𝑻,π’Ž,𝒐 𝒋, π’Œ = 𝝉 ෍ ෍ ෍ 𝒙 𝑻,π’Ž,𝒐 (π‘ž, 𝑙, π‘š)𝑽 𝑫,π’Žβˆ’πŸ,𝒒 (𝑗 + π‘š, π‘˜ + 𝑙) 𝒒 𝑙=1 π‘š=1 𝑽 𝑫,π’Ž,𝒐 𝒋, π’Œ = π‘™βˆˆ 𝑗,𝑗+𝑀 π‘š ,π‘˜βˆˆ(π‘š,π‘š+𝑀 π‘š ) 𝑽 𝑻,π’Ž,𝒐 𝒋, π’Œ max β€’ This is, however, identical to β€œscanning” (convolving) with a single neuron/filter (what LeNet actually did)

  5. Convolutional Neural Networks

  6. The general architecture of a convolutional neural network Output Multi-layer Perceptron β€’ A convolutional neural network comprises of β€œconvolutional” and β€œdown - sampling” layers – The two may occur in any sequence, but typically they alternate β€’ Followed by an MLP with one or more layers

  7. The general architecture of a convolutional neural network Output Multi-layer Perceptron β€’ A convolutional neural network comprises of β€œconvolutional” and β€œ downsampling ” layers – The two may occur in any sequence, but typically they alternate β€’ Followed by an MLP with one or more layers

  8. The general architecture of a convolutional neural network Output Multi-layer Perceptron β€’ Convolutional layers and the MLP are learnable – Their parameters must be learned from training data for the target classification task β€’ Down-sampling layers are fixed and generally not learnable

  9. A convolutional layer Maps Previous layer β€’ A convolutional layer comprises of a series of β€œmaps” – Corresponding the β€œS - planes” in the Neocognitron – Variously called feature maps or activation maps

  10. A convolutional layer Previous Previous layer layer β€’ Each activation map has two components – A linear map, obtained by convolution over maps in the previous layer β€’ Each linear map has, associated with it, a learnable filter – An activation that operates on the output of the convolution

  11. A convolutional layer Previous Previous layer layer β€’ All the maps in the previous layer contribute to each convolution

  12. A convolutional layer Previous Previous layer layer β€’ All the maps in the previous layer contribute to each convolution – Consider the contribution of a single map

  13. What is a convolution Example 5x5 image with binary pixels Example 3x3 filter bias 1 1 1 0 0 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 1 1 0 1 0 0 1 1 0 3 3 𝑨 𝑗, π‘˜ = ෍ ෍ 𝑔 𝑙, π‘š 𝐽 𝑗 + π‘š, π‘˜ + 𝑙 + 𝑐 0 1 1 0 0 𝑙=1 π‘š=1 β€’ Scanning an image with a β€œfilter” – Note: a filter is really just a perceptron, with weights and a bias

  14. What is a convolution 1 0 1 0 0 1 0 bias 1 0 1 Filter Input Map β€’ Scanning an image with a β€œfilter” – At each location, the β€œfilter and the underlying map values are multiplied component wise, and the products are added along with the bias

  15. The β€œStride” between adjacent scanned locations need not be 1 1 0 1 0 1 1 1 0 0 x1 x0 x1 0 1 0 bias 4 0 1 1 1 0 1 0 1 x0 x1 x0 Filter 0 0 1 1 1 x1 x0 x1 0 0 1 1 0 0 1 1 0 0 β€’ Scanning an image with a β€œfilter” – The filter may proceed by more than 1 pixel at a time – E.g. with a β€œstride” of two pixels per shift

  16. The β€œStride” between adjacent scanned locations need not be 1 1 0 1 0 1 1 1 0 0 x1 x0 x1 0 1 0 bias 4 4 0 1 1 1 0 1 0 1 x0 x1 x0 Filter 0 0 1 1 1 x1 x0 x1 0 0 1 1 0 0 1 1 0 0 β€’ Scanning an image with a β€œfilter” – The filter may proceed by more than 1 pixel at a time – E.g. with a β€œhop” of two pixels per shift

  17. The β€œStride” between adjacent scanned locations need not be 1 1 0 1 0 1 1 1 0 0 0 1 0 bias 4 4 0 1 1 1 0 1 0 1 Filter 2 0 0 1 1 1 x1 x0 x1 0 0 1 1 0 x0 x1 x0 0 1 1 0 0 x1 x0 x1 β€’ Scanning an image with a β€œfilter” – The filter may proceed by more than 1 pixel at a time – E.g. with a β€œhop” of two pixels per shift

  18. The β€œStride” between adjacent scanned locations need not be 1 1 0 1 0 1 1 1 0 0 0 1 0 bias 4 4 0 1 1 1 0 1 0 1 Filter 4 2 0 0 1 1 1 x1 x0 x1 0 0 1 1 0 x0 x1 x0 0 1 1 0 0 x1 x0 x1 β€’ Scanning an image with a β€œfilter” – The filter may proceed by more than 1 pixel at a time – E.g. with a β€œhop” of two pixels per shift

  19. Extending to multiple input maps Previous Previous layer layer β€’ We actually compute any individual convolutional map from all the maps in the previous layer

  20. Extending to multiple input maps Previous layer β€’ We actually compute any individual convolutional map from all the maps in the previous layer β€’ The actual processing is better understood if we modify our visualization of all the maps in a layer as vertical arrangement to..

  21. Extending to multiple input maps Stacked arrangement of kth layer of maps Filter applied to kth layer of maps (convolutive component plus bias) β€’ ..A stacked arrangement of planes β€’ We can view the joint processing of the various maps as processing the stack using a three- dimensional filter

  22. Extending to multiple input maps bias 𝑀 𝑀 𝑨 𝑗, π‘˜ = ෍ ෍ ෍ π‘₯ 𝑇,π‘š,π‘œ (π‘ž, 𝑙, π‘š)𝑍 π‘ž (𝑗 + π‘š, π‘˜ + 𝑙) + 𝑐 π‘ž 𝑙=1 π‘š=1 β€’ The computation of the convolutive map at any location sums the convolutive outputs at all planes

  23. Extending to multiple input maps bias One map 𝑀 𝑀 𝑨 𝑗, π‘˜ = ෍ ෍ ෍ π‘₯ 𝑇,π‘š,π‘œ (π‘ž, 𝑙, π‘š)𝑍 π‘ž (𝑗 + π‘š, π‘˜ + 𝑙) + 𝑐 π‘ž 𝑙=1 π‘š=1 β€’ The computation of the convolutive map at any location sums the convolutive outputs at all planes

  24. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, π‘˜ = ෍ ෍ ෍ π‘₯ 𝑇,π‘š,π‘œ (π‘ž, 𝑙, π‘š)𝑍 π‘ž (𝑗 + π‘š, π‘˜ + 𝑙) + 𝑐 π‘ž 𝑙=1 π‘š=1 β€’ The computation of the convolutive map at any location sums the convolutive outputs at all planes

  25. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, π‘˜ = ෍ ෍ ෍ π‘₯ 𝑇,π‘š,π‘œ (π‘ž, 𝑙, π‘š)𝑍 π‘ž (𝑗 + π‘š, π‘˜ + 𝑙) + 𝑐 π‘ž 𝑙=1 π‘š=1 β€’ The computation of the convolutive map at any location sums the convolutive outputs at all planes

  26. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, π‘˜ = ෍ ෍ ෍ π‘₯ 𝑇,π‘š,π‘œ (π‘ž, 𝑙, π‘š)𝑍 π‘ž (𝑗 + π‘š, π‘˜ + 𝑙) + 𝑐 π‘ž 𝑙=1 π‘š=1 β€’ The computation of the convolutive map at any location sums the convolutive outputs at all planes

  27. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, π‘˜ = ෍ ෍ ෍ π‘₯ 𝑇,π‘š,π‘œ (π‘ž, 𝑙, π‘š)𝑍 π‘ž (𝑗 + π‘š, π‘˜ + 𝑙) + 𝑐 π‘ž 𝑙=1 π‘š=1 β€’ The computation of the convolutive map at any location sums the convolutive outputs at all planes

  28. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, π‘˜ = ෍ ෍ ෍ π‘₯ 𝑇,π‘š,π‘œ (π‘ž, 𝑙, π‘š)𝑍 π‘ž (𝑗 + π‘š, π‘˜ + 𝑙) + 𝑐 π‘ž 𝑙=1 π‘š=1 β€’ The computation of the convolutive map at any location sums the convolutive outputs at all planes

  29. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, π‘˜ = ෍ ෍ ෍ π‘₯ 𝑇,π‘š,π‘œ (π‘ž, 𝑙, π‘š)𝑍 π‘ž (𝑗 + π‘š, π‘˜ + 𝑙) + 𝑐 π‘ž 𝑙=1 π‘š=1 β€’ The computation of the convolutive map at any location sums the convolutive outputs at all planes

  30. The size of the convolution 1 0 1 0 0 1 0 bias 1 0 1 Filter Input Map β€’ Image size: 5x5 β€’ Filter: 3x3 β€’ β€œStride”: 1 β€’ Output size = ?

  31. The size of the convolution 1 0 1 0 0 1 0 bias 1 0 1 Filter Input Map β€’ Image size: 5x5 β€’ Filter: 3x3 β€’ Stride: 1 β€’ Output size = ?

  32. The size of the convolution 1 0 1 0 1 1 1 0 0 0 1 0 bias 4 4 0 1 1 1 0 1 0 1 Filter 4 2 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 β€’ Image size: 5x5 β€’ Filter: 3x3 β€’ Stride: 2 β€’ Output size = ?

  33. The size of the convolution 1 0 1 0 1 1 1 0 0 0 1 0 bias 4 4 0 1 1 1 0 1 0 1 Filter 4 2 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 β€’ Image size: 5x5 β€’ Filter: 3x3 β€’ Stride: 2 β€’ Output size = ?

  34. The size of the convolution 𝑁 Γ— 𝑁 0 1 1 1 0 0 𝑇𝑗𝑨𝑓 ∢ 𝑂 Γ— 𝑂 bias 0 1 1 1 0 ? Filter 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 β€’ Image size: 𝑂 Γ— 𝑂 β€’ Filter: 𝑁 Γ— 𝑁 β€’ Stride: 1 β€’ Output size = ?

  35. The size of the convolution 𝑁 Γ— 𝑁 0 1 1 1 0 0 𝑇𝑗𝑨𝑓 ∢ 𝑂 Γ— 𝑂 bias 0 1 1 1 0 ? Filter 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 β€’ Image size: 𝑂 Γ— 𝑂 β€’ Filter: 𝑁 Γ— 𝑁 β€’ Stride: 𝑇 β€’ Output size = ?

  36. The size of the convolution 𝑁 Γ— 𝑁 0 1 1 1 0 0 𝑇𝑗𝑨𝑓 ∢ 𝑂 Γ— 𝑂 bias 0 1 1 1 0 ? Filter 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 β€’ Image size: 𝑂 Γ— 𝑂 β€’ Filter: 𝑁 Γ— 𝑁 β€’ Stride: 𝑇 β€’ Output size (each side) = 𝑂 βˆ’ 𝑁 /𝑇 + 1 – Assuming you’re not allowed to go beyond the edge of the input

  37. Convolution Size β€’ Simple convolution size pattern: – Image size: 𝑂 Γ— 𝑂 – Filter: 𝑁 Γ— 𝑁 – Stride: 𝑇 – Output size (each side) = 𝑂 βˆ’ 𝑁 /𝑇 + 1 β€’ Assuming you’re not allowed to go beyond the edge of the input β€’ Results in a reduction in the output size – Even if 𝑇 = 1 – Not considered acceptable β€’ If there’s no active downsampling, through max pooling and/or 𝑇 > 1 , then the output map should ideally be the same size as the input

  38. Solution 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 0 0 1 0 bias 1 0 1 0 0 0 1 1 1 0 Filter 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 β€’ Zero-pad the input – Pad the input image/map all around β€’ Add P L rows of zeros on the left and P R rows of zeros on the right β€’ Add P L rows of zeros on the top and P L rows of zeros at the bottom – P L and P R chosen such that: β€’ P L = P R OR | P L – P R | = 1 β€’ P L + P R = M-1 – For stride 1, the result of the convolution is the same size as the original image

  39. Solution 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 bias 1 0 1 0 0 1 1 1 0 0 Filter 0 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 β€’ Zero-pad the input – Pad the input image/map all around – Pad as symmetrically as possible, such that.. – For stride 1, the result of the convolution is the same size as the original image

  40. Why convolution? β€’ Convolutional neural networks are, in fact, equivalent to scanning with an MLP – Just run the entire MLP on each block separately, and combine results β€’ As opposed to scanning (convolving) the picture with individual neurons/filters – Even computationally, the number of operations in both computations is identical β€’ The neocognitron in fact views it equivalently to a scan β€’ So why convolutions?

  41. Cost of Correlation Correlation N M β€’ Correlation : 𝑧 𝑗, π‘˜ = ෍ ෍ 𝑦 𝑗 + π‘š, π‘˜ + 𝑛 π‘₯(π‘š, 𝑛) π‘š 𝑛 Cost of scanning an 𝑁 Γ— 𝑁 image with an 𝑂 Γ— 𝑂 filter: O 𝑁 2 𝑂 2 β€’ – 𝑂 2 multiplications at each of 𝑁 2 positions β€’ Not counting boundary effects – Expensive, for large filters

  42. Correlation in Transform Domain Correlation N M β€’ Correlation usind DFTs : Y = π½πΈπΊπ‘ˆ2 πΈπΊπ‘ˆ2(π‘Œ) ∘ π‘‘π‘π‘œπ‘˜ πΈπΊπ‘ˆ2(𝑋) β€’ Cost of doing this using the Fast Fourier Transform to compute the DFTs: O 𝑁 2 π‘šπ‘π‘•π‘‚ – Significant saving for large filters – Or if there are many filters

  43. A convolutional layer Previous Previous layer layer β€’ The convolution operation results in a convolution map β€’ An Activation is finally applied to every entry in the map

  44. The other component Downsampling/Pooling Output Multi-layer Perceptron β€’ Convolution (and activation) layers are followed intermittently by β€œ downsampling ” (or β€œpooling”) layers – Often, they alternate with convolution, though this is not necessary

  45. Recall: Max pooling 6 3 1 Max 4 6 Max β€’ Max pooling selects the largest from a pool of elements β€’ Pooling is performed by β€œscanning” the input

  46. Recall: Max pooling 6 6 1 3 Max 6 5 Max β€’ Max pooling selects the largest from a pool of elements β€’ Pooling is performed by β€œscanning” the input

  47. Recall: Max pooling 6 6 7 3 2 Max 5 7 Max β€’ Max pooling selects the largest from a pool of elements β€’ Pooling is performed by β€œscanning” the input

  48. Recall: Max pooling Max β€’ Max pooling selects the largest from a pool of elements β€’ Pooling is performed by β€œscanning” the input

  49. Recall: Max pooling Max β€’ Max pooling selects the largest from a pool of elements β€’ Pooling is performed by β€œscanning” the input

  50. Recall: Max pooling Max β€’ Max pooling selects the largest from a pool of elements β€’ Pooling is performed by β€œscanning” the input

  51. β€œStrides” Max β€’ The β€œmax” operations may β€œstride” by more than one pixel

  52. β€œStrides” Max β€’ The β€œmax” operations may β€œstride” by more than one pixel

  53. β€œStrides” Max β€’ The β€œmax” operations may β€œstride” by more than one pixel

  54. β€œStrides” Max β€’ The β€œmax” operations may β€œstride” by more than one pixel

  55. β€œStrides” Max β€’ The β€œmax” operations may β€œstride” by more than one pixel

  56. Max Pooling Single depth slice 1 1 2 4 x max pool with 2x2 filters 6 8 5 6 7 8 and stride 2 3 4 3 2 1 0 1 2 3 4 y β€’ An 𝑂 Γ— 𝑂 picture compressed by a 𝑄 Γ— 𝑄 maxpooling filter with stride 𝐸 results in an output map of side ΪΏ(𝑂 βˆ’

  57. Alternative to Max pooling: Mean Pooling Single depth slice 1 1 2 4 x Mean pool with 2x2 3.25 5.25 5 6 7 8 filters and stride 2 2 2 3 2 1 0 1 2 3 4 y β€’ An 𝑂 Γ— 𝑂 picture compressed by a 𝑄 Γ— 𝑄 maxpooling filter with stride 𝐸 results in an output map of side ΪΏ(𝑂 βˆ’

  58. Other options Network applies to each 2x2 block and strides by Single depth slice 2 in this example 1 1 2 4 x 6 8 5 6 7 8 3 4 3 2 1 0 1 2 3 4 y β€’ The pooling may even be a learned filter β€’ The same network is applied on each block β€’ (Again, a shared parameter network)

  59. Other options Network applies to each 2x2 block and strides by Single depth slice 2 in this example 1 1 2 4 x 6 8 5 6 7 8 3 4 3 2 1 0 1 2 3 4 Network in network y β€’ The pooling may even be a learned filter β€’ The same network is applied on each block β€’ (Again, a shared parameter network)

  60. Setting everything together β€’ Typical image classification task

  61. Convolutional Neural Networks β€’ Input: 1 or 3 images – Black and white or color – Will assume color to be generic

  62. Convolutional Neural Networks β€’ Input: 3 pictures

  63. Convolutional Neural Networks β€’ Input: 3 pictures

  64. Preprocessing β€’ Typically works with square images – Filters are also typically square β€’ Large networks are a problem – Too much detail – Will need big networks β€’ Typically scaled to small sizes, e.g. 32x32 or 128x128

  65. Convolutional Neural Networks 𝐽 Γ— 𝐽 𝑗𝑛𝑏𝑕𝑓 β€’ Input: 3 pictures

  66. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 Γ— 𝑀 Γ— 3 𝐽 Γ— 𝐽 𝑗𝑛𝑏𝑕𝑓 β€’ Input is convolved with a set of K 1 filters – Typically K 1 is a power of 2, e.g. 2, 4, 8, 16, 32,.. – Filters are typically 5x5, 3x3, or even 1x1

  67. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 Γ— 𝑀 Γ— 3 Small enough to capture fine features (particularly important for scaled-down images) 𝐽 Γ— 𝐽 𝑗𝑛𝑏𝑕𝑓 β€’ Input is convolved with a set of K 1 filters – Typically K 1 is a power of 2, e.g. 2, 4, 8, 16, 32,.. – Filters are typically 5x5, 3x3, or even 1x1

  68. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 Γ— 𝑀 Γ— 3 Small enough to capture fine features (particularly important for scaled-down images) What on earth is this? 𝐽 Γ— 𝐽 𝑗𝑛𝑏𝑕𝑓 β€’ Input is convolved with a set of K 1 filters – Typically K 1 is a power of 2, e.g. 2, 4, 8, 16, 32,.. – Filters are typically 5x5, 3x3, or even 1x1

  69. The 1x1 filter β€’ A 1x1 filter is simply a perceptron that operates over the depth of the map, but has no spatial extent – Takes one pixel from each of the maps (at a given location) as input

  70. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 Γ— 𝑀 Γ— 3 𝐽 Γ— 𝐽 𝑗𝑛𝑏𝑕𝑓 β€’ Input is convolved with a set of K 1 filters – Typically K 1 is a power of 2, e.g. 2, 4, 8, 16, 32,.. – Better notation: Filters are typically 5x5(x3), 3x3(x3), or even 1x1(x3)

  71. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 Γ— 𝑀 Γ— 3 Parameters to choose: 𝐿 1 , 𝑀 and 𝑇 1. Number of filters 𝐿 1 2. Size of filters 𝑀 Γ— 𝑀 Γ— 3 + 𝑐𝑗𝑏𝑑 3. Stride of convolution 𝑇 Total number of parameters: 𝐿 1 3𝑀 2 + 1 𝐽 Γ— 𝐽 𝑗𝑛𝑏𝑕𝑓 β€’ Input is convolved with a set of K 1 filters – Typically K 1 is a power of 2, e.g. 2, 4, 8, 16, 32,.. – Better notation: Filters are typically 5x5(x3), 3x3(x3), or even 1x1(x3) – Typical stride : 1 or 2

  72. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 Γ— 𝑀 Γ— 3 𝐽 Γ— 𝐽 𝑗𝑛𝑏𝑕𝑓 β€’ The input may be zero-padded according to the size of the chosen filters

  73. Convolutional Neural Networks K 1 filters of size: 𝑀 Γ— 𝑀 Γ— 3 𝐽 Γ— 𝐽 1 𝑍 The layer includes a convolution operation 1 followed by an activation (typically RELU) 𝑀 𝑀 1 𝑍 1 (𝑗, π‘˜) = 1 𝑑, 𝑙, π‘š 𝐽 𝑑 𝑗 + 𝑙, π‘˜ + π‘š + 𝑐 𝑛 (1) 2 𝑨 𝑛 ෍ ෍ ෍ π‘₯ 𝑛 π‘‘βˆˆ{𝑆,𝐻,𝐢} 𝑙=1 π‘š=1 1 (𝑗, π‘˜) = 𝑔 𝑨 𝑛 1 (𝑗, π‘˜) 𝑍 𝑛 𝐽 Γ— 𝐽 𝑗𝑛𝑏𝑕𝑓 1 𝑍 𝐿 1 β€’ First convolutional layer: Several convolutional filters – Filters are β€œ3 - D” (third dimension is color) – Convolution followed typically by a RELU activation β€’ Each filter creates a single 2-D output map

  74. Learnable parameters in the first convolutional layer β€’ The first convolutional layer comprises 𝐿 1 filters, each of size 𝑀 Γ— 𝑀 Γ— 3 – Spatial span: 𝑀 Γ— 𝑀 – Depth : 3 (3 colors) β€’ This represents a total of 𝐿 1 3𝑀 2 + 1 parameters – β€œ+ 1” because each filter also has a bias β€’ All of these parameters must be learned

  75. Convolutional Neural Networks Filter size: 𝑀 Γ— 𝑀 Γ— 3 𝐽 Γ— 𝐽 𝐽/𝐸 Γ— (𝐽/𝐸 The layer pools PxP blocks of Y into a single value 1 1 𝑍 𝑉 1 1 It employs a stride D between adjacent blocks pool 1 𝑍 1 1 (𝑗, π‘˜) = 1 (𝑙, π‘š) 𝑉 2 2 𝑉 𝑛 max 𝑍 𝑛 π‘™βˆˆ π‘—βˆ’1 𝐸+1, 𝑗𝐸 , π‘šβˆˆ π‘˜βˆ’1 𝐸+1, π‘˜πΈ 𝐽 Γ— 𝐽 𝑗𝑛𝑏𝑕𝑓 1 1 𝑍 𝑉 𝐿 1 𝐿 1 β€’ First downsampling layer: From each 𝑄 Γ— 𝑄 block of each map, pool down to a single value – For max pooling, during training keep track of which position had the highest value

  76. Convolutional Neural Networks Filter size: 𝑀 Γ— 𝑀 Γ— 3 𝐽 Γ— 𝐽 𝐽/𝐸 Γ— (𝐽/𝐸 1 (𝑗, π‘˜) = 1 (𝑙, π‘š) 𝑉 𝑛 max 𝑍 𝑛 1 1 π‘™βˆˆ π‘—βˆ’1 𝐸+1, 𝑗𝐸 , 𝑍 𝑉 1 1 π‘šβˆˆ π‘˜βˆ’1 𝐸+1, π‘˜πΈ Parameters to choose: Size of pooling block 𝑄 pool 1 𝑍 1 Pooling stride 𝐸 𝑉 2 2 Choices: Max pooling or mean pooling? 𝐽 Γ— 𝐽 𝑗𝑛𝑏𝑕𝑓 Or learned pooling? 1 1 𝑍 𝑉 𝐿 1 𝐿 1 β€’ First downsampling layer: From each 𝑄 Γ— 𝑄 block of each map, pool down to a single value – For max pooling, during training keep track of which position had the highest value

Recommend


More recommend