 
              Convolutional Neural Networks Paweł Liskowski Institute of Computing Science, Poznań University of Technology 19 November 2018 1 / 13
Neural networks for computer vision Regular Neural Nets don’t scale well to full images: images of size 32 × 32 → 3072 weights, image of size 512 × 512 → 786432 weights ... . . . per single neuron! Fully connected layer with 256 units contains roughly 200 milion parameters! networks becomes hard to train, the problem of overfitting. How to deal with this problem? 2 / 13
Convolutional Neural Nets (CNNs) Overview: core idea similar to ordinary NNs, consists of neurons that have learnable weights, neurons performs a dot product followed with a non-linearity. So what changes? Architecture is adapted to leverage the nature of images: neurons organized in 3D volumes ... ... and connected to a small region in a previous layer, parameters are shared across hidden units, activations are pooled to reduce dimensionality. Let’s review these ideas in detail. 3 / 13
Local connectivity Idea: connect each neuron to only a local region of the input volume. receptive field defines the spatial extent of this connectivity, ... but always connected to all channels! The connections are local in space, but always full along the depth of the volume. Question: What problems does local connectivity solve? 4 / 13
Parameter sharing Observation: if a feature is useful to compute at some position ( x 1 , y 1 ) , then it should also be useful to compute at position ( x 2 , y 2 ) . Each member of the kernel is used et every position of the input. Rather than learning a separate set of parameters for every location, we learn only one set . Neurons in each depth slice are constrained to use the same weights. Effect: Does not affect computational cost of forward prop, Further reduces the storage requirements. 5 / 13
Convolution Operation The size of the output volume is controlled by the depth , stride and padding : depth corresponds to the number of filters we use, stride defines how we slide the filter, zero-padding adds zeros around the border. What if there are more input channels? 6 / 13
Convolution Operation 1D example: input size W = 5, receptive field F = 3, padding P = 1, stride S = 1 General formula for output volume size: W + 2 P − F + 1 S 7 / 13
Convolutional Layer Remainder: all neurons in a single depth slice share weights . Forward pass can be computed as a convolution of the neuron’s weights with the input volume. It is common to refer to the sets of weights as a filter (or a kernel). Example: Input: image of size 32 × 32 × 3, 6 filters of size 5 × 5 × 3 → output volume of size 28 × 28 × 6, Control questions: How many neurons are in the first conv layer? What about the number of weights? 8 / 13
Convolutional layer - summary Accepts a volume of size W 1 × H 1 × D 1 There are four hyperparameters: number of filters K , their spatial extent F , the stride S , the amount of zero-padding P . Produces a volume of size W 2 × H 2 × D 2 W 2 = W 1 + 2 P − F + 1 S H 2 = H 1 + 2 P − F + 1 S D 2 = K There are F · F · D 1 weights per filter. 9 / 13
Pooling Pooling layer progressively reduces the spatial size of the representation: reduce the amount of parameters, makes the model computationally cheaper, controls overfitting. 10 / 13
Pooling Most common approach: 2 × 2 filters with a stride of 2 downsamples every depth slice in by 2 along both width and height, discards 75% of the activations. pooling sizes with F > 3 are considered too destructive. other possiblities: average, L2, pyramid, stochastic, . . . Pooling is no longer as popular as it used to be: to reduce the size of the representation one may use larger stride, discarding pooling is crucial for generative models. 11 / 13
ConvNet architecture Common pattern: INPUT → [[ CONV → ReLU ] × N → POOL ?] × M → [ FC → ReLU ] × K → FC Rule of thumb: prefer a stack of small filters to one large receptive field. input layer convolution pooling convolution pooling fully-connected fully-connected L0 - image layer L1 layer L2 layer L3 layer L4 layers L5 layer L6 12 / 13
Recommend
More recommend