Convolutional Neural Networks Pawe Liskowski Institute of Computing - - PowerPoint PPT Presentation

convolutional neural networks
SMART_READER_LITE
LIVE PREVIEW

Convolutional Neural Networks Pawe Liskowski Institute of Computing - - PowerPoint PPT Presentation

Convolutional Neural Networks Pawe Liskowski Institute of Computing Science, Pozna University of Technology 19 November 2018 1 / 13 Neural networks for computer vision Regular Neural Nets dont scale well to full images: images of size


slide-1
SLIDE 1

Convolutional Neural Networks

Paweł Liskowski

Institute of Computing Science, Poznań University of Technology

19 November 2018

1 / 13

slide-2
SLIDE 2

Neural networks for computer vision

Regular Neural Nets don’t scale well to full images: images of size 32 × 32 → 3072 weights, image of size 512 × 512 → 786432 weights ... . . . per single neuron! Fully connected layer with 256 units contains roughly 200 milion parameters! networks becomes hard to train, the problem of overfitting. How to deal with this problem?

2 / 13

slide-3
SLIDE 3

Convolutional Neural Nets (CNNs)

Overview: core idea similar to ordinary NNs, consists of neurons that have learnable weights, neurons performs a dot product followed with a non-linearity. So what changes? Architecture is adapted to leverage the nature of images: neurons organized in 3D volumes ... ... and connected to a small region in a previous layer, parameters are shared across hidden units, activations are pooled to reduce dimensionality. Let’s review these ideas in detail.

3 / 13

slide-4
SLIDE 4

Local connectivity

Idea: connect each neuron to only a local region of the input volume. receptive field defines the spatial extent of this connectivity, ... but always connected to all channels! The connections are local in space, but always full along the depth of the volume. Question: What problems does local connectivity solve?

4 / 13

slide-5
SLIDE 5

Parameter sharing

Observation: if a feature is useful to compute at some position (x1, y1), then it should also be useful to compute at position (x2, y2). Each member of the kernel is used et every position of the input. Rather than learning a separate set of parameters for every location, we learn

  • nly one set.

Neurons in each depth slice are constrained to use the same weights. Effect: Does not affect computational cost of forward prop, Further reduces the storage requirements.

5 / 13

slide-6
SLIDE 6

Convolution Operation

The size of the output volume is controlled by the depth, stride and padding: depth corresponds to the number of filters we use, stride defines how we slide the filter, zero-padding adds zeros around the border. What if there are more input channels?

6 / 13

slide-7
SLIDE 7

Convolution Operation

1D example: input size W = 5, receptive field F = 3, padding P = 1, stride S = 1 General formula for output volume size: W + 2P − F S + 1

7 / 13

slide-8
SLIDE 8

Convolutional Layer

Remainder: all neurons in a single depth slice share weights. Forward pass can be computed as a convolution of the neuron’s weights with the input volume. It is common to refer to the sets of weights as a filter (or a kernel). Example: Input: image of size 32 × 32 × 3, 6 filters of size 5 × 5 × 3 → output volume of size 28 × 28 × 6, Control questions: How many neurons are in the first conv layer? What about the number of weights?

8 / 13

slide-9
SLIDE 9

Convolutional layer - summary

Accepts a volume of size W1 × H1 × D1 There are four hyperparameters:

number of filters K, their spatial extent F, the stride S, the amount of zero-padding P.

Produces a volume of size W2 × H2 × D2

W2 = W1+2P−F

S

+ 1 H2 = H1+2P−F

S

+ 1 D2 = K

There are F · F · D1 weights per filter.

9 / 13

slide-10
SLIDE 10

Pooling

Pooling layer progressively reduces the spatial size of the representation: reduce the amount of parameters, makes the model computationally cheaper, controls overfitting.

10 / 13

slide-11
SLIDE 11

Pooling

Most common approach: 2 × 2 filters with a stride of 2 downsamples every depth slice in by 2 along both width and height, discards 75% of the activations. pooling sizes with F > 3 are considered too destructive.

  • ther possiblities: average, L2, pyramid, stochastic, . . .

Pooling is no longer as popular as it used to be: to reduce the size of the representation one may use larger stride, discarding pooling is crucial for generative models.

11 / 13

slide-12
SLIDE 12

ConvNet architecture

Common pattern: INPUT → [[CONV → ReLU] × N → POOL?] × M → [FC → ReLU] × K → FC Rule of thumb: prefer a stack of small filters to one large receptive field.

convolution layer L1 convolution layer L3 pooling layer L2 input layer L0 - image fully-connected layers L5 pooling layer L4 fully-connected layer L6

12 / 13