CS480/680 Machine Learning Lecture 20: Convolutional Neural Network - - PowerPoint PPT Presentation

cs480 680 machine learning lecture 20 convolutional
SMART_READER_LITE
LIVE PREVIEW

CS480/680 Machine Learning Lecture 20: Convolutional Neural Network - - PowerPoint PPT Presentation

CS480/680 Machine Learning Lecture 20: Convolutional Neural Network Zahra Sheikhbahaee March 29, 2020 University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1 Outline Convolution Zero Padding Stride Weight Sharing Pooling


slide-1
SLIDE 1

CS480/680 Machine Learning Lecture 20: Convolutional Neural Network

Zahra Sheikhbahaee March 29, 2020

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1

slide-2
SLIDE 2

Outline

Convolution Zero Padding Stride Weight Sharing Pooling Convolutional neural net architecture LeNet-5 AlexNet ResNet Inception

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 2

slide-3
SLIDE 3

Computer Vision Tasks Using Convolutional Networks

Neural Style transfer

Figure: B The Shipwreck of the Minotaur by J.M.W. Turner, 1805. C The

Starry Night by Vincent van Gogh, 1889. D Der Schrei by Edvard Munch, 1893.

Object Detection

Figure: Faster R-CNN model

Semantic Segmentation

Figure: FCN

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 3

slide-4
SLIDE 4

Convolutional Neural Networks

◮ Convolutional neural network (CNN) is designed to automatically and adaptively learn spatial hierarchies of features through a backpropagation algorithm. ◮ A deficiency of fully connected architectures is that the topology of the input is entirely ignored. ◮ Convolutional neural networks combine three mechanisms:

  • local receptive field
  • Share weight
  • Spatial or temporal sampling

◮ CNN is composed of multiple building blocks, such as convolution layers, pooling layers, and fully connected layers.

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 4

slide-5
SLIDE 5

Definition of Convolution

Mathematical definition

g(x, y) = K ∗ I =

a

  • i=−a

b

  • j=−b

K(i, j)I(x − i, y − j) Figure: Edge detection with horizontal and vertical filters

  • The rhs image is convolved by a 3 × 3 Sobel filter which puts a little bit more weight on the central pixels. The coefficient matrices for the Sobel filter are

Gx =

 

1

−1

2

−2

1

−1   and Gy = GT

x . An edge is where the pixel intensity changes in a notorious way. A good way to express changes is by using

  • derivatives. The Sobel operator calculates the approximation to a derivative of an image by G =
  • G2

x + G2 y.

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 5

slide-6
SLIDE 6

Detecting Vertical Edges

nin: number of input features nout: number of output features k: Convolutional kernel size

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 6

slide-7
SLIDE 7

Some Other Kernel Examples

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 7

slide-8
SLIDE 8

Gabor Filters

◮ Gabor filters: common feature maps inspired by the human vision system and it is used for texture analysis. ◮ A Gabor filter can be viewed as a sinusoidal plane of particular frequency and

  • rientation, modulated by a Gaussian envelope.

◮ Weights : Grey → zero, white → positive, black → negative

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 8

slide-9
SLIDE 9

Padding Schemes

Definition of Padding

The number of zeros concatenated at the beginning and at the end of an axis (p). Why padding is important?

  • The shrinking output
  • Throwing away the information on corners of the images

To preserve the size of the output as the input, p = k−1

2

padding is needed for the input image.

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 9

slide-10
SLIDE 10

Strided Convolution

Definition of stride

The distance between two consecutive positioning of the kernel along axes (s).

(d) 2 × 2 strides (e) unit strides

Padding with p zeros changes the effective input size from nin to nin + 2p. Then the size of the output is equal to nout = nin−k+2p

s

+ 1.

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 10

slide-11
SLIDE 11

Weight Sharing

– In CNNs, each filter is replicated across the entire visual field. These replicated units share the same parameterization (weight vector and bias) and form a feature map. – This provides the basis for the invariance of the network outputs to translation and distortions of the input images. – weight sharing helps in reducing over-fitting due to the reduced number of trainable parameters. – modeling local correlations is easy with CNNs through weight sharing scheme.

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 11

slide-12
SLIDE 12

Pooling Layers

Figure: Architecture of CNN: FC is a fully connected layer, ReLU denotes a rectified Linear Unit and ci is the number of input channels

  • A convolution layer computes feature response maps that involve multiple channels

within some localized spatial region.

  • A pooling layer is restricted to act within just one channel at a time, condensing the

activation values in each spatially local region in the currently considered channel.

  • pooling operations play a role in producing downstream representations that are

more robust to the effects of variations in data while still preserving important motifs.

  • Pooling layer does not have any parameter.

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 12

slide-13
SLIDE 13

Type of Pooling Layers

◮ Max Pooling ◮ Average Pooling

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 13

slide-14
SLIDE 14

Drawback of Max and Average Pooling

◮ Max pooling drawback ◮ Average pooling drawback

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 14

slide-15
SLIDE 15

Convolutional neural net architecture (LeNet-5)

Figure: Lecun et al. 1998

◮ The input is a 32 × 32 grayscale image which passes through the first convolutional layer. ◮ Layer C1 has 6 feature maps (filters) and 5 × 5 filter size with a stride of one and no padding. The image dimensions changes from 32 × 32 × 1 to

28 × 28 × 6. The layer has 156 trainable parameters.

◮ Layer S2 is a subsampling layer with 6 feature maps. It has a filter size 2 × 2 and a stride of s = 2. The resulting image dimensions will be reduced to

14 × 14 × 6. Layer S2 has only 12 trainable parameter.

◮ Layer C3 is a convolutional layer with 16 feature maps. The filter size is 5 × 5 and a stride of 1 and it has 1516 trainable parameters. ◮ The S4 layer is an average pooling layer with filter size 2 × 2 and a stride of 2. This layer has 16 feature maps with 32 parameters and its output will be reduced to 5 × 5 × 16. ◮ The fifth layer C5 is a fully connected convolutional layer with 120 feature maps each of size 1 × 1. Each of the 120 units in C5 is connected to all the

5 × 5 × 16 nodes in the s4 layer.

◮ The sixth layer is a fully connected layer F6 with 84 units and there is a fully connected Euclidean radial basis function (RBF) instead of softmax function. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 15

slide-16
SLIDE 16

Convolutional neural net architecture (AlexNet)

Figure: Krizhevsky et al. 2012

◮ It contains 5 convolutional layers and 3 fully connected layers. ◮ The first convolutional layer filters the 227 × 227 × 3 input image with 96 kernels of size 11 × 11 × 3 with a stride of s = 4 pixels. ◮ The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 × 5 × 48

  • Use Relu instead of Tanh to add non-linearity. It accelerates the speed by 6 times at the same accuracy.
  • Use dropout instead of regularisation to deal with overfitting.
  • Overlap pooling to reduce the size of network.
  • Use multiple GPU to train 62.3 millions parameters.
  • Employ Local Response Normalization

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 16

slide-17
SLIDE 17

Convolutional neural net architecture (AlexNet)

◮ The parallelization scheme that employed in AlexNet puts half of the kernels on each GPU. ◮ The GPUs communicate only in certain layers. ◮ top 48 kernels on GPU 1 : color-agnostic ◮ bottom 48 kernels on GPU 2 : color-specific. ◮ . This scheme reduces our top-1 and top-5 error rates by 1.7% and 1.2%, respectively.

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 17

slide-18
SLIDE 18

Deep Residual Networks

◮ Training deep neural networks with gradient based optimizers and learning methods can cause vanishing and exploding gradients during backpropagation. ◮ the degradation problem: As the network depth increasing, accuracy gets saturated and then degrades rapidly. Adding more layers to a suitably deep model leads to higher training error. ◮ A residual network is a solution for the upper mentioned problems. These networks are easier to optimize, and can gain accuracy from considerably increased depth. ◮ The shortcut connections simply perform identity mapping, and their outputs are added to the outputs of the stacked layers.

xl and xl+1: input and output of the l-th unit

F and h(xl): a residual function and an identity mapping

yl = h(xl) + F(xl, Wl) xl+1 = f(yl)

  • ReLU

If f is also an identity mapping: xl+1 = yl, so we have xl+1 = xl + F(xl, Wl)

xL = xl +

L−1

  • i=l

F(xi, Wi) ∂ε ∂xl = ∂ε ∂xL ∂xL ∂xl = ∂ε ∂xL (1 + ∂ ∂xl

L−1

  • i=l

F(xi, Wi)) University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 18

slide-19
SLIDE 19

Convolutional neural net architecture (ResNet)

Figure: He et al. 2015

◮ 152-layer model for the ImageNet competition with 3.57% top 5 error (better than human performance). ◮ Every residula block has two 3 × 3 convolutional layers ◮ Periodically, double number of filters and downsample spatially using stride 2 which divides by 2 in each dimension ◮ There is additional conv layer at the beginning ◮ No fully connected (FC) layers at the end, just a globally averaging pooling layer (only FC 1000 to output classes) University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 19

slide-20
SLIDE 20

Inception Module (GoogLeNet)

◮ The GoogLeNet uses 12× fewer parameters than the winning architecture of Krizhevsky et al. 2012. The codenamed Inception is inspired by the famous we need to go deeper internet meme. ◮ Naively stacking large convolution operations is computationally expensive. In the naive inception module, with 3 different sizes of filters (1 × 1, 3 × 3,

5 × 5) the module performs convolution on an input as well as max pooling.

◮ In the naive structure, a modest number of 5 × 5 convolutions can be prohibitively expensive on top of a convolutional layer with a large number of filters.

(a) Inception module, naive version (b) Inception module with dimension reductions

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 20

slide-21
SLIDE 21

Inception Module (GoogLeNet)

◮ The solution: applying dimension reductions and projections wherever the computational requirements would increase too much. ◮

1 × 1 convolutions are used to compute reductions before the expensive 3 × 3 and 5 × 5 convolutions.

1 × 1 convolution is used with ReLU in order to introduce more non-linearity and increase the representational power of the network.

(c) Inception module, naive version (d) Inception module with dimension reductions

University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 21

slide-22
SLIDE 22

Inception Architecture

  • The network is 22 layers deep
  • A 1×1 convolution with 128 filters for dimension reduction and rectified linear activation.
  • A fully connected layer with 1024 units and rectified linear activation.
  • A dropout layer with 70% ratio of dropped outputs.
  • A linear layer with softmax loss as the classifier (predicting the same 1000 classes as the main classifier, but removed at inference time).
  • By adding auxiliary classifiers connected to these intermediate layers, we would expect to encourage discrimination in the lower stages in the classifier,

increase the gradient signal that gets propagated back, and provide additional regularization. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 22