Advanced Machine Learning Convolutional Neural Networks Amit Sethi - - PowerPoint PPT Presentation

β–Ά
advanced machine learning
SMART_READER_LITE
LIVE PREVIEW

Advanced Machine Learning Convolutional Neural Networks Amit Sethi - - PowerPoint PPT Presentation

Advanced Machine Learning Convolutional Neural Networks Amit Sethi Electrical Engineering, IIT Bombay Learning outcomes for the lecture List benefits of convolution Identify input types suited for convolution List benefits of pooling


slide-1
SLIDE 1

Advanced Machine Learning Convolutional Neural Networks

Amit Sethi Electrical Engineering, IIT Bombay

slide-2
SLIDE 2

Learning outcomes for the lecture

  • List benefits of convolution
  • Identify input types suited for convolution
  • List benefits of pooling
  • Identify input types not suited for convolution
  • Write backprop through conv and pool
slide-3
SLIDE 3

Convolutional layers

f(.) y1 f(.) y2 Input Output x1 x2 x3 g(.)

Idea: (1) Features are local (2) Their presence/absence is ergodic

x4 x5 h111

Concept by Yann LeCun

Index can not be permutated

slide-4
SLIDE 4

Convolutional layers

f(.) y1 f(.) y2 Input Output x1 x2 x3 g(.)

Idea: (1) Features are local (2) Their presence/absence is stationary

x4 x5 h111 h112

Concept by Yann LeCun

Index can not be permutated

slide-5
SLIDE 5

Convolutional layers

f(.) y1 f(.) y2 Input Output x1 x2 x3 g(.)

Idea: (1) Features are local, (2) Their presence/absence is stationary (3) GPU implementation for inexpensive super-computing

x4 x5 h111 h112 h113

LeNet, AlexNet

Index can not be permutated

slide-6
SLIDE 6

Receptive fields of neurons

Source: http://psych.hanover.edu/Krantz/receptive/

  • Levine and Shefner (1991) define a receptive

field as an "area in which stimulation leads to response of a particular sensory neuron" (p. 671).

slide-7
SLIDE 7

The concept of the best stimulus

  • Depending on excitatory and inhibitory

connections, there is an optimal stimulus that falls only in the excitatory region

  • On-center retinal ganglion cell example shown

here

Source: http://psych.hanover.edu/Krantz/receptive/

slide-8
SLIDE 8

On-center vs. off- center

Source: https://en.wikipedia.org/wiki/Receptive_field

slide-9
SLIDE 9

Bar detection example

Source: http://psych.hanover.edu/Krantz/receptive/

slide-10
SLIDE 10

Gabor filters model simple cell in visual cortex

Source: https://en.wikipedia.org/wiki/Gabor_filter

slide-11
SLIDE 11

Modeling oriented edges using Gabor

Source: https://en.wikipedia.org/wiki/Gabor_filter

slide-12
SLIDE 12

Feature maps using Gabor filters

Source: https://en.wikipedia.org/wiki/Gabor_filter

slide-13
SLIDE 13

Haar filters

Source: http://www.cosy.sbg.ac.at/~hegenbart/

slide-14
SLIDE 14

More feature maps

Source: http://www.cosy.sbg.ac.at/~hegenbart/

slide-15
SLIDE 15

Convolution

  • Classical definitions

𝑔 βˆ— 𝑕 𝑒 = 𝑔 𝑒 βˆ’ 𝜐 𝑕 𝜐 π‘’πœ

∞ βˆ’βˆž

𝑔 βˆ— 𝑕 π‘œ = 𝑔 π‘œ βˆ’ 𝑦 𝑕 𝑦

∞ 𝑦=βˆ’βˆž

  • Or, one can take cross-correlation between 𝑔 π‘œ and 𝑕 βˆ’π‘œ
  • In 2-D, it would be

𝑔 π‘œ, 𝑛 𝑕 π‘œ + 𝑦, 𝑛 + 𝑧

∞ 𝑐=βˆ’βˆž ∞ 𝑏=βˆ’βˆž

  • Fast implementation for multiple PUs
slide-16
SLIDE 16

Convolution animation

Source: http://bmia.bmt.tue.nl/education/courses/fev/course/notebooks/triangleblockconvolution.gif

slide-17
SLIDE 17

Convolution in 2-D (sharpening filter)

Source: https://upload.wikimedia.org/wikipedia/commons/4/4f/3D_Convolution_Animation.gif

slide-18
SLIDE 18

Let the network learn conv kernels

slide-19
SLIDE 19

Number of weights with and without conv.

  • Assume that we want to extract 25 features per pixel
  • Fully connected layer:

– Input 32x32x3 – Hidden 28x28x25 – Weights 32x32x3 x 28x28x25 = 60,211,200

  • With convolutions (weight sharing):

– Input 32x32x3 – Hidden 28x28x25 – Weights 5x5x3 x 25 = 1,875

slide-20
SLIDE 20

How will backpropagation work?

  • Backpropagation will treat each input patch

(not image) as a sample!

slide-21
SLIDE 21

Feature maps

  • Convolutional layer:

– Input οƒ  A (set of) layer(s)

  • Convolutional filter(s)
  • Bias(es)
  • Nonlinear squashing

– Output οƒ  Another layer(s); AKA: Feature maps

  • A map of where each feature was detected
  • A shift in input => A shift in feature map
  • Is it important to know where exactly the feature was

detected?

  • Notion of invariances: translation, scaling, rotation, contrast
slide-22
SLIDE 22

Pooling is subsampling

Source: "Gradient-based learning applied to document recognition" by Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, in Proc. IEEE, Nov. 1998.

slide-23
SLIDE 23

Types of pooling

  • Two types of popular pooling methods

– Average – Max

  • How do these differ?
  • How do gradient computations differ?
slide-24
SLIDE 24

A bi-pyramid approach: Map size decreases, but number of maps increases

Why?

Source: "Gradient-based learning applied to document recognition" by Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, in Proc. IEEE, Nov. 1998.

slide-25
SLIDE 25

Fully connected layers

  • Multi-layer non-linear decision making

Source: "Gradient-based learning applied to document recognition" by Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, in Proc. IEEE, Nov. 1998.

slide-26
SLIDE 26

Visualizing weights, conv layer 1

Source: http://cs231n.github.io/understanding-cnn/

slide-27
SLIDE 27

Visualizing feature map, conv layer 1

Source: http://cs231n.github.io/understanding-cnn/

slide-28
SLIDE 28

Visualizing weights, conv layer 2

Source: http://cs231n.github.io/understanding-cnn/

slide-29
SLIDE 29

Visualizing feature map, conv layer 2

Source: http://cs231n.github.io/understanding-cnn/

slide-30
SLIDE 30

CNN for speech processing

Source: "Convolutional neural networks for speech recognition" by Ossama Abdel-Hamid et al., in IEEE/ACM Trans. ASLP, Oct, 2014

slide-31
SLIDE 31

CNN for DNA-protein binding

Source: "Convolutional neural network architectures for predicting DNA–protein binding” by Haoyang Zeng et al., Bioinformatics 2016, 32 (12)

slide-32
SLIDE 32

Convolution and pooling revisited

ReLU

*

Input Image Feature Map

Feature Map

Inputs can be padded to match the input and output size

Max

Pooling Layer Convolutional Layer FC Layer Class Probability

slide-33
SLIDE 33

Variations of convolutional filter achieve various purposes

  • N-D convolutions generalize over 2-D
  • Stride variation leads to pooling
  • Atrous (dilated) convolutions cover more area

with less parameters

  • Transposed convolution increases the feature

map size

  • Layer-wise convolutions reduce parameters
  • 1x1 convolutions reduce feature maps
  • Separable convolutions reduce parameters
  • Network-in-network learns a nonlinear conv
slide-34
SLIDE 34

Convolutions in 3-D

*

slide-35
SLIDE 35

Convolutions with stride > 1

*

slide-36
SLIDE 36

Atrous (dilated) convolutions can increase the receptive field without increasing the number of weights

Image pixels 5x5 kernel 3x3 kernel 5x5 dilated kernel with only 3x3 trainable weights

*

slide-37
SLIDE 37

Transposed (de-) convolution increases feature map size

*

slide-38
SLIDE 38

MobileNet filters each feature map separately

β€œMobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” by Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto Hartwig Adam, 2017

* * * * *

slide-39
SLIDE 39

Using 1x1 convolutions is equivalent to having a fully connected layer

  • This way, a fully

convolutional network can be constructed from a regular CNN such as VGG11

Number of 1x1 filters is equal to number of fully connected nodes

slide-40
SLIDE 40

1x1 convolutions can also be used to change the number of feature maps

* = ReLU

slide-41
SLIDE 41

Inception uses multiple sized convolution filters

Image source: https://ai.googleblog.com/2016/08/improving-inception-and-image.html

slide-42
SLIDE 42

Separable convolutions

* *

slide-43
SLIDE 43

Network in network

Source: β€œNetwork in Network” by Min Lin, Qiang Chen, Shuicheng Yan, https://arxiv.org/pdf/1312.4400v3.pdf

  • Instead of a linear filter with a nonlinear

squashing function, N-i-N uses an MLP in a convolutional (sliding) fashion

slide-44
SLIDE 44

Variations of pooling are also available, e.g. stochastic pooling

  • Average pooling (subsampling):
  • Max pooling:
  • Stochastic pooling:

– Define probability: – Select activation from multinomial distribution: – Backpropagation works just like max pooling

  • Keep track of l that was chosen (sampled)

– During testing, take a weighted average of activations

Source: β€œStochastic Pooling for Regularization of Deep Convolutional Neural Networks”, by Zeiler and Fergus, in ICLR 2013.

slide-45
SLIDE 45

Example of stochastic pooling

Source: β€œStochastic Pooling for Regularization of Deep Convolutional Neural Networks”, by Zeiler and Fergus, in ICLR 2013.

slide-46
SLIDE 46

A standard architecture

  • n a large image with

global average pooling

GAP Layer