Convolutional Neural Networs for Image Classification Marcello - - PowerPoint PPT Presentation

convolutional neural networs for image classification
SMART_READER_LITE
LIVE PREVIEW

Convolutional Neural Networs for Image Classification Marcello - - PowerPoint PPT Presentation

Convolutional Neural Networs for Image Classification Marcello Pelillo University of Venice, Italy Image and Video Understanding a.y. 2018/19 The Age of Deep Learning The Deep Learning Philosophy Learn a feature hierarchy all


slide-1
SLIDE 1

Convolutional Neural Networs for Image Classification

Marcello Pelillo University of Venice, Italy Image and Video Understanding

a.y. 2018/19

slide-2
SLIDE 2

The Age of “Deep Learning”

slide-3
SLIDE 3

The Deep Learning “Philosophy”

  • Learn a feature hierarchy all the way from pixels to classifier
  • Each layer extracts features from the output of previous layer
  • Train all layers jointly
slide-4
SLIDE 4

Performance Improves with More Data

slide-5
SLIDE 5

Old Idea… Why Now?

  • 1. We have more data - from Lena to ImageNet.
  • 2. We have more computing power, GPUs are

really good at this.

  • 3. Last but not least, we have new ideas
slide-6
SLIDE 6

Image Classification

Predict a single label (or a distribution over labels as shown here to indicate our confidence) for a given image. Images are 3-dimensional arrays of integers from 0 to 255, of size Width x Height x 3. The 3 represents the three color channels Red, Green, Blue.

From: A. Karpathy

slide-7
SLIDE 7

Challenges

From: A. Karpathy

slide-8
SLIDE 8

The Data-Driven Approach

An example training set for four visual categories. In practice we may have thousands of categories and hundreds of thousands of images for each category.

From: A. Karpathy

slide-9
SLIDE 9

Inspiration from Biology

slide-10
SLIDE 10

The Visual System as a Hierarchy of Feature Detectors

slide-11
SLIDE 11

Convolution

slide-12
SLIDE 12

Convolution

slide-13
SLIDE 13

Mean Filters

slide-14
SLIDE 14

Gaussian Filters

slide-15
SLIDE 15

Gaussian Filters

slide-16
SLIDE 16

The Effect of Gaussian Filters

slide-17
SLIDE 17

The Effect of Gaussian Filters

slide-18
SLIDE 18

Kernel Width Affects Scale

slide-19
SLIDE 19

Edge detection

slide-20
SLIDE 20

Edge detection

slide-21
SLIDE 21

Using Convolution for Edge Detection

slide-22
SLIDE 22

A Variety of Image Filters

Laplacian of Gaussians (LoG) (Marr 1982)

slide-23
SLIDE 23

Gabor filters (directional) (Daugman 1985)

A Variety of Image Filters

slide-24
SLIDE 24

From: M. Sebag

A Variety of Image Filters

slide-25
SLIDE 25

From: M. Sebag

Traditional vs Deep Learning Approach

slide-26
SLIDE 26

Convolutional Neural Networks (CNNs)

(LeCun 1998) (Krizhevsky et al. 2012)

slide-27
SLIDE 27

Fully- vs Locally-Connected Networks

  • From. M. A. Ranzato

Fully-connected: 400,000 hidden units = 16 billion parameters Locally-connected: 400,000 hidden units 10 x 10 fields = 40 million parameters Local connections capture local dependencies

slide-28
SLIDE 28

Weight Sharing

We can dramatically reduce the number of parameters by making one reasonable assumption: That if one feature is useful to compute at some spatial position (x1,y1), then it should also be useful to compute at a different position (x2,y2).

slide-29
SLIDE 29
slide-30
SLIDE 30

Using Several Trainable Filters

Normally, several filters are packed together and learnt automatically during training

slide-31
SLIDE 31

Pooling

Max pooling is a way to simplify the network architecture, by downsampling the number of neurons resulting from filtering operations.

slide-32
SLIDE 32

Combining Feature Extraction and Classification

slide-33
SLIDE 33

AlexNet (2012)

  • 8 layers total
  • Trained on Imagenet Dataset (1000

categories, 1.2M training images, 150k test images)

slide-34
SLIDE 34

AlexNet Architecture

  • 1st layer: 96 kernels (11 x 11 x 3)
  • Normalized, pooled
  • 2nd layer: 256 kernels (5 x 5 x 48)
  • Normalized, pooled
  • 3rd layer: 384 kernels (3 x 3 x 256)
  • 4th layer: 384 kernels (3 x 3 x 192)
  • 5th layer: 256 kernels (3 x 3 x 192)
  • Followed by 2 fully connected layers, 4096 neurons each
  • Followed by a 1000-way SoftMax layer

650,000 neurons 60 million parameters

slide-35
SLIDE 35

Training on Multiple GPU’s

slide-36
SLIDE 36

Output Layer: Softmax

slide-37
SLIDE 37

Rectified Linear Units (ReLU’s)

Problem: Sigmoid activation takes on values in (0,1). Propagating the gradient back to the initial layers, it tends to become 0 (vanishing gradient problem). From a practical perspective, this slows down the training procedure of the initial layers of the network.

slide-38
SLIDE 38

Rectified Linear Units (ReLU’s)

slide-39
SLIDE 39

Mini-batch Stochastic Gradient Descent

Loop:

  • 1. Sample a batch of data
  • 2. Forward prop it through the graph, get loss
  • 3. Backprop to calculate the gradients
  • 4. Update the parameters using the gradient
slide-40
SLIDE 40

Data Augmentation

The easiest and most common method to reduce overfitting on image data is to artificially enlarge the dataset using label-preserving transformations AlexNet uses two forms of this data augmentation.

  • The first form consists of generating image translations and

horizontal reflections.

  • The second form consists of altering the intensities of the RGB

channels in training images.

slide-41
SLIDE 41

Dropout

Set to zero the output of each hidden neuron with probability 0.5. The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in backpropagation. So every time an input is presented, the neural network samples a different architecture, but all these architectures share weights. Reduces complex co- adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons.

slide-42
SLIDE 42

ImageNet

slide-43
SLIDE 43

Deep learning!

ImageNet Challenges

slide-44
SLIDE 44

ImageNet Challenge 2012

slide-45
SLIDE 45

Revolution of Depth

slide-46
SLIDE 46

A Hierarchy of Features

From: B. Biggio

slide-47
SLIDE 47

Layer 1

Each 3x3 block shows the top 9 patches for

  • ne filter
slide-48
SLIDE 48

Layer 2

slide-49
SLIDE 49

Layer 3

slide-50
SLIDE 50

Layer 3

slide-51
SLIDE 51
slide-52
SLIDE 52
  • A well-trained ConvNet is an excellent feature extractor.
  • Chop the network at desired layer and use the output as a feature

representation to train an SVM on some other dataset (Zeiler-Fergus 2013):

  • Improve further by taking a pre-trained ConvNet and re-training it on a

different dataset (Fine tuning).

Feature Analysis

slide-53
SLIDE 53

Today deep learning, in its several manifestations, is being applied in a variety of different domains besides computer vision, such as:

  • Speech recognition
  • Optical character recognition
  • Natural language processing
  • Autonomous driving
  • Game playing (e.g., Google’s AlphaGo)

Other Success Stories of Deep Learning

slide-54
SLIDE 54

References

  • http://neuralnetworksanddeeplearning.com
  • http://deeplearning.stanford.edu/tutorial/
  • http://www.deeplearningbook.org/
  • http://deeplearning.net/

Platforms:

  • Theano
  • PyTorch
  • TensorFlow