Convolutional Neural Networs for Image Classification Marcello - - PowerPoint PPT Presentation

▶

Jun 22, 2023 589 likes •1.14k views

Convolutional Neural Networs for Image Classification Marcello Pelillo University of Venice, Italy Image and Video Understanding a.y. 2018/19 The Age of Deep Learning The Deep Learning Philosophy Learn a feature hierarchy all

SLIDE 1

Convolutional Neural Networs for Image Classification

Marcello Pelillo University of Venice, Italy Image and Video Understanding

a.y. 2018/19

SLIDE 2

The Age of “Deep Learning”

SLIDE 3

The Deep Learning “Philosophy”

Learn a feature hierarchy all the way from pixels to classifier
Each layer extracts features from the output of previous layer
Train all layers jointly

SLIDE 4

Performance Improves with More Data

SLIDE 5

Old Idea… Why Now?

1. We have more data - from Lena to ImageNet.
2. We have more computing power, GPUs are

really good at this.

3. Last but not least, we have new ideas

SLIDE 6

Image Classification

Predict a single label (or a distribution over labels as shown here to indicate our confidence) for a given image. Images are 3-dimensional arrays of integers from 0 to 255, of size Width x Height x 3. The 3 represents the three color channels Red, Green, Blue.

From: A. Karpathy

SLIDE 7

Challenges

From: A. Karpathy

SLIDE 8

The Data-Driven Approach

An example training set for four visual categories. In practice we may have thousands of categories and hundreds of thousands of images for each category.

From: A. Karpathy

SLIDE 9

Inspiration from Biology

SLIDE 10

The Visual System as a Hierarchy of Feature Detectors

SLIDE 11

Convolution

SLIDE 12

Convolution

SLIDE 13

Mean Filters

SLIDE 14

Gaussian Filters

SLIDE 15

Gaussian Filters

SLIDE 16

The Effect of Gaussian Filters

SLIDE 17

The Effect of Gaussian Filters

SLIDE 18

Kernel Width Affects Scale

SLIDE 19

Edge detection

SLIDE 20

Edge detection

SLIDE 21

Using Convolution for Edge Detection

SLIDE 22

A Variety of Image Filters

Laplacian of Gaussians (LoG) (Marr 1982)

SLIDE 23

Gabor filters (directional) (Daugman 1985)

A Variety of Image Filters

SLIDE 24

From: M. Sebag

A Variety of Image Filters

SLIDE 25

From: M. Sebag

Traditional vs Deep Learning Approach

SLIDE 26

Convolutional Neural Networks (CNNs)

(LeCun 1998) (Krizhevsky et al. 2012)

SLIDE 27

Fully- vs Locally-Connected Networks

From. M. A. Ranzato

Fully-connected: 400,000 hidden units = 16 billion parameters Locally-connected: 400,000 hidden units 10 x 10 fields = 40 million parameters Local connections capture local dependencies

SLIDE 28

Weight Sharing

We can dramatically reduce the number of parameters by making one reasonable assumption: That if one feature is useful to compute at some spatial position (x1,y1), then it should also be useful to compute at a different position (x2,y2).

SLIDE 29

SLIDE 30

Using Several Trainable Filters

Normally, several filters are packed together and learnt automatically during training

SLIDE 31

Pooling

Max pooling is a way to simplify the network architecture, by downsampling the number of neurons resulting from filtering operations.

SLIDE 32

Combining Feature Extraction and Classification

SLIDE 33

AlexNet (2012)

8 layers total
Trained on Imagenet Dataset (1000

categories, 1.2M training images, 150k test images)

SLIDE 34

AlexNet Architecture

1st layer: 96 kernels (11 x 11 x 3)
Normalized, pooled
2nd layer: 256 kernels (5 x 5 x 48)
Normalized, pooled
3rd layer: 384 kernels (3 x 3 x 256)
4th layer: 384 kernels (3 x 3 x 192)
5th layer: 256 kernels (3 x 3 x 192)
Followed by 2 fully connected layers, 4096 neurons each
Followed by a 1000-way SoftMax layer

650,000 neurons 60 million parameters

SLIDE 35

Training on Multiple GPU’s

SLIDE 36

Output Layer: Softmax

SLIDE 37

Rectified Linear Units (ReLU’s)

Problem: Sigmoid activation takes on values in (0,1). Propagating the gradient back to the initial layers, it tends to become 0 (vanishing gradient problem). From a practical perspective, this slows down the training procedure of the initial layers of the network.

SLIDE 38

Rectified Linear Units (ReLU’s)

SLIDE 39

Mini-batch Stochastic Gradient Descent

Loop:

1. Sample a batch of data
2. Forward prop it through the graph, get loss
3. Backprop to calculate the gradients
4. Update the parameters using the gradient

SLIDE 40

Data Augmentation

The easiest and most common method to reduce overfitting on image data is to artificially enlarge the dataset using label-preserving transformations AlexNet uses two forms of this data augmentation.

The first form consists of generating image translations and

horizontal reflections.

The second form consists of altering the intensities of the RGB

channels in training images.

SLIDE 41

Dropout

Set to zero the output of each hidden neuron with probability 0.5. The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in backpropagation. So every time an input is presented, the neural network samples a different architecture, but all these architectures share weights. Reduces complex co- adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons.

SLIDE 42

ImageNet

SLIDE 43

Deep learning!

ImageNet Challenges

SLIDE 44

ImageNet Challenge 2012

SLIDE 45

Revolution of Depth

SLIDE 46

A Hierarchy of Features

From: B. Biggio

SLIDE 47

Layer 1

Each 3x3 block shows the top 9 patches for

ne filter

SLIDE 48

Layer 2

SLIDE 49

Layer 3

SLIDE 50

Layer 3

SLIDE 51

SLIDE 52

A well-trained ConvNet is an excellent feature extractor.
Chop the network at desired layer and use the output as a feature

representation to train an SVM on some other dataset (Zeiler-Fergus 2013):

Improve further by taking a pre-trained ConvNet and re-training it on a

different dataset (Fine tuning).

Feature Analysis

SLIDE 53

Today deep learning, in its several manifestations, is being applied in a variety of different domains besides computer vision, such as:

Speech recognition
Optical character recognition
Natural language processing
Autonomous driving
Game playing (e.g., Google’s AlphaGo)
…

References

http://neuralnetworksanddeeplearning.com
http://deeplearning.stanford.edu/tutorial/
http://www.deeplearningbook.org/
http://deeplearning.net/

Platforms:

Theano
PyTorch
TensorFlow
…

Convolutional Neural Networs for Image Classification

Marcello Pelillo University of Venice, Italy Image and Video Understanding

The Age of “Deep Learning”

The Deep Learning “Philosophy”

Performance Improves with More Data

Old Idea… Why Now?

Image Classification

Challenges

The Data-Driven Approach

Inspiration from Biology

The Visual System as a Hierarchy of Feature Detectors

Convolution

Convolution

Mean Filters

Gaussian Filters

Gaussian Filters

The Effect of Gaussian Filters

The Effect of Gaussian Filters

Kernel Width Affects Scale

Edge detection

Edge detection

Using Convolution for Edge Detection

A Variety of Image Filters

A Variety of Image Filters

A Variety of Image Filters

Traditional vs Deep Learning Approach

Convolutional Neural Networks (CNNs)

Fully- vs Locally-Connected Networks

Weight Sharing

Using Several Trainable Filters

Pooling

Combining Feature Extraction and Classification

AlexNet (2012)

AlexNet Architecture

Training on Multiple GPU’s

Output Layer: Softmax

Rectified Linear Units (ReLU’s)

Rectified Linear Units (ReLU’s)

Mini-batch Stochastic Gradient Descent

Data Augmentation

Dropout

ImageNet

ImageNet Challenges

ImageNet Challenge 2012

Revolution of Depth

A Hierarchy of Features

Layer 1

Layer 2

Layer 3

Layer 3

Feature Analysis

Other Success Stories of Deep Learning

References