Convolutional Neural Networs for Image Classification
Marcello Pelillo University of Venice, Italy Image and Video Understanding
a.y. 2018/19
Convolutional Neural Networs for Image Classification Marcello - - PowerPoint PPT Presentation
Convolutional Neural Networs for Image Classification Marcello Pelillo University of Venice, Italy Image and Video Understanding a.y. 2018/19 The Age of Deep Learning The Deep Learning Philosophy Learn a feature hierarchy all
a.y. 2018/19
really good at this.
Predict a single label (or a distribution over labels as shown here to indicate our confidence) for a given image. Images are 3-dimensional arrays of integers from 0 to 255, of size Width x Height x 3. The 3 represents the three color channels Red, Green, Blue.
From: A. Karpathy
From: A. Karpathy
An example training set for four visual categories. In practice we may have thousands of categories and hundreds of thousands of images for each category.
From: A. Karpathy
Laplacian of Gaussians (LoG) (Marr 1982)
Gabor filters (directional) (Daugman 1985)
From: M. Sebag
From: M. Sebag
(LeCun 1998) (Krizhevsky et al. 2012)
Fully-connected: 400,000 hidden units = 16 billion parameters Locally-connected: 400,000 hidden units 10 x 10 fields = 40 million parameters Local connections capture local dependencies
We can dramatically reduce the number of parameters by making one reasonable assumption: That if one feature is useful to compute at some spatial position (x1,y1), then it should also be useful to compute at a different position (x2,y2).
Normally, several filters are packed together and learnt automatically during training
Max pooling is a way to simplify the network architecture, by downsampling the number of neurons resulting from filtering operations.
categories, 1.2M training images, 150k test images)
650,000 neurons 60 million parameters
Problem: Sigmoid activation takes on values in (0,1). Propagating the gradient back to the initial layers, it tends to become 0 (vanishing gradient problem). From a practical perspective, this slows down the training procedure of the initial layers of the network.
Loop:
The easiest and most common method to reduce overfitting on image data is to artificially enlarge the dataset using label-preserving transformations AlexNet uses two forms of this data augmentation.
horizontal reflections.
channels in training images.
Set to zero the output of each hidden neuron with probability 0.5. The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in backpropagation. So every time an input is presented, the neural network samples a different architecture, but all these architectures share weights. Reduces complex co- adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons.
Deep learning!
From: B. Biggio
Each 3x3 block shows the top 9 patches for
representation to train an SVM on some other dataset (Zeiler-Fergus 2013):
different dataset (Fine tuning).
Today deep learning, in its several manifestations, is being applied in a variety of different domains besides computer vision, such as:
Platforms: