Going Deeper with Convolutions Christian Szegedy, Wei Liu, Yangqing - - PowerPoint PPT Presentation

going deeper with convolutions
SMART_READER_LITE
LIVE PREVIEW

Going Deeper with Convolutions Christian Szegedy, Wei Liu, Yangqing - - PowerPoint PPT Presentation

Going Deeper with Convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich PRESENTED BY: KAYLEE YUHAS AND KYLE COFFEY About Neural Networks


slide-1
SLIDE 1

Going Deeper with Convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich

PRESENTED BY: KAYLEE YUHAS AND KYLE COFFEY

slide-2
SLIDE 2

About Neural Networks

  • Neural networks can be used in many different capacities, often by

capitalizing on their shared skills with AIs:

  • Object classification, such as with images
  • Given images of 2 different wolves, can identify subspecies
  • Speech recognition
  • Through interactive mediums such as video games, identify how people

respond to different stimuli in various environments and situations

  • This work requires a hefty amount of resources to run smoothly
  • Traditional neural network architecture has remained mostly constant
slide-3
SLIDE 3

How to improve on traditional neural network setups?

  • Increasing the performance of a neural network by increasing its size,

while seemingly logically sound, has severe drawbacks:

  • Increased number of parameters makes the network prone to overfitting
  • Larger network size requires more computational resources

l l

Green line: overfitting

slide-4
SLIDE 4

How to improve on traditional neural network setups?

  • How to improve performance without

more hardware?

  • By utilizing computations on dense

matrices

  • This sparse architecture’s name is

Inception, based on the 2010 film of the same name

  • Introducing sparsity into the architecture by replacing fully connected layers

with sparse ones, even inside convolutions, is key.

  • Mimics biological systems
slide-5
SLIDE 5

Inception Architecture: Naïve Version

  • The paper’s authors determined this was the optimal spatial spread, “the decision based

more on convenience than necessity”

l This can be repeated spatially for scaling l This alignment also avoids patch-alignment issues

  • However, 5x5 modules quickly become prohibitively expensive on convolutional layers

with a large number of filters

In short: Inputs come from the previous layer, and go through various convolutional layers. The pooling layer serves to control

  • verfitting by reducing spatial

size.

slide-6
SLIDE 6

Inception Architecture: Dimensionality Reduction

  • By computing reductions with 1x1 convolutions before reaching the more expensive

3x3 and 5x5 convolutions, the necessary processing power is tremendously reduced

l The use of dimensionality reductions allows for significant increases in the

number of units at each stage without having a sharp increase in necessary computational resources at later, more complex stages

slide-7
SLIDE 7

GoogLeNet

  • An iteration of Inception the paper’s authors used as their submission to the 2014

ImageNet Large Scale Visual Recognition Competition (ILSVRC).

  • The network was designed to be so efficient it could run with a low memory footprint
  • n individual devices that have limited computational resources.

l If CNNs are to gain a foothold in private industry, having low overhead costs is

especially important. Here is a small sample

  • f the architecture of

GoogLeNet, where you can note the usage of dimensionality reduction as opposed to the naïve.

slide-8
SLIDE 8

GoogLeNet

  • Because the entirety of the architecture is far too large to fit legibly in one slide.
slide-9
SLIDE 9

GoogLeNet

  • GoogLeNet incarnation of the

Inception architecture.

  • “#3x3/#5x5 reduce” stands for the

number of 1x1 filters in the reduction layer used before 3x3 and 5x5 convolutions.

  • While there are many layers to this,

the main goal of it is to have the final “softmax” layers give “scores” to the image classes.

  • i.e. dogs, skin diseases, etc.
  • Loss function determines how good or bad

each score is.

slide-10
SLIDE 10

GoogLeNet

  • GoogLeNet was 22 layers deep, when counting only layers with parameters.

l 27 if you count pooling l About 100 total layers

  • Could be trained to convergence with a few high-end GPUs in about a week

l The main limitation would be memory usage

  • It was trained to classify images of into one of over 1000 leaf-node image categories in

the ImageNet hierarchy

l ImageNet is a large visual database designed specifically for visual software

recognition research

l GoogLeNet performed quite well in this contest

slide-11
SLIDE 11

GoogLeNet

  • GoogLeNet was 22 layers deep, when counting only layers with parameters: 27 if you count pooling,

with about 100 layers in total.

  • Left: GoogLeNet’s performance at the 2014 ILSVRC: it came in first place.
  • Right: A breakdown of its classification performance breakdown.

l Using multiple different CNNs and averaging their scores to get a prediction class for an

image results in better scores than just 1 CNN. See: the instance with 7 CNNs.

slide-12
SLIDE 12

Summary

  • Convolutional neural networks are still top performers in neural networks.
  • The Inception framework allows for large scaling while minimizing

processing bottlenecks, as well as “choke points” where if it scales to a certain point, it becomes inefficient.

l It also runs well on machines without powerful hardware.

  • Reducing with using 1x1 convolutions before passing it to 3x3 and 5x5

convolutions has proven efficient and effective.

  • Further study: is mimicking the actual biological conditions universally the

best case for neural network architecture?

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2015.7298594 .

  • Chabacano. (2008, February). Overfitting. Retrieved April 08, 2017, from

https://en.wikipedia.org/wiki/Overfitting