Going Deeper with Convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich
PRESENTED BY: KAYLEE YUHAS AND KYLE COFFEY
Going Deeper with Convolutions Christian Szegedy, Wei Liu, Yangqing - - PowerPoint PPT Presentation
Going Deeper with Convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich PRESENTED BY: KAYLEE YUHAS AND KYLE COFFEY About Neural Networks
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich
PRESENTED BY: KAYLEE YUHAS AND KYLE COFFEY
capitalizing on their shared skills with AIs:
respond to different stimuli in various environments and situations
while seemingly logically sound, has severe drawbacks:
l l
Green line: overfitting
more hardware?
matrices
Inception, based on the 2010 film of the same name
with sparse ones, even inside convolutions, is key.
more on convenience than necessity”
l This can be repeated spatially for scaling l This alignment also avoids patch-alignment issues
with a large number of filters
In short: Inputs come from the previous layer, and go through various convolutional layers. The pooling layer serves to control
size.
3x3 and 5x5 convolutions, the necessary processing power is tremendously reduced
l The use of dimensionality reductions allows for significant increases in the
number of units at each stage without having a sharp increase in necessary computational resources at later, more complex stages
ImageNet Large Scale Visual Recognition Competition (ILSVRC).
l If CNNs are to gain a foothold in private industry, having low overhead costs is
especially important. Here is a small sample
GoogLeNet, where you can note the usage of dimensionality reduction as opposed to the naïve.
Inception architecture.
number of 1x1 filters in the reduction layer used before 3x3 and 5x5 convolutions.
the main goal of it is to have the final “softmax” layers give “scores” to the image classes.
each score is.
l 27 if you count pooling l About 100 total layers
l The main limitation would be memory usage
the ImageNet hierarchy
l ImageNet is a large visual database designed specifically for visual software
recognition research
l GoogLeNet performed quite well in this contest
with about 100 layers in total.
l Using multiple different CNNs and averaging their scores to get a prediction class for an
image results in better scores than just 1 CNN. See: the instance with 7 CNNs.
processing bottlenecks, as well as “choke points” where if it scales to a certain point, it becomes inefficient.
l It also runs well on machines without powerful hardware.
convolutions has proven efficient and effective.
best case for neural network architecture?
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2015.7298594 .
https://en.wikipedia.org/wiki/Overfitting