DEEP LEARNING
FFR135, Artificial Neural Networks Olof Mogren
Chalmers University of Technology
October 2016
DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren - - PowerPoint PPT Presentation
DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren Chalmers University of Technology October 2016 DEEP LEARNING Artificial neural networks Many layers of abstractions Outperforms traditional methods in: Image
FFR135, Artificial Neural Networks Olof Mogren
Chalmers University of Technology
October 2016
layerwise pretrained Restricted Boltzmann Machines
Real applications from Google, Facebook, Tesla, Microsoft, Apple, and others!
A fast learning algorithm for deep belief nets; Hinton, Osindero, Tehi; Neural Computation; 2006
(e.g. XOR)
inputs
x x
2
x
1
x
4
x
3
w w
2
w
1
w
4
w
3
y
x0 1 1 1 1 x1
x0 1 1 1 1 x1 x0 ∧ ¬x1 1 1 1 1 ¬x0 ∧ x1
represent non-linear functions
a = Wx + b
activation: h = g(a)
inputs hidden layer
(SGD)
inputs hidden layer
details
Exploring Strategies for Training Deep Neural Networks; Larochelle, Bengio, Louradour, Lamblin; JMLR 2009
non-linear transformation of inputs: h = sigmoid(Wx + b)
n binary parameters → n values
n binary parameters → 2n possible values
cs231n.stanford.edu playground.tensorflow.org
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
32 32 3
width height depth
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
32 32 3
Convolve the filter with the image i.e. “slide over the image spatially, computing dot products”
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
32 32 3
Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” Filters always extend the full depth of the input volume
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
32 32 3
1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
32 32 3
convolve (slide) over all spatial locations activation map 1 28 28
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
32 32 3
convolve (slide) over all spatial locations activation maps 1 28 28
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
32 32 3 Convolution Layer activation maps 6 28 28
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions 32 32 3 28 28 6 CONV, ReLU e.g. 6 5x5x3 filters
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 32 32 3 CONV, ReLU e.g. 6 5x5x3 filters 28 28 6 CONV, ReLU e.g. 10 5x5x6 filters CONV, ReLU
10 24 24
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
(32 total) We call the layer convolutional because it is related to convolution
elementwise multiplication and sum of a filter and the signal (image)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
preview:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
32 32 3
convolve (slide) over all spatial locations activation map 1 28 28
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Remember back to… E.g. 32x32 input convolved repeatedly with 5x5 filters shrinks volumes spatially! (32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well. 32 32 3 CONV, ReLU e.g. 6 5x5x3 filters 28 28 6 CONV, ReLU e.g. 10 5x5x6 filters CONV, ReLU
10 24 24
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
64 56 56 1x1 CONV with 32 filters 32 56 56 (each filter has size 1x1x64, and performs a 64-dimensional dot product)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
max pool with 2x2 filters and stride 2
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson
Networks
Improving neural networks by preventing co-adaptation of feature detectors; Hinton, Srivastava, Krizhevsky, Sutskever, Salakhutdinov; (2012); arXiv:1207.0580 more on regularization
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift; Ioffe, Szegedy; arXiv:1502.03167
+
Deep Residual Learning for Image Recognition; He, Zhang, Ren, Sun; arXiv:1512.03385
3.57
ImageNet Classification top-5 error (%)
shallow 8 layers 19 layers 22 layers
152 layers
K aiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition” . arXiv 2015.
8 layers
28.2 25.8 16.4 11.7 7.3 6.7 ILSVRC'10 ILSVRC'11 ILSVRC'12 AlexNet ILSVRC'14 VGG ILSVRC'14 GoogleNet ILSVRC'15 ResNet ILSVRC'13
minimum
The loss surfaces of multilayer networks; Choromanska, et.al.; AISTATS 2015 Identifying and attacking the saddle point problem in high-dimensional non-convex
Yoshua Bengio
Andrej Karpathy details
xt-2 xt-1 xt
LSTM LSTM LSTM Classification
x3 x2 x1 y3 y2 y1
encoder
decoder
Sequence to sequence learning with neural networks; Sutskever, Vinyals, Le; NIPS 2014 Neural machine translation by jointly learning to align and translate; Bahdanau, Cho, Bengio; ICLR 2015
results comparable to human translators!
Google’s neural machine translation system: Bridging the gap between human and machine translation; Yonghui Wu, et.al.; arXiv 1609.08144
more
http://mogren.one/