All You Want To Know About CNNs
Yukun Zhu
All You Want To Know About CNNs Yukun Zhu Deep Learning Deep - - PowerPoint PPT Presentation
All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep
Yukun Zhu
Image from http://imgur.com/
Image from http://imgur.com/
Image from http://imgur.com/
Image from http://imgur.com/
Image from http://imgur.com/
Image from http://imgur.com/
33.4 DPM (2010)
Object detection performance, PASCAL VOC 2010
33.4 DPM (2010) 40.4 segDPM (2014)
Object detection performance, PASCAL VOC 2010
33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014)
Object detection performance, PASCAL VOC 2010
33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014)
Object detection performance, PASCAL VOC 2010
62.9 RCNN*
(Oct 2014)
33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) 67.2
segRCNN
(Jan 2015)
Object detection performance, PASCAL VOC 2010
62.9 RCNN*
(Oct 2014)
33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) 67.2
segRCNN
(Jan 2015)
Object detection performance, PASCAL VOC 2010
62.9 RCNN*
(Oct 2014)
70.8
Fast RCNN
(Jun 2015)
Image from http://cs231n.github.io/neural-networks-1/
Image from http://cs231n.github.io/neural-networks-1/
Image modified from http://cs231n.github.io/neural-networks-1/
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37 1.37
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37 1.37 0.73
○ wt+1 = wt - a∇w
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37 1.37 0.73 1.00
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37 1.37 0.73 1.00
f = 1/x df/dx = -1/x2
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37 1.37 0.73 1.00
f = x + 1 df/dx = 1
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37 1.37 0.73 1.00
f = ex df/dx = ex
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37 1.37 0.73 1.00
0.20
f = -x df/dx = -1
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37 1.37 0.73 1.00
0.20 0.20 0.20
f = x + a df/dx = 1
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37 1.37 0.73 1.00
0.20 0.20 0.20 0.20 0.20
Image and code modified from http://cs231n.github.io/optimization-2/
sigmoid 2.00
6.00 4.00 1.00
0.37 1.37 0.73 1.00
0.20 0.20 0.20 0.20 0.40
0.20
f = ax df/dx = a
https://en.wikipedia.org/wiki/Universal_approximation_theorem
○ B separates points. ○ B contains the constant function 1. ○ If f ∈ B then af ∈ B for all constants a ∈ R. ○ If f, g ∈ B, then f + g, max{f, g} ∈ B.
Image from http://cs231n.github.io/convolutional-networks/
Image from http://cs231n.github.io/convolutional-networks/. See this page for an excellent example of convolution.
depth height width
Image from http://cs231n.github.io/convolutional-networks/
Image from http://cs231n.github.io/convolutional-networks/
Image from http://cs231n.github.io/convolutional-networks/
Image modified from http://cs231n.github.io/convolutional-networks/
Conv:1 ReLU:1 Conv:2 ReLU:2 MaxPool:1 Conv:3
Image modified from http://cs231n.github.io/convolutional-networks/
MaxPool:2 Conv:5 ReLU:5 Conv:6 ReLU:6 MaxPool:3
Image from http://cs231n.github.io/convolutional-networks/
○ CPU clusters, GPUs, etc.
○ ImageNet: 1.2m images of 1,000 object classes ○ CoCo: 300k images of 2m object instances
○ ReLU, dropout, inception, etc.
Krizhevsky, Alex, et al. "Imagenet classification with deep convolutional neural networks." NIPS 2012
Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).
○ A common trick: subtract the mean and divide by its std.
Image from http://cs231n.github.io/neural-networks-2/
○ Keep training data balanced ○ Shuffle data before batching
○ Image channel order ○ Tensor storage order
First Order, in order. First Order, out of order.
○ Used in Caffe, Torch, Theano, supported by CuDNN ○ Pros: faster for convolution (FFT, memory access)
○ Used in TensorFlow, limited support by CuDNN ○ Pros: Fast batch normalization, easier batching
○ Stay away from sigmoid (except for output) ○ ReLU preferred ○ Leaky ReLU after ○ Use Maxout if most ReLU units die (have zero activation)
○ Random initialization with proper variance
○ For ReLU we prefer a small positive bias to activate ReLU
○ Decrease learning rate while training ○ Setting momentum to 0.8 - 0.9
○ For large dataset: set to whatever fits your memory ○ For smaller dataset: find a tradeoff between instance randomness and gradient smoothness
○ Optimize your hyperparameter in val and evaluate on test ○ Keep track of training and validation loss during training ○ Do early stopping if training and validation loss diverge ○ Loss doesn’t tell you all. Try precision, class-wise precision, and more
Image from http://www.linkresearchtools.com/
Long, Jonathan, et al. "Fully convolutional networks for semantic segmentation." arXiv preprint arXiv:1411.4038 (2014).