Convolutional Neural Networks for Computer Vision
Centrum für Informations- und Sprachverarbeitung
- 24. November ’15
Convolutional Neural Networks for Computer Vision Caner Hazrba - - PowerPoint PPT Presentation
Convolutional Neural Networks for Computer Vision Caner Hazrba Centrum fr Informations- und Sprachverarbeitung 24. November 15 Computer Vision Group 5 Postdocs, 24 PhD students Caner Hazrba | vision.in.tum.de Convolutional
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
2
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
3
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
Learning good features automatically from raw data
5
Google’s cat detection neural network
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
6
Input ‘Pixels’ 1st and 2nd Layers ‘Edges’ 3rd Layer ‘Object Parts’ 4th Layer ‘Objects’
third layer
faces faces cars airplanes motorbikes
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
Unsupervised Methods
7
encode decode
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
Supervised Methods
8
Vision Deep CNN Language Generating RNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand.
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
Stochastic Gradient Descent — supervised learning
9
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
Alternatives:
The Adam is a gradient-based optimization method (like SGD). This includes an “adaptive moment estimation” (mt,vt) and can be regarded as a generalization of AdaGrad. The update formulas are:
10
(mt)i = β1(mt−1)i + (1 β1)(rL(Wt))i, (vt)i = β2(vt−1)i + (1 β2)(rL(Wt))2
i
(Wt+1)i = (Wt)i − α p 1 − (β2)t
i
1 − (β1)t
i
(mt)i p (vt)i + ε .
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
(e.g. 2D images, 3D video/volumetric images)
and pooling layers
previous layer
11
20 15 x 54 15 x 54 8 x 27 4 x 14 50 500 x 1 378 x 1 E A q y B 4 conv1 pool1 conv pool2 10% 20 8 x 27 50
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
Convolutional networks take advantage of the properties of natural signals:
12
Person
Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Thomas Brox Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Daniel Cremers, Patrick van der Smagt
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
14
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
15
50 100 0.5 1 1.5 2 2.5 3 x 10
9
Flying Chairs Displacement (px) Number of pixels 50 100 0.5 1 1.5 2 2.5 x 10
8
Sintel Displacement (px) Number of pixels 50 100 Flying Chairs Displacement (px) Number of pixels (log scale) 50 100 10
6
10
8
Sintel Displacement (px) Number of pixels (log scale)
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
16
Generated Augmented
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
17
96 x 128 9 192 x 256 6 64 128 256 256 512 512 512 512 1024 5 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5 conv4_1 conv4 conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment
FlowNetSimple
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
18
96 x 128 9 192 x 256 6 64 128 256 256 512 512 512 512 1024 5 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5 conv4_1 conv4 conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment
FlowNetSimple
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
19
96 x 128 9 192 x 256 6 64 128 256 256 512 512 512 512 1024 5 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5 conv4_1 conv4 conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment
FlowNetSimple
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
20
conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3
FlowNetCorr
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
21
corr conv_redir 441 256 sqrt kernel 1 x 1 1 x 1 3 x 3
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
22
conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3
FlowNetCorr
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
23
conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3
FlowNetCorr
FlowNetS FlowNetCorr
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
24
conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3
FlowNetCorr
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
25
conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3
FlowNetCorr
FlowNetS FlowNetCorr
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
26
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
27
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
Quoc V. Le , Rajat Monga , Matthieu Devin , Kai Chen , Greg S. Corrado , Jeff Dean , Andrew Y. Ng ICML’12
Hierarchical Representations
Honglak Lee Roger Grosse Rajesh Ranganath Andrew Y. Ng ICML’09
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton NIPS’12
Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox
28
Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de
selection-technology/deep-learning-image-classification/
descent/
29