Convolutional Neural Networks for Computer Vision Caner Hazrba - - PowerPoint PPT Presentation

convolutional neural networks for computer vision
SMART_READER_LITE
LIVE PREVIEW

Convolutional Neural Networks for Computer Vision Caner Hazrba - - PowerPoint PPT Presentation

Convolutional Neural Networks for Computer Vision Caner Hazrba Centrum fr Informations- und Sprachverarbeitung 24. November 15 Computer Vision Group 5 Postdocs, 24 PhD students Caner Hazrba | vision.in.tum.de Convolutional


slide-1
SLIDE 1

Convolutional Neural Networks for Computer Vision

Centrum für Informations- und Sprachverarbeitung


  • 24. November ’15

Caner Hazırbaş

slide-2
SLIDE 2

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Computer Vision Group

2

5 Postdocs, 24 PhD students

slide-3
SLIDE 3

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Research in Computer Vision

3

slide-4
SLIDE 4

Convolutional Neural Networks for Computer Vision

slide-5
SLIDE 5

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

What is deep learning ?

  • Representation learning method


Learning good features automatically from raw data

  • Learning representations of data with multiple levels of abstraction

5

Google’s cat detection neural network

slide-6
SLIDE 6

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Going deeper in the network

6

Input
 ‘Pixels’ 1st and 2nd Layers ‘Edges’ 3rd Layer ‘Object Parts’ 4th Layer ‘Objects’

third layer

faces faces cars airplanes motorbikes

slide-7
SLIDE 7

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Deep Learning Methods

Unsupervised Methods

  • Restricted Boltzmann Machines
  • Deep Belief Networks
  • Auto encoders: unsupervised feature extraction/learning

7

encode decode

slide-8
SLIDE 8

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Deep Learning Methods

Supervised Methods

  • Deep Neural Networks
  • Recurrent Neural Networks
  • Convolutional Neural Networks

8

Vision Deep CNN Language Generating RNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand.

slide-9
SLIDE 9

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

How to train a deep network ?

Stochastic Gradient Descent — supervised learning

  • show input vector of few examples
  • compute the output and the errors
  • compute average gradient
  • update the weights accordingly

9

slide-10
SLIDE 10

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

How to train a deep network ?

Alternatives:

  • AdaGrad, AdaDelta, NAG (Nesterov’s Accelerated Gradient)…
  • ADAM (now in Caffe - http://caffe.berkeleyvision.org/tutorial/solver.html)


The Adam is a gradient-based optimization method (like SGD). This includes an “adaptive moment estimation” (mt,vt) and can be regarded as a generalization of AdaGrad. The update formulas are:

10

  • D. Kingma, J. Ba. Adam: A Method for Stochastic Optimization. International Conference for Learning Representations, 2015

(mt)i = β1(mt−1)i + (1 β1)(rL(Wt))i, (vt)i = β2(vt−1)i + (1 β2)(rL(Wt))2

i

(Wt+1)i = (Wt)i − α p 1 − (β2)t

i

1 − (β1)t

i

(mt)i p (vt)i + ε .

slide-11
SLIDE 11

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Convolutional Neural Networks

  • CNNs are designed to process the data in the form of multiple arrays

(e.g. 2D images, 3D video/volumetric images)

  • Typical architecture is composed of series of stages: convolutional layers

and pooling layers

  • Each unit is connected to local patches in the feature maps of the

previous layer

11

20 15 x 54 15 x 54 8 x 27 4 x 14 50 500 x 1 378 x 1 E A q y B 4 conv1 pool1 conv pool2 10% 20 8 x 27 50

slide-12
SLIDE 12

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

  • local connections

Key Idea behind 
 Convolutional Networks

Convolutional networks take advantage of the properties of natural signals:

12

  • shared weights
  • pooling
  • the use of many layers

Person

slide-13
SLIDE 13

FlowNet: Learning Optical Flow with Convolutional Networks

Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Thomas Brox Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Daniel Cremers, Patrick van der Smagt

slide-14
SLIDE 14

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Flying Chairs

14

slide-15
SLIDE 15

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Flying Chairs

15

50 100 0.5 1 1.5 2 2.5 3 x 10

9

Flying Chairs Displacement (px) Number of pixels 50 100 0.5 1 1.5 2 2.5 x 10

8

Sintel Displacement (px) Number of pixels 50 100 Flying Chairs Displacement (px) Number of pixels (log scale) 50 100 10

6

10

8

Sintel Displacement (px) Number of pixels (log scale)

slide-16
SLIDE 16

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Data Augmentation

16

Generated Augmented

  • translation, rotation, scaling, additive Gaussian noise
  • changes in brightness, contrast, gamma and colour
slide-17
SLIDE 17

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNetSimple

17

96 x 128 9 192 x 256 6 64 128 256 256 512 512 512 512 1024 5 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5 conv4_1 conv4 conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment

FlowNetSimple

slide-18
SLIDE 18

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNetSimple - Flying Chairs

18

96 x 128 9 192 x 256 6 64 128 256 256 512 512 512 512 1024 5 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5 conv4_1 conv4 conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment

FlowNetSimple

slide-19
SLIDE 19

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNetSimple - Sintel

19

96 x 128 9 192 x 256 6 64 128 256 256 512 512 512 512 1024 5 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5 conv4_1 conv4 conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment

FlowNetSimple

slide-20
SLIDE 20

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNetCorr

20

conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3

FlowNetCorr

slide-21
SLIDE 21

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Correlation Layer

21

corr conv_redir 441 256 sqrt kernel 1 x 1 1 x 1 3 x 3

slide-22
SLIDE 22

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNetCorr - Flying Chairs

22

conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3

FlowNetCorr

slide-23
SLIDE 23

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Simple vs. Corr - Flying Chairs

23

conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3

FlowNetCorr

FlowNetS FlowNetCorr

slide-24
SLIDE 24

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNetCorr - Sintel

24

conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3

FlowNetCorr

slide-25
SLIDE 25

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

Simple vs. Corr - Sintel

25

conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3

FlowNetCorr

FlowNetS FlowNetCorr

slide-26
SLIDE 26

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNetSimple + Variational Smoothing

26

slide-27
SLIDE 27

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNet: Learning Optical Flow with Convolutional Networks

27

slide-28
SLIDE 28

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

References

  • Building High-level Features Using Large Scale Unsupervised Learning


Quoc V. Le , Rajat Monga , Matthieu Devin , Kai Chen , Greg S. Corrado , Jeff Dean , Andrew Y. Ng ICML’12

  • Convolutional Deep Belief Networks for Scalable Unsupervised Learning of

Hierarchical Representations


Honglak Lee Roger Grosse Rajesh Ranganath Andrew Y. Ng ICML’09

  • ImageNet Classification with Deep Convolutional Neural Networks


Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton NIPS’12

  • Gradient-based learning applied to document recognition.

  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner Proceedings of the IEEE’98
  • FlowNet: Learning Optical Flow with Convolutional Networks


Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox

28

slide-29
SLIDE 29

Convolutional Neural Networks for Computer Vision Caner Hazırbaş | vision.in.tum.de

References

  • Google’s cat detection neural network http://www.resnap.com/image-

selection-technology/deep-learning-image-classification/

  • Example auto-encoder : http://nghiaho.com/?p=1765
  • SGD : http://blog.datumbox.com/tuning-the-learning-rate-in-gradient-

descent/

29