Deep Learning in Computer Vision Caner Hazrba Deep Learning in - - PowerPoint PPT Presentation

deep learning in computer vision
SMART_READER_LITE
LIVE PREVIEW

Deep Learning in Computer Vision Caner Hazrba Deep Learning in - - PowerPoint PPT Presentation

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15 Computer Vision Group 6 Postdocs, 16 PhD students Caner Hazrba | vision.in.tum.de Deep Learning in Computer Vision 2 Research in Computer Vision


slide-1
SLIDE 1

Deep Learning in Computer Vision

Deep Learning in Action


  • 24. June ’15

Caner Hazırbaş

slide-2
SLIDE 2

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

Computer Vision Group

2

6 Postdocs, 16 PhD students

slide-3
SLIDE 3

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

Research in Computer Vision

3

Image-based 3D
 Reconstruction Shape Analysis Robot Vision RGB-D Vision Image
 Segmentation Convex 
 Relaxation
 Methods Visual SLAM Optical Flow

slide-4
SLIDE 4

Deep Learning 
 in Computer Vision

slide-5
SLIDE 5

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

How to teach a machine ?

5

edges classifier (or any other hand-crafted features) Person

slide-6
SLIDE 6

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

How to teach a machine ?

6

edges classifier Person (or any other hand-crafted features)

N

  • t

a g

  • d

r e p r e s e n t a t i

  • n
slide-7
SLIDE 7

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

What is deep learning ?

  • Representation learning method


Learning good features automatically from raw data

  • Learning representations of data with multiple levels of abstraction

7

Google’s cat detection neural network

slide-8
SLIDE 8

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

Construction of higher 
 levels of abstraction

8

w1 w2 w3 “non-linear”
 transformation 1 b

slide-9
SLIDE 9

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

Going deeper in the network

9

Input
 ‘Pixels’ 1st and 2nd Layers ‘Edges’ 3rd Layer ‘Object Parts’ 4th Layer ‘Objects’

third layer

faces faces cars airplanes motorbikes

slide-10
SLIDE 10

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

Deep Learning Methods

Unsupervised Methods

  • Restricted Boltzmann Machines
  • Deep Belief Networks
  • Auto encoders: unsupervised feature extraction/learning

10

encode decode

slide-11
SLIDE 11

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

Deep Learning Methods

Supervised Methods

  • Deep Neural Networks
  • Recurrent Neural Networks
  • Convolutional Neural Networks

11

Vision Deep CNN Language Generating RNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand.

slide-12
SLIDE 12

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

How to train a deep network ?

Stochastic Gradient Descent — supervised learning

  • show input vector of few examples
  • compute the output and the errors
  • compute average gradient
  • update the weights accordingly

12

slide-13
SLIDE 13

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

Convolutional Neural Networks

  • CNNs are designed to process the data in the form of multiple arrays

(e.g. 2D images, 3D video/volumetric images)

  • Typical architecture is composed of series of stages: convolutional layers

and pooling layers

  • Each unit is connected to local patches in the feature maps of the

previous layer

13

20 15 x 54 15 x 54 8 x 27 4 x 14 50 500 x 1 378 x 1 E A q y B 4 conv1 pool1 conv pool2 10% 20 8 x 27 50

slide-14
SLIDE 14

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

  • local connections

Key Idea behind 
 Convolutional Networks

14

Convolutional networks take advantage of the properties of natural signals:

slide-15
SLIDE 15

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

  • local connections

Key Idea behind 
 Convolutional Networks

15

  • shared weights

Convolutional networks take advantage of the properties of natural signals:

slide-16
SLIDE 16

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

  • local connections

Key Idea behind 
 Convolutional Networks

16

  • shared weights
  • pooling

Convolutional networks take advantage of the properties of natural signals:

slide-17
SLIDE 17

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

  • local connections

Key Idea behind 
 Convolutional Networks

Convolutional networks take advantage of the properties of natural signals:

17

  • shared weights
  • pooling
  • the use of many layers

Person

slide-18
SLIDE 18

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

Pros & Cons

  • Best performing method in many

Computer Vision tasks

  • No need of hand-crafted features
  • Most applicable method for large-

scale problems, e.g. classification

  • f 1000 classes
  • Easy parallelization on GPUs

18

  • Need of huge amount of training

data

  • Hard to train (local minima problem,

tuning hyper-parameters)

  • Difficult to analyse (to be solved)
slide-19
SLIDE 19

Deep Learning Applications
 in Computer Vision

slide-20
SLIDE 20

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

Handwritten Digit Recognition

20

slide-21
SLIDE 21

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)

21

slide-22
SLIDE 22

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNet: Learning Optical Flow with Convolutional Networks

22

in collaboration with University of Freiburg
 lmb.informatik.uni-freiburg.de

slide-23
SLIDE 23

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNet: Learning Optical Flow with Convolutional Networks

23

slide-24
SLIDE 24

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNet: Learning Optical Flow with Convolutional Networks

24

96 x 128 9 192 x 256 6 64 128 256 256 512 512 512 512 1024 5 x 5 5 x 5 3 x 3 conv6 prediction conv5_1 conv5 conv4_1 conv4 conv3_1 conv3 conv2 conv1 136 x 320 7 x 7 384 x 512 refine- ment conv1 conv2 conv3 corr conv_redir conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 3 64 128 256 441 32 473 256 512 512 512 512 1024 384 x 512 sqrt prediction 136 x 320 refine- ment 4 x 512 4 x 512 2 kernel 7 x 7 5 x 5 1 x 1 1 x 1 3 x 3

FlowNetSimple FlowNetCorr

slide-25
SLIDE 25

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNet: Learning Optical Flow with Convolutional Networks

25

corr conv_redir 441 256 sqrt kernel 1 x 1 1 x 1 3 x 3

slide-26
SLIDE 26

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

FlowNet: Learning Optical Flow with Convolutional Networks

26

slide-27
SLIDE 27

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

From Image to Caption

27

Vision Deep CNN Language Generating RNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand.

A woman is throwing a frisbee in a park. A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A girafge standing in a forest with trees in the background. A dog is standing on a hardwood fmoor. A stop sign is on a road with a mountain in the background

slide-28
SLIDE 28

Vision Deep CNN Language Generating RNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand.

Deep Learning in Computer Vision

Questions ?

End of
 Presentation

Caner Hazırbaş | hazirbas@cs.tum.edu

slide-29
SLIDE 29

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

References

  • Building High-level Features Using Large Scale Unsupervised Learning


Quoc V. Le , Rajat Monga , Matthieu Devin , Kai Chen , Greg S. Corrado , Jeff Dean , Andrew Y. Ng ICML’12

  • Convolutional Deep Belief Networks for Scalable Unsupervised Learning of

Hierarchical Representations


Honglak Lee Roger Grosse Rajesh Ranganath Andrew Y. Ng ICML’09

  • ImageNet Classification with Deep Convolutional Neural Networks


Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton NIPS’12

  • Gradient-based learning applied to document recognition.

  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner Proceedings of the IEEE’98
  • FlowNet: Learning Optical Flow with Convolutional Networks


Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox

29

slide-30
SLIDE 30

Deep Learning in Computer Vision Caner Hazırbaş | vision.in.tum.de

References

  • Google’s cat detection neural network http://www.resnap.com/image-

selection-technology/deep-learning-image-classification/

  • Example auto-encoder : http://nghiaho.com/?p=1765
  • SGD : http://blog.datumbox.com/tuning-the-learning-rate-in-gradient-

descent/

30