Deep learning for dense per-pixel prediction Chunhua Shen The - - PowerPoint PPT Presentation

deep learning for dense per pixel prediction
SMART_READER_LITE
LIVE PREVIEW

Deep learning for dense per-pixel prediction Chunhua Shen The - - PowerPoint PPT Presentation

Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Convolution Neural Networks [Krizhevsky et al., 2012] 16.4% error Image Classification (AlexNet) Classification error 0.3


slide-1
SLIDE 1

Deep learning for dense per-pixel prediction

Chunhua Shen The University of Adelaide, Australia

slide-2
SLIDE 2

Image understanding

slide-3
SLIDE 3

Convolution Neural Networks

0.1 0.2 0.3 2010 2011 2012 2013 2014 2015

Image Classification

Classification error ILSVRC year [Krizhevsky et al., 2012] 16.4% error (AlexNet) [Zeiler et al., 2013] 11.1% error [Szegedt et al., 2014] 6.6% error (GoogLeNet) [Simonyan et al., 2014] 7.3% error (VGGNet) [He et al., 2015] 3.6% error (ResNet)

slide-4
SLIDE 4

Google’s best reported results 2016 “Wider or Deeper: Revisiting the ResNet Model for Visual Recognition”, arXiv:1611.10080

slide-5
SLIDE 5

Image understanding

slide-6
SLIDE 6

Image understanding

slide-7
SLIDE 7
slide-8
SLIDE 8

Depth Estimation From Single Monocular Images

slide-9
SLIDE 9

Depth Estimation From Single Monocular Images

  • Useful

– Scene understanding – 3D modelling – Benefit other vision tasks

  • e.g., semantic labellings, pose estimations
  • Challenging

– No reliable depth cues

  • e.g., stereo correspondence, motion information
slide-10
SLIDE 10
slide-11
SLIDE 11

11/09/2015 19

Deep$ Convolutional$ Neural$ Fields

Deep convolutional neural fields

slide-12
SLIDE 12

38

Prediction*examples:*NYU*v2

Deep convolutional neural fields

slide-13
SLIDE 13

Prediction*examples:*Make*3D

Deep convolutional neural fields

slide-14
SLIDE 14

Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields Fayao Liu, Chunhua Shen, Guosheng Lin CVPR2015 http://arxiv.org/abs/1502.07411

Conclusion

  • Deep convolutional neural fields for monocular image

depth estimations

  • Combine deep CNN and continuous CRF
  • General learning framework
slide-15
SLIDE 15

Motivation:

  • Limited metric RGB-D data in diversity and quantity.
  • Relative depth has been proven to be an informative cue.
  • Relative depth can be easily acquired from vast stereo videos.

Highlights:

  • A new Relative Depth in Stereo (RDIS) dataset is proposed.
  • Densely labelled relative depth using existing stereo matching methods.
  • State-of-the-art results on benchmark Depth Estimation datasets.

Monocular Depth Estimation with Augmented Ordinal Depth Relationships

slide-16
SLIDE 16

Overview

  • 1. Acquire relative depth from stereo videos.
  • 2. Pretrain a deep ResNet with relative depths.
  • 3. Finetune the ResNet with metric depths.

ResNet

Predicted Depth

1 2 3

slide-17
SLIDE 17

Relative Depth Generation

  • 1. Use the absolute difference (AD) matching cost and the semi-global

matching (SGM) method to generate the initial disparity maps.

  • 2. Post-process the disparity maps: Correct vague or missing

boundaries of objects, and smooth disparities within objects and

  • background. This is done by experienced workers from movie

production companies.

Image Initial Post

slide-18
SLIDE 18

Results

State-of-the-art results on NYUD2

State-of-the-art results on KITTI

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

Semantic pixel labelling using FCN

slide-22
SLIDE 22
slide-23
SLIDE 23

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

slide-24
SLIDE 24
  • 1. Standard multi-layer CNNs, such as ResNet (a):

producing low-resolution (down-sampled) feature maps; fine structures/details are lost.

  • 2. Dilated convolutions (b):

Resulting high-resolution and high-dimension feature maps; computationally expensive and huge memory consumption if generating large resolution

  • utput.

Existing approaches

slide-25
SLIDE 25

Our approach

Exploits various levels of detail at different stages of convolutions and fuses them to obtain a high-resolution prediction without the need to maintain large intermediate feature maps

slide-26
SLIDE 26
slide-27
SLIDE 27
  • 2. Effective gradient propagation with identity mappings through short and long

range connections

Our cascaded RefineNets can be effectively trained end-to-end, which is crucial for best prediction performance. All components in RefineNet employ residual connections with identity mappings, such that gradients can be directly propagated through short-range and long-range residual connections allowing for both effective and efficient end-to-end training.

  • 3. Chained residual pooling

We propose a new network component we call “chained residual pooling” which is able to capture background context from a large image region. It does so by efficiently pooling features with multiple window sizes and fusing them together with residual connections and learnable weights.

Highlights

  • 1. Exploits features at multiple levels of abstraction for high-resolution output.

RefineNet refines low-resolution (coarse) semantic features with fine-grained low-level features in a recursive manner to generate high-resolution semantic feature maps. Our model is flexible in that it can be cascaded and modified in various ways.

slide-28
SLIDE 28

Flexible network architectures

slide-29
SLIDE 29

Experiments

Our source code is available at: https://github.com/guosheng/refinenet

slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33

15 FPS with 720P input on a single GPU

slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37

Low-level image processing with very deep FCN

slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42

Denoise

slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46

Super-resolution

slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50

Deblur

slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54

Enhancing JPEG images

slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58

Inpainting

slide-59
SLIDE 59

Inpainting

slide-60
SLIDE 60

. Superior results on Denoising, & Super

resolution . Many other low-level image processing tasks: . Deblur . Dehaze

Image Restoration Using Very Deep Fully Convolutional Encoder-Decoder Networks with Symmetric Skip Connections, X. Mao, C. Shen, Y. Yang, NIPS 2016.

slide-61
SLIDE 61

Thanks. Questions?