RGBD Occlusion Detection via Deep Convolutional Neural Networks - - PowerPoint PPT Presentation

rgbd occlusion detection via deep
SMART_READER_LITE
LIVE PREVIEW

RGBD Occlusion Detection via Deep Convolutional Neural Networks - - PowerPoint PPT Presentation

RGBD Occlusion Detection via Deep Convolutional Neural Networks Soumik Sarkar 1,2 , Vivek Venugopalan 1 , Kishore Reddy 1 , Michael Giering 1 , Julian Ryde 3 , Navdeep Jaitly 4,5 1 United Technologies Research Center, East Hartford, CT 2 Currently


slide-1
SLIDE 1

1

RGBD Occlusion Detection via Deep Convolutional Neural Networks

Soumik Sarkar1,2, Vivek Venugopalan1, Kishore Reddy1, Michael Giering1, Julian Ryde3, Navdeep Jaitly4,5

1United Technologies Research Center, East Hartford, CT 2Currently with Iowa State University, Ames, IA 3United Technologies Research Center, Berkeley, CA 4University of Toronto, ON, Canada 5Currently with Google Inc., Mountain View, CA

Subject to the EAR, ECCN: EAR99. This information is subject to the export control laws of the United States, specifically including the Export Administration Regulations (EAR), 15 C.F.R. Part 730 et seq. Transfer, retransfer or disclosure of this data by any means to a non-US person (individual or company), whether in the U.S. or abroad, without any required export license or other approval from the U.S. Govt. is prohibited. UTC Proprietary - This material contains proprietary information of United Technologies Corporation. Any copying, distribution, or dissemination of the contents of this material is strictly prohibited and may be unlawful without the express written permission of UTC. If you have obtained this material in error, please notify UTRC Counsel at (860) 610-7948 immediately.

slide-2
SLIDE 2

United Technologies

2

Otis Sikorsky UTC Climate, Controls & Security UTC Aerospace Systems Pratt & Whitney UTC Propulsion & Aerospace Systems

This page contains no technical data and is not subject to the EAR or the ITAR.

slide-3
SLIDE 3

Occlusion detection

  • Occlusion edges help image feature selection, once occlusion

boundaries are established – the depth of the region can be determined

  • This is very useful in Simultaneous localization and mapping (SLAM)

problems in robotics applications for indoor environments, object recognition, grasping, obstacle avoidance in UAV applications, etc.

3

A voxel map and the corresponding geometric edges for a hallway

This page contains no technical data and is not subject to the EAR or the ITAR.

slide-4
SLIDE 4

Range edge

Occlusion detection

  • Occlusion edges depend on the gradient of the depth image which is

very sensitive to noise in the depth map

  • The depth map derived from a single image is very noisy and has large

errors.

  • In our work, we are estimating the occlusion edges directly rather than

estimating depth first and then calculating occlusion edges. Secondly there are additional cues other than depth which contribute to establishing occlusion edges that our technique is taking advantage of.

4

Image edge Original image

This page contains no technical data and is not subject to the EAR or the ITAR.

slide-5
SLIDE 5

Deep Neural Nets and Convolutional Neural Nets

  • Convolutional filters to generate feature

maps from data

  • Subsampling or pooling for dimension

reduction and higher order feature generation

5

Input to the Deep Neural network Target layer Output predicted feature vectors

shared weights

max max max max

Convolutions - sliding filter max pooling subsampling

hm+ 1

hm

W

hidden layers

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

slide-6
SLIDE 6

Occlusion detection from Freiburg dataset

  • Use readily available dataset for demonstrating occlusion edge

detection from Computer Vision Group at Technische Universität München (TUM)

  • Partition the trajectory into training and test datasets for the neural nets

6

Reference: http://vision.in.tum.de/data/datasets/rgbd-dataset/download

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

slide-7
SLIDE 7

Problem setup

7

Input: RGB+D information from consecutive video frames (640x480) captured by mobile sensor Deep Convolutional Neural Network Output: Occlusion edges

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

slide-8
SLIDE 8

8

Training process

No occlusion patch Occlusion patch Center-pixel based labels Partition into 32x32 patch examples

Training and Testing processes

Network Training

Testing process

Trained Network 32x32 patches generated with fixed stride Prediction of patch (Center-pixel) label

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

slide-9
SLIDE 9

9

Post-processing for Occlusion edge reconstruction

Testing and post-processing

Trained Network 32x32 patches generated with fixed stride Prediction of patch (Center-pixel) label Prediction confidence from softmax posterior Prediction confidence converted to patch-wide label using a Gaussian kernel (with Full Width at Half Maximum - FWHM) Gaussian labels are fused in a mixture model to generate smooth occlusion edges

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

slide-10
SLIDE 10

Experimental setup

  • Nvidia Tesla K40 GPU with 2880 cores and 12 GB device RAM
  • Initial pre-processing for dividing dataset into training and test and

extracting small images (32x32) from large frames (480x640)

  • Image size fixed at 32x32 with number of channels depending on the

experiment

  • 4 channels for RGBD
  • 3 channels for RGB
  • 6 channels for RGBD + optical flow (UV)
  • Ground truth consists of labelled edges by using only the depth sensor

10

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

slide-11
SLIDE 11

Optical flow pre-processing

11

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

slide-12
SLIDE 12

Results

12

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

Data Channels Patch stride Training dataset Testing dataset Test error (averaged over 80-100 epochs) Computation time/epoch RGBD (1 frame) 4 4 56354 500000 15.35 1m 21s 4 8 14278 316167 18.76 2m 17s RGB (1 frame) 3 4 56354 500000 16.43 1m 2s 3 8 14278 316167 18.72 1m 42s RGBDUV 6 4 56354 500000 15.18 1m 22s

slide-13
SLIDE 13

Post-processing Results

  • Input: RGBD image (32x32x4), stride 8

13

  • Input: RGBD image (32x32x4), stride 4

Performance improves with higher granularity of fusion

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

slide-14
SLIDE 14

Post-processing Results

  • Input: RGB image (32x32x3), stride 8

14

  • Input: RGB image (32x32x3), stride 4

Performance improves with higher granularity of fusion

Overall detection confidence deteriorates without D channel

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

slide-15
SLIDE 15

RGBD and optical flow (RGBDUV) Results

15

UTC PROPRIETARY - Export Controlled - ECCN: EAR99

slide-16
SLIDE 16

Conclusion

  • Deep CNN can extract significant occlusion edge features from only

RGB channels (i.e., without the depth sensor information). Occlusion detection accuracy increases when we introduce optical flow.

  • Deep Convolutional Neural Nets (Deep CNN) for multi-modal fusion

applied to occlusion detection

  • The trade-off between high resolution patch analysis and frame-level

computation time is critical for real-time robotics applications

  • Currently investigating multiple time-frames of RGB input in order to

extract structure from motion

16

This page contains no technical data and is not subject to the EAR or the ITAR.

slide-17
SLIDE 17

Questions

17