region merging driven by deep learning for rgb d
play

Region Merging Driven by Deep Learning for RGB-D Segmentation and - PowerPoint PPT Presentation

ICDSC 2019 Region Merging Driven by Deep Learning for RGB-D Segmentation and Labeling U. Michieli, M. Camporese, A. Agiollo, G. Pagnutti, P. Zanuttigh September 9 th , 2019 2 Outline Semantic Segmentation Proposed Framework


  1. ICDSC 2019 Region Merging Driven by Deep Learning for RGB-D Segmentation and Labeling U. Michieli, M. Camporese, A. Agiollo, G. Pagnutti, P. Zanuttigh September 9 th , 2019

  2. 2 Outline ¡ Semantic Segmentation ¡ Proposed Framework ¡ Pre-processing ¡ Over-segmentation and Classification ¡ Merging Phase ¡ Results ¡ Conclusions and Future Work

  3. 3 Semantic Segmentation wall wall objects objects furniture furniture floor ¡ Segmentation + labeling (pixel-wise classification) ¡ Deep learning and consumer depth sensors ¡ Very useful for free navigation systems to explore the surroundings

  4. 4 Semantic Segmentation

  5. 4 Semantic Segmentation

  6. 5 Proposed Framework

  7. 6 Proposed Framework AIM: propose CNN for region merging and refine boundaries of shapes Use normalized cuts spectral clustering extended for RGBD à but bias toward region of similar sizes Then 2 steps procedure: ¡ Initial over-segmentation to properly separate objects ¡ Region merging procedure to avoid over-segmentation Framework derived from [1] but much faster and simpler [1] G.Pagnutti, L. Minto, P. Zanuttigh, "Segmentation and Semantic Labeling of RGBD Data with Convolutional Neural Networks and Surface Fitting “, IET Computer Vision, 2017

  8. 7 Framework of [1] 320x240x6 (x, y, z) Geometry 1/ σ g point set vectors Depth Normalized cuts data spectral clustering 160x120x6 Normals Orientation Segment 1/ σ n computation vectors descriptors Convolutional Color Neural Network data RGB to CIELab Color CONs: 1/ σ c (CNN) conversion vectors Pre-processing Over-segmentation and classification • NURBS fitting very slow NURBS • Many hand-tuned Segment 1 fitting Compute Surface No Select two Sort and discard Discard similarity of fitting accuracy segments below similarity thresholds (on depth, union adjacent improved? to be joined threshold segments NURBS Segment 2 fitting Yes color, normals, NURBS Keep union fitting) Merge phase [1] G.Pagnutti, L. Minto, P. Zanuttigh, "Segmentation and Semantic Labeling of RGBD Data with Convolutional Neural Networks and Surface Fitting “, IET Computer Vision, 2017

  9. 8 Proposed Framework PROs: • Much faster • Fewer thresholds • Same accuracy

  10. 9 Proposed Framework - Preprocessing ¡ 3 channels for 3D location ¡ 3 channels for surface normals ¡ 3 channels for color representation à CIELab for perceptual uniformity ¡ Normalization to achieve consistent representation across the 3 domains.

  11. 10 Proposed Framework – Oversegmentation ¡ Over-segmentation with normalized cuts spectral clustering with Nystrom acceleration: 9D input ¡ CNN for the semantic labeling of each segment and for guiding the region merging process ¡ 9 conv layers ¡ 15 classes ¡ very simple

  12. � 11 Proposed Framework – Region Merging ¡ Compute adjacency map of the segments ¡ Compute similarity between adjacent segment descriptors with Bhattacharyya coefficient: ' 𝑡 ' 𝑐 ",$ = ∑ ' 𝑡 " $ 𝑢 : class scores 𝑡 " : descriptors (~PDFs) ¡ Sort list on the basis of 𝑐 ",$

  13. 12 Proposed Framework Iterative merging procedure Ø Select segments with 𝑐 ",$ > 𝑈 -". Ø CNN classifier to decide whether the two segments will be joined or not • If merged: new segment of the union is created and list updated • If not merged: remove segments from the list

  14. .. .. . . training time : about 11 hours on a NVIDIA Titan X GPU with 𝑚𝑠 = 10 34 , regularization constant = 10 35 , 𝑈 -". = 0.8 training : 50 epochs, batch size of 32 samples, CE & L2 regularization losses, Adam input : 2 outputs of softmax layer of semantic CNN (15 channels each candidate) CNN for classification (6 conv. layers, symm. padding, 2x2 maxpool, ReLU) CNN for Region Merging - PDFs PDFs 560x425x30 (560x425x6) CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 280x212x4 CONV 4@7x7 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 140x106x4 CONV 4@5x5 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 70x53x4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 35x26x4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 17x13x4 CONV 2@17x13 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 1x2 ARGMAX Not merged Merged 13

  15. à PDFs richer descriptions, while normals are faster with limited impact on the final accuracy training time : about 3 hours on a NVIDIA Titan X GPU with 𝑚𝑠 = 10 35 , regularization constant = 5 ⋅ 10 3: , 𝑈 -". = 0.75 training : 50 epochs, batch size of 32 samples, CE & L2 regularization losses, Adam input : 2 surface normals of the 2 candidate segments (3 channels each) CNN for classification (6 conv. layers, symm. padding, 2x2 maxpool, ReLU) CNN for Region Merging - Normals normals 560x425x30 (560x425x6) CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 280x212x4 CONV 4@7x7 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 140x106x4 CONV 4@5x5 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 70x53x4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 35x26x4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 17x13x4 CONV 2@17x13 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 1x2 ARGMAX Not merged Merged 14

  16. 15 Experimental Results

  17. 16 NYUDv2 Dataset [2] 1449 depth maps + color images of indoor scenes with Kinect sensor RGB raw depth GT training set: 795 scenes test set: 654 scenes 894 classes clustered in 15 classes as [3] unknown & unlabeled classes excluded [2] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. 2012. Indoor segmentation and support inference from RGBD images. ECCV. Springer. [3] C. Couprie, C. Farabet, L. Najman, and Y. LeCun. 2013. Indoor semantic segmentation using depth information. ICLR.

  18. ¡ Randomly select 10 couples of adjacent segments in each image the merging CNN Need a dataset to train Merging CNN – Ground Truth Generation ¡ Assign label 0 otherwise ¡ Assign label 1 if more than 85% of the union of the segments belongs to same object Selection of a in the semantic segmentation ground truth segment adjacent segment Selection of an . . . . . . Ground truth examination label 1 5 6 0 x 4 2 5 x 3 0 ( 5 6 0 x 4 2 5 x 6 ) CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 2 8 0 x 2 1 2 x 4 CONV 4@7x7 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 1 4 0 x 1 0 6 x 4 Region appears to be uniform CONV 4@5x5 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 7 0 x 5 3 x 4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 3 5 x 2 6 x 4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 1 7 x 1 3 x 4 CONV 2@17x13 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 1x2 ARGMAX Not merged Merged 17

  19. 18 Merging CNN – GT Ambiguities ¡ Examples of ambiguities in ground truth: ¡ Inconsistent labeling ¡ Objects not labeled Bed Objects Chair Furniture Ceiling Floor Picture/Deco Sofa Table Wall Windows missing Books Monitor/TV Unknown

  20. 19 Merging CNN – Results Predicted: Merge Predicted: Not Merged GT: Merge GT: Not Merged 18 ¡ Good oversegmentation (inter-uniformity)

  21. 20 Merging CNN – Results Predicted: Not Merged Predicted: Merge GT: Merge GT: Not Merged 18 ¡ Bad oversegmentation

  22. 21 Qualitative Results [1] Color view Semantic CNN Pagnutti et al. [21] Our Approach Ground Truth Bed Objects Chair Furniture Ceiling Floor Picture/Deco Sofa Table Wall Windows Books Monitor/TV Unknown [1] G.Pagnutti, L. Minto, P. Zanuttigh, "Segmentation and Semantic Labeling of RGBD Data with Convolutional Neural Networks and Surface Fitting “, IET Computer Vision, 2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend