depth sensing and deep learning grasping and segmenting
play

Depth Sensing and Deep Learning: Grasping and Segmenting 3D Objects - PowerPoint PPT Presentation

Depth Sensing and Deep Learning: Grasping and Segmenting 3D Objects from Real Depth Images using Synthetic Data Mike D e Dan aniel elczuk uk, Jeffrey Mahler, Matthew Matl, Saurabh Gupta, Andrew Lee, Andrew Li, Vishal Satish, Bill DeRose,


  1. Depth Sensing and Deep Learning: Grasping and Segmenting 3D Objects from Real Depth Images using Synthetic Data Mike D e Dan aniel elczuk uk, Jeffrey Mahler, Matthew Matl, Saurabh Gupta, Andrew Lee, Andrew Li, Vishal Satish, Bill DeRose, Stephen McKinley, Ken Goldberg

  2. Black Grouse

  3. Imagenet: 14M labeled images, 20K categories

  4. Classical Computer Vision Pipeline. CV experts 1. Select / develop features: SURF, HoG, SIFT, RIFT, … 2. Add Machine Learning for multi-class recognition and train classifier Feat eatur ure Detection De on, Extracti Ex tion: Cl Clas assificat ation SIFT, T, Ho HoG.. ... Recogn ognition on Classical CV feature definition is domain- specific and time-consuming Slide from Boris Ginzburg

  5. Imagenet Classification 2012 Slide from Boris Ginzburg

  6. Imagenet 2012 Leaderboard http://www.image-net.org/challenges/LSVRC/2012/results.html N Error-5 Algorithm Team Authors 1 0.153 Deep Conv. Neural Network Univ. of Toronto Krizhevsky et al 2 0.262 Features + Fisher Vectors + ISI Gunji et al Linear classifier 3 0.270 Features + FV + SVM OXFORD_VGG Simonyan et al 4 0.271 SIFT + FV + PQ + SVM XRCE/INRIA Perronin et al 5 0.300 Color desc. + SVM Univ. of van de Sande et al Amsterdam Slide from Boris Ginzburg

  7. Imagenet 2013 Leaderboard http://www.image-net.org/challenges/LSVRC/2013/results.php N Error-5 Algorithm Team Authors 1 0.117 Deep Convolutional Neural Clarifi Zeiler Network 2 0.129 Deep Convolutional Neural Nat.Univ Min LIN Networks Singapore 3 0.135 Deep Convolutional Neural NYU Zeiler Networks Fergus 4 0.135 Deep Convolutional Neural Andrew Howard Networks 5 0.137 Deep Convolutional Neural Overfeat Pierre Sermanet et al Networks NYU Slide from Boris Ginzburg

  8. Imagenet Classification 2013 Slide from Boris Ginzburg

  9. Today’s Lecture • (Brief) Intro to Convolutional Neural Networks (CNNs) • Learning Instance Specific Grasping • Learning Grasp Quality CNNs • Learning Instance Segmentation CNNs

  10. Today’s Lecture • (B (Bri rief) I ) Intr tro t to Convolutional l Neura ral l Netw tworks (C (CNNs) • Learning Instance Specific Grasping • Learning Grasp Quality CNNs • Learning Instance Segmentation CNNs

  11. Why is Convolution good? • Shared Parameters • Only store filter parameters: 5*5*3 (for kernel) + 1 (for bias) for example • Local Connectivity • Each neuron only corresponds to a local patch of the image (not the whole thing)

  12. Idea: Learn Features

  13. Today’s Lecture • (Brief) Intro to Convolutional Neural Networks (CNNs) • Lear arning ng I Ins nstan ance-Spec pecific G Grasping • Learning Grasp Quality CNNs • Learning Instance Segmentation CNNs

  14. Grand Challenge The ability to “grasp sp m millio ions of of differe rent si sized an and sh shape ped o obj bjects… would have significant impact on deployment of robots in fact ctorie ries, in warehouses, and in homes es.” - Rod Brooks

  15. 33

  16. 34

  17. Universal Picking: diversely shaped and sized objects

  18. Motivation: Instance-Specific Grasping Desired Object: from Dasani Drops

  19. Grasping under Uncertainty

  20. Physics UNCERTAINTY Perception Control

  21. First Wave: R ( x , u ) ∈ {0, 1} Analytic Methods u *=π( x ) = argmax R ( x , u ) KRUGER ET AL., 2012 SHIMOGA, 1996 NGUYEN, 1988 POKORNY ET AL., BICCHI & KUMAR, REAULEAUX, 1876 FERRARI & CANNY, 2013 HANAFUSA & ASADA, 1977 2001 1992 HAAS-HEGER ET AL., LI & SASTRY, 1988 ROA & SUAREZ, BICCHI, 1994 2006 2006

  22. Data-driven Grasping under Uncertainty Uncertainty in object pose: green grasps are robust to pose errors, red ones are not. Uncertainty in object identity: aggregate pre-computed grasp information from multiple objects in database for robustness Computations performed over database to recognition errors. of ~200 household objects. Ciocarlie, Pantofaru, Hsiao, Bradski, Brook, and Dreyfuss. “A Side of Data with My Robot: Three Datasets for Mobile Manipulation in Human Environments.”, R&A Magazine, 2011

  23. Image-Based Grasp Planning Filter Bank Regressor Input Image Proposals Grasp Col olor Imag ages Point C Poi Clouds [Fishchinger & Vincze, 2012] [Saxena et al., 2008] [Saxena et al., 2008] [Herzog et al., 2014] [Stark et al., 2008] [Detry et al., 2009] [Oberlin & Tellex, 2015] [Bohg & Kragic, 2010] [Hubner & Kragic, 2010] [ten Pas et al., 2015] [Le et al., 2010] [Boularis et al., 2011] Image sources: Saxena et al., 2008, Pinto & Gutpa, 2016

  24. Deep Grasp Planning Convolutional Neural Network Input Image Grasp Col olor Imag ages Poi Point C Clouds [Lenz et al., 2015] [Kappler et al., 2015] [Redmon & Angelova, 2016] [Gualtieri et al., 2016] [Pinto et al., 2016] [Johns et al, 2016] [Levine et al., 2017] [Viereck et al., 2017] Image source: Pinto & Gutpa, 2016

  25. Data Sources Human labeling Self-supervision 1035 images: 80% success ~1 robot year: 80-90% success Lenz et al. “Deep Learning for Detecting Robotic Grasps.” S. Levine et al.. “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.” IJRR 2015. IJRR 2017.

  26. Dexterity Network (Dex- Net)

  27. Synthetic Point Clouds Synthetic LIDAR Images

  28. Dex-Net 6.7 million examples Positive Negative Negative

  29. Grasp Quality CNN

  30. GQ-CNN for Bins Iter 0 Iter 1 Plan

  31. Mahler, Jeffrey, Matthew Matl, Vishal Satish, Michael Danielczuk, Bill DeRose, Stephen McKinley, and Ken Goldberg. "Learning ambidextrous robot grasping policies." Science Robotics 4, no. 26 (2019): eaau4984.

  32. Motivation: Instance-Specific Grasping Desired Object: from Dasani Drops

  33. One Approach Segm Segment all objects in the scene , Classify all segments, Gr Grasp segment that matches the target!

  34. Slide from: Kaiming He

  35. Mask R-CNN

  36. Mask R-CNN • State-of-the-art instance e segm egmentation on network • Requires mas assive hand hand- label eled ed d dataset ets for training • Does not generalize to unseen classes He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

  37. Automated Dataset Generation • WISDOM-Sim: 50,000 images with 320,000 object labels in just 3. 3.5 h hour urs • 80/20 train/test split for both images and objects • 1600 unique objects, 10,000 unique heaps

  38. Sampling Distributions

  39. Data Augmentation

  40. Modal Segmentation Masks

  41. Amodal Segmentation Masks

  42. SD Mask R-CNN • State-of-the-art performance for objec ect i ins nstanc nce s seg egment entation on real depth images • No No ha hand nd-labelin ling required, trained solely on synt nthet hetic d dep epth i h images es • Gener Generalizes to unseen objects • Outperforms point cloud clustering baselines by 15% in average precision and 20% in average recall

  43. COCO Metrics

  44. COCO Metrics Prec eciso son: TP / (TP + FP) Recal all: TP / (TP + FN)

  45. Application: Instance-Specific Grasping Target: Dasani Drops

  46. Thank You! Pro roje ject W t Website, Code: e: Sup upplementar ary M Mat ater erial, https://github.com/ an and D Dat atas asets: BerkeleyAutomation/ https:/ ://b /bit.l .ly/2 /2letCuE sd-maskrcnn

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend