Depth Sensing and Deep Learning: Grasping and Segmenting 3D Objects - - PowerPoint PPT Presentation

depth sensing and deep learning grasping and segmenting
SMART_READER_LITE
LIVE PREVIEW

Depth Sensing and Deep Learning: Grasping and Segmenting 3D Objects - - PowerPoint PPT Presentation

Depth Sensing and Deep Learning: Grasping and Segmenting 3D Objects from Real Depth Images using Synthetic Data Mike D e Dan aniel elczuk uk, Jeffrey Mahler, Matthew Matl, Saurabh Gupta, Andrew Lee, Andrew Li, Vishal Satish, Bill DeRose,


slide-1
SLIDE 1

Depth Sensing and Deep Learning: Grasping and Segmenting 3D Objects from Real Depth Images using Synthetic Data

Mike D e Dan aniel elczuk uk, Jeffrey Mahler, Matthew Matl, Saurabh Gupta, Andrew Lee, Andrew Li, Vishal Satish, Bill DeRose, Stephen McKinley, Ken Goldberg

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Black Grouse

slide-7
SLIDE 7

Imagenet: 14M labeled images, 20K categories

slide-8
SLIDE 8

Classical Computer Vision Pipeline.

CV experts 1. Select / develop features: SURF, HoG, SIFT, RIFT, … 2. Add Machine Learning for multi-class recognition and train classifier Feat eatur ure Ex Extracti tion: SIFT, T, Ho HoG.. ... De Detection

  • n,

Cl Clas assificat ation Recogn

  • gnition
  • n

Classical CV feature definition is domain- specific and time-consuming

Slide from Boris Ginzburg

slide-9
SLIDE 9

Imagenet Classification 2012

Slide from Boris Ginzburg

slide-10
SLIDE 10

http://www.image-net.org/challenges/LSVRC/2012/results.html

N Error-5 Algorithm Team Authors 1 0.153 Deep Conv. Neural Network

  • Univ. of Toronto

Krizhevsky et al 2 0.262 Features + Fisher Vectors + Linear classifier ISI Gunji et al 3 0.270 Features + FV + SVM OXFORD_VGG Simonyan et al 4 0.271 SIFT + FV + PQ + SVM XRCE/INRIA Perronin et al 5 0.300 Color desc. + SVM

  • Univ. of

Amsterdam van de Sande et al Slide from Boris Ginzburg

Imagenet 2012 Leaderboard

slide-11
SLIDE 11

Imagenet 2013 Leaderboard

http://www.image-net.org/challenges/LSVRC/2013/results.php

N Error-5 Algorithm Team Authors 1 0.117 Deep Convolutional Neural Network Clarifi Zeiler 2 0.129 Deep Convolutional Neural Networks Nat.Univ Singapore Min LIN 3 0.135 Deep Convolutional Neural Networks NYU Zeiler Fergus 4 0.135 Deep Convolutional Neural Networks Andrew Howard 5 0.137 Deep Convolutional Neural Networks Overfeat NYU Pierre Sermanet et al Slide from Boris Ginzburg

slide-12
SLIDE 12

Slide from Boris Ginzburg

Imagenet Classification 2013

slide-13
SLIDE 13

Today’s Lecture

  • (Brief) Intro to Convolutional Neural Networks (CNNs)
  • Learning Instance Specific Grasping
  • Learning Grasp Quality CNNs
  • Learning Instance Segmentation CNNs
slide-14
SLIDE 14

Today’s Lecture

  • (B

(Bri rief) I ) Intr tro t to Convolutional l Neura ral l Netw tworks (C (CNNs)

  • Learning Instance Specific Grasping
  • Learning Grasp Quality CNNs
  • Learning Instance Segmentation CNNs
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25

Why is Convolution good?

  • Shared Parameters
  • Only store filter parameters: 5*5*3 (for kernel) + 1 (for

bias) for example

  • Local Connectivity
  • Each neuron only corresponds to a local patch of the

image (not the whole thing)

slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29

Idea: Learn Features

slide-30
SLIDE 30

Today’s Lecture

  • (Brief) Intro to Convolutional Neural Networks (CNNs)
  • Lear

arning ng I Ins nstan ance-Spec pecific G Grasping

  • Learning Grasp Quality CNNs
  • Learning Instance Segmentation CNNs
slide-31
SLIDE 31
slide-32
SLIDE 32

Grand Challenge

The ability to “grasp sp m millio ions of

  • f differe

rent si sized an and sh shape ped o

  • bj

bjects… would have significant impact on deployment of robots in fact ctorie ries, in warehouses, and in homes es.” - Rod Brooks

slide-33
SLIDE 33

33

slide-34
SLIDE 34

34

slide-35
SLIDE 35

Universal Picking: diversely shaped and sized

  • bjects
slide-36
SLIDE 36

Motivation: Instance-Specific Grasping

Desired Object: Dasani Drops from

slide-37
SLIDE 37

Grasping under Uncertainty

slide-38
SLIDE 38

Physics Control Perception UNCERTAINTY

slide-39
SLIDE 39

First Wave: Analytic Methods

REAULEAUX, 1876 HANAFUSA & ASADA, 1977 LI & SASTRY, 1988 NGUYEN, 1988 FERRARI & CANNY, 1992 BICCHI, 1994 SHIMOGA, 1996 BICCHI & KUMAR, 2001 ROA & SUAREZ, 2006 KRUGER ET AL., 2012 POKORNY ET AL., 2013 HAAS-HEGER ET AL., 2006

R(x, u) ∈ {0, 1} u*=π(x) = argmax R(x, u)

slide-40
SLIDE 40

Uncertainty in object identity: aggregate pre-computed grasp information from multiple objects in database for robustness to recognition errors.

Data-driven Grasping under Uncertainty

Ciocarlie, Pantofaru, Hsiao, Bradski, Brook, and Dreyfuss. “A Side of Data with My Robot: Three Datasets for Mobile Manipulation in Human Environments.”, R&A Magazine, 2011

Uncertainty in object pose: green grasps are robust to pose errors, red ones are not.

Computations performed over database

  • f ~200 household objects.
slide-41
SLIDE 41

Image-Based Grasp Planning

Filter Bank Input Image Regressor Proposals Grasp Image sources: Saxena et al., 2008, Pinto & Gutpa, 2016

Col

  • lor Imag

ages [Saxena et al., 2008] [Stark et al., 2008] [Bohg & Kragic, 2010] [Le et al., 2010] Poi Point C Clouds [Saxena et al., 2008] [Detry et al., 2009] [Hubner & Kragic, 2010] [Boularis et al., 2011] [Fishchinger & Vincze, 2012] [Herzog et al., 2014] [Oberlin & Tellex, 2015] [ten Pas et al., 2015]

slide-42
SLIDE 42

Deep Grasp Planning

Convolutional Neural Network Input Image Grasp

Col

  • lor Imag

ages [Lenz et al., 2015] [Redmon & Angelova, 2016] [Pinto et al., 2016] [Levine et al., 2017] Poi Point C Clouds [Kappler et al., 2015] [Gualtieri et al., 2016] [Johns et al, 2016] [Viereck et al., 2017]

Image source: Pinto & Gutpa, 2016

slide-43
SLIDE 43

Data Sources

  • S. Levine et al.. “Learning hand-eye coordination for robotic

grasping with deep learning and large-scale data collection.” IJRR 2017.

Self-supervision

Lenz et al. “Deep Learning for Detecting Robotic Grasps.” IJRR 2015.

Human labeling 1035 images: 80% success ~1 robot year: 80-90% success

slide-44
SLIDE 44

Dexterity Network (Dex- Net)

slide-45
SLIDE 45

Synthetic LIDAR Images Synthetic Point Clouds

slide-46
SLIDE 46

Dex-Net 6.7 million examples

Positive Negative Negative

slide-47
SLIDE 47

Grasp Quality CNN

slide-48
SLIDE 48

GQ-CNN for Bins

Iter 0 Iter 1 Plan

slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51

Mahler, Jeffrey, Matthew Matl, Vishal Satish, Michael Danielczuk, Bill DeRose, Stephen McKinley, and Ken

  • Goldberg. "Learning ambidextrous robot grasping policies." Science Robotics 4, no. 26 (2019): eaau4984.
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54

Motivation: Instance-Specific Grasping

Desired Object: Dasani Drops from

slide-55
SLIDE 55

One Approach

Segm Segment all objects in the scene , Classify all segments, Gr Grasp segment that matches the target!

slide-56
SLIDE 56

Slide from: Kaiming He

slide-57
SLIDE 57

Mask R-CNN

slide-58
SLIDE 58

Mask R-CNN

  • State-of-the-art instance

e segm egmentation

  • n network
  • Requires mas

assive hand hand- label eled ed d dataset ets for training

  • Does not generalize to

unseen classes

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

slide-59
SLIDE 59

Automated Dataset Generation

  • WISDOM-Sim: 50,000 images with 320,000 object labels in just 3.

3.5 h hour urs

  • 80/20 train/test split for both images and objects
  • 1600 unique objects, 10,000 unique heaps
slide-60
SLIDE 60
slide-61
SLIDE 61

Sampling Distributions

slide-62
SLIDE 62

Data Augmentation

slide-63
SLIDE 63

Modal Segmentation Masks

slide-64
SLIDE 64
slide-65
SLIDE 65

Amodal Segmentation Masks

slide-66
SLIDE 66

SD Mask R-CNN

  • State-of-the-art performance for objec

ect i ins nstanc nce s seg egment entation on real depth images

  • No

No ha hand nd-labelin ling required, trained solely on synt nthet hetic d dep epth i h images es

  • Gener

Generalizes to unseen objects

  • Outperforms point cloud clustering baselines by 15% in average precision and

20% in average recall

slide-67
SLIDE 67

COCO Metrics

slide-68
SLIDE 68

COCO Metrics

Prec eciso son: TP / (TP + FP) Recal all: TP / (TP + FN)

slide-69
SLIDE 69
slide-70
SLIDE 70
slide-71
SLIDE 71
slide-72
SLIDE 72

Application: Instance-Specific Grasping

Target: Dasani Drops

slide-73
SLIDE 73

Thank You!

Pro roje ject W t Website, Sup upplementar ary M Mat ater erial, an and D Dat atas asets: https:/ ://b /bit.l .ly/2 /2letCuE Code: e: https://github.com/ BerkeleyAutomation/ sd-maskrcnn