Grasping Team MIT-Princeton @ the Amazon Robotics Challenge 1st - - PowerPoint PPT Presentation

grasping team mit princeton the amazon robotics challenge
SMART_READER_LITE
LIVE PREVIEW

Grasping Team MIT-Princeton @ the Amazon Robotics Challenge 1st - - PowerPoint PPT Presentation

Grasping Team MIT-Princeton @ the Amazon Robotics Challenge 1st place in stowing task Andy Zeng Shuran Song Kuan-Ting Yu Elliott Donlon Francois Hogan Maria Bauza Daolin Ma Orion Taylor Melody Liu Eudald Romo Nima Fazeli Ferran Alet


slide-1
SLIDE 1

Grasping

slide-2
SLIDE 2

Team MIT-Princeton @ the Amazon Robotics Challenge

Andy Zeng Shuran Song Kuan-Ting Yu Prem Qu Nair Isabella Morona Daolin Ma Thomas Funkhouser Alberto Rodriguez Elliott Donlon Francois Hogan Maria Bauza Orion Taylor Eudald Romo Nima Fazeli Nikhil Dafle Rachel Holladay Melody Liu Weber Liu Ferran Alet

1st place in stowing task

Ian Taylor Druck Green

slide-3
SLIDE 3

From model-based to model-free

Model-based grasping

Pose estimation ✔ Works well with known objects in structured environments ✘ Can’t handle novel objects in unstructured environments (due to pose estimation) Grasp planning

slide-4
SLIDE 4

From model-based to model-free

Model-based grasping

Pose estimation

Model-free grasping

✔ Works well with known objects in structured environments ✘ Can’t handle novel objects in unstructured environments (due to pose estimation) Grasp planning

  • Use local geometric features
  • Ignore object identity
  • End-to-end
  • Motivated by industry

Visual data Grasp planning

slide-5
SLIDE 5

Recent work on model-free grasping

Supersizing Self-Supervision

  • L. Pinto and A. Gupta, ‘16

DexNet 1.0 - 3.0

  • J. Mahler et al., ‘17

Grasp Pose Detection

  • M. Gualtieri et al., ‘17

Handles clutter and novel objects

slide-6
SLIDE 6

Recent work on model-free grasping

Supersizing Self-Supervision

  • L. Pinto and A. Gupta, ‘16

DexNet 1.0 - 3.0

  • J. Mahler et al., ‘17

Grasp Pose Detection

  • M. Gualtieri et al., ‘17

Handles clutter on tabletop scenarios and novel objects selected beforehand

slide-7
SLIDE 7

Recent work on model-free grasping

Supersizing Self-Supervision

  • L. Pinto and A. Gupta, ‘16

DexNet 1.0 - 3.0

  • J. Mahler et al., ‘17

Grasp Pose Detection

  • M. Gualtieri et al., ‘17

Common limitations: low grasp sample density, small neural network sizes

slide-8
SLIDE 8

In this talk

  • A model-free grasping method
slide-9
SLIDE 9

In this talk

  • A model-free grasping method

○ Handles dense clutter in tabletop bin/box scenario Rethink dense clutter:

  • Objects not only tightly packed, but also

tossed and stacked on top of each other

  • Objects in corners and on bin edges
slide-10
SLIDE 10

In this talk

  • A model-free grasping method

○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) 90 - 95% grasping accuracy is not enough Objects without depth data...

slide-11
SLIDE 11

In this talk

  • A model-free grasping method

○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) ○ Fast and efficient Grasp sampling Grasp ranking Dense pixel-wise predictions Standard: Ours:

slide-12
SLIDE 12

In this talk

  • A model-free grasping method

○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) ○ Fast and efficient ○ 1st place stowing task at Amazon Robotics Challenge ‘17 (i.e. it works) “The Beast from the East” setup competition footage

slide-13
SLIDE 13

Overview: multi-affordance grasping

Input: multi-view RGB-D images

slide-14
SLIDE 14

Overview: multi-affordance grasping

Input: multi-view RGB-D images Output: dense grasp proposals and affordance scores for 4 primitive grasping behaviors:

suction side suction down grasp down flush grasp

slide-15
SLIDE 15

Dense pixel-wise affordances with FCNs

Input RGB-D images fully convolutional ResNet-50

slide-16
SLIDE 16

Dense pixel-wise affordances with FCNs

Input RGB-D images suction down suction side

❌ ✓

fully convolutional ResNet-50

slide-17
SLIDE 17

Dense pixel-wise affordances with FCNs

Input RGB-D images suction down suction side

❌ ✓

fully convolutional ResNet-50

slide-18
SLIDE 18

Dense pixel-wise affordances with FCNs

Input RGB-D images suction down suction side

❌ ✓

fully convolutional ResNet-50 What about grasping?

slide-19
SLIDE 19

Dense pixel-wise affordances with FCNs

RGB-D heightmaps Input RGB-D images suction down suction side

❌ ✓

fully convolutional ResNet-50

slide-20
SLIDE 20

Dense pixel-wise affordances with FCNs

RGB-D heightmaps Input RGB-D images suction down suction side

❌ ✓

grasp down flush grasp

❌ ✓

fully convolutional ResNet-50

slide-21
SLIDE 21

Dense pixel-wise affordances with FCNs

RGB-D heightmaps Input RGB-D images suction down suction side

❌ ✓

grasp down flush grasp

❌ ✓

fully convolutional ResNet-50 predicts horizontal grasp affordances

slide-22
SLIDE 22

Training data

  • Manual labeling
  • ~100 different household/office objects

Suctionable areas Parallel-jaw grasps

slide-23
SLIDE 23

Generalization from hardware capabilities

  • High-powered deployable suction
  • Actuated spatula
slide-24
SLIDE 24

Pros and cons

Advantages:

  • Fast runtime speeds from efficient convolution
slide-25
SLIDE 25

Pros and cons

Advantages:

  • Fast runtime speeds from efficient convolution
  • Uses both color and depth information
slide-26
SLIDE 26

Pros and cons

Advantages:

  • Fast runtime speeds from efficient convolution
  • Uses both color and depth information
  • Can leverage fat pre-trained networks
  • Higher good grasp recall

Grasp sampling Grasp ranking Standard: Ours:

slide-27
SLIDE 27

Pros and cons

Advantages:

  • Fast runtime speeds from efficient convolution
  • Uses both color and depth information
  • Can leverage fat pre-trained networks
  • Higher good grasp recall

Limitations:

  • Considers only top-down parallel-jaw grasps

○ Can trivially extend to more grasp angles

  • Limited to grasping behaviors for which you can define

affordances (no real planning)

  • Open-loop
slide-28
SLIDE 28

Future work

Model-based grasping

Pose estimation

Model-free grasping

Grasp planning Visual data Grasp planning

slide-29
SLIDE 29

Future work

Model-based grasping

Pose estimation

Model-free grasping

Grasp planning Visual data Grasp planning

How can we improve model-free by making it more like model-based?

slide-30
SLIDE 30

Future work

Model-based grasping Model-free grasping

Semantic Scene Completion from a Single Depth Image [Song et al., CVPR ‘17]

slide-31
SLIDE 31

Takeaways

  • A model-free grasping method

○ FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) ○ Multiple grasping primitive behaviors → dense clutter in bin/box scenario ○ Multi-view color and depth + diverse training data + robust hardware → handle novel objects of all kinds ○ FCNs for grasping affordance predictions → efficiency and high grasp recall

slide-32
SLIDE 32

Takeaways

  • A model-free grasping method

○ FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) ○ Multiple grasping primitive behaviors → dense clutter in bin/box scenario ○ Multi-view color and depth + diverse training data + robust hardware → handle novel objects of all kinds ○ FCNs for grasping affordance predictions → efficiency and high grasp recall

Paper and code are available: arc.cs.princeton.edu

slide-33
SLIDE 33

Recognition of novel objects without retraining

  • Match real images of novel objects to their product images (available at test time)
  • After isolating object from clutter with model-free grasping, perform recognition

✓ ❌ ❌

slide-34
SLIDE 34

Cross domain image matching (training)

match? ℓ2 distance ratio loss

  • bserved images

product images

slide-35
SLIDE 35

Cross domain image matching (training)

softmax loss for K-Net only match? ℓ2 distance ratio loss

  • bserved images

product images

slide-36
SLIDE 36

Cross domain image matching (testing)

novel feature embedding known

slide-37
SLIDE 37

Cross domain image matching (testing)

novel feature embedding known input

slide-38
SLIDE 38

Cross domain image matching (testing)

novel match! feature embedding known input

slide-39
SLIDE 39

Cross domain image matching (testing)

novel match! feature embedding known input Pre-trained ImageNet features