Grasping Team MIT-Princeton @ the Amazon Robotics Challenge 1st - - PowerPoint PPT Presentation
Grasping Team MIT-Princeton @ the Amazon Robotics Challenge 1st - - PowerPoint PPT Presentation
Grasping Team MIT-Princeton @ the Amazon Robotics Challenge 1st place in stowing task Andy Zeng Shuran Song Kuan-Ting Yu Elliott Donlon Francois Hogan Maria Bauza Daolin Ma Orion Taylor Melody Liu Eudald Romo Nima Fazeli Ferran Alet
Team MIT-Princeton @ the Amazon Robotics Challenge
Andy Zeng Shuran Song Kuan-Ting Yu Prem Qu Nair Isabella Morona Daolin Ma Thomas Funkhouser Alberto Rodriguez Elliott Donlon Francois Hogan Maria Bauza Orion Taylor Eudald Romo Nima Fazeli Nikhil Dafle Rachel Holladay Melody Liu Weber Liu Ferran Alet
1st place in stowing task
Ian Taylor Druck Green
From model-based to model-free
Model-based grasping
Pose estimation ✔ Works well with known objects in structured environments ✘ Can’t handle novel objects in unstructured environments (due to pose estimation) Grasp planning
From model-based to model-free
Model-based grasping
Pose estimation
Model-free grasping
✔ Works well with known objects in structured environments ✘ Can’t handle novel objects in unstructured environments (due to pose estimation) Grasp planning
- Use local geometric features
- Ignore object identity
- End-to-end
- Motivated by industry
Visual data Grasp planning
Recent work on model-free grasping
Supersizing Self-Supervision
- L. Pinto and A. Gupta, ‘16
DexNet 1.0 - 3.0
- J. Mahler et al., ‘17
Grasp Pose Detection
- M. Gualtieri et al., ‘17
Handles clutter and novel objects
Recent work on model-free grasping
Supersizing Self-Supervision
- L. Pinto and A. Gupta, ‘16
DexNet 1.0 - 3.0
- J. Mahler et al., ‘17
Grasp Pose Detection
- M. Gualtieri et al., ‘17
Handles clutter on tabletop scenarios and novel objects selected beforehand
Recent work on model-free grasping
Supersizing Self-Supervision
- L. Pinto and A. Gupta, ‘16
DexNet 1.0 - 3.0
- J. Mahler et al., ‘17
Grasp Pose Detection
- M. Gualtieri et al., ‘17
Common limitations: low grasp sample density, small neural network sizes
In this talk
- A model-free grasping method
In this talk
- A model-free grasping method
○ Handles dense clutter in tabletop bin/box scenario Rethink dense clutter:
- Objects not only tightly packed, but also
tossed and stacked on top of each other
- Objects in corners and on bin edges
In this talk
- A model-free grasping method
○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) 90 - 95% grasping accuracy is not enough Objects without depth data...
In this talk
- A model-free grasping method
○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) ○ Fast and efficient Grasp sampling Grasp ranking Dense pixel-wise predictions Standard: Ours:
In this talk
- A model-free grasping method
○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) ○ Fast and efficient ○ 1st place stowing task at Amazon Robotics Challenge ‘17 (i.e. it works) “The Beast from the East” setup competition footage
Overview: multi-affordance grasping
Input: multi-view RGB-D images
Overview: multi-affordance grasping
Input: multi-view RGB-D images Output: dense grasp proposals and affordance scores for 4 primitive grasping behaviors:
suction side suction down grasp down flush grasp
Dense pixel-wise affordances with FCNs
Input RGB-D images fully convolutional ResNet-50
Dense pixel-wise affordances with FCNs
Input RGB-D images suction down suction side
❌ ✓
fully convolutional ResNet-50
Dense pixel-wise affordances with FCNs
Input RGB-D images suction down suction side
❌ ✓
fully convolutional ResNet-50
Dense pixel-wise affordances with FCNs
Input RGB-D images suction down suction side
❌ ✓
fully convolutional ResNet-50 What about grasping?
Dense pixel-wise affordances with FCNs
RGB-D heightmaps Input RGB-D images suction down suction side
❌ ✓
fully convolutional ResNet-50
Dense pixel-wise affordances with FCNs
RGB-D heightmaps Input RGB-D images suction down suction side
❌ ✓
grasp down flush grasp
❌ ✓
fully convolutional ResNet-50
Dense pixel-wise affordances with FCNs
RGB-D heightmaps Input RGB-D images suction down suction side
❌ ✓
grasp down flush grasp
❌ ✓
fully convolutional ResNet-50 predicts horizontal grasp affordances
Training data
- Manual labeling
- ~100 different household/office objects
Suctionable areas Parallel-jaw grasps
Generalization from hardware capabilities
- High-powered deployable suction
- Actuated spatula
Pros and cons
Advantages:
- Fast runtime speeds from efficient convolution
Pros and cons
Advantages:
- Fast runtime speeds from efficient convolution
- Uses both color and depth information
Pros and cons
Advantages:
- Fast runtime speeds from efficient convolution
- Uses both color and depth information
- Can leverage fat pre-trained networks
- Higher good grasp recall
Grasp sampling Grasp ranking Standard: Ours:
Pros and cons
Advantages:
- Fast runtime speeds from efficient convolution
- Uses both color and depth information
- Can leverage fat pre-trained networks
- Higher good grasp recall
Limitations:
- Considers only top-down parallel-jaw grasps
○ Can trivially extend to more grasp angles
- Limited to grasping behaviors for which you can define
affordances (no real planning)
- Open-loop
Future work
Model-based grasping
Pose estimation
Model-free grasping
Grasp planning Visual data Grasp planning
Future work
Model-based grasping
Pose estimation
Model-free grasping
Grasp planning Visual data Grasp planning
How can we improve model-free by making it more like model-based?
Future work
Model-based grasping Model-free grasping
Semantic Scene Completion from a Single Depth Image [Song et al., CVPR ‘17]
Takeaways
- A model-free grasping method
○ FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) ○ Multiple grasping primitive behaviors → dense clutter in bin/box scenario ○ Multi-view color and depth + diverse training data + robust hardware → handle novel objects of all kinds ○ FCNs for grasping affordance predictions → efficiency and high grasp recall
Takeaways
- A model-free grasping method
○ FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) ○ Multiple grasping primitive behaviors → dense clutter in bin/box scenario ○ Multi-view color and depth + diverse training data + robust hardware → handle novel objects of all kinds ○ FCNs for grasping affordance predictions → efficiency and high grasp recall
Paper and code are available: arc.cs.princeton.edu
Recognition of novel objects without retraining
- Match real images of novel objects to their product images (available at test time)
- After isolating object from clutter with model-free grasping, perform recognition
✓ ❌ ❌
Cross domain image matching (training)
match? ℓ2 distance ratio loss
- bserved images
product images
Cross domain image matching (training)
softmax loss for K-Net only match? ℓ2 distance ratio loss
- bserved images
product images
Cross domain image matching (testing)
novel feature embedding known
Cross domain image matching (testing)
novel feature embedding known input
Cross domain image matching (testing)
novel match! feature embedding known input
Cross domain image matching (testing)
novel match! feature embedding known input Pre-trained ImageNet features