[PPT] - Form2Fit: Learning Shape Priors for Generalizable Assembly from PowerPoint Presentation

SLIDE 1

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly

Kevin Zakka, Andy Zeng, Johnny Lee, Shuran Song

Columbia

University

Stanford

University

Google

Research

SLIDE 2

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly

Kevin Zakka, Andy Zeng, Johnny Lee, Shuran Song

Columbia

University

Stanford

University

Google

Research

includes video narration

SLIDE 3

Shape Matching

kit assembly

SLIDE 4

Shape Matching

kit assembly everyday interactions

SLIDE 5

Shape Matching

kit assembly everyday interactions

SLIDE 6

Shape Matching

kit assembly everyday interactions

SLIDE 7

Shape Matching

kit assembly everyday interactions

SLIDE 8

Towards Flexible Assembly

state-of-the-art robo-kitting solution

Choi et. al.

CAD Model

SLIDE 9

Towards Flexible Assembly

state-of-the-art robo-kitting solution

Choi et. al.

CAD Model

SLIDE 10

Towards Flexible Assembly

state-of-the-art robo-kitting solution

Choi et. al.

require prior knowledge and manual engineering

CAD Model

SLIDE 11

Towards Flexible Assembly

state-of-the-art robo-kitting solution

Choi et. al.

require prior knowledge and manual engineering

CAD Model

cannot quickly adapt to new objects and settings

SLIDE 12

Towards Flexible Assembly

state-of-the-art robo-kitting solution can we endow them with generalization abilities?

Choi et. al.

require prior knowledge and manual engineering

CAD Model

cannot quickly adapt to new objects and settings

SLIDE 13

Generalizable Assembly

SLIDE 14

Generalizable Assembly

through Shape Matching & Self-Supervision

SLIDE 15

Generalizable Assembly

through Shape Matching & Self-Supervision

SLIDE 16

Generalizable Assembly

through Shape Matching & Self-Supervision

SLIDE 17

Generalizable Assembly

never before seen through Shape Matching & Self-Supervision

SLIDE 18

Generalizable Assembly

Form2Fit never before seen through Shape Matching & Self-Supervision

SLIDE 19

Generalizable Assembly

Form2Fit 94% novel configurations 86% novel objects & kits never before seen through Shape Matching & Self-Supervision

SLIDE 20

Generalizable Assembly

Form2Fit 94% novel configurations 86% novel objects & kits never before seen through Shape Matching & Self-Supervision ~12 hours training

SLIDE 21

Key Ideas

Kit Assembly Shape Matching

→

SLIDE 22

Key Ideas

Kit Assembly Shape Matching

→

learns geometric shape descriptors

SLIDE 23

Key Ideas

Kit Assembly Shape Matching

→

learns geometric shape descriptors

SLIDE 24

Key Ideas

Kit Assembly Shape Matching

→

learns geometric shape descriptors
generalizes to new shapes

SLIDE 25

Key Ideas

Assembly from Disassembly

64x

learns geometric shape descriptors
generalizes to new shapes

Kit Assembly Shape Matching

→

SLIDE 26

Key Ideas

Assembly from Disassembly

fully self-supervised

64x

learns geometric shape descriptors
generalizes to new shapes

Kit Assembly Shape Matching

→

SLIDE 27

Key Ideas

Assembly from Disassembly

fully self-supervised

64x

learns geometric shape descriptors
generalizes to new shapes
trial and error

Kit Assembly Shape Matching

→

SLIDE 28

Key Ideas

Assembly from Disassembly

fully self-supervised

64x

learns geometric shape descriptors
generalizes to new shapes
trial and error

Kit Assembly Shape Matching

→

SLIDE 29

Key Ideas

Assembly from Disassembly

64x

fully self-supervised
trial and error
learns geometric shape descriptors
generalizes to new shapes

Kit Assembly Shape Matching

→

SLIDE 30

<< rewind

Key Ideas

Assembly from Disassembly

64x

fully self-supervised
trial and error
learns geometric shape descriptors
generalizes to new shapes

Kit Assembly Shape Matching

→

SLIDE 31

Method

SLIDE 32

Overview of Form2Fit

grayscale-depth heightmaps are generated from 3D pointcloud data

SLIDE 33

Kit Heightmap Object Heightmap

Overview of Form2Fit

grayscale-depth heightmaps are generated from 3D pointcloud data

SLIDE 34

Overview of Form2Fit

suction network ingests object heightmap and outputs suction heatmap

Kit Heightmap Object Heightmap

SLIDE 35

Overview of Form2Fit

suction network ingests object heightmap and outputs suction heatmap

Kit Heightmap Object Heightmap Suction Network

SLIDE 36

Overview of Form2Fit

suction network ingests object heightmap and outputs suction heatmap

Kit Heightmap Object Heightmap Suction Network

SLIDE 37

Overview of Form2Fit

place network ingests kit heightmap and outputs place heatmap

Kit Heightmap Object Heightmap Suction Network

SLIDE 38

Overview of Form2Fit

place network ingests kit heightmap and outputs place heatmap

Kit Heightmap Object Heightmap Suction Network Place Network

SLIDE 39

Overview of Form2Fit

place network ingests kit heightmap and outputs place heatmap

Kit Heightmap Object Heightmap Suction Network Place Network

SLIDE 40

Overview of Form2Fit

Kit Heightmap Object Heightmap Suction Network Place Network

corresponding pick and place candidates

SLIDE 41

Overview of Form2Fit

Kit Heightmap Object Heightmap Suction Network Place Network

corresponding pick and place candidates

SLIDE 42

Overview of Form2Fit

Kit Heightmap Object Heightmap Suction Network Place Network

corresponding pick and place candidates

SLIDE 43

Overview of Form2Fit

Kit Heightmap Object Heightmap Suction Network Place Network

corresponding pick and place candidates

SLIDE 44

Overview of Form2Fit

Kit Heightmap Object Heightmap Suction Network Place Network

corresponding pick and place candidates

SLIDE 45

Overview of Form2Fit

Kit Heightmap Object Heightmap Suction Network Place Network

corresponding pick and place candidates

SLIDE 46

Overview of Form2Fit

matching network ingests heightmaps and outputs descriptor maps

Kit Heightmap Object Heightmap Suction Network Place Network

SLIDE 47

Overview of Form2Fit

Matching Network

matching network ingests heightmaps and outputs descriptor maps

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

SLIDE 48

Overview of Form2Fit

Matching Network

closer descriptor distances indicate better object-to-placement correspondences

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

SLIDE 49

Overview of Form2Fit

Matching Network

closer descriptor distances indicate better object-to-placement correspondences

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

SLIDE 50

Overview of Form2Fit

Matching Network

closer descriptor distances indicate better object-to-placement correspondences

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

SLIDE 51

Overview of Form2Fit

Matching Network

closer descriptor distances indicate better object-to-placement correspondences

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

SLIDE 52

Overview of Form2Fit

Matching Network

closer descriptor distances indicate better object-to-placement correspondences

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

SLIDE 53

Overview of Form2Fit

Matching Network

closer descriptor distances indicate better object-to-placement correspondences

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20 × 20

SLIDE 54

Overview of Form2Fit

Matching Network

descriptors are rotation-sensitive

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20 × 20

SLIDE 55

Overview of Form2Fit

Matching Network

descriptors are rotation-sensitive

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20 × 20

SLIDE 56

Overview of Form2Fit

Matching Network

descriptors are rotation-sensitive

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20 × 20

SLIDE 57

Overview of Form2Fit

Matching Network

descriptors are rotation-sensitive

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20 × 20

SLIDE 58

Overview of Form2Fit

Matching Network

planner integrates information to produce suction/place poses & end-effector rotation

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20 × 20

SLIDE 59

Overview of Form2Fit

Matching Network

planner integrates information to produce suction/place poses & end-effector rotation

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20

θ

p q

Planner

× 20

SLIDE 60

Overview of Form2Fit

Matching Network

planner integrates information to produce suction/place poses & end-effector rotation

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20

θ

p q

Planner

× 20

SLIDE 61

Overview of Form2Fit

Matching Network

planner integrates information to produce suction/place poses & end-effector rotation

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20

θ

p q

Planner

× 20

SLIDE 62

Overview of Form2Fit

Matching Network

planner integrates information to produce suction/place poses & end-effector rotation

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20

θ

p q

Planner

× 20

SLIDE 63

Overview of Form2Fit

Matching Network

planner integrates information to produce suction/place poses & end-effector rotation

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20

θ

p q

Planner

× 20

p

Pick Position Place Position

q

SLIDE 64

Overview of Form2Fit

Matching Network

planner integrates information to produce suction/place poses & end-effector rotation

Kit Heightmap Object Heightmap Suction Network Place Network pixel-wise descriptors

× 20

θ

p q

Planner

× 20

p

Pick Position Place Position

q

SLIDE 65

Data Collection

SLIDE 66

Data Collection

12x 12x

500 disassembly sequence (~ 8 to 10 hours) for each kit

12x 12x 12x

SLIDE 67

Data Collection

12x 12x

500 disassembly sequence (~ 8 to 10 hours) for each kit

12x 12x 12x

SLIDE 68

Data Collection from Disassembly

SLIDE 69

Data Collection from Disassembly

suction network predicts a suction candidate

SLIDE 70

Data Collection from Disassembly

suction network predicts a suction candidate

SLIDE 71

Data Collection from Disassembly

suction network predicts a suction candidate

SLIDE 72

Data Collection from Disassembly

SLIDE 73

place pose randomly generated (q, θ)

Data Collection from Disassembly

SLIDE 74

place pose randomly generated (q, θ)

Data Collection from Disassembly

SLIDE 75

place pose randomly generated (q, θ)

θ

Data Collection from Disassembly

SLIDE 76

kit is secured to table to prevent accidental displacement from bad suction grasps

Data Collection from Disassembly

SLIDE 77

kit is secured to table to prevent accidental displacement from bad suction grasps

Data Collection from Disassembly

SLIDE 78

place point ground-truth obtained from suction

Data Collection from Disassembly

SLIDE 79

place point ground-truth obtained from suction

Data Collection from Disassembly

SLIDE 80

place point ground-truth obtained from suction

Data Collection from Disassembly

SLIDE 81

suction point ground-truth obtained from place

Data Collection from Disassembly

SLIDE 82

suction point ground-truth obtained from place

Data Collection from Disassembly

SLIDE 83

suction point ground-truth obtained from place

Data Collection from Disassembly

SLIDE 84

Data Collection from Disassembly

SLIDE 85

dense correspondence ground-truth obtained from robot motion

Data Collection from Disassembly

SLIDE 86

dense correspondence ground-truth obtained from robot motion

Data Collection from Disassembly

SLIDE 87

Results

SLIDE 88

Varying Initial Conditions

model trained and tested on each kit

12x 12x 12x 12x 12x

SLIDE 89

Varying Initial Conditions

model trained and tested on each kit

12x 12x 12x 12x 12x

SLIDE 90

Varying Initial Conditions

model trained and tested on each kit

12x 12x 12x 12x 12x

SLIDE 91

Varying Initial Conditions

model trained and tested on each kit

12x 12x 12x 12x 12x

SLIDE 92

Varying Initial Conditions

model trained and tested on each kit

12x 12x 12x 12x 12x

SLIDE 93

Varying Initial Conditions

model trained and tested on each kit

12x 12x 12x 12x 12x

SLIDE 94

Generalization to Novel Settings

SLIDE 95

model trained on 2 kits: floss and tape

Generalization to Novel Settings

SLIDE 96

model trained on 2 kits: floss and tape

Generalization to Novel Settings

Individual

64x 64x

SLIDE 97

model trained on 2 kits: floss and tape

Generalization to Novel Settings

Multiple Individual

64x 64x 64x 64x

SLIDE 98

model trained on 2 kits: floss and tape

Generalization to Novel Settings

Multiple Mixture Individual

64x 64x 64x 64x 64x 64x

SLIDE 99

Generalization to Novel Objects/Kits

SLIDE 100

Generalization to Novel Objects/Kits

64x 64x 64x 64x

SLIDE 101

Generalization to Novel Objects/Kits

never before seen animals

4x

SLIDE 102

Generalization to Novel Objects/Kits

never before seen animals

4x

SLIDE 103

What Has Form2Fit Learned?

SLIDE 104

Descriptor Visualization

descriptors encode object orientation

SLIDE 105

Descriptor Visualization

descriptors encode object orientation

same orientation

SLIDE 106

Descriptor Visualization

descriptors encode object orientation

same orientation different rotation

SLIDE 107

descriptors encode spatial correspondence

Descriptor Visualization

SLIDE 108

descriptors encode spatial correspondence

Descriptor Visualization

same points share similar descriptors

SLIDE 109

descriptors encode object identity

Descriptor Visualization

SLIDE 110

descriptors encode object identity

Descriptor Visualization

unique descriptor for different objects

SLIDE 111

Limitations & Future Work

SLIDE 112

Typical Failure Case

180º rotational flips

12x

SLIDE 113

Typical Failure Case

180º rotational flips

12x

SLIDE 114

Future Directions

SLIDE 115

Future Directions

SLIDE 116

Future Directions

restricted to planar manipulations

SLIDE 117

Future Directions

restricted to planar manipulations

SLIDE 118

Future Directions

restricted to planar manipulations
can’t handle fully-transparent objects

SLIDE 119

Future Directions

restricted to planar manipulations
can’t handle fully-transparent objects
time-reversal currently restricted to

quasi-static environment

SLIDE 120

For details, videos and code, visit: https://form2fit.github.io