Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 - - PowerPoint PPT Presentation

learning to manipulate from demonstra3ons
SMART_READER_LITE
LIVE PREVIEW

Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 - - PowerPoint PPT Presentation

Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 Sandy Huang Slides courtesy of Pieter Abbeel Personal RoboBcs Hardware ? PR2 Baxter UBR-1 Willow Garage Rethink RoboBcs Unbounded RoboBcs ? $400,000 $30,000 $35,000


slide-1
SLIDE 1

Learning to Manipulate from Demonstra3ons

CS287 November 17, 2015 Sandy Huang Slides courtesy of Pieter Abbeel

slide-2
SLIDE 2

Personal RoboBcs Hardware

PR2 Willow Garage $400,000 2009 ? $2,000 ? 2017? Baxter Rethink RoboBcs $30,000 2013 UBR-1 Unbounded RoboBcs $35,000 2013 ?

slide-3
SLIDE 3

Challenge Task: RoboBc Laundry

[MaiBn-Shepard, Cusumano Towner, Lei, Abbeel, ICRA 2010]

slide-4
SLIDE 4

How About…

slide-5
SLIDE 5

Surgical Knot Tie

[van den Berg, Miller, Duckworth, Humphrey, Wan, Fu, Goldberg, Abbeel, Best Medical RoboBcs Paper, ICRA 2010]

slide-6
SLIDE 6

n Open loop n If careful about iniBal condiBons

n 50% success rate

Surgical Knot Tie

slide-7
SLIDE 7

n Robot has to Be a knot in

this rope

n The problem

n Human demonstrated knot-

Be in this rope

Learning from DemonstraBons

slide-8
SLIDE 8

n

Prior work

n

Billard, Calinon and collaborators

n

Gaussian Mixture Models (GMM) and Gaussian Mixture Regression (GMR)

n

Schaal and collaborators

n

Dynamic moBon primiBves

n

Cakmak, Thomaz and collaborators

n

Human robot interacBon for robot to learn faster

n

Peters and collaborators

n

Stay close to demonstraBons distribuBon while also opBmizing reward

n

BUT

n

All of these algorithms have underlying representaBons in terms of coordinates

n

Can we alleviate need to specify coordinate frames / features and directly adapt to geometry?

Generalizing Trajectories

slide-9
SLIDE 9

Trajectory demonstraBons What trajectory here? Training scene Test scene

?

Cartoon Problem Secng

slide-10
SLIDE 10

Trajectory demonstraBons What trajectory here? Training scene Test scene

?

Cartoon Problem Secng

Samples of f : R3 à R3

slide-11
SLIDE 11

Trajectory demonstraBons What trajectory here? Training scene Test scene

?

Cartoon Problem Secng

Samples of f : R3 à R3

slide-12
SLIDE 12

Trajectory demonstraBons What trajectory here? Training scene Test scene

Cartoon Problem Secng

Samples of f : R3 à R3

?

slide-13
SLIDE 13

Trajectory demonstraBons What trajectory here? Training scene Test scene

Cartoon Problem Secng

Samples of f : R3 à R3

slide-14
SLIDE 14

n TranslaBons, rotaBons and scaling are FREE

Learning f : R3 à R3 from Samples

s.t. f(x(i)

train) = x(i) test

∀i ∈ 1, . . . , m min

f∈{R3→R3}

Z

x∈R3 kD2f(x)k2 Frobdx

slide-15
SLIDE 15

n SoluBon has form:

Wahba, Spline models for observaBonal data. Philadelphia: Society for Industrial and Applied MathemaBcs. 1990. Evgeniou, PonBl, Poggio, RegularizaBon Networks and Support Vector Machines. Advances in ComputaBonal MathemaBcs. 2000. HasBe, Tibshirani, Friedman, Elements of StaBsBcal Learning, Chapter 5. 2008.

Learning f : R3 à R3 from Samples

min

f∈{R3→R3}

Z

x∈R3 kD2fk2 Frob(x)dx

s.t. f(x(i)

train) = x(i) test 8i 2 1, . . . , m

slide-16
SLIDE 16

n Thin Plate Spline Robust Point Matching (TPS-RPM) [Chui et al.

CVIU 2003]:

Finding a Non-Rigid RegistraBon

n Variant of ExpectaBon-MaximizaBon (EM); finds locally opBmal

warp

Calculate soj point correspondence matrix OpBmize for warp funcBon IniBalize

slide-17
SLIDE 17

n Using non-rigid registraBon, find a transformaBon f from

training scene to test scene

n Apply f to the demonstrated end-effector trajectory n Convert the end-effector trajectory to a joint trajectory

Trajectory Transfer Procedure

[J. Schulman, J. Ho, C. Lee, P. Abbeel, ISRR 2013]

slide-18
SLIDE 18

n Knots Bed

n Overhand n Figure-eight n Double-overhand n Square n Clove-hitch

Robot Experiments

slide-19
SLIDE 19

Experiment: Knot-Tie

[J. Schulman, J. Ho, C. Lee, P. Abbeel, ISRR 2013]

slide-20
SLIDE 20

EvaluaBon

slide-21
SLIDE 21

Experiment: Suturing

[J. Schulman, A. Gupta, S. Venkatesan, M. Tayson-Frederick, P. Abbeel, IROS 2013]

slide-22
SLIDE 22

n Does not consider joint limits and obstacles when finding the

warp funcBon

n ComputaBonally expensive with >100 demonstraBons n Ignores surface normals when finding the warp funcBon n Only uses geometric informaBon of the objects, not

appearance informaBon

LimitaBons of Trajectory Transfer

slide-23
SLIDE 23

DemonstraBon scene Test scene

?

Trajectory Transfer: First Step

Step 1:

! τ f ← f τ demo

( )

! min

f ∈registration!functions!registration_error Sdemo,!Stest

( )+bending_energy f ( )

slide-24
SLIDE 24

Transferred trajectory Feasible trajectory

Trajectory Transfer: Second Step

! min

τ ∈trajectories

trajectory_error τ f ,τ

( )

s.t. τ !is!feasible!and!collision5free

Step 2:

slide-25
SLIDE 25

Two-step opBmizaBon Unified opBmizaBon

Unifying Trajectory Transfer

Step 1:

! min

τ ∈trajectories

trajectory_error f τ demo

( ),τ

( )

s.t. τ !is!feasible!and!collision5free ! min

f ∈registration!functions!registration_error Sdemo,!Stest

( )

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!+bending_energy f

( ) Step 2:

! min

f ∈registration!functions !!!!!!!!τ ∈trajectories

registration_error Sdemo,!Stest

( )

!!!!!!!!!!!!! +bending_energy f

( )

+trajectory_error f τ demo

( ),τ

( )

s.t. τ !is!feasible!and!collision7free

slide-26
SLIDE 26

ApplicaBon to ManipulaBon of Deformable Objects

10 20 30 40 50 60 70 80 90 100 0.4 0.5 0.6 0.7 0.8 0.9 1

Success Rate Degree of Freedom Range Reduc3on Factor Two-step opBmizaBon Unified opBmizaBon [A. Lee, S. Huang, D. Hadfield-Menell, E. Tzeng, P. Abbeel, IROS 2014]

slide-27
SLIDE 27

n Can be expected to work if the dynamics of the system are

approximately covariant under sufficiently smooth warpings.

TheoreBcal Guarantees

slide-28
SLIDE 28

n Repeat

n Acquire new point cloud Xtest n Using non-rigid registraBon compute distance between Xtest and each

point cloud Xtrain,i from demonstraBons

n If i* is a “done” state, break n Apply trajectory transfer to generate new trajectory

Nearest-Neighbor Policy for Tasks

slide-29
SLIDE 29

n Doesn’t account for demonstraBon quality n Doesn’t prefer moves that make progress n Doesn’t account for reachability of trajectory

LimitaBons of the Nearest-Neighbor Policy

slide-30
SLIDE 30

Learning to Choose Bever AcBons

[D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

slide-31
SLIDE 31

Max-Margin Policy Learning

[D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

slide-32
SLIDE 32

Max-Margin Q-FuncBon Learning

[D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

slide-33
SLIDE 33

Experiments

[D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

slide-34
SLIDE 34

Results in SimulaBon

[D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

slide-35
SLIDE 35

70%

EvaluaBon on Knot-Tying

20 40 60 80 100 [Schulman et al. ISRR '13] Max Margin Q-function Estimation Beam Search (3-3) Success Rate

Overhand Knots

82% 88%

20 40 60 80 100 [Schulman et al. ISRR '13] Max Margin Q-function Estimation Beam Search (3-3) Success Rate

Figure 8 Knots

54% 63% 76%

slide-36
SLIDE 36

MoBvaBon for Including Surface Normals

slide-37
SLIDE 37

Standard TPS-RPM RegistraBon

Test scene DemonstraBon scene

slide-38
SLIDE 38

TPS-RPM RegistraBon with Normals

Test scene DemonstraBon scene

[A. Lee, M. Goldstein, S. Barrav, P. Abbeel, ICRA 2015]

slide-39
SLIDE 39

Problem FormulaBon

slide-40
SLIDE 40

n Only uses geometric informaBon to find non-rigid registraBon

TPS-RPM: SensiBvity to IniBalizaBon

Demo Test

slide-41
SLIDE 41

n DemonstraBon selecBon also only uses geometric informaBon

Geometric Similarity ≠ SemanBc Similarity

Geometrically-similar demonstraBon configuraBons Test configuraBon

slide-42
SLIDE 42

n

corners-against-background

n

edges-against-background

n

edges-against-interior

n

folds-against-background

n

flat interior

n

wrinkled interior

ConvoluBonal Neural Net ClassificaBon

[S. Huang, J. Pan, G. Mulcaire, P. Abbeel, IROS 2015]

slide-43
SLIDE 43

n = correspondence between source point and target point n = prior probability that and should be matched n Define the new point correspondence matrix as n Normalize so that the rows and columns sum to 1

Leveraging Appearance InformaBon

Calculate soj point correspondence matrix OpBmize for warp funcBon IniBalize

slide-44
SLIDE 44

Trajectory Transfer + Appearance Priors

Demo Test Without appearance priors With appearance priors

slide-45
SLIDE 45

TPS-RPM with CNN ClassificaBon of Pixels

[S. Huang, J. Pan, G. Mulcaire, P. Abbeel, IROS 2015]

slide-46
SLIDE 46

n Unsupervised features in registraBon n Reinforcement learning to further improve performance n Forces and torques (to extend to non-kinemaBc tasks) n More data…

Current DirecBons

slide-47
SLIDE 47

Thank you

slide-48
SLIDE 48

Trajectory Transfer: Toy Example

DemonstraBon Test

?

Schulman et al. ISRR 2013

slide-49
SLIDE 49

Trajectory Transfer: Toy Example

DemonstraBon Test

  • 1. Calculate a non-rigid registraBon

Schulman et al. ISRR 2013

slide-50
SLIDE 50

Trajectory Transfer: Toy Example

DemonstraBon Test

  • 1. Calculate a non-rigid registraBon

Schulman et al. ISRR 2013

slide-51
SLIDE 51

DemonstraBon Test

Trajectory Transfer: Toy Example

  • 1. Calculate a non-rigid registraBon

Schulman et al. ISRR 2013

slide-52
SLIDE 52

Trajectory Transfer: Toy Example

  • 2. Apply to the demonstrated trajectory

Schulman et al. ISRR 2013

DemonstraBon Test