Learning to Select Expert Demonstra4ons for Deformable - PowerPoint PPT Presentation

Learning ¡to ¡Select ¡Expert ¡Demonstra4ons ¡ for ¡Deformable ¡Object ¡Manipula4on ¡ Dylan ¡Hadfield-‑Menell, ¡Alex ¡Lee, ¡Sandy ¡Huang, ¡Eric ¡Tzeng, ¡Pieter ¡Abbeel ¡ Workshop ¡on ¡Informa4on-‑Based ¡Grasp ¡and ¡Manipula4on ¡Planning ¡ July ¡13, ¡2014 ¡ RSS ¡2014 ¡

Vision ¡ • We’d ¡like ¡robots ¡to ¡be ¡able ¡to ¡do ¡lots ¡of ¡things ¡ • Need ¡deformable ¡object ¡manipula4on ¡ • Ease ¡of ¡programming ¡

Deformable ¡Object ¡Manipula4on ¡ • High-‑Dimensional, ¡Con4nuous ¡State ¡and ¡ Ac4on ¡Spaces ¡ • Long ¡Time ¡Horizons ¡ • Complex ¡Dynamics ¡ • Example: ¡Knot-‑Tying ¡with ¡the ¡PR2 ¡ A ⊂ R 14 S ⊂ R 230 H ≈ 100

Trajectory ¡Transfer ¡ • Planning ¡for ¡deformable ¡object ¡manipula4on ¡ is ¡a ¡serious ¡challenge ¡ – Substan4al ¡improvements ¡in ¡exis4ng ¡methods ¡ before ¡tractability ¡ • Solu4on: ¡Don’t ¡plan! ¡ ¡ – modify ¡demonstra4on ¡trajectories ¡to ¡fit ¡the ¡ current ¡situa4on ¡

Trajectory ¡Transfer: ¡Cartoon ¡Problem ¡ Se\ng ¡ Train ¡situa4on: ¡ Trajectory ¡demonstra4on ¡ Samples ¡of ¡ ¡ ¡ f ¡: ¡R 3 ¡ à ¡R 3 ¡ ¡ ¡ What ¡trajectory ¡here? ¡ Test ¡situa4on: ¡ ? ¡

Transferring ¡a ¡Trajectory ¡ Test ¡ Trajectory ¡ { p i } ¡Scene ¡ { x i } ¡Scene ¡ { y i } Demonstra4on ¡ Fit ¡ ¡ f ∗ Transfer ¡ ¡ Func4on ¡ Transform ¡ Trajectory ¡ f ∗ ( { p i } ) { p 0 i } Execu4on ¡ ¡ on ¡Robot ¡ Trajectory ¡ || f ⇤ ( p i ) − p 0 min i || Following ¡ p 0 i

Example ¡Trajectory ¡Transfer ¡ J. ¡Schulman, ¡J. ¡Ho, ¡C. ¡Lee, ¡P. ¡Abbeel. ¡‘Generaliza4on ¡of ¡robo4c ¡manipula4on ¡through ¡the ¡use ¡of ¡ • non-‑rigid ¡registra4on.’ ¡ISRR ¡2013. ¡ J. ¡Schulman, ¡A. ¡Gupta, ¡S. ¡Venkatesan, ¡M. ¡Taylor-‑Frederick, ¡P. ¡Abbeel. ¡‘A ¡case ¡study ¡of ¡ ¡trajectory ¡ • transfer ¡through ¡non-‑rigid ¡registra4on ¡for ¡a ¡simplified ¡suturing ¡scenario.’ ¡IROS ¡2013. ¡ A. ¡Lee, ¡S. ¡Huang, ¡D. ¡Hadfield-‑Menell, ¡E. ¡Tzeng, ¡P. ¡Abbeel. ¡‘Unifying ¡scene ¡registra4on ¡and ¡trajectory ¡ • op4miza4on ¡for ¡learning ¡from ¡demonstra4ons ¡with ¡applica4on ¡to ¡manipula4on ¡of ¡deformable ¡ objects.’ ¡IROS ¡2014 ¡

Demonstra4on ¡ Library ¡ ¡Scene ¡ { x (0) Test ¡ i } ¡Scene ¡ { y i } Trajectory ¡ { p (0) ¡ i } D ¡ ¡ { x ( j ∗ ) } Fit ¡ ¡ i Demonstra4on ¡ argmax Transfer ¡ ¡ Selec4on ¡ { p ( j ∗ ) d ∈ D Func4on ¡ } i f ∗ Transform ¡ Trajectory ¡ f ∗ ( { p ( j ∗ ) } ) Execu4on ¡ ¡ i { p 0 i } on ¡Robot ¡ Trajectory ¡ || f ⇤ ( p i ) − p 0 min i || Following ¡ p 0 i

How ¡do ¡we ¡ ¡select ¡the ¡‘best’ ¡ Demonstra4on? ¡ • Different ¡demonstra4ons ¡may ¡have ¡very ¡ different ¡results ¡under ¡transfer ¡ – Selec4ng ¡the ¡wrong ¡one ¡may ¡move ¡to ¡a ¡state ¡ where ¡we ¡don’t ¡have ¡good ¡demonstra4ons! ¡ • [Schulman ¡et ¡al. ¡ISRR ¡2013] ¡ – Select ¡nearest ¡neighbor ¡with ¡respect ¡to ¡rigidity ¡of ¡ the ¡transforma4on ¡ ¡ • How ¡to ¡improve ¡on ¡this? ¡ – Need ¡a ¡framework ¡for ¡demonstra4on ¡selec4on! ¡

Demo ¡+ ¡Transfer ¡Method ¡ è ¡Policy ¡ New ¡Scene ¡ New ¡Trajectory ¡ Fit ¡ ¡ Transform ¡ Trajectory ¡ || f ⇤ ( p i ) − p 0 Following ¡ min i || Transfer ¡ ¡ Trajectory ¡ p 0 Func4on ¡ i ¡ ¡ ¡ π d d Demonstra4on ¡

Demo ¡+ ¡Transfer ¡Method ¡ è ¡Policy ¡ π d New ¡Scene ¡ New ¡Trajectory ¡ • ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡specifies ¡an ¡ op#on ¡ π d – Op4on ¡= ¡policy ¡+ ¡termina4on ¡ condi4on ¡ Trajectory ¡ – Selec4ng ¡an ¡op4on ¡runs ¡the ¡ Controller ¡ corresponding ¡policy ¡un4l ¡the ¡ termina4on ¡condi4on ¡ ¡ M D M D Original ¡(intractable) ¡MDP ¡ Demonstra4on ¡Library ¡ Op4ons ¡MDP ¡

vs ¡ ¡ ¡ M M D R 14 |D| ≈ 150 |A| H ≈ 4 ≈ 100 R 230 R 230 |S|

Takeaways ¡ • Heuris4c ¡Method ¡from ¡ISRR ¡paper ¡is ¡a ¡policy ¡ for ¡ ¡ M D • Learning ¡policies ¡is ¡something ¡we ¡know ¡how ¡ to ¡do ¡ • Can ¡we ¡apply ¡that ¡here? ¡ – State ¡space ¡is ¡s4ll ¡a ¡challenge ¡ • Solu4on: ¡use ¡expert ¡knowledge ¡again ¡ – This ¡4me ¡about ¡ which ¡demonstra4ons ¡to ¡transfer ¡

Max-‑Margin ¡Policy ¡Cloning ¡ Expert ¡Transfer ¡ φ 2 Selec4on ¡ Maximum ¡Margin ¡ Separator ¡ Subop4mal ¡Transfer ¡ Selec4ons ¡ φ 1

Max-‑Margin ¡Policy ¡Cloning ¡ Maximize ¡the ¡Margin ¡ Prefer ¡Expert ¡ Selec4ons ¡ w > w min w s.t. w > φ ( s, d exp ) ≥ w > φ ( s, d 0 ) + 1; ∀ s Details ¡ • Expert ¡Selec4ons ¡gathered ¡by ¡watching ¡mul4ple ¡ transfers ¡from ¡same ¡state ¡and ¡selec4ng ¡`best’ ¡ • Structured ¡margin ¡to ¡capture ¡similarity ¡between ¡ demonstra4ons ¡ • Slack ¡variables ¡to ¡cope ¡with ¡sub-‑op4mality ¡in ¡choices ¡

Max-‑Margin ¡Q-‑func4on ¡Es4ma4on ¡ • Policy ¡Cloning ¡is ¡good, ¡but ¡has ¡some ¡drawbacks ¡ – Ranking ¡func4on ¡has ¡no ¡natural ¡interpreta4on ¡ – No ¡direct ¡no4on ¡of ¡progress ¡ – No ¡comparisons ¡between ¡states ¡ • We ¡have ¡a ¡bunch ¡of ¡other ¡informa4on ¡ – Cost ¡func4on ¡for ¡MDP, ¡Bellman ¡constraints ¡on ¡value ¡ func4on…etc ¡ • Solu4on: ¡modify ¡Max-‑Margin ¡Policy ¡Cloning ¡to ¡ learn ¡an ¡approximate ¡Q-‑func4on ¡

Max-‑Margin ¡Q-‑func4on ¡Es4ma4on ¡ Maximize ¡the ¡Margin ¡ Minimize ¡Bellman ¡ Error ¡ X w, ξ i w > w + min | ξ i | s.t. w > φ ( s i , d exp ( s i ) = w > φ ( s i +1 , d exp ( s i +1 ) − γ C + ξ i w > φ ( s, d exp ( s )) ≥ w > φ ( s, d 0 ) + 1; ∀ s Prefer ¡Expert ¡ Selec4ons ¡

Evalua4on ¡on ¡Overhand ¡Knot-‑Tying ¡ • Distribu4on ¡over ¡ini4al ¡states ¡ – Ini4al ¡states ¡from ¡demonstra4ons ¡with ¡10cm ¡perturba4ons ¡at ¡7 ¡ random ¡loca4ons ¡along ¡rope ¡ • Compare ¡success ¡rate ¡for ¡tying ¡overhand ¡knot ¡on ¡500 ¡perturbed ¡ instances ¡ Example ¡Ini4al ¡State ¡ Samples ¡from ¡Perturbed ¡Distribu4on ¡

Evalua4on ¡on ¡Overhand ¡Knot-‑Tying ¡ Success ¡Rate ¡ 90 ¡ 80 ¡ % ¡of ¡Problems ¡Solved ¡ 70 ¡ 60 ¡ 50 ¡ 40 ¡ 30 ¡ 20 ¡ 10 ¡ 0 ¡ [Schulman ¡et ¡al. ¡ISRR ¡'13] ¡ Max ¡Margin ¡Policy ¡Cloning ¡ Max ¡Margin ¡Q-‑func4on ¡ Es4ma4on ¡ 56% ¡ 72% ¡ 79% ¡

Search ¡ • We ¡have ¡an ¡es4mate ¡of ¡the ¡Q-‑func4on ¡ • If ¡we ¡have ¡access ¡to ¡a ¡simulator, ¡we ¡can ¡do ¡a ¡ local ¡expansion ¡of ¡the ¡state ¡space ¡graph ¡ • Select ¡the ¡ac4on ¡that ¡maximizes ¡the ¡Q-‑ func4on ¡at ¡the ¡search ¡horizon ¡ • Large ¡Branching ¡Factor ¡ à ¡Beam ¡Search ¡

Evalua4on ¡on ¡Overhand ¡Knot-‑Tying ¡ Success ¡Rate ¡ 100 ¡ 90 ¡ 80 ¡ % ¡of ¡problems ¡solved ¡ ¡ 70 ¡ 60 ¡ 50 ¡ 40 ¡ 30 ¡ 20 ¡ 10 ¡ 0 ¡ [Schulman ¡et ¡al. ¡ Max ¡Margin ¡Policy ¡ Max ¡Margin ¡Q-‑ Beam ¡Search ¡ '13] ¡ Cloning ¡ func4on ¡Es4ma4on ¡ (Width ¡10, ¡Depth ¡2) ¡ 56% ¡ 72% ¡ 79% ¡ 94% ¡

Next ¡Steps ¡ • More ¡difficult ¡tasks ¡ – More ¡complex ¡knots ¡ à ¡longer ¡4me ¡horizon ¡ • Other ¡robots ¡ – Humanoid ¡robot ¡demonstra4on ¡from ¡mo4on ¡ capture ¡ – More ¡complicated ¡end ¡effectors ¡ • Transferring ¡more ¡than ¡trajectories? ¡ – Linear ¡Feedback ¡controllers? ¡Arbitrary ¡policies? ¡ ¡

Learning to Select Expert Demonstra4ons for Deformable - PowerPoint PPT Presentation

Learning to Select Expert Demonstra4ons for Deformable Object Manipula4on Dylan Hadfield-Menell, Alex Lee, Sandy Huang, Eric Tzeng, Pieter Abbeel Workshop on

Geometric Registration for Deformable Shapes 2.2 Deformable Registration Variational Model

SQL Database Manipulations: SELECT statements Thomas Schwarz, SJ SELECT SELECT is the most

Engineering Mechanics Of Deformable Solids A Presentation With Exercises Engineering Mechanics

Manipulation of 1D and 2D Deformable Objects Without Modeling Deformation Dmitry Berenson

Nested queries Subqueries in SELECT SELECT DISTINCT C.cname, (SELECT count(*) FROM Product P

Select the best sources by Currency Select the checking best sources by Range Select the

This Lecture SQL SELECT WHERE Clauses SQL SELECT SELECT from multiple tables JOINs

Tracking Deformable Objects with Point Clouds John Schulman, Alex

Tracking deformable objects with WiSARD networks: a preliminary work INNOROBO 2014 European

A Deformable Balloon for Tomography Motion Artifact Study Damien Rohmer November 21, 2006

Path Planning and Execution For Deformable Objects Using a Voxel-Based Representation Calder

Engineering Mechanics of Deformable Solids: A Presentation with Exercises by Sanjay Govindjee

Sof oft-Bod ody y Dy Dyna namics cs Deformable Objects A difficult problem Lots of

Geometric Registration for Deformable Shapes 1.3 4D Kinematic Surfaces Rigid Transformation (

Last time Fitting an arbitrary shape with active deformable contours Segmentation

A Discriminatively Trained, Multiscale, Deformable Part Model by Pedro Felzenszwalb, David

Ray Tracing Intro Steve Marschner CS 4620 Cornell University Cornell CS4620 Fall 2020 Steve

Nico Pietroni Paolo Cignoni 1 c What is Digital Fabrication? Additive Manufacturing CNC Milling

Towards X Visual Reasoning Hanwang Zhang hanwangzhang@ntu.edu.sg Pattern Recognition

AUTOMATIC DETECTION OF SATIRE AND SARCASM Computational Approaches to Creative Language, SS

Behind the Scene of Side Channel Attacks ASIACRYPT 2013 Victor LOMNE, Emmanuel PROUFF and Thomas

Information Session June 9 th , Grade 5 Information Session School We are a Middle School of

Interfaces from SciFi Robert W. Lindeman Worcester Polytechnic Institute Department of Computer

Computational Science Working Group Adam Lyon & Jim Kowalkowski All Scientist Retreat 26

Learning to Select Expert Demonstra4ons for Deformable - PowerPoint PPT Presentation

Learning to Select Expert Demonstra4ons for Deformable Object Manipula4on Dylan Hadfield-Menell, Alex Lee, Sandy Huang, Eric Tzeng, Pieter Abbeel Workshop on

Geometric Registration for Deformable Shapes 2.2 Deformable Registration Variational Model

SQL Database Manipulations: SELECT statements Thomas Schwarz, SJ SELECT SELECT is the most

Engineering Mechanics Of Deformable Solids A Presentation With Exercises Engineering Mechanics

Manipulation of 1D and 2D Deformable Objects Without Modeling Deformation Dmitry Berenson

Nested queries Subqueries in SELECT SELECT DISTINCT C.cname, (SELECT count(*) FROM Product P

Select the best sources by Currency Select the checking best sources by Range Select the

This Lecture SQL SELECT WHERE Clauses SQL SELECT SELECT from multiple tables JOINs

Tracking Deformable Objects with Point Clouds John Schulman, Alex

Tracking deformable objects with WiSARD networks: a preliminary work INNOROBO 2014 European

A Deformable Balloon for Tomography Motion Artifact Study Damien Rohmer November 21, 2006

Path Planning and Execution For Deformable Objects Using a Voxel-Based Representation Calder

Engineering Mechanics of Deformable Solids: A Presentation with Exercises by Sanjay Govindjee

Sof oft-Bod ody y Dy Dyna namics cs Deformable Objects A difficult problem Lots of

Geometric Registration for Deformable Shapes 1.3 4D Kinematic Surfaces Rigid Transformation (

Last time Fitting an arbitrary shape with active deformable contours Segmentation

A Discriminatively Trained, Multiscale, Deformable Part Model by Pedro Felzenszwalb, David

Ray Tracing Intro Steve Marschner CS 4620 Cornell University Cornell CS4620 Fall 2020 Steve

Nico Pietroni Paolo Cignoni 1 c What is Digital Fabrication? Additive Manufacturing CNC Milling

Towards X Visual Reasoning Hanwang Zhang hanwangzhang@ntu.edu.sg Pattern Recognition

AUTOMATIC DETECTION OF SATIRE AND SARCASM Computational Approaches to Creative Language, SS

Behind the Scene of Side Channel Attacks ASIACRYPT 2013 Victor LOMNE, Emmanuel PROUFF and Thomas

Information Session June 9 th , Grade 5 Information Session School We are a Middle School of

Interfaces from SciFi Robert W. Lindeman Worcester Polytechnic Institute Department of Computer

Computational Science Working Group Adam Lyon &amp; Jim Kowalkowski All Scientist Retreat 26

Computational Science Working Group Adam Lyon & Jim Kowalkowski All Scientist Retreat 26