Guided Policy Search Sergey Levine Learning on PR2 Shape sorting - - PowerPoint PPT Presentation

▶

Jan 16, 2023 502 likes •839 views

Guided Policy Search Sergey Levine Learning on PR2 Shape sorting cube Visuomotor Policies Guided Policy Search trajectory optimization supervised learning expectation under current policy trajectory distribution(s) Lagrange multiplier

SLIDE 1

Guided Policy Search

Sergey Levine

SLIDE 2

Learning on PR2

SLIDE 3

Shape sorting cube

SLIDE 4

Visuomotor Policies

SLIDE 5

Guided Policy Search

supervised learning trajectory optimization

SLIDE 6

expectation under current policy

trajectory distribution(s)

Lagrange multiplier

SLIDE 7

Supervised Learning Objective

SLIDE 8

Trajectory Optimization (without GPS)

SLIDE 9

Trajectory Optimization

SLIDE 10

new

Trajectory Optimization

[see Levine & Abbeel ‘14 for details]

SLIDE 11

[see L. et al. NIPS ‘14 for details]

SLIDE 12

Trajectory Optimization (with GPS)

SLIDE 13

[see L. et al. NIPS ‘14 for details]

SLIDE 14

Instrumented Training

training time test time

SLIDE 15

~ 92,000 parameters

Chelsea Finn

SLIDE 16

Experimental Tasks

SLIDE 17

Shape sorting cube

SLIDE 18

Hanger

SLIDE 19

Hammer

SLIDE 20

Bottle

SLIDE 21

Locomotion

Igor Mordatch

better trajectory optimization + large scale simulation

SLIDE 22

Darwin Robot

Igor Mordatch

better trajectory optimization + large scale simulation + adaptation to real world dynamics

Mordatch, Mishra, Eppner, Abbeel

SLIDE 23

Guided Policy Search Applications manipulation locomotion

with N. Wagener and P. Abbeel with V. Kumar and E. Todorov with V. Koltun

aerial vehicles

with G. Kahn, T. Zhang, P. Abbeel

tensegrity robot

with M. Zhang, K. Caluwaerts, P. Abbeel

dexterous hands

SLIDE 24

DAGGER

typically 0.0, except when i = 1, then 1.0

SLIDE 25

DAGGER Video

See http://videolectures.net/aistats2011_ross_reduction/

SLIDE 26

Trajectory Optimization – Dynamics Fitting

SLIDE 27

[see L. et al. NIPS ‘14 for details]

SLIDE 28

Learned Motion Skills

SLIDE 29

More Visuomotor Experiments

SLIDE 30

Beyond Instrumented Training

training time test time

Finn, Tan, Duan, Darrell, L., Abbeel ‘15

SLIDE 31

Learning Visual State Spaces

SLIDE 32

Guided Policy Search

Sergey Levine

Learning on PR2

Shape sorting cube

Visuomotor Policies

Guided Policy Search

Supervised Learning Objective

Trajectory Optimization (without GPS)

Trajectory Optimization

Trajectory Optimization

Trajectory Optimization (with GPS)

Instrumented Training

training time test time

Experimental Tasks

Shape sorting cube

Hanger

Hammer

Bottle

Locomotion

better trajectory optimization + large scale simulation

Darwin Robot

better trajectory optimization + large scale simulation + adaptation to real world dynamics

Guided Policy Search Applications manipulation locomotion

aerial vehicles

tensegrity robot

dexterous hands

DAGGER

DAGGER Video

See http://videolectures.net/aistats2011_ross_reduction/

Trajectory Optimization – Dynamics Fitting

Learned Motion Skills

More Visuomotor Experiments

Beyond Instrumented Training

training time test time

Learning Visual State Spaces

Visual State Space Experiments