Guided Policy Search Sergey Levine Learning on PR2 Shape sorting - - PowerPoint PPT Presentation

guided policy search
SMART_READER_LITE
LIVE PREVIEW

Guided Policy Search Sergey Levine Learning on PR2 Shape sorting - - PowerPoint PPT Presentation

Guided Policy Search Sergey Levine Learning on PR2 Shape sorting cube Visuomotor Policies Guided Policy Search trajectory optimization supervised learning expectation under current policy trajectory distribution(s) Lagrange multiplier


slide-1
SLIDE 1

Guided Policy Search

Sergey Levine

slide-2
SLIDE 2

Learning on PR2

slide-3
SLIDE 3

Shape sorting cube

slide-4
SLIDE 4

Visuomotor Policies

slide-5
SLIDE 5

Guided Policy Search

supervised learning trajectory optimization

slide-6
SLIDE 6

expectation under current policy

trajectory distribution(s)

Lagrange multiplier

slide-7
SLIDE 7

Supervised Learning Objective

slide-8
SLIDE 8

Trajectory Optimization (without GPS)

slide-9
SLIDE 9

Trajectory Optimization

slide-10
SLIDE 10

new

  • ld

Trajectory Optimization

[see Levine & Abbeel ‘14 for details]

slide-11
SLIDE 11

[see L. et al. NIPS ‘14 for details]

slide-12
SLIDE 12

Trajectory Optimization (with GPS)

slide-13
SLIDE 13

[see L. et al. NIPS ‘14 for details]

slide-14
SLIDE 14

Instrumented Training

training time test time

slide-15
SLIDE 15

~ 92,000 parameters

Chelsea Finn

slide-16
SLIDE 16

Experimental Tasks

slide-17
SLIDE 17

Shape sorting cube

slide-18
SLIDE 18

Hanger

slide-19
SLIDE 19

Hammer

slide-20
SLIDE 20

Bottle

slide-21
SLIDE 21

Locomotion

Igor Mordatch

better trajectory optimization + large scale simulation

slide-22
SLIDE 22

Darwin Robot

Igor Mordatch

better trajectory optimization + large scale simulation + adaptation to real world dynamics

Mordatch, Mishra, Eppner, Abbeel

slide-23
SLIDE 23

Guided Policy Search Applications manipulation locomotion

with N. Wagener and P. Abbeel with V. Kumar and E. Todorov with V. Koltun

aerial vehicles

with G. Kahn, T. Zhang, P. Abbeel

tensegrity robot

with M. Zhang, K. Caluwaerts, P. Abbeel

dexterous hands

slide-24
SLIDE 24

DAGGER

typically 0.0, except when i = 1, then 1.0

slide-25
SLIDE 25

DAGGER Video

See http://videolectures.net/aistats2011_ross_reduction/

slide-26
SLIDE 26

Trajectory Optimization – Dynamics Fitting

slide-27
SLIDE 27

[see L. et al. NIPS ‘14 for details]

slide-28
SLIDE 28

Learned Motion Skills

slide-29
SLIDE 29

More Visuomotor Experiments

slide-30
SLIDE 30

Beyond Instrumented Training

training time test time

Finn, Tan, Duan, Darrell, L., Abbeel ‘15

slide-31
SLIDE 31

Learning Visual State Spaces

slide-32
SLIDE 32

Visual State Space Experiments