Policy Continuation with Hindsight Inverse Dynamics Hao Sun 1 , - - PowerPoint PPT Presentation

policy continuation with hindsight inverse dynamics
SMART_READER_LITE
LIVE PREVIEW

Policy Continuation with Hindsight Inverse Dynamics Hao Sun 1 , - - PowerPoint PPT Presentation

Policy Continuation with Hindsight Inverse Dynamics Hao Sun 1 , Zhizhong Li 1 , Xiaotong Liu 2 , Dahua Lin 1 , Bolei Zhou 1 1 The Chinese University of Hong Kong 2 Peking University sh018@ie.cuhk.edu.hk Goal-Oriented Reward Sparse Tasks Goal


slide-1
SLIDE 1

Policy Continuation with Hindsight Inverse Dynamics

Hao Sun1, Zhizhong Li1, Xiaotong Liu2, Dahua Lin1, Bolei Zhou1

1 The Chinese University of Hong Kong 2 Peking University

sh018@ie.cuhk.edu.hk

slide-2
SLIDE 2

Goal-Oriented Reward Sparse Tasks

Start Goal

slide-3
SLIDE 3

Inspirations from Human Learning

  • 1. Learning from failures

[Hindsight Experience Replay, M Andrychowicz et al. 2017]

Aimed Achieved

slide-4
SLIDE 4

Inspirations from Human Learning

  • 1. Learning from failures

[Hindsight Experience Replay, M Andrychowicz et al. 2017]

Aimed Achieved

slide-5
SLIDE 5

Inspirations from Human Learning

  • 1. Learning from failures
  • 2. Extrapolating Success

Learned Extrapolate

slide-6
SLIDE 6

Our Proposed Method

ID HID 1.Hindsight

  • 2. Extrapolate 3. Policy Continuation
slide-7
SLIDE 7

Equipe Inverse Dynamics with Hindsight

Inverse Dynamics:

State Goal

Hindsight Inverse Dynamics:

slide-8
SLIDE 8

1-step HID Is Not Enough

Linear Case Non-linear Case 1-step HID

slide-9
SLIDE 9

Multi-step Optimality?

Policy Continuation: Test the optimality recursively

step 1 step 2

In 1 step ?

slide-10
SLIDE 10

Multi-step Optimality?

Policy Continuation: Test the optimality recursively

step 1 step 2 In 1 step ?

step 1 step k

In less than k-1 steps ?

step 1 step 2

In 1 step ?

slide-11
SLIDE 11

East Exhibition Hall B + C #194