Learning a Prior over Intent via Meta-Inverse Reinforcement Learning - - PowerPoint PPT Presentation

learning a prior over intent via meta inverse
SMART_READER_LITE
LIVE PREVIEW

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning - - PowerPoint PPT Presentation

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn University of California, Berkeley Motivation : a well specified reward function remains an important


slide-1
SLIDE 1

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning

Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn University of California, Berkeley

slide-2
SLIDE 2

Motivation: a well specified reward function remains an important assumption for applying RL in practice

Meta Reward and Intention Learning

MANDRIL

slide-3
SLIDE 3

Motivation: a well specified reward function remains an important assumption for applying RL in practice

Simulation

Meta Reward and Intention Learning

MANDRIL

slide-4
SLIDE 4

Motivation: a well specified reward function remains an important assumption for applying RL in practice

Simulation Real World

Meta Reward and Intention Learning

MANDRIL

slide-5
SLIDE 5

Motivation: a well specified reward function remains an important assumption for applying RL in practice

Simulation Real World

Often easier to provide expert data and learn a reward function using inverse RL

Meta Reward and Intention Learning

MANDRIL

slide-6
SLIDE 6

Motivation: a well specified reward function remains an important assumption for applying RL in practice

Simulation Real World

Often easier to provide expert data and learn a reward function using inverse RL Inverse RL frequently requires a lot of data to learn a generalizable reward

Meta Reward and Intention Learning

MANDRIL

slide-7
SLIDE 7

Motivation: a well specified reward function remains an important assumption for applying RL in practice

Simulation Real World

Often easier to provide expert data and learn a reward function using inverse RL Inverse RL frequently requires a lot of data to learn a generalizable reward This is due in part with the fundamental ambiguity of reward learning

Meta Reward and Intention Learning

MANDRIL

slide-8
SLIDE 8

Goal: how can agents infer rewards from

  • ne or a few demonstrations?

Meta Reward and Intention Learning

MANDRIL

slide-9
SLIDE 9

Goal: how can agents infer rewards from

  • ne or a few demonstrations?

Intuition: demonstrations from previous tasks induce a prior over the space

  • f possible future tasks

Meta Reward and Intention Learning

MANDRIL

slide-10
SLIDE 10

Goal: how can agents infer rewards from

  • ne or a few demonstrations?

Intuition: demonstrations from previous tasks induce a prior over the space

  • f possible future tasks

Meta Reward and Intention Learning

MANDRIL

slide-11
SLIDE 11

Goal: how can agents infer rewards from

  • ne or a few demonstrations?

Intuition: demonstrations from previous tasks induce a prior over the space

  • f possible future tasks

Meta Reward and Intention Learning

MANDRIL

slide-12
SLIDE 12

Goal: how can agents infer rewards from

  • ne or a few demonstrations?

Shared Context → Efficient adaptation

Intuition: demonstrations from previous tasks induce a prior over the space

  • f possible future tasks

Meta Reward and Intention Learning

MANDRIL

slide-13
SLIDE 13

Meta-inverse reinforcement learning: using prior tasks information to accelerate inverse-RL

Meta Reward and Intention Learning

MANDRIL

slide-14
SLIDE 14

Meta-inverse reinforcement learning: using prior tasks information to accelerate inverse-RL

Meta Reward and Intention Learning

MANDRIL

slide-15
SLIDE 15

Meta-inverse reinforcement learning: using prior tasks information to accelerate inverse-RL

Meta Reward and Intention Learning

MANDRIL

slide-16
SLIDE 16

Our instantiation: (background) Model-agnostic meta-learning

Meta Reward and Intention Learning

MANDRIL

slide-17
SLIDE 17

Our instantiation: (background) Model-agnostic meta-learning

Meta Reward and Intention Learning

MANDRIL

slide-18
SLIDE 18

Our instantiation: (background) Model-agnostic meta-learning

Meta Reward and Intention Learning

MANDRIL

slide-19
SLIDE 19

Our instantiation: (background) Model-agnostic meta-learning

Meta Reward and Intention Learning

MANDRIL

slide-20
SLIDE 20

Our approach: Meta reward and intention learning

Meta Reward and Intention Learning

MANDRIL

slide-21
SLIDE 21

Our approach: Meta reward and intention learning

Meta Reward and Intention Learning

MANDRIL

slide-22
SLIDE 22

Our approach: Meta reward and intention learning

Meta Reward and Intention Learning

MANDRIL

slide-23
SLIDE 23

Domain 1: SpriteWorld environment

Evaluation time Meta- Training

Meta Reward and Intention Learning

MANDRIL

slide-24
SLIDE 24

Domain 1: SpriteWorld environment

Each task is a specific landmark navigation task

Evaluation time Meta- Training

Meta Reward and Intention Learning

MANDRIL

slide-25
SLIDE 25

Domain 1: SpriteWorld environment

Each task is a specific landmark navigation task Each task exhibits the same terrain preferences

Evaluation time Meta- Training

Meta Reward and Intention Learning

MANDRIL

slide-26
SLIDE 26

Domain 1: SpriteWorld environment

Each task is a specific landmark navigation task Each task exhibits the same terrain preferences Evaluation time varies the position of landmark and uses unseen sprites

Evaluation time Meta- Training

Meta Reward and Intention Learning

MANDRIL

slide-27
SLIDE 27

Domain 2: First person navigation (SUNCG)

Meta Reward and Intention Learning

MANDRIL

slide-28
SLIDE 28

Tasks require both learning navigation (NAV) and picking (PICK)

Domain 2: First person navigation (SUNCG)

Meta Reward and Intention Learning

MANDRIL

slide-29
SLIDE 29

Tasks require both learning navigation (NAV) and picking (PICK)

Domain 2: First person navigation (SUNCG)

Task illustration

Meta Reward and Intention Learning

MANDRIL

slide-30
SLIDE 30

Tasks require both learning navigation (NAV) and picking (PICK)

Domain 2: First person navigation (SUNCG)

Task illustration Agent view

Meta Reward and Intention Learning

MANDRIL

slide-31
SLIDE 31

Tasks require both learning navigation (NAV) and picking (PICK)

Domain 2: First person navigation (SUNCG)

Task illustration Agent view Tasks share a common theme but differ in visual layout and specific goal

Meta Reward and Intention Learning

MANDRIL

slide-32
SLIDE 32

Results: With only a limited number of demonstrations, performance is significantly better

Meta Reward and Intention Learning

MANDRIL

slide-33
SLIDE 33

Results: With only a limited number of demonstrations, performance is significantly better

Meta Reward and Intention Learning

MANDRIL

slide-34
SLIDE 34

Results: With only a limited number of demonstrations, performance is significantly better

Meta Reward and Intention Learning

MANDRIL

slide-35
SLIDE 35

Results: With only a limited number of demonstrations, performance is significantly better

Meta Reward and Intention Learning

MANDRIL

slide-36
SLIDE 36

Results: Optimizing initial weights consistently improves performance across tasks

Success rate is significantly improved on both test and unseen house layouts especially on the harder PICK task

Meta Reward and Intention Learning

MANDRIL

slide-37
SLIDE 37

Reward function can be adapted with a limited number of demonstrations

Meta Reward and Intention Learning

MANDRIL

slide-38
SLIDE 38

Reward function can be adapted with a limited number of demonstrations

Meta Reward and Intention Learning

MANDRIL

slide-39
SLIDE 39

Reward function can be adapted with a limited number of demonstrations

Meta Reward and Intention Learning

MANDRIL

slide-40
SLIDE 40

Reward function can be adapted with a limited number of demonstrations

Meta Reward and Intention Learning

MANDRIL

slide-41
SLIDE 41

Reward function can be adapted with a limited number of demonstrations

Meta Reward and Intention Learning

MANDRIL

slide-42
SLIDE 42

Thanks! Tuesday, Poster #222

Kelvin Xu Ellis Ratner Anca Dragan Sergey Levine Chelsea Finn