Meta-Reinforcement Learning of Structured Exploration Strategies - - PowerPoint PPT Presentation

▶

May 28, 2023 886 likes •1.2k views

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine Human Exploration vs Robot Exploration Human Exploration vs Robot Exploration Human Exploration vs Robot

SLIDE 1

Meta-Reinforcement Learning of Structured Exploration Strategies

Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine

SLIDE 2

Human Exploration vs Robot Exploration

SLIDE 3

Human Exploration vs Robot Exploration

SLIDE 4

Human Exploration vs Robot Exploration

SLIDE 5

Exploration Informed by Prior Experience

SLIDE 6

Exploration Informed by Prior Experience

SLIDE 7

Exploration Informed by Prior Experience

Desired:

§ Effective exploration for sparse rewards § Quick adaptation for new tasks

SLIDE 8

1. Explore with random but structured behaviors (exploration)

Key Insights in MAESN

SLIDE 9

1. Explore with random but structured behaviors (exploration)
2. Explicitly train for quick learning on new tasks (adaptation)

Key Insights in MAESN

SLIDE 10

1. Explore with random but structured behaviors (exploration)
2. Explicitly train for quick learning on new tasks (adaptation)

Key Insights in MAESN

SLIDE 11

1. Explore with random but structured behaviors (exploration)
2. Explicitly train for quick learning on new tasks (adaptation)

Key Insights in MAESN

Fast Learning

Grasp red object

SLIDE 12

1. Explore with random but structured behaviors (exploration)
2. Explicitly train for quick learning on new tasks (adaptation)

Key Insights in MAESN

Fast Learning

Grasp red object

SLIDE 13

Structured exploration: pick an intention, execute for entire episode. Explore across different intentions

Structured Exploration Per-timestep Exploration

Using Structured Stochasticity

SLIDE 14

Structured stochasticity introduced through latent conditioned policy

Latent Space

Latent Conditioned Policies

z ∼ qω(.)

<latexit sha1_base64="btQgEOcsFKLrPIvjacYwlzSy1n4=">AB/XicbVDLSsNAFJ3UV62v+Ni5GSxC3YREBF0W3bisYB/QhDCZTtqhM5M4MxHaUPwVNy4Ucet/uPNvnLZaOuBC4dz7uXe6KUaVd9sqrayurW+UNytb2zu7e/b+QUslmcSkiROWyE6EFGFUkKampFOKgniESPtaHgz9duPRCqaiHs9SknAUV/QmGKkjRTaR2PoK8rhQ5j7CSd9NKk5Z6FdR13BrhMvIJUQYFGaH/5vQRnAiNGVKq67mpDnIkNcWMTCp+pkiK8BD1SdQgThRQT67fgJPjdKDcSJNCQ1n6u+JHGlRjwynRzpgVr0puJ/XjfT8VWQU5Fmg8XxRnDOoETqOAPSoJ1mxkCMKSmlshHiCJsDaBVUwI3uLy6R17niu491dVOvXRxlcAxOQA14BLUwS1ogCbAYAyewSt4s56sF+vd+pi3lqxi5hD8gfX5A1p4lH4=</latexit><latexit sha1_base64="btQgEOcsFKLrPIvjacYwlzSy1n4=">AB/XicbVDLSsNAFJ3UV62v+Ni5GSxC3YREBF0W3bisYB/QhDCZTtqhM5M4MxHaUPwVNy4Ucet/uPNvnLZaOuBC4dz7uXe6KUaVd9sqrayurW+UNytb2zu7e/b+QUslmcSkiROWyE6EFGFUkKampFOKgniESPtaHgz9duPRCqaiHs9SknAUV/QmGKkjRTaR2PoK8rhQ5j7CSd9NKk5Z6FdR13BrhMvIJUQYFGaH/5vQRnAiNGVKq67mpDnIkNcWMTCp+pkiK8BD1SdQgThRQT67fgJPjdKDcSJNCQ1n6u+JHGlRjwynRzpgVr0puJ/XjfT8VWQU5Fmg8XxRnDOoETqOAPSoJ1mxkCMKSmlshHiCJsDaBVUwI3uLy6R17niu491dVOvXRxlcAxOQA14BLUwS1ogCbAYAyewSt4s56sF+vd+pi3lqxi5hD8gfX5A1p4lH4=</latexit><latexit sha1_base64="btQgEOcsFKLrPIvjacYwlzSy1n4=">AB/XicbVDLSsNAFJ3UV62v+Ni5GSxC3YREBF0W3bisYB/QhDCZTtqhM5M4MxHaUPwVNy4Ucet/uPNvnLZaOuBC4dz7uXe6KUaVd9sqrayurW+UNytb2zu7e/b+QUslmcSkiROWyE6EFGFUkKampFOKgniESPtaHgz9duPRCqaiHs9SknAUV/QmGKkjRTaR2PoK8rhQ5j7CSd9NKk5Z6FdR13BrhMvIJUQYFGaH/5vQRnAiNGVKq67mpDnIkNcWMTCp+pkiK8BD1SdQgThRQT67fgJPjdKDcSJNCQ1n6u+JHGlRjwynRzpgVr0puJ/XjfT8VWQU5Fmg8XxRnDOoETqOAPSoJ1mxkCMKSmlshHiCJsDaBVUwI3uLy6R17niu491dVOvXRxlcAxOQA14BLUwS1ogCbAYAyewSt4s56sF+vd+pi3lqxi5hD8gfX5A1p4lH4=</latexit>

at

<latexit sha1_base64="cg9eTEqUtCZqkwCW5khYw4me7mE=">AB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpgfax71a9mjcHWSV+QapQoNF3v3qDhGUxV8gkNabreykGOdUomOTSi8zPKVsTIe8a6miMTdBPj91Ss6sMiBRom0pJHP190ROY2MmcWg7Y4ojs+zNxP+8bobRdZALlWbIFVsijJMCGzv8lAaM5QTiyhTAt7K2EjqilDm07FhuAv7xKWhc136v595fV+k0RxlO4BTOwYcrqMdNKAJDIbwDK/w5kjnxXl3PhatJaeYOY/cD5/AE4yjcw=</latexit><latexit sha1_base64="cg9eTEqUtCZqkwCW5khYw4me7mE=">AB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpgfax71a9mjcHWSV+QapQoNF3v3qDhGUxV8gkNabreykGOdUomOTSi8zPKVsTIe8a6miMTdBPj91Ss6sMiBRom0pJHP190ROY2MmcWg7Y4ojs+zNxP+8bobRdZALlWbIFVsijJMCGzv8lAaM5QTiyhTAt7K2EjqilDm07FhuAv7xKWhc136v595fV+k0RxlO4BTOwYcrqMdNKAJDIbwDK/w5kjnxXl3PhatJaeYOY/cD5/AE4yjcw=</latexit><latexit sha1_base64="cg9eTEqUtCZqkwCW5khYw4me7mE=">AB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpgfax71a9mjcHWSV+QapQoNF3v3qDhGUxV8gkNabreykGOdUomOTSi8zPKVsTIe8a6miMTdBPj91Ss6sMiBRom0pJHP190ROY2MmcWg7Y4ojs+zNxP+8bobRdZALlWbIFVsijJMCGzv8lAaM5QTiyhTAt7K2EjqilDm07FhuAv7xKWhc136v595fV+k0RxlO4BTOwYcrqMdNKAJDIbwDK/w5kjnxXl3PhatJaeYOY/cD5/AE4yjcw=</latexit>

<latexit sha1_base64="NO/AVq0yYsPdpG3K5Q5U13QK4s=">AB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpwfSx71a9mjcHWSV+QapQoNF3v3qDhGUxV8gkNabreykGOdUomOTSi8zPKVsTIe8a6miMTdBPj91Ss6sMiBRom0pJHP190ROY2MmcWg7Y4ojs+zNxP+8bobRdZALlWbIFVsijJMCGzv8lAaM5QTiyhTAt7K2EjqilDm07FhuAv7xKWhc136v595fV+k0RxlO4BTOwYcrqMdNKAJDIbwDK/w5kjnxXl3PhatJaeYOY/cD5/AGmejd4=</latexit><latexit sha1_base64="NO/AVq0yYsPdpG3K5Q5U13QK4s=">AB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpwfSx71a9mjcHWSV+QapQoNF3v3qDhGUxV8gkNabreykGOdUomOTSi8zPKVsTIe8a6miMTdBPj91Ss6sMiBRom0pJHP190ROY2MmcWg7Y4ojs+zNxP+8bobRdZALlWbIFVsijJMCGzv8lAaM5QTiyhTAt7K2EjqilDm07FhuAv7xKWhc136v595fV+k0RxlO4BTOwYcrqMdNKAJDIbwDK/w5kjnxXl3PhatJaeYOY/cD5/AGmejd4=</latexit><latexit sha1_base64="NO/AVq0yYsPdpG3K5Q5U13QK4s=">AB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpwfSx71a9mjcHWSV+QapQoNF3v3qDhGUxV8gkNabreykGOdUomOTSi8zPKVsTIe8a6miMTdBPj91Ss6sMiBRom0pJHP190ROY2MmcWg7Y4ojs+zNxP+8bobRdZALlWbIFVsijJMCGzv8lAaM5QTiyhTAt7K2EjqilDm07FhuAv7xKWhc136v595fV+k0RxlO4BTOwYcrqMdNKAJDIbwDK/w5kjnxXl3PhatJaeYOY/cD5/AGmejd4=</latexit>

Train latent space to capture prior task distribution

SLIDE 15

Beyond capturing task distribution, train for quick adaptation via meta-learning

Meta-Training Latent Spaces

Latent Space

SLIDE 16

Beyond capturing task distribution, train for quick adaptation via meta-learning

Meta-Training Latent Spaces

Latent Space 1 step of RL Grasp red object

SLIDE 17

Beyond capturing task distribution, train for quick adaptation via meta-learning

Meta-Training Latent Spaces

Latent Space 1 step of RL Latent Space Grasp red object

SLIDE 18

Beyond capturing task distribution, train for quick adaptation via meta-learning

Meta-Training Latent Spaces

1 step of RL 1 step of RL 1 step of RL Meta-train latent space, policy

SLIDE 19

Beyond capturing task distribution, train for quick adaptation via meta-learning

Meta-Training Latent Spaces

1 step of RL 1 step of RL 1 step of RL Meta-train latent space, policy

Train with algorithm based on Model Agnostic Meta-Learning[1]

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al ICML 2017

SLIDE 20

Experiments: Robotic Manipulation

Random Exploration

SLIDE 21

Experiments: Robotic Manipulation

Random Exploration MAESN exploration

SLIDE 22

Experiments: Robotic Manipulation

Random Exploration MAESN exploration

SLIDE 23

Experiments: Legged Locomotion

Random Exploration

SLIDE 24

Experiments: Legged Locomotion

Random Exploration MAESN exploration

SLIDE 25

Experiments: Legged Locomotion

Random Exploration MAESN exploration

SLIDE 26

§ Learns very quickly § Higher asymptotic reward than prior methods § Better exploration

Quick Learning of New Tasks

SLIDE 27

§ Learns very quickly § Higher asymptotic reward than prior methods § Better exploration

Quick Learning of New Tasks

SLIDE 28

§ Learns very quickly § Higher asymptotic reward than prior methods § Better exploration

Quick Learning of New Tasks

SLIDE 29

§ Learns very quickly § Higher asymptotic reward than prior methods § Better exploration

Quick Learning of New Tasks

SLIDE 30

Thank You!

Sergey Levine Pieter Abbeel YuXuan Liu Russell Mendonca

Meta-Reinforcement Learning of Structured Exploration Strategies

Human Exploration vs Robot Exploration

Human Exploration vs Robot Exploration

Human Exploration vs Robot Exploration

Exploration Informed by Prior Experience

Exploration Informed by Prior Experience

Exploration Informed by Prior Experience

Key Insights in MAESN

Key Insights in MAESN

Key Insights in MAESN

Key Insights in MAESN

Key Insights in MAESN

Structured exploration: pick an intention, execute for entire episode. Explore across different intentions

Using Structured Stochasticity

Structured stochasticity introduced through latent conditioned policy

Latent Conditioned Policies

at

Train latent space to capture prior task distribution

Beyond capturing task distribution, train for quick adaptation via meta-learning

Meta-Training Latent Spaces

Beyond capturing task distribution, train for quick adaptation via meta-learning

Meta-Training Latent Spaces

Beyond capturing task distribution, train for quick adaptation via meta-learning

Meta-Training Latent Spaces

Beyond capturing task distribution, train for quick adaptation via meta-learning

Meta-Training Latent Spaces

Beyond capturing task distribution, train for quick adaptation via meta-learning

Meta-Training Latent Spaces

Train with algorithm based on Model Agnostic Meta-Learning[1]

Experiments: Robotic Manipulation

Experiments: Robotic Manipulation

Experiments: Robotic Manipulation

Experiments: Legged Locomotion

Experiments: Legged Locomotion

Experiments: Legged Locomotion

Quick Learning of New Tasks

Quick Learning of New Tasks

Quick Learning of New Tasks

Quick Learning of New Tasks

Thank You!

Please come visit our poster at Room 210 and 230, AB #134 Find code and paper online at https://sites.google.com/view/meta-explore/