CS 285 Instructor: Sergey Levine UC Berkeley Recap: whats the - - PowerPoint PPT Presentation

cs 285
SMART_READER_LITE
LIVE PREVIEW

CS 285 Instructor: Sergey Levine UC Berkeley Recap: whats the - - PowerPoint PPT Presentation

Exploration (Part 2) CS 285 Instructor: Sergey Levine UC Berkeley Recap: whats the problem? this is easy (mostly) this is impossible Why? Unsupervised learning of diverse behaviors What if we want to recover diverse behavior without any


slide-1
SLIDE 1

Exploration (Part 2)

CS 285

Instructor: Sergey Levine UC Berkeley

slide-2
SLIDE 2

Recap: what’s the problem?

this is easy (mostly) this is impossible

Why?

slide-3
SLIDE 3

Unsupervised learning of diverse behaviors

What if we want to recover diverse behavior without any reward function at all? Why?

➢Learn skills without supervision, then use them to accomplish goals ➢Learn sub-skills to use with hierarchical reinforcement learning ➢Explore the space of possible behaviors

slide-4
SLIDE 4

An Example Scenario

training time: unsupervised How can you prepare for an unknown future goal?

slide-5
SLIDE 5

In this lecture…

➢ Definitions & concepts from information theory ➢ Learning without a reward function by reaching goals ➢ A state distribution-matching formulation of reinforcement learning ➢ Is coverage of valid states a good exploration objective? ➢ Beyond state covering: covering the space of skills

slide-6
SLIDE 6

In this lecture…

➢ Definitions & concepts from information theory ➢ Learning without a reward function by reaching goals ➢ A state distribution-matching formulation of reinforcement learning ➢ Is coverage of valid states a good exploration objective? ➢ Beyond state covering: covering the space of skills

slide-7
SLIDE 7

Some useful identities

slide-8
SLIDE 8

Some useful identities

slide-9
SLIDE 9

Information theoretic quantities in RL

quantifies coverage can be viewed as quantifying “control authority” in an information-theoretic way

slide-10
SLIDE 10

In this lecture…

➢ Definitions & concepts from information theory ➢ Learning without a reward function by reaching goals ➢ A state distribution-matching formulation of reinforcement learning ➢ Is coverage of valid states a good exploration objective? ➢ Beyond state covering: covering the space of skills

slide-11
SLIDE 11

An Example Scenario

training time: unsupervised How can you prepare for an unknown future goal?

slide-12
SLIDE 12

Learn without any rewards at all

(but there are many other choices)

Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals. ’18 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

12

slide-13
SLIDE 13

Learn without any rewards at all

Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals. ’18 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

13

slide-14
SLIDE 14

Learn without any rewards at all

Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals. ’18 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

14

slide-15
SLIDE 15

How do we get diverse goals?

Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals. ’18 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

15

slide-16
SLIDE 16

How do we get diverse goals?

Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals. ’18 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

16

slide-17
SLIDE 17

How do we get diverse goals?

Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals. ’18 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

17

goals get higher entropy due to Skew-Fit goal final state

slide-18
SLIDE 18

How do we get diverse goals?

Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals. ’18 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

18

slide-19
SLIDE 19

Reinforcement learning with imagined goals

Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals. ’18 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

imagined goal RL episode

19

slide-20
SLIDE 20

In this lecture…

➢ Definitions & concepts from information theory ➢ Learning without a reward function by reaching goals ➢ A state distribution-matching formulation of reinforcement learning ➢ Is coverage of valid states a good exploration objective? ➢ Beyond state covering: covering the space of skills

slide-21
SLIDE 21

Aside: exploration with intrinsic motivation

slide-22
SLIDE 22

Can we use this for state marginal matching?

Lee*, Eysenbach*, Parisotto*, Xing, Levine, Salakhutdinov. Efficient Exploration via State Marginal Matching See also: Hazan, Kakade, Singh, Van Soest. Provably Efficient Maximum Entropy Exploration

slide-23
SLIDE 23

MaxEnt on actions variants of SMM

State marginal matching for exploration

much better coverage! Lee*, Eysenbach*, Parisotto*, Xing, Levine, Salakhutdinov. Efficient Exploration via State Marginal Matching See also: Hazan, Kakade, Singh, Van Soest. Provably Efficient Maximum Entropy Exploration

slide-24
SLIDE 24

In this lecture…

➢ Definitions & concepts from information theory ➢ Learning without a reward function by reaching goals ➢ A state distribution-matching formulation of reinforcement learning ➢ Is coverage of valid states a good exploration objective? ➢ Beyond state covering: covering the space of skills

slide-25
SLIDE 25

Is state entropy really a good objective?

25

more or less the same thing Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning See also: Hazan, Kakade, Singh, Van Soest. Provably Efficient Maximum Entropy Exploration

slide-26
SLIDE 26

In this lecture…

➢ Definitions & concepts from information theory ➢ A distribution-matching formulation of reinforcement learning ➢ Learning without a reward function by reaching goals ➢ A state distribution-matching formulation of reinforcement learning ➢ Is coverage of valid states a good exploration objective? ➢ Beyond state covering: covering the space of skills

slide-27
SLIDE 27

Learning diverse skills

task index

Intuition: different skills should visit different state-space regions

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need. Reaching diverse goals is not the same as performing diverse tasks not all behaviors can be captured by goal-reaching

slide-28
SLIDE 28

Diversity-promoting reward function

Policy(Agent) Discriminator(D)

Skill (z) Environment Action State Predict Skill

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

slide-29
SLIDE 29

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Cheetah Ant

Examples of learned tasks

Mountain car

slide-30
SLIDE 30

A connection to mutual information

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need. See also: Gregor et al. Variational Intrinsic Control. 2016