CS 285 Instructor: Sergey Levine UC Berkeley Recap: whats the - PowerPoint PPT Presentation

Exploration (Part 2) CS 285 Instructor: Sergey Levine UC Berkeley

Recap: what’s the problem? this is easy (mostly) this is impossible Why?

Unsupervised learning of diverse behaviors What if we want to recover diverse behavior without any reward function at all ? Why? ➢ Learn skills without supervision, then use them to accomplish goals ➢ Learn sub-skills to use with hierarchical reinforcement learning ➢ Explore the space of possible behaviors

An Example Scenario How can you prepare for an unknown future goal? training time: unsupervised

In this lecture… ➢ Definitions & concepts from information theory ➢ Learning without a reward function by reaching goals ➢ A state distribution-matching formulation of reinforcement learning ➢ Is coverage of valid states a good exploration objective? ➢ Beyond state covering: covering the space of skills

Some useful identities

Information theoretic quantities in RL quantifies coverage can be viewed as quantifying “control authority” in an information -theoretic way

An Example Scenario How can you prepare for an unknown future goal? training time: unsupervised

Learn without any rewards at all (but there are many other choices) Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals . ’18 12 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

Learn without any rewards at all Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals . ’18 13 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

Learn without any rewards at all Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals . ’18 14 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

How do we get diverse goals? Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals . ’18 15 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

How do we get diverse goals? goals get higher entropy due to Skew-Fit goal final state Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals . ’18 17 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

Reinforcement learning with imagined goals imagined goal RL episode Nair*, Pong*, Bahl, Dalal, Lin, L. Visual Reinforcement Learning with Imagined Goals . ’18 19 Dalal*, Pong*, Lin*, Nair, Bahl, Levine. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. ‘19

Aside: exploration with intrinsic motivation

Can we use this for state marginal matching? Lee*, Eysenbach*, Parisotto*, Xing, Levine, Salakhutdinov. Efficient Exploration via State Marginal Matching See also: Hazan, Kakade, Singh, Van Soest. Provably Efficient Maximum Entropy Exploration

State marginal matching for exploration much better coverage! MaxEnt on actions variants of SMM Lee*, Eysenbach*, Parisotto*, Xing, Levine, Salakhutdinov. Efficient Exploration via State Marginal Matching See also: Hazan, Kakade, Singh, Van Soest. Provably Efficient Maximum Entropy Exploration

Is state entropy really a good objective? more or less the same thing See also: Hazan, Kakade, Singh, Van Soest. Provably Efficient Maximum Entropy Exploration 25 Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning

In this lecture… ➢ Definitions & concepts from information theory ➢ A distribution-matching formulation of reinforcement learning ➢ Learning without a reward function by reaching goals ➢ A state distribution-matching formulation of reinforcement learning ➢ Is coverage of valid states a good exploration objective? ➢ Beyond state covering: covering the space of skills

Learning diverse skills task index Reaching diverse goals is not the same as performing diverse tasks not all behaviors can be captured by goal-reaching Intuition: different skills should visit different state-space regions Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Diversity-promoting reward function Environment Action State Discriminator(D) Policy(Agent) Skill (z) Predict Skill Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Examples of learned tasks Cheetah Ant Mountain car Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

A connection to mutual information Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need. See also: Gregor et al. Variational Intrinsic Control. 2016

CS 285 Instructor: Sergey Levine UC Berkeley Recap: whats the - PowerPoint PPT Presentation

Exploration (Part 2) CS 285 Instructor: Sergey Levine UC Berkeley Recap: whats the problem? this is easy (mostly) this is impossible Why? Unsupervised learning of diverse behaviors What if we want to recover diverse behavior without any

Performa 285 Performa 285 High Alloy Zinc Nickel High Alloy Zinc Nickel Alloy Zinc Automotive

Ichthys LNG Project Ichthys Project Location Abadi WA 285 P Ichthys Field WA 285

I-285 Top End Express Lanes I-285 Westside Express Lanes 1 Unprecedented Growth in Metro

Ichthys LNG Project Ichthys NG roject Ichthys Project Location Abadi WA 285 P Ichthys

BLU-285: A potent and highly selective inhibitor designed to target malignancies driven by KIT and

GIST: imatinib and beyond Clinical activity of BLU-285 in advanced gastrointestinal stromal tumor

Particulate Air Quality Around Wisconsin Frac Sand Mines #285 B A Presentation by Dr. Crispin

Quality Candles ...in a modern design www.diana-candles.com 285 employees Aprox .

the public sector with Lorraine Forrest-Turner governmentevents.co.uk | 0330 0584 285 |

Clinical activity in a Phase 1 study of BLU-285, a potent, highly-selective inhibitor of KIT D816V

Visual disability Low vision 2015 Estimated blind people 2020 Visually impaired 285 M Blind

Southern Companys Demonstration of a 285 MW Coal-Based Transport Gasifier Project Project

Georgia DOT Updates: MMIP and Transform 285/400 January 23, 2018 Tim Matthews, P.E. MMIP

Lanes and I-285 Top End Express Lanes Fulton County Schools Briefing Tim Matthews, P.E.

COST OR PRICE COST OR PRICE REASONABLENESS REASONABLENESS (CPR) (CPR) UH APM A8.285 RCUH

Introduction to Intelligent Transportation Systems (ITS): I-285 Variable Speed Limits Andrew

Function Space Priors in Bayesian Deep Learning Roger Grosse Motivation Today Bayesian deep

Lecture 1 08/24/15 Instructor: Yu-San Lin yusan@psu.edu

Smaller, more accurate regression forests using tree alternating optimization Arman

Machine Learning Software: Design and Practical Use Chih-Jen Lin National Taiwan University

Jin Lin, Ernesto Su, Xinmin Tian Intel Corporation LLVM Developers Meeting 2018, October

Partnering with countries, cities and industries William Lin EVP, regions, cities and solutions

Basic Algorithms for Periodic-Linear Inequalities and Integer Polyhedra Alain Keterlin / Camus

Oboe: Auto-tuning Video ABR Algorithms to Network Conditions Zahaib Akhtar , Yun Seong Nam