Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj - PowerPoint PPT Presentation

Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj Gandhi* Abhinav Gupta UC Berkeley CMU CMU, FAIR ICML 2019 * equal contribution

Exploration – a major challenge!

Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. Houthooft et.al. “VIME: Variational information • maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • fun, and intrinsic motivation (1990–2010)”, 2010. Gregor et.al. “Variational intrinsic control”. ICLR • Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • motivation? a typology of computational Pathak et.al. “Curiosity-driven Exploration by Self- • approaches. Frontiers in neurorobotics, 2009. supervised Exploration”. ICML 2017 Poupart et.al. “An analytic solution to discrete • Ostrovski et.al. “Count-based exploration with • bayesian reinforcement learning”. ICML, 2006. neural density models”. ICML, 2017. Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.

Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. t Houthooft et.al. “VIME: Variational information • n e ] maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • i s c e i fun, and intrinsic motivation (1990–2010)”, 2010. l f p f Gregor et.al. “Variational intrinsic control”. ICLR • e m n Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • a I s e motivation? a typology of computational l f Pathak et.al. “Curiosity-driven Exploration by Self- • p o approaches. Frontiers in neurorobotics, 2009. m s supervised Exploration”. ICML 2017 n a o S Poupart et.al. “An analytic solution to discrete • i Ostrovski et.al. “Count-based exploration with • l l bayesian reinforcement learning”. ICML, 2006. i m neural density models”. ICML, 2017. [ Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.

Sample Inefficient Simulation

Sample Inefficient Simulation Real Robots

Sample Inefficient “Stuck” in Stochastic Envs Simulation Real Robots

Sample Inefficient “Stuck” in Stochastic Envs Simulation Curiosity Exploration w/ Noisy TV & Remote [Burda*, Edwards*, Pathak* et. al. ICLR’19] [Juliani et.al., ArXiv’19] Real Robots

Why inefficient?

[Pathak et al. ICML, 2017]

current image x t [Pathak et al. ICML, 2017]

policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]

action a t policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]

next image x t+1 action a t policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]

next image x t+1 action a t policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t [Pathak et al. ICML, 2017]

next image x t+1 action a t policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t action a t current image x t [Pathak et al. ICML, 2017]

next image x t+1 action a t predicted next image * 𝒚 𝒖-𝟐 policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t action a t current image x t [Pathak et al. ICML, 2017]

Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj - PowerPoint PPT Presentation

Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj Gandhi* Abhinav Gupta UC Berkeley CMU CMU, FAIR ICML 2019 * equal contribution Exploration a major challenge! Exploration a major challenge! Mohamed et.al.

Harmony in the Society Self-exploration, Self-investigation, Self-study 1. Content of Self

Value Disagreement and Two Aspects of Meaning Erich Rast erich@snafu.de IFILNOVA Institute of

Information Flows and Disagreement Cristian Badarinza Marco Buchmann FRBNY C ONFERENCE ON C

Disagreement and Political Liberalism Matthias Brinkmann, matthias.brinkmann@philosophy.ox.ac.uk

Minimizing Polarization and Disagreement in Social Networks Cameron Musco Chris Musco Charalampos

Measuring disagreement in science Dakota Murray, Wout Lamers, Kevin Boyack, Vincent Larivire,

vf vfLr LrRo Ro es esa O;oL oLFk Fkk Self-exploration, Self-investigation, Self-study 1.

iz iz fr fr es esa O;oL oLFk Fkk Self-exploration, Self-investigation, Self-study 1.

Supervised Learning via Decision Trees Lecture 8 Supervised Learning via Decision Trees March

Supervised Learning via Decision Trees Lecture 9 Supervised Learning via Decision Trees March

Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October

Self-Supervised Feature Learning by Learning to Spot Artifacts Wonbin Kim Self-Supervised

Harmony in the Family Understanding Relationship Trust Self-exploration, Self-investigation,

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

GP-RARS: Evolving Controllers for the Robot Auto Racing Simulator Yehonatan Shichel & Moshe

Distributed Design of Glocal Controllers Hampe pei i Sasahar ahara a (KTH), ), Takayuk uki

Out of Control: Demonstrating SCADA Exploitation Brian Meixell Eric Forner Black Hat 2013

Feedback Control Theory a Computer System s Perspective Introduction Introduction

The Hidden Nemesis: Backdooring Embedded Controllers Ralf-Philipp Weinmann University of

Forward and Inverse Models in the Cerebellum Computational Models of Neural Systems Lecture 2.3

Forward and Inverse Models in the Cerebellum Computational Models of Neural Systems Lecture 2.3

Stanford CS193p Developing Applications for iOS Winter 2017 CS193p Winter 2017 Today MVC

Sambuz

Useful Links

Newsletter

Mail Us

Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj - PowerPoint PPT Presentation

Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj Gandhi* Abhinav Gupta UC Berkeley CMU CMU, FAIR ICML 2019 * equal contribution Exploration a major challenge! Exploration a major challenge! Mohamed et.al.

Harmony in the Society Self-exploration, Self-investigation, Self-study 1. Content of Self

Value Disagreement and Two Aspects of Meaning Erich Rast erich@snafu.de IFILNOVA Institute of

Information Flows and Disagreement Cristian Badarinza Marco Buchmann FRBNY C ONFERENCE ON C

Disagreement and Political Liberalism Matthias Brinkmann, matthias.brinkmann@philosophy.ox.ac.uk

Minimizing Polarization and Disagreement in Social Networks Cameron Musco Chris Musco Charalampos

Measuring disagreement in science Dakota Murray, Wout Lamers, Kevin Boyack, Vincent Larivire,

vf vfLr LrRo Ro es esa O;oL oLFk Fkk Self-exploration, Self-investigation, Self-study 1.

iz iz fr fr es esa O;oL oLFk Fkk Self-exploration, Self-investigation, Self-study 1.

Supervised Learning via Decision Trees Lecture 8 Supervised Learning via Decision Trees March

Supervised Learning via Decision Trees Lecture 9 Supervised Learning via Decision Trees March

Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October

Self-Supervised Feature Learning by Learning to Spot Artifacts Wonbin Kim Self-Supervised

Harmony in the Family Understanding Relationship Trust Self-exploration, Self-investigation,

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

GP-RARS: Evolving Controllers for the Robot Auto Racing Simulator Yehonatan Shichel &amp; Moshe

Distributed Design of Glocal Controllers Hampe pei i Sasahar ahara a (KTH), ), Takayuk uki

Out of Control: Demonstrating SCADA Exploitation Brian Meixell Eric Forner Black Hat 2013

Feedback Control Theory a Computer System s Perspective Introduction Introduction

The Hidden Nemesis: Backdooring Embedded Controllers Ralf-Philipp Weinmann University of

Forward and Inverse Models in the Cerebellum Computational Models of Neural Systems Lecture 2.3

Forward and Inverse Models in the Cerebellum Computational Models of Neural Systems Lecture 2.3

Stanford CS193p Developing Applications for iOS Winter 2017 CS193p Winter 2017 Today MVC

Sambuz

Useful Links

Newsletter

Mail Us

GP-RARS: Evolving Controllers for the Robot Auto Racing Simulator Yehonatan Shichel & Moshe