Unsupervised Subgoal Discovery Method for Learning Hierarchical - PowerPoint PPT Presentation

Unsupervised Subgoal Discovery Method for Learning Hierarchical Representations Jacob Rafati Ph.D., Electrical Engineering and Computer Science (EECS) Computational Cognitive Neuroscience Laboratory (CCNL) http://rafati.net Co-authored with David C. Noelle Professor and Chair of Cognitive Information and Sciences Founding Faculty of EECS & CSE, Director of CCNL University of California, Merced Workshop on Structure and Priors in Reinforcement Learning (SPiRL 2019) 7th International Conference on Learning Representations (ICLR 2019) � 1

Reinforcement Learning Reinforcement learning (RL) is learning how to map situations ( states ) to agent ’s decisions ( actions ) to maximize future rewards (return) by interaction with an unknown environment. Experience ( s, a, r, s’ ) as Data . Sutton and Barto (2017). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, USA, 2nd edition. � 2

Generalization Parameterized Value Function State Expectation of Return Function (Game Scores) Approximator q ( s, a i ; w ) s w . . . . . . � 3

Success in easy tasks, Failure in more complex task Mnih, et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529–533. � 4

Learning Representations in model-free HRL • Temporal Abstraction Learning to operate over di ff erent levels of temporal abstraction . Learning a meta-policy to choose a proper subgoal. • Intrinsic Motivation Learning E ffi ciently exploring the state-space while learning reusable subpolicies (skills) through the intrinsic motivation learning . The intrinsic critic sends intrinsic rewards based on attaining subgoals. • Automatic Subgoal Discovery Automatic Subgoal Discovery in large-scale tasks with sparse delayed feedback within model-free HRL framework. • Learning hierarchical representation of model-free HRL in a unified approach Integration of temporal abstraction, intrinsic motivation learning and subgoal discovery in one unified algorithm. � 5

Meta-controller/Controller Framework Kulkarni et al. (2016). Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. NeurIPS. � 6

Unsupervised Subgoal Discovery Properties: • It is close to a rewarding state. • It represents a set of states, at least some of which tend to be along a state transition path to a rewarding state. Hypothesis: We can use unsupervised learning methods to find useful subgoals based on a memory of the agent’s experiences (rewards and visited states). Centroids of K-means clusters (e.g. rooms) Outliers as potential subgoals (e.g. key, box) Boundary of two clusters (e.g. doorway) � 7

Unsupervised Subgoal Discovery Anomaly Detection K-Means Clustering K = 6 K = 8 K = 4 � 8

Unified Model-Free HRL Agent Experience Memory D Subgoal Discovery G Meta-Controller g e t Controller s t r t a t r t +1 Environment s t +1 � 9

Results — 4-Rooms task 1 . 0 100 Success in Reaching Subgoals % State Space Coverage Rate 90 0 . 8 80 0 . 6 70 60 0 . 4 50 0 . 2 40 Intrinsic Motivation within Unified HRL K = 4 Intrinsic Motivation with Random Subgoals Selection 30 K = 6 Regular RL Random Walk 0 . 0 K = 8 20 0 200 400 600 800 1000 0 20000 40000 60000 80000 100000 Training Episodes Training Episodes 100 50 Success in Solving Task% 80 40 Episode Return Unified Model-Free HRL Method, K = 4 60 30 Unified Model-Free HRL Method, K = 4 Unified Model-Free HRL Method, K = 6 Unified Model-Free HRL Method, K = 6 Unified Model-Free HRL Method, K = 8 Unified Model-Free HRL Method, K = 8 20 40 Regular RL Regular RL 10 20 0 0 0 20000 40000 60000 80000 100000 − 10 0 20000 40000 60000 80000 100000 Training Episodes Training Episodes � 10

Montezuma’s Revenge Initial Subgoals Unsupervised Subgoal Discovery Random Walk Our Method Edge Detection Bounding Box 400 Unified Model-Free HRL Method Average return over 1000 episdes Unified Model-Free HRL Method 100 DeepMind DQN Algorithm (Mnih et. al., 2015) Success in reaching subgoals % DeepMind DQN Algorithm (Mnih et. al., 2015) 350 80 300 250 60 200 40 150 100 20 50 0 0 0 500000 1000000 1500000 2000000 2500000 0 500000 1000000 1500000 2000000 2500000 Training steps Training steps � 11

Neural Correlates of Unsupervised Subgoal Discovery • Temporal abstraction in HRL might map onto regions within the dorsolateral and orbital prefrontal cortex (PFC). • More recent discoveries reveal a potential role for medial temporal lobe structures, including the hippocampus, in planning and spatial navigation, utilizing a hierarchical representation of space. • There are evidences that hippocampus serve in model-based and model-free HRL with both flexibility and computational e ffi ciency. • Place cells in the dorsal hippocampus represent small regions while those in the ventral hippocampus represent larger regions. Strange et al. (2014). Functional organization of the hippocampal longitudinal axis. Nature Reviews Neuroscience, 15(10):655–669. Chalmers et al. (2016). Computational properties of the hippocampus increase the e ffi ciency of goal-directed foraging through hierarchical reinforcement learning. Frontiers in Computational Neuroscience, 10. Botvinick et al. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3). Botvinick, M. and Weinstein, A. (2014). Model-based hierarchical reinforcement learning and human action control. Philosophical Transactions of the Royal Society B: Biological Sciences, 369. � 12

Conclusions • We proposed and demonstrated a novel model-free method for subgoal discovery using unsupervised learning over a small memory of experiences (trajectories) of the agent. • When combined with an intrinsic motivation learning mechanism, this method learns subgoals and skills together, based on experiences in the environment. • Intrinsic motivation learning provides e ffi cient exploration scheme in tasks with sparse rewards that leads to successful subgoal discovery. • We o ff ered a unified approach for learning hierarchical representations in a model-free HRL framework. This method is scalable to larger scale problems. � 13

Publications • Jacob Rafati, David C. Noelle. (2019). Unsupervised Subgoal Discovery Method for Learning Hierarchical Representations. In 7th International Conference on Learning Representations, ICLR 2019 Workshop on "Structure & Priors in Reinforcement Learning", New Orleans, LA, USA. • Jacob Rafati, David C. Noelle. (2019). Unsupervised Methods For Subgoal Discovery During Intrinsic Motivation in Model-Free Hierarchical Reinforcement Learning. In 33rd AAAI Conference on Artificial Intelligence (AAAI-19). Workshop on Knowledge Extraction From Games. Honolulu, Hawaii. USA. • Jacob Rafati, and David C. Noelle (2019). Learning Representations in Model- Free Hierarchical Reinforcement Learning. In 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, Hawaii. • Jacob Rafati, and David C. Noelle (2019). Learning Representations in Model- Free Hierarchical Reinforcement Learning. arXiv e-print (arXiv:1810.10096). � 14

Questions and Feedbacks For paper, code, slides: http://rafati.net Email: yrafati@gmail.com � 15

Unsupervised Subgoal Discovery Method for Learning Hierarchical - PowerPoint PPT Presentation

Unsupervised Subgoal Discovery Method for Learning Hierarchical Representations Jacob Rafati Ph.D., Electrical Engineering and Computer Science (EECS) Computational Cognitive Neuroscience Laboratory (CCNL) http://rafati.net Co-authored with David

Unsupervised Methods For Subgoal Discovery During Intrinsic Motivation in Model-Free Hierarchical

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Using the SOLO Taxonomy to Understand Subgoal Labels Effect in CS1 Adrienne Decker, University

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Design and Pilot Testing of Subgoal Labeled Worked Examples for Five Core Concepts in CS1 Briana

Regression Idea: dont solve one subgoal by itself, but keep track of all subgoals that must

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Better Transfer Learning with Inferred Successor Maps Tamas Madarasz 1,2 , Tim Behrens 1,2

Care of the Patient with Posttraumatic Stress Disorder Thomas C. Neylan, M.D. Director, PTSD

Hippocampal-prefrontal plasticity seems to reverberate in a thalamic-prefrontal loop: what else

spatial cognition its all relative COGS 1 Oct. 13, 2009 Einstein and Picasso knew a

Percentage of Past Month Marijuana Users Aged 12 or Older: Annual Averages, 2002-2003 Percentage

Buddhas Brain: The Practical Neuroscience of Happiness, Love, and Wisdom PESI Seminars, 2013

Co-funded by the European Union Fenix User Forum meeting Parallel Session 1 Co-funded by the

Realistic modeling and interpretation of depth-EEG signals recorded during inter-ictal to ictal