Reinforcement Learning in Continuous Environments 64.425 Integrated - PowerPoint PPT Presentation

MIN Faculty Department of Informatics University of Hamburg Continuous Reinforcement Learning Reinforcement Learning in Continuous Environments 64.425 Integrated Seminar: Intelligent Robotics Oke Martensen University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Department of Informatics Technical Aspects of Multimodal Systems 30. November 2015 Oke Martensen 1

MIN Faculty Department of Informatics University of Hamburg Continuous Reinforcement Learning Outline 1. Reinforcement Learning in a Nutshell Basics of RL Standard Approaches Motivation: The Continuity Problem 2. RL in Continuous Environments Continuous Actor Critic Learning Automaton (CACLA) CACLA in Action 3. RL in Robotics Conclusion Oke Martensen 2

MIN Faculty Department of Informatics University of Hamburg Reinforcement Learning in a Nutshell - Basics of RL Continuous Reinforcement Learning Classical Reinforcement Learning Agent := algorithm that learns to interact with the environment. Environment := the world (including actor) Goal: optimize agent’s behaviour wrt. a reward signal. Sutton and Barto (1998) Problem as Markov Decision Process (MDP): (S, A, R, T) Oke Martensen 3

MIN Faculty Department of Informatics University of Hamburg Reinforcement Learning in a Nutshell - Basics of RL Continuous Reinforcement Learning The General Procedure Policy π := action selection strategy ◮ exploration and exploitation trade-off ◮ e.g. ǫ -greedy, soft-max, ... Different ways to model the environment: ◮ value functions V ( s ), Q ( s , a ): cumulative discounted reward expected after reaching state s (and after performing action a ) Oke Martensen 4

MIN Faculty Department of Informatics University of Hamburg Reinforcement Learning in a Nutshell - Standard Approaches Continuous Reinforcement Learning Standard Algorithms Sutton and Barto (1998) Temporal-difference (TD) learning V ( s t ) ← V ( s t ) + α [ r t +1 + γ V ( s t +1 ) − V ( s t )] Numerous algorithms are based on TD learning: ◮ SARSA ◮ Q-Learning ◮ actor-critic methods (details on next slide) Oke Martensen 5

MIN Faculty Department of Informatics University of Hamburg Reinforcement Learning in a Nutshell - Standard Approaches Continuous Reinforcement Learning Actor-Critic Models A TD method with separate memory structure to explicitly represent the policy independent of the value function. Actor: policy structure Critic: estimated value function The critic’s output, TD error, drives all the learning. ◮ computationally cheap action selection ◮ biologically more plausible Sutton and Barto (1998) Oke Martensen 6

MIN Faculty Department of Informatics University of Hamburg Reinforcement Learning in a Nutshell - Standard Approaches Continuous Reinforcement Learning Why is RL so Cool? ◮ it’s how humans do ◮ sophisticated, hard-to-engineer behaviour ◮ can cope with uncertain, noisy, non-observable stuff ◮ no need for labels ◮ online learning “The relationship between [robotics and reinforcement learning] has sufficient promise to be likened to that between physics and mathematics” Kober and Peters (2012) Oke Martensen 7

MIN Faculty Department of Informatics University of Hamburg Reinforcement Learning in a Nutshell - Motivation: The Continuity Problem Continuous Reinforcement Learning The Continuity Problem So far: discrete action and state spaces. Problem: world ain’t discrete. Example: moving on a grid world Continuous state spaces have already been investigated a lot. Continuous action spaces, however, remain a problem. Oke Martensen 8

MIN Faculty Department of Informatics University of Hamburg RL in Continuous Environments Continuous Reinforcement Learning Tackling the Continuity Problem 1. Discretize spaces, then use regular RL methods ◮ e.g. tile coding: group space into binary features receptive fields ◮ But: How fine-grained? Where to put focus? Bad generalization .. 2. Use parameter vector � θ t of a function approximator for updates ◮ often neural networks are used and the weights as parameters Oke Martensen 9

MIN Faculty Department of Informatics University of Hamburg RL in Continuous Environments - Continuous Actor Critic Learning Automaton (CACLA) Continuous Reinforcement Learning CACLA — Continuous Actor Critic Learning Automaton Van Hasselt and Wiering (2007) ◮ learns undiscretized continuous actions in continuous states ◮ model-free ◮ computes updates and actions very fast ◮ easy to implement (cf. pseudocode next slide) Oke Martensen 10

MIN Faculty Department of Informatics University of Hamburg RL in Continuous Environments - Continuous Actor Critic Learning Automaton (CACLA) Continuous Reinforcement Learning CACLA Algorithm � θ : parameter vector � ψ : feature vector Van Hasselt (2011) Oke Martensen 11

MIN Faculty Department of Informatics University of Hamburg RL in Continuous Environments - CACLA in Action Continuous Reinforcement Learning A bio-inspired model of predictive sensorimotor integration Zhong et al. (2012) Latencies in sensory processing make it hard to do real time robotics; noisy, inaccurate readings may cause failure. 1. Elman network for sensory prediction/filtering 2. CACLA for continuous action generation Elman (1990) Zhong et al. (2012) Oke Martensen 12

MIN Faculty Department of Informatics University of Hamburg RL in Continuous Environments - CACLA in Action Continuous Reinforcement Learning Robot Docking & Grasping Behaviour Zhong et al. (2012) Zhong et al. (2012) https://www.youtube.com/watch?v=vF7u18h5IoY ◮ more natural and smooth behaviour ◮ flexible wrt. changes in the action space Oke Martensen 13

MIN Faculty Department of Informatics University of Hamburg RL in Robotics - Conclusion Continuous Reinforcement Learning Conclusion Challenges: ◮ problems with high-dimensional/continuous states and actions ◮ only partially observable, noisy environment ◮ uncertainty (e.g. Which state am I actually in? ) ◮ hardware/physical system: ◮ tedious, time-intensive, costly data generation ◮ reproducibility Solution approaches: ◮ partially observable Markov decision processes (POMDPs) ◮ use of filters: raw observations + uncertainty in estimates Oke Martensen 14

MIN Faculty Department of Informatics University of Hamburg Continuous Reinforcement Learning Thanks for your attention! Questions? Oke Martensen 15

MIN Faculty Department of Informatics University of Hamburg Continuous Reinforcement Learning References Elman, J. L. (1990). Finding structure in time. Cognitive science , 14(2):179–211. Kober, J. and Peters, J. (2012). Reinforcement Learning in Robotics: A Survey. In Wiering, M. and van Otterlo, M., editors, Reinforcement Learning , volume 12, pages 579–610. Springer Berlin Heidelberg, Berlin, Heidelberg. Sutton, R. S. and Barto, A. G. (1998). Reinforcement learning: An introduction , volume 1. MIT press Cambridge. Van Hasselt, H. and Wiering, M. (2007). Reinforcement learning in continuous action spaces. In Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on , pages 272–279. IEEE. Van Hasselt, H. P. (2011). Insights in reinforcement learning . Hado Van Hasselt. Zhong, J., Weber, C., and Wermter, S. (2012). A predictive network architecture for a robust and smooth robot docking behavior. Paladyn, Journal of Behavioral Robotics , 3(4):172–180. Oke Martensen 16

Reinforcement Learning in Continuous Environments 64.425 Integrated - PowerPoint PPT Presentation

MIN Faculty Department of Informatics University of Hamburg Continuous Reinforcement Learning Reinforcement Learning in Continuous Environments 64.425 Integrated Seminar: Intelligent Robotics Oke Martensen University of Hamburg Faculty of

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Reinforcement Learning in Configurable Continuous Environments Alberto Maria Metelli, Emanuele

Reinforcement Learning for Continuous State and Action Spaces Gradient Methods 1 MACHINE LEARNING

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Action Robust Reinforcement Learning and Applications in Continuous Control Chen Tessler *,

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

recommended school supply list to purchase supplies. Please D DO O N OT T label

www.crblprogramasoya.org Bogot Laureles Distrito 4281 DRIVERS FOR SUCCESS OF SOYCOW

Information Retrieval TDT4215 Web intelligence g Based on slides by: Hinrich Schtze and

Sinnar Taluka Overview and preparation for field trip Pooja Prasad (Ph D scholar) 22/8/2017 1

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Presentation title: Node based world engine development inside the Unity3D game engine using

Lecture 12 Display type depends on variable types: 1 measurement variable (students

Urgent Treatment Centre: Patient Representative Induction Abi Ademoyero Interim Programme Lead

Sambuz

Useful Links

Newsletter

Mail Us