Breakout Group Reinforcement Learning F ABIAN R UEHLE (U NIVERSITY - PowerPoint PPT Presentation

Breakout Group Reinforcement Learning F ABIAN R UEHLE (U NIVERSITY OF O XFORD ) String_Data 2017, Boston 12/01/2017

Outline ‣ Theoretical introduction ( 30 minutes ) ‣ Discussion of code ( 30 minutes ) • Solve version of grid world with SARSA ‣ Discussion of RL and its applications to String Theory ( 30 minutes )

How to teach a machine ‣ Supervised Learning (SL): • provide a set of training tuples [(in 0 , out 0 ) , (in 1 , out 1 ) , . . . , (in n , out n )] • after training, machine predicts out i from in i ‣ Unsupervised Learning (UL): • only provide set training input set [in 0 , in 1 , . . . , in n ] • give task to machine (e.g. cluster input) without telling it how to do this exactly • After training, the machine will perform self-learned action on in i ‣ Reinforcement Learning (RL): • in between SL and UL • Machine acts autonomously, but actions are reinforced / punished

Theoretical introduction

Reinforcement Learning - Vocabulary ‣ Basic textbooks/literature [Barton, Sutton ’98 ‘17] ‣ The “thing that learns” is called agent or worker ‣ The “thing that is explored” is called environment ‣ The “elements of the environment” are called states or observations ‣ The “things that take you from one state to another” are called actions ‣ The “thing that tells you how to select the next action” is called policy ‣ Actions are executed sequentially in a sequence called (time) steps ‣ The “reinforcement” the agent experiences is called reward ‣ The “accumulated reward” is called return ‣ In RL, an agent performs actions in an environment with the goal to maximize its long-term return

Reinforcement Learning - Details ‣ We focus on discrete state and action spaces ‣ State space S = { states in environment } ‣ Action space • total: A = { actions to transition between states } • for : s ∈ S A ( s ) = { possible actions in state s } ‣ Policy : Select next action for given state π : S 7! A π π ( s ) = a , a ∈ A ( s ) ‣ Reward : Reward for taking action in state   R ( s, a ) ∈ R a s R : S ⇥ A 7! R

Reinforcement Learning - Details ‣ Return: The accumulated reward from current step   t ∞ X γ k r t + k +1 , γ ∈ (0 , 1] G t = k =0 ‣ State value function : Expected return for with v π ( s ) s v π ( s ) = E [ G t | s = s t ] policy : π ‣ Action value function : Expected return for q ( s, a ) performing action in state with policy : s π a q π ( s, a ) = E [ G t | s = s t , a = a t ] ‣ Prediction problem: Given , predict or v π ( s ) q π ( s, a ) π ‣ Control problem: Find optimal policy that π ∗ q π ( s, a ) maximizes or v π ( s )

Reinforcement Learning - Details ‣ Commonly used policies: • greedy: Choose the action that maximizes the action π 0 ( s ) = argmax q ( s, a ) value function: • - greedy: Explore different possibilities ε ⇢ Choose greedy in (1 − " ) cases ⇡ 0 ( s ) = Choose random action in ✏ cases ‣ We take -greedy policy improvement ε ‣ On-policy: Update policy you are following (e.g. always - ε greedy) ‣ Off-policy: Use different policy for choosing next action a t +1 and updating q ( s t , a t )

Reinforcement Learning - SARSA ‣ Solving the control problem: ∆ v ( s t ) = α [ G t − v ( s t )] v ( s t ) • : Learning rate ( means no update to ) α = 0 α G t = r + γ v ( s t +1 ) • One step approximation: ‣ Similar for action value function: ∆ q ( s t , a t ) = α [ G t − q ( s t , a t )] = α [ r + γ q ( s t +1 , a t +1 ) − q ( s t , a t ))] ( s t , a t , r, s t +1 , a t +1 ) • Update depends on tuple • is currently best known action for state a t +1 s t +1 ‣ Note: SARSA is on-policy

Reinforcement Learning - Q-Learning ‣ Very similar to SARSA ‣ Difference in update: • SARSA: ∆ q ( s t , a t ) = α [ r + γ q ( s t +1 , a t +1 ) − q ( s t , a t )] • Q_Learning: ∆ q ( s t , a t ) = α [ r + γ max a 0 q ( s t +1 , a 0 ) − q ( s t , a t )] ‣ Note: This means that Q-Learning is off-policy ‣ SARSA is found to perform better ‣ Q-Learning is proven to converge to solution ‣ Combine with (deep NNs): Deep Q-Learning

Example - Gridworld Worker (“Explorer”) Pitfall Exit Wall

Example - Gridworld ‣ We will look at a version of grid world: • Gridworld is a grid-like maze with walls, pitfalls, and an exit • Each state is a point on the grid of the maze A = { up, down, left, right } • The actions are • Goal: Find the exit (strongly rewarded) • Each step is punished mildly (solve maze quickly) • Pitfalls should be avoided (strongly punished) • Running into a wall does not change the state

Gridworld vs String Landscape ‣ Walls = Boundaries of landscape (negative number of branes) ‣ Empty square = Consistent point in the landscape which does not correspond to our Universe ‣ Pitfalls = Mathematically / Physically inconsistent states (anomalies, tadpoles, …) ‣ Exit = Standard Model of Particle Physics

Coding

Discussion

Breakout Group Reinforcement Learning F ABIAN R UEHLE (U NIVERSITY - PowerPoint PPT Presentation

Breakout Group Reinforcement Learning F ABIAN R UEHLE (U NIVERSITY OF O XFORD ) String_Data 2017, Boston 12/01/2017 Outline Theoretical introduction ( 30 minutes ) Discussion of code ( 30 minutes ) Solve version of grid world with

Table of Contents September 12 Opening Plenary Session 7 Breakout Sessions 1 7-8 Breakout

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Community Governance Review Breakout session at CALC Conference (16.2.19) Breakout sessions

To enable breakout rooms, click the Breakout Rooms icon Youll be prompted for the number

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

Endless Network Programming An Update from eBPF Land Quentin Monnet @qeole Outline Q.

Welcome! INFOMOV Lecture 2 Low Leve l 2 Previously in INFOMOV Consistent

Building Value In A Digital Age Paul Carton Head of Digital Vodafone Ireland Irish Consumer

[S PARK ] Shrideep Pallickara Computer Science Colorado State University CS555: Distributed

ESTABLISHMENT Unit 6 Risk Assessment And Risk Response Lecture Objectives Todays objectives

Invited talk for 8th International Symposium on Formal Aspects of Component Software, FACS11,

SAFETY CLOUD H&S Software Transforming UKs Industry Adam Francis- Product Manager &

Breakout Group Reinforcement Learning F ABIAN R UEHLE (U NIVERSITY - PowerPoint PPT Presentation

Breakout Group Reinforcement Learning F ABIAN R UEHLE (U NIVERSITY OF O XFORD ) String_Data 2017, Boston 12/01/2017 Outline Theoretical introduction ( 30 minutes ) Discussion of code ( 30 minutes ) Solve version of grid world with

Table of Contents September 12 Opening Plenary Session 7 Breakout Sessions 1 7-8 Breakout

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Community Governance Review Breakout session at CALC Conference (16.2.19) Breakout sessions

To enable breakout rooms, click the Breakout Rooms icon Youll be prompted for the number

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

Endless Network Programming An Update from eBPF Land Quentin Monnet @qeole Outline Q.

Welcome! INFOMOV Lecture 2 Low Leve l 2 Previously in INFOMOV Consistent

Building Value In A Digital Age Paul Carton Head of Digital Vodafone Ireland Irish Consumer

[S PARK ] Shrideep Pallickara Computer Science Colorado State University CS555: Distributed

ESTABLISHMENT Unit 6 Risk Assessment And Risk Response Lecture Objectives Todays objectives

Invited talk for 8th International Symposium on Formal Aspects of Component Software, FACS11,

SAFETY CLOUD H&amp;S Software Transforming UKs Industry Adam Francis- Product Manager &amp;

SAFETY CLOUD H&S Software Transforming UKs Industry Adam Francis- Product Manager &