INF3490 - Biologically inspired computing Reinforcement Learning - PowerPoint PPT Presentation

INF3490 - Biologically inspired computing Reinforcement Learning Weria Khaksar October 10, 2018

The Commuter (2018) Ghostbusters (1984) 10.10.2018 2

”It would be in vain for one Intelligent Being , to set a Rule to the Actions of another, if he had not in his Power, to reward the compliance with, and punish deviation from his Rule, by some Good and Evil, that is not the natural product and consequence of the action itself. ” (Locke, ”Essay”, 2.28.6) ”The use of punishments and rewards can at best be a part of the teaching process. Roughly speaking, if the teacher has no other means of communicating to the pupil, the amount of information which can reach him does not exceed the total number of rewards and punishments applied . ” (Turing (1950) ”Computing Machinery and Intelligence”) 10.10.2018 3

Applications of RL: From: ”Deconstructing Reinforcement Learning” ICML 2009 10.10.2018 4

Examples: Barrett WAM robot learning to flip pancakes by reinforcement learning Socially Aware Motion Planning with Deep Reinforcement Learning Hierarchical Reinforcement Learning for Robot Navigation Google DeepMind's Deep Q-learning playing Atari Breakout 10.10.2018 5

Last time: Supervised learning “CAT” Untrained Classifier No, it was a dog. Adjust classifier parameters 10.10.2018 6

Supervised learning: Weight updates inputs x 1 weights w 1  w 2 activation y x 2 a=  i=1 n w i x i q . . 1 if a  q output { y = 0 if a < q . w n x n 7 10.10.2018

Reinforcement Learning: Infrequent Feedback 50 chess moves later You lost Update chess- playing strategy 10.10.2018 8

How do we update our system now? We don’t know the error! 10.10.2018 9

10.10.2018 10

10.10.2018 11

10.10.2018 12

The reinforcement learning problem: State, Action and Reward 10.10.2018 13

The reinforcement learning problem: State, Action and Reward “Move piece from J1 to H1” 10.10.2018 16

The reinforcement learning problem: State, Action and Reward You took an opponent’s piece. Reward=1 10.10.2018 17

The reinforcement learning problem: State, Action and Reward Learning is guided by the reward • An infrequent numerical feedback indicating how well we are doing. • Problems: – The reward does not tell us what we should have done! – The reward may be delayed – does not always indicate when we made a mistake. 10.10.2018 20

The reinforcement learning problem: The reward Function • Corresponds to the fitness function of an evolutionary algorithm. • 𝑠 𝑢+1 is a function of 𝑡 𝑢 , 𝑏 𝑢 . • The reward is a numeric value. Can be negative (“punishment”) . • Can be given throughout the learning episode, or only in the end. • Goal: Maximize total reward. 10.10.2018 21

The reinforcement learning problem: Maximizing total reward ▪ Total reward: 𝑂−1 𝑆 = ෍ 𝑠 𝑢+1 𝑢=0 Future rewards may be uncertain and we might care more about rewards that come soon. Therefore, we discount future rewards: ∞ 𝛿 𝑢 . 𝑠 𝑆 = ෍ 𝑢+1 , 0 ≤ 𝛿 ≤ 1 𝑢=0 or ∞ 𝛿 𝑙 . 𝑠 𝑆 = ෍ 𝑢+𝑙+1 , 0 ≤ 𝛿 ≤ 1 𝑙=0 10.10.2018 22

The reinforcement learning problem: Maximizing total reward ▪ Future reward: 𝑆 = 𝑠 1 + 𝑠 2 + 𝑠 3 + ⋯ + 𝑠 𝑜 𝑆 𝑢 = 𝑠 𝑢 + 𝑠 𝑢+1 + 𝑠 𝑢+2 + ⋯ + 𝑠 𝑜 ▪ Discount future rewards (environment is stochastic) 𝑢+1 + 𝛿 2 𝑠 𝑢+2 + ⋯ + 𝛿 𝑜−𝑢 𝑠 𝑆 𝑢 = 𝑠 𝑢 + 𝛿𝑠 𝑜 = 𝑠 𝑢 + 𝛿(𝑠 𝑢+1 +𝛿(𝑠 𝑢+2 + ⋯ )) = 𝑠 𝑢 + 𝛿𝑆 𝑢+1 ▪ A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward. 10.10.2018 23

The reinforcement learning problem: Discounted rewards example 0.99 𝑢 0.95 𝑢 0.50 𝑢 0.05 𝑢 𝑢 1 0,990000 0,950000 0,500000 0,050000 2 0,980100 0,902500 0,250000 0,002500 4 0,960596 0,814506 0,062500 0,000006 8 0,922745 0,663420 0,003906 0,000000 16 0,851458 0,440127 0,000015 0,000000 32 0,724980 0,193711 0,000000 0,000000 64 0,525596 0,037524 0,000000 0,000000 10.10.2018 24

The reinforcement learning problem: Discounted rewards example 1.00 0.90 0,99 0,95 0,50 0,05 0.80 0.70 γ ^time 0.60 0.50 0.40 0.30 0.20 0.10 0.00 time 0 10 20 30 40 50 60 10.10.2018 25

The reinforcement learning problem: Action Selection • At each learning stage, the RL algorithm looks at the possible actions and calculates the expected average reward. 𝑅 𝑡,𝑢 𝑏 • Based on 𝑅 𝑡,𝑢 𝑏 , an action will be selected using: ➢ Greedy strategy: pure exploitation ➢ 𝜻 -Greedy strategy: exploitation with a little exploration 𝑓 (𝑅𝑡,𝑢 𝑏 /𝜐) ➢ Soft-Max strategy: 𝑄 𝑅 𝑡,𝑢 𝑏 = σ 𝑐 𝑓 (𝑅𝑡,𝑢 𝑐 /𝜐) 10.10.2018 26

The reinforcement learning problem: Policy (𝜌) and Value (𝑊) ▪ The set of actions we took define our policy (𝜌) . ▪ The expected rewards we get in return, defines our value (𝑊) . 10.10.2018 27

The reinforcement learning problem: Markov Decision Process • If we only need to know the current state, the problem has the Markov property . • No Markov Property: 𝑢 = 𝑠 ′ , 𝑡 𝑢+1 = 𝑡 ′ |𝑡 𝑢 , 𝑏 𝑢 , 𝑠 𝑄 𝑠 (𝑠 𝑢−1 , … , 𝑠 1 , 𝑡 1 , 𝑏 1 , 𝑡 0 , 𝑏 0 ) • Markov Property: 𝑢 = 𝑠 ′ , 𝑡 𝑢+1 = 𝑡 ′ |𝑡 𝑢 , 𝑏 𝑢 ) 𝑄 𝑠 (𝑠 10.10.2018 28

The reinforcement learning problem: Markov Decision Process A simple example of a Markov Decision Process 10.10.2018 29

The reinforcement learning problem: Value • The expected future reward is known as the value. • Two ways to compute the value: – The value of a state, 𝑊(𝑡) , averaged over all possible actions in that state. (state-value function) ∞ 𝛿 𝑗 . 𝑠 𝑢+𝑗+1 | 𝑡 𝑢 = 𝑡 𝑊 𝑡 = 𝐹 𝑠 𝑢 𝑡 𝑢 = 𝑡 = 𝐹 ෍ 𝑗=0 – The value of a state/action pair 𝑅(𝑡, 𝑏) . (action-value function) ∞ 𝛿 𝑗 . 𝑠 𝑢+𝑗+1 | 𝑡 𝑢 = 𝑡, 𝑏 𝑢 = 𝑏 𝑅 𝑡, 𝑏 = 𝐹 𝑠 𝑢 𝑡 𝑢 = 𝑡, 𝑏 𝑢 = 𝑏 = 𝐹 ෍ 𝑗=0 • 𝑹 and 𝑾 are initially unknown, and learned iteratively as we gain experience. 10.10.2018 30

The reinforcement learning problem: The Q-Learning Algorithm 10.10.2018 31

The reinforcement learning problem: The Q-Learning Algorithm 10.10.2018 32

The reinforcement learning problem: The SARSA Algorithm 10.10.2018 33

Q-learning example • Credits: Arjun Chandra home -1 10.10.2018 34

10.10.2018 35

10.10.2018 36

10.10.2018 37

10.10.2018 38

10.10.2018 39

10.10.2018 40

10.10.2018 41

10.10.2018 42

10.10.2018 43

10.10.2018 44

10.10.2018 45

10.10.2018 46

10.10.2018 47

10.10.2018 48

10.10.2018 49

10.10.2018 50

10.10.2018 51

10.10.2018 52

10.10.2018 53

10.10.2018 54

10.10.2018 55

10.10.2018 56

10.10.2018 57

10.10.2018 58

10.10.2018 59

Action selection 60 10.10.2018

On-policy vs off-policy learning • Reward structure: Each move: -1. Move to cliff: -100. • Policy: 90% chance of choosing best action (exploit). 10% chance of choosing random action (explore). Start The Cliff Goal 10.10.2018 61

On-policy vs off-policy learning: Q-learning • Always assumes optimal action -> does not visit cliff often while learning. Therefore, does not learn that cliff is dangerous. • Resulting path is efficient, but risky. Start The Cliff Goal 10.10.2018 62

On-policy vs off-policy learning: SARSA • During learning, we more frequently end up outside the cliff (due to the 10% chance of exploring in our policy). • That info propagates to all states, generating a safer plan. Start The Cliff Goal 10.10.2018 63

Which plan is better? • SARSA (on-policy): Start The Cliff Goal • Q-learning (off-policy): Start The Cliff Goal 10.10.2018 64

Using evolution and neural networks in reinforcement learning 10.10.2018 65 MarI/O - Machine Learning for Video Games

10.10.2018 66

INF3490 - Biologically inspired computing Reinforcement Learning - PowerPoint PPT Presentation

INF3490 - Biologically inspired computing Reinforcement Learning Weria Khaksar October 10, 2018 The Commuter (2018) Ghostbusters (1984) 10.10.2018 2 It would be in vain for one Intelligent Being , to set a Rule to the Actions of another,

INF3490/INF4490: Biologically Inspired Computing Autumn 2017 Lecturer: Kai Olav

Lecture 1 INF 3490: Biologically inspired computing - Autumn 11 Lecturer: Jim Trresen

biologically-inspired computing luis rocha 2015 lecture 2 biologically Inspired computing

biologically-inspired computing luis rocha 2015 lecture 12 biologically Inspired computing

biologically-inspired computing luis rocha 2015 lecture 6 biologically Inspired computing

INF3490 - Biologically inspired computing Swarm Intelligence, Fuzzy Logic Weria Khaksar

28.08.2018 Why Draw Inspiration from Evolution? INF3490/4490 Biologically inspired computing

INF3490 - Biologically inspired computing Swarm Intelligence, Fuzzy Logic Weria Khaksar

INF3490/INF4490 Biologically inspired computing Future Perspectives on Artificial Intelligence

INF3490 - Biologically inspired computing Unsupervised Learning Weria Khaksar October 24, 2018

Biologically I nspired Hardware System What is Bio-Inspired System? Why do we need

Summary & Questions Weria and Kai I NF3490/ 4490 Exam Format: Written/Digital (see small

Kai Olav Ellefsen Key points from last time (1/3) Selection pressure Parent selection:

Evolutionary Algorithms - Introduction and representation Kai Olav Ellefsen Why Draw Inspiration

Kai Olav Ellefsen Key points from last time (1/3) Selection pressure Parent selection:

02.11.2018 When and Where will a Breakthrough Come? Technology breakthroughs often happens

VTSA10 Summer School, Luxembourg, September 2010 Introduction Probabilistic model checking

Pay by Design Teacher Performance Pay Design and the Distribu6on of Student Achievement Sean an

An Online Learning-Based Task Offloading Framework for 5G Small Cell Networks ICPP2020 1

Future of Enzo Michael L. Norman James Bordner LCA/SDSC/UCSD SDSC Resources Data to

Facilitation in participatory governance for sustainable development and mindful activism 1

HOW TO HOLD SPACE ADAM HOCKE & SARAH SCHARF PRE-CLASS RITUALS HELPFUL TIPS PREPARE OUR

Introduction to CL Session 1: 7/08/2011 What is computational linguistics? Processing natural

Interview Review: an empirical study on detecting ambiguities in requirements elicitation

INF3490 - Biologically inspired computing Reinforcement Learning - PowerPoint PPT Presentation

INF3490 - Biologically inspired computing Reinforcement Learning Weria Khaksar October 10, 2018 The Commuter (2018) Ghostbusters (1984) 10.10.2018 2 It would be in vain for one Intelligent Being , to set a Rule to the Actions of another,

INF3490/INF4490: Biologically Inspired Computing Autumn 2017 Lecturer: Kai Olav

Lecture 1 INF 3490: Biologically inspired computing - Autumn 11 Lecturer: Jim Trresen

biologically-inspired computing luis rocha 2015 lecture 2 biologically Inspired computing

biologically-inspired computing luis rocha 2015 lecture 12 biologically Inspired computing

biologically-inspired computing luis rocha 2015 lecture 6 biologically Inspired computing

INF3490 - Biologically inspired computing Swarm Intelligence, Fuzzy Logic Weria Khaksar

28.08.2018 Why Draw Inspiration from Evolution? INF3490/4490 Biologically inspired computing

INF3490 - Biologically inspired computing Swarm Intelligence, Fuzzy Logic Weria Khaksar

INF3490/INF4490 Biologically inspired computing Future Perspectives on Artificial Intelligence

INF3490 - Biologically inspired computing Unsupervised Learning Weria Khaksar October 24, 2018

Biologically I nspired Hardware System What is Bio-Inspired System? Why do we need

Summary &amp; Questions Weria and Kai I NF3490/ 4490 Exam Format: Written/Digital (see small

Kai Olav Ellefsen Key points from last time (1/3) Selection pressure Parent selection:

Evolutionary Algorithms - Introduction and representation Kai Olav Ellefsen Why Draw Inspiration

Kai Olav Ellefsen Key points from last time (1/3) Selection pressure Parent selection:

02.11.2018 When and Where will a Breakthrough Come? Technology breakthroughs often happens

VTSA10 Summer School, Luxembourg, September 2010 Introduction Probabilistic model checking

Pay by Design Teacher Performance Pay Design and the Distribu6on of Student Achievement Sean an

An Online Learning-Based Task Offloading Framework for 5G Small Cell Networks ICPP2020 1

Future of Enzo Michael L. Norman James Bordner LCA/SDSC/UCSD SDSC Resources Data to

Facilitation in participatory governance for sustainable development and mindful activism 1

HOW TO HOLD SPACE ADAM HOCKE &amp; SARAH SCHARF PRE-CLASS RITUALS HELPFUL TIPS PREPARE OUR

Introduction to CL Session 1: 7/08/2011 What is computational linguistics? Processing natural

Interview Review: an empirical study on detecting ambiguities in requirements elicitation

Summary & Questions Weria and Kai I NF3490/ 4490 Exam Format: Written/Digital (see small

HOW TO HOLD SPACE ADAM HOCKE & SARAH SCHARF PRE-CLASS RITUALS HELPFUL TIPS PREPARE OUR