RL LECTURE 3 SIMPLE LEARNING TAXONOMY LEARNING FROM INTERACTION - PDF document

✁ ✁ RL LECTURE 3 SIMPLE LEARNING TAXONOMY LEARNING FROM INTERACTION � Supervised Learning – with environment – “Teacher” provides required response to inputs. De- sired behaviour known. “Costly” – to achieve some goal � Unsupervised Learning – Learner looks for patterns in inputs. No “right” an- swer � Baby playing. No teacher. Sensorimotor connection to � Reinforcement Learning environment. – Learner not told which actions to take, but gets re- – Cause – effect ward/punishment from environment and adjusts/learns – Action – consequences the action to pick next time. – How to achieve goals � Learning to drive car, hold conversation, etc. – Environment’s response affects our subsequent actions – We find out the effects of our actions later 1 2 REINFORCEMENT LEARNING EXPLORATION/EXPLOITATION TRADE- OFF Learning a mapping from situations to actions in order to maximise a scalar reward/reinforcement signal High rewards from trying previously-well-rewarded actions – EXPLOITATION BUT HOW? Which actions are best? Must try ones not tried before – EXPLORATION Try out actions to learn which produces highest reward – MUST DO BOTH trial-and-error search Actions affect immediate reward next situation all subsequent rewards – delayed effects, delayed reward Especially if task stochastic, try each action many times per situations to get reliable estimate of reward. Situations, Actions, Goals Gradually prefer those actions that prove to lead to high reward. Sense situations, choose actions TO achieve goals (Doesn’t arise in supervised learning) Environment uncertain 3 4

P ✠ ✠ ✠ ✣ ✠ � ✁ ✟ ✟ ✁ ✣ ✢ ✠ ✁ ✡ ✁ ✠ ✄ � ✡ ◆ ✽ ✡ ✠ P ✠ ❇ ✡ ✠ ✡ ✠ ✠ ❃ ✡ ✠ � ✁ EXAMPLES FRAMEWORK State/ � Animal learning to find food and avoid predators Situation s t � Robot trying to learn how to dock with charging station AGENT � Backgammon player learning to beat opponent Reward r t � Football team trying to find strategies to score goals Action � Infant learning to feed itself with spoon at � Cornet player learning to produce beautiful sounds rt+1 � Temperature controller keeping FH warm while minimis- ENVIRONMENT ing fuel consumption st+1 Agent in situation �✂✁ chooses action ✄☎✁ One tick later in situation ✆✞✝ gets reward ✆✞✝ POLICY �☞☛✌✄☎✍✏✎✒✑✓✟✕✔✖✄ ✎✗✄✙✘ ✎✒�✛✚ Given the situation at time ✜ is � the policy gives the probability the agent’s action will be ✄ . Reinforcement learning Get/find/learn the policy 5 6 EXAMPLE POLICIES JARGON Find the coffee machine Policy Decision on what action to do �☞☛✰✄✼✍ 3 4 in that state Reward function Defines goal, and good and bad 2 experience for learner Value function Predicts reward. start 1 Estimate of total future reward Model of the environment Maps states and actions onto states ✡✤✣ ✡✤✣ ❚ . If in state ❚❊❯ turn left or ✝ we take ✍✥✢ ☛✧✦✩★☎✪✤✫✭✬ ✮✌✯ ✦✰✍✱✢ ❘❲❱ ✡✳✲ ✡✤✣ straight on ✍✥✢ ☛✩✴✧✦✩✪✩✵✷✶ ✸✖✹✺✦✱✻✂✫✼✍✱✢✾✽ action ✝ predicts �✂❳ (and sometimes ✡❀✿ ✡✤✣ turn right reward ❳ ). ✍✥✢ ☛✧✦✩★☎✪✤✫❁✪✤✶ ✸✖✹✺✦✌✍✥✢❂✽ ✡❄✣ go through door ✍✥✢ ☛✩✸✺✻❅✦✧✹☎✪✧✻✂★☎✸✖✹✭❆☎✻✛✻✖✪✧✍✱✢✾✽ Not all agents use models. etc. Bandit problem 10 arms, Q table gives the Q value for each arm Reward function and environmental model fixed external to agent. ❇ -greedy policy: Policy, value function, estimate of model adjusted during ✣▼▲ ✁ ❖◆ ◗❑P �☞☛✰✄✕❈❉✘ ✄✕❈☎✎❊✵✖✪❄✸✱❋✭✵❍●✕■❑❏ ✄✼✍✤✍✏✎ learning. else ◗❑P �☞☛✰✄✼✍✏✎ ☛✛✘ ❘❙✘✛✎ 7 8

✄ � � � ✡ ✢ � VALUE FUNCTIONS GENERAL RL ALGORITHM 1. Initialise learner’s internal state (e.g. Q values, other statistics) � How desirable is it to be in a certain state? 2. Do for a long time � Observe current world state What is its value ? � Choose action ✄ using the policy Value � Execute action �✖✍✥❱ � Let ✟ be immediate reward, ❈ new world state Value is (an estimate of) the expected future reward from � Update internal state based on ❈ , previous in- �☞☛✌✄✞☛✌✟✖☛ that state ternal state 3. Output a policy based on, e.g. learnt Q values and follow it � Value vs. reward Long-term vs. immediate We need: Want actions that lead to states of high value, not neces- sarily high immediate reward � Decision on what constitutes an internal state � Decision on what constitutes a world state � Sensing of a world state � Learn policy via learning value – when we know the values � Action-choice mechanism (policy) based usually on of states we can choose to go to states of high value � an evaluation (of current world and internal state) func- cf. GAGP discover policy directly tion � A means of executing the action � Genotypical vs. phenotypical learning? (GAGP vs. RL) � A way of updating the internal state 9 10 Environment (simulator?) provides EXAMPLE - 0 AND X � Transitions between world states, i.e. model � A reward function See Sutton and Barto Section 1.4 and Figure 1.1. But of course the learner has to discover what these are while exploring the world. 11 12

✡ ✎ ✆ ✍ � ✡ � ✡ ✡ ✡ � ✣ � � ✡ � ✡ � ▲ ✡ � � � EXAMPLE Construct a player to play against an imperfect opponent For each board state, set up �✖✍ – estimate of probability of winning from that state XXX �✖✍✏✎ OOO �✖✍✏✎❊✽ ✂ initially Rest �✖✍✏✎❊✽ ✁� Play many games Move selection � mostly pick move leading to state with highest � sometimes explore Value adjustment � back-up value of states after non-exploratory moves to states preceding moves � e.g. ✁ ✝✆✟✞ � ☎✄ ✷✍ � ☎✄ ❍✍ ✆✞✝ � ☎✄ � ✠✄ ✍ ☛✡ Reduce over time converges to probabilities of winning – optimal policy 13

RL LECTURE 3 SIMPLE LEARNING TAXONOMY LEARNING FROM INTERACTION - PDF document

RL LECTURE 3 SIMPLE LEARNING TAXONOMY LEARNING FROM INTERACTION Supervised Learning with environment Teacher provides required response to inputs. De- sired behaviour known. Costly to achieve some goal

NCTracks Taxonomy Presentation Agenda Taxonomy Code Information Using Taxonomy Codes in

Introduction to Plant Taxonomy Introduction to Plant Taxonomy (See P. 1169) (See P. 1169)

Taxonomy Jrg Cassens Data and Process Visualization SoSe 2017 SoSe 2017 Jrg Cassens

How are living Taxonomy things classified? the classification of living things Taxonomy

BLOOMS TAXONOMY At the end of this workshop you will be able to: Explain what a Taxonomy

Flynns Taxonomy Prof. Mike Flynns famous taxonomy of parallel computers 1 Flynns

AmI Taxonomy AmI Taxonomy Network Characteristics of the technologies allowing devices to

SOLO Taxonomy Moving towards understanding What is SOLO Taxonomy? The Structured Overview of

EU Taxonomy Technical Expert Group on Sustainable Finance The taxonomy is a tool, an extremely

A Taxonomy of Variability in Web Service Flows A Taxonomy of Variability in Web Service Flows

Taxonomy for App Makers: Movie Monsters & Medical Insurance UX London 30 May 2014 Presented

The Greatness of God Examining Gods CV Do you know the word taxonomy? Do you know the

Intermediate Blooms Taxonomy Mattox Beckman University of Illinois at Urbana-Champaign

Towards a Taxonomy of Approaches Towards a Taxonomy of Approaches for for Mining of Source Code

A Taxonomy of Web Search by Andrei Broder Bahaeddin Eravci, Emre Yilmaz 2012 Bahaeddin Eravci,

in Barcelona NOC taxonomy Stefan Listrm NORDUnet NOC taxonomy topics Nordic infrastructure

Ch 4 SAQs (Pop Quiz) 1. How would you go about getting the 'what'? 2. Why are Post-its so

Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW,

Instance-level recognition: Local invariant features Cordelia Schmid INRIA, Grenoble Overview

Instance-level recognition Cordelia Schmid INRIA, Grenoble Instance-level recognition Search

RL LECTURE 3 LEARNING FROM INTERACTION with environment to achieve some goal Baby

LETS GET YOUR DOCUMENTATION RIGHT ALL ABOUT ME DANIELE PROCIDA Divio (cloud hosting for

L1 DOCUMENTATION TOOLS TF-NOC, Zurich, 06/2011. L1 documentation tool - outline

5.2 MAS for managing the personal information space: ILTIS Lorenz (2001) ILTIS: Information

Sambuz

Useful Links

Newsletter

Mail Us

RL LECTURE 3 SIMPLE LEARNING TAXONOMY LEARNING FROM INTERACTION - PDF document

RL LECTURE 3 SIMPLE LEARNING TAXONOMY LEARNING FROM INTERACTION Supervised Learning with environment Teacher provides required response to inputs. De- sired behaviour known. Costly to achieve some goal

NCTracks Taxonomy Presentation Agenda Taxonomy Code Information Using Taxonomy Codes in

Introduction to Plant Taxonomy Introduction to Plant Taxonomy (See P. 1169) (See P. 1169)

Taxonomy Jrg Cassens Data and Process Visualization SoSe 2017 SoSe 2017 Jrg Cassens

How are living Taxonomy things classified? the classification of living things Taxonomy

BLOOMS TAXONOMY At the end of this workshop you will be able to: Explain what a Taxonomy

Flynns Taxonomy Prof. Mike Flynns famous taxonomy of parallel computers 1 Flynns

AmI Taxonomy AmI Taxonomy Network Characteristics of the technologies allowing devices to

SOLO Taxonomy Moving towards understanding What is SOLO Taxonomy? The Structured Overview of

EU Taxonomy Technical Expert Group on Sustainable Finance The taxonomy is a tool, an extremely

A Taxonomy of Variability in Web Service Flows A Taxonomy of Variability in Web Service Flows

Taxonomy for App Makers: Movie Monsters &amp; Medical Insurance UX London 30 May 2014 Presented

The Greatness of God Examining Gods CV Do you know the word taxonomy? Do you know the

Intermediate Blooms Taxonomy Mattox Beckman University of Illinois at Urbana-Champaign

Towards a Taxonomy of Approaches Towards a Taxonomy of Approaches for for Mining of Source Code

A Taxonomy of Web Search by Andrei Broder Bahaeddin Eravci, Emre Yilmaz 2012 Bahaeddin Eravci,

in Barcelona NOC taxonomy Stefan Listrm NORDUnet NOC taxonomy topics Nordic infrastructure

Ch 4 SAQs (Pop Quiz) 1. How would you go about getting the 'what'? 2. Why are Post-its so

Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW,

Instance-level recognition: Local invariant features Cordelia Schmid INRIA, Grenoble Overview

Instance-level recognition Cordelia Schmid INRIA, Grenoble Instance-level recognition Search

RL LECTURE 3 LEARNING FROM INTERACTION with environment to achieve some goal Baby

LETS GET YOUR DOCUMENTATION RIGHT ALL ABOUT ME DANIELE PROCIDA Divio (cloud hosting for

L1 DOCUMENTATION TOOLS TF-NOC, Zurich, 06/2011. L1 documentation tool - outline

5.2 MAS for managing the personal information space: ILTIS Lorenz (2001) ILTIS: Information

Sambuz

Useful Links

Newsletter

Mail Us

Taxonomy for App Makers: Movie Monsters & Medical Insurance UX London 30 May 2014 Presented