Introduction to Reinforcement Learning A. LAZARIC ( SequeL Team - - PowerPoint PPT Presentation

introduction to reinforcement learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Reinforcement Learning A. LAZARIC ( SequeL Team - - PowerPoint PPT Presentation

Introduction to Reinforcement Learning A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course A Bit of History: From Psychology to Machine Learning A Bit of History From Psychology to Machine


slide-1
SLIDE 1

MVA-RL Course

Introduction to Reinforcement Learning

  • A. LAZARIC (SequeL Team @INRIA-Lille)

ENS Cachan - Master 2 MVA

SequeL – INRIA Lille

slide-2
SLIDE 2

A Bit of History: From Psychology to Machine Learning

A Bit of History From Psychology to Machine Learning

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 2/14

slide-3
SLIDE 3

A Bit of History: From Psychology to Machine Learning

The law of effect [Thorndike, 1911]

“Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond.”

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 3/14

slide-4
SLIDE 4

A Bit of History: From Psychology to Machine Learning

Experimental psychology

◮ Classical (human and) animal conditioning: “the magnitude

and timing of the conditioned response changes as a result of the contingency between the conditioned stimulus and the unconditioned stimulus” [Pavlov, 1927].

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 4/14

slide-5
SLIDE 5

A Bit of History: From Psychology to Machine Learning

Experimental psychology

◮ Classical (human and) animal conditioning: “the magnitude

and timing of the conditioned response changes as a result of the contingency between the conditioned stimulus and the unconditioned stimulus” [Pavlov, 1927].

◮ Operant conditioning (or instrumental conditioning): process

by which humans and animals learn to behave in such a way as to obtain rewards and avoid punishments [Skinner, 1938].

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 4/14

slide-6
SLIDE 6

A Bit of History: From Psychology to Machine Learning

Experimental psychology

◮ Classical (human and) animal conditioning: “the magnitude

and timing of the conditioned response changes as a result of the contingency between the conditioned stimulus and the unconditioned stimulus” [Pavlov, 1927].

◮ Operant conditioning (or instrumental conditioning): process

by which humans and animals learn to behave in such a way as to obtain rewards and avoid punishments [Skinner, 1938]. Remark: reinforcement denotes any form of conditioning, either positive (rewards) or negative (punishments).

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 4/14

slide-7
SLIDE 7

A Bit of History: From Psychology to Machine Learning

Computational neuroscience

◮ Hebbian learning: development of formal models of how the

synaptic weights between neurons are reinforced by simultaneous activation. “Cells that fire together, wire together.” [Hebb, 1961].

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 5/14

slide-8
SLIDE 8

A Bit of History: From Psychology to Machine Learning

Computational neuroscience

◮ Hebbian learning: development of formal models of how the

synaptic weights between neurons are reinforced by simultaneous activation. “Cells that fire together, wire together.” [Hebb, 1961].

◮ Emotions theory: model on how the emotional process can

bias the decision process [Damasio, 1994].

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 5/14

slide-9
SLIDE 9

A Bit of History: From Psychology to Machine Learning

Computational neuroscience

◮ Hebbian learning: development of formal models of how the

synaptic weights between neurons are reinforced by simultaneous activation. “Cells that fire together, wire together.” [Hebb, 1961].

◮ Emotions theory: model on how the emotional process can

bias the decision process [Damasio, 1994].

◮ Dopamine and basal ganglia model: direct link with motor

control and decision-making (e.g., [Doya, 1999]).

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 5/14

slide-10
SLIDE 10

A Bit of History: From Psychology to Machine Learning

Computational neuroscience

◮ Hebbian learning: development of formal models of how the

synaptic weights between neurons are reinforced by simultaneous activation. “Cells that fire together, wire together.” [Hebb, 1961].

◮ Emotions theory: model on how the emotional process can

bias the decision process [Damasio, 1994].

◮ Dopamine and basal ganglia model: direct link with motor

control and decision-making (e.g., [Doya, 1999]). Remark: reinforcement denotes the effect of dopamine (and surprise).

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 5/14

slide-11
SLIDE 11

A Bit of History: From Psychology to Machine Learning

Optimal control theory and dynamic programming

◮ Optimal control: formal framework to define optimization

methods to derive control policies in continuous time control problems [Pontryagin and Neustadt, 1962].

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 6/14

slide-12
SLIDE 12

A Bit of History: From Psychology to Machine Learning

Optimal control theory and dynamic programming

◮ Optimal control: formal framework to define optimization

methods to derive control policies in continuous time control problems [Pontryagin and Neustadt, 1962].

◮ Dynamic programming: set of methods used to solve control

problems by decomposing them into subproblems so that the

  • ptimal solution to the global problem is the conjunction of

the solutions to the subproblems [Bellman, 2003].

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 6/14

slide-13
SLIDE 13

A Bit of History: From Psychology to Machine Learning

Optimal control theory and dynamic programming

◮ Optimal control: formal framework to define optimization

methods to derive control policies in continuous time control problems [Pontryagin and Neustadt, 1962].

◮ Dynamic programming: set of methods used to solve control

problems by decomposing them into subproblems so that the

  • ptimal solution to the global problem is the conjunction of

the solutions to the subproblems [Bellman, 2003]. Remark: reinforcement denotes an objective function to maximize (or minimize).

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 6/14

slide-14
SLIDE 14

A Bit of History: From Psychology to Machine Learning

Reinforcement learning

Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal in an unknown uncertain

  • environment. The learner is not told which actions to

take, as in most forms of machine learning, but she must discover which actions yield the most reward by trying them (trial–and–error). In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards (delayed reward).

“An introduction to reinforcement learning”, Sutton and Barto (1998).

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 7/14

slide-15
SLIDE 15

A Bit of History: From Psychology to Machine Learning

Reinforcement learning

Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal in an unknown uncertain

  • environment. The learner is not told which actions to

take, as in most forms of machine learning, but she must discover which actions yield the most reward by trying them (trial–and–error). In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards (delayed reward).

“An introduction to reinforcement learning”, Sutton and Barto (1998).

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 8/14

slide-16
SLIDE 16

A Bit of History: From Psychology to Machine Learning

A Multi-disciplinary Field

Reinforcement Learning

Clustering

A.I.

Statistical Learning Approximation Theory Learning Theory Dynamic Programming Optimal Control

Neuroscience Psychology

Active Learning Categorization Neural Networks

Cognitives Sciences Applied Math Automatic Control Statistics

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 9/14

slide-17
SLIDE 17

A Bit of History: From Psychology to Machine Learning

A Machine Learning Paradigm

◮ Supervised learning: an expert (supervisor) provides examples

  • f the right strategy (e.g., classification of clinical images).

Supervision is expensive.

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 10/14

slide-18
SLIDE 18

A Bit of History: From Psychology to Machine Learning

A Machine Learning Paradigm

◮ Supervised learning: an expert (supervisor) provides examples

  • f the right strategy (e.g., classification of clinical images).

Supervision is expensive.

◮ Unsupervised learning: different objects are clustered together

by similarity (e.g., clustering of images on the basis of their content). No actual performance is optimized.

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 10/14

slide-19
SLIDE 19

A Bit of History: From Psychology to Machine Learning

A Machine Learning Paradigm

◮ Supervised learning: an expert (supervisor) provides examples

  • f the right strategy (e.g., classification of clinical images).

Supervision is expensive.

◮ Unsupervised learning: different objects are clustered together

by similarity (e.g., clustering of images on the basis of their content). No actual performance is optimized.

◮ Reinforcement learning: learning by direct interaction (e.g.,

autonomous robotics). Minimum level of supervision (reward) and maximization of long term performance.

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 10/14

slide-20
SLIDE 20

A Bit of History: From Psychology to Machine Learning

The Problems

How to model an RL problem

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 11/14

slide-21
SLIDE 21

A Bit of History: From Psychology to Machine Learning

The Problems

How to model an RL problem How to solve exactly an RL problem

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 11/14

slide-22
SLIDE 22

A Bit of History: From Psychology to Machine Learning

The Problems

How to model an RL problem How to solve exactly an RL problem How to solve incrementally an RL problem

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 11/14

slide-23
SLIDE 23

A Bit of History: From Psychology to Machine Learning

The Problems

How to model an RL problem How to solve exactly an RL problem How to solve incrementally an RL problem How to efficiently explore in an RL problem

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 11/14

slide-24
SLIDE 24

A Bit of History: From Psychology to Machine Learning

The Problems

How to model an RL problem How to solve exactly an RL problem How to solve incrementally an RL problem How to efficiently explore in an RL problem How to solve approximately an RL problem

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 11/14

slide-25
SLIDE 25

A Bit of History: From Psychology to Machine Learning

Bibliography I

Bellman, R. (2003). Dynamic Programming. Dover Books on Computer Science Series. Dover Publications, Incorporated. Damasio, A. R. (1994). Descartes’ Error: Emotion, Reason and the Human Brain. Grosset/Putnam. Doya, K. (1999). What are the computations of the cerebellum, the basal ganglia, and the cerebral cortex. Neural Networks, 12:961–974. Hebb, D. O. (1961). Distinctive features of learning in the higher animal. In Delafresnaye, J. F., editor, Brain Mechanisms and Learning. Oxford University Press. Pavlov, I. (1927). Conditioned reflexes. Oxford University Press.

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 12/14

slide-26
SLIDE 26

A Bit of History: From Psychology to Machine Learning

Bibliography II

Pontryagin, L. and Neustadt, L. (1962). The Mathematical Theory of Optimal Processes. Number v. 4 in Classics of Soviet Mathematics. Gordon and Breach Science Publishers. Skinner, B. F. (1938). The behavior of organisms. Appleton-Century-Crofts. Thorndike, E. (1911). Animal Intelligence: Experimental Studies. The animal behaviour series. Macmillan.

  • A. LAZARIC – Introduction to Reinforcement Learning

Sept 29th, 2015 - 13/14

slide-27
SLIDE 27

A Bit of History: From Psychology to Machine Learning

Reinforcement Learning

Alessandro Lazaric alessandro.lazaric@inria.fr sequel.lille.inria.fr