[PPT] - A Quick Look at the Reinforcement Learning course A. LAZARIC ( PowerPoint Presentation

SLIDE 1

EC-RL Course

A Quick Look at the “Reinforcement Learning” course

A. LAZARIC (SequeL Team @INRIA-Lille)

Ecole Centrale - Option DAD

SequeL – INRIA Lille

SLIDE 2

Why

A. LAZARIC – Introduction to Reinforcement Learning

2/18

SLIDE 3

Why: Important Problems

A. LAZARIC – Introduction to Reinforcement Learning

3/18

SLIDE 4

Why: Important Problems

◮ Autonomous robotics

A. LAZARIC – Introduction to Reinforcement Learning

4/18

SLIDE 5

Why: Important Problems

◮ Autonomous robotics ◮ Elder care

A. LAZARIC – Introduction to Reinforcement Learning

4/18

SLIDE 6

Why: Important Problems

◮ Autonomous robotics ◮ Elder care ◮ Exploration of

unknown/dangerous environments

A. LAZARIC – Introduction to Reinforcement Learning

4/18

SLIDE 7

Why: Important Problems

◮ Autonomous robotics ◮ Elder care ◮ Exploration of

unknown/dangerous environments

◮ Robotics for entertainment

A. LAZARIC – Introduction to Reinforcement Learning

4/18

SLIDE 8

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications

A. LAZARIC – Introduction to Reinforcement Learning

5/18

SLIDE 9

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Trading execution algorithms

A. LAZARIC – Introduction to Reinforcement Learning

5/18

SLIDE 10

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Trading execution algorithms ◮ Portfolio management

A. LAZARIC – Introduction to Reinforcement Learning

5/18

SLIDE 11

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Trading execution algorithms ◮ Portfolio management ◮ Option pricing

A. LAZARIC – Introduction to Reinforcement Learning

5/18

SLIDE 12

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management

A. LAZARIC – Introduction to Reinforcement Learning

6/18

SLIDE 13

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Energy grid integration

A. LAZARIC – Introduction to Reinforcement Learning

6/18

SLIDE 14

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Energy grid integration ◮ Maintenance scheduling

A. LAZARIC – Introduction to Reinforcement Learning

6/18

SLIDE 15

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Energy grid integration ◮ Maintenance scheduling ◮ Energy market regulation

A. LAZARIC – Introduction to Reinforcement Learning

6/18

SLIDE 16

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Energy grid integration ◮ Maintenance scheduling ◮ Energy market regulation ◮ Energy production

management

A. LAZARIC – Introduction to Reinforcement Learning

6/18

SLIDE 17

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems

A. LAZARIC – Introduction to Reinforcement Learning

7/18

SLIDE 18

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Web advertising

A. LAZARIC – Introduction to Reinforcement Learning

7/18

SLIDE 19

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Web advertising ◮ Product recommendation

A. LAZARIC – Introduction to Reinforcement Learning

7/18

SLIDE 20

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Web advertising ◮ Product recommendation ◮ Date matching

A. LAZARIC – Introduction to Reinforcement Learning

7/18

SLIDE 21

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Social applications

A. LAZARIC – Introduction to Reinforcement Learning

8/18

SLIDE 22

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Social applications ◮ Bike sharing optimization

A. LAZARIC – Introduction to Reinforcement Learning

8/18

SLIDE 23

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Social applications ◮ Bike sharing optimization ◮ Election campaign

A. LAZARIC – Introduction to Reinforcement Learning

8/18

SLIDE 24

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Social applications ◮ Bike sharing optimization ◮ Election campaign ◮ ER service optimization

A. LAZARIC – Introduction to Reinforcement Learning

8/18

SLIDE 25

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Social applications ◮ Bike sharing optimization ◮ Election campaign ◮ ER service optimization ◮ Resource distribution

ptimization
A. LAZARIC – Introduction to Reinforcement Learning

8/18

SLIDE 26

Why: Important Problems

◮ Autonomous robotics ◮ Financial applications ◮ Energy management ◮ Recommender systems ◮ Social applications ◮ And many more...

A. LAZARIC – Introduction to Reinforcement Learning

9/18

SLIDE 27

What

A. LAZARIC – Introduction to Reinforcement Learning

10/18

SLIDE 28

What: Decision-Making under Uncertainty

Agent Environment

state / actuation action / perception

A. LAZARIC – Introduction to Reinforcement Learning

11/18

SLIDE 29

How: Reinforcement Learning

Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them (trial–and–error). In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards (delayed reward).

“An introduction to reinforcement learning”, Sutton and Barto (1998).

A. LAZARIC – Introduction to Reinforcement Learning

12/18

SLIDE 30

How: the Course

A. LAZARIC – Introduction to Reinforcement Learning

13/18

SLIDE 31

How: the Course

A. LAZARIC – Introduction to Reinforcement Learning

13/18

SLIDE 32

How: the Course

A. LAZARIC – Introduction to Reinforcement Learning

13/18

SLIDE 33

How: the Course

A. LAZARIC – Introduction to Reinforcement Learning

13/18

SLIDE 34

How: the Course

A. LAZARIC – Introduction to Reinforcement Learning

13/18

SLIDE 35

How: the Course

A. LAZARIC – Introduction to Reinforcement Learning

13/18

SLIDE 36

How: the Course

Agent Environment

state / actuation action / perception

A. LAZARIC – Introduction to Reinforcement Learning

13/18

SLIDE 37

How: the Course

Agent Environment

state / actuation action / perception

Formal and rigorous approach to the RL’s way to decision-making under uncertainty

A. LAZARIC – Introduction to Reinforcement Learning

13/18

SLIDE 38

What: the Highlights of the Course

How do we formalize the agent-environment interaction?

A. LAZARIC – Introduction to Reinforcement Learning

14/18

SLIDE 39

What: the Highlights of the Course

How do we formalize the agent-environment interaction? How do we solve an MDP?

A. LAZARIC – Introduction to Reinforcement Learning

14/18

SLIDE 40

What: the Highlights of the Course

How do we formalize the agent-environment interaction? How do we solve an MDP? How do we solve an MDP “online”?

A. LAZARIC – Introduction to Reinforcement Learning

14/18

SLIDE 41

What: the Highlights of the Course

How do we formalize the agent-environment interaction? How do we solve an MDP? How do we solve an MDP “online”? How do we effectively trade-off exploration and exploitation?

A. LAZARIC – Introduction to Reinforcement Learning

14/18

SLIDE 42

What: the Highlights of the Course

How do we formalize the agent-environment interaction? How do we solve an MDP? How do we solve an MDP “online”? How do we effectively trade-off exploration and exploitation? How do we solve a “huge” MDP?

A. LAZARIC – Introduction to Reinforcement Learning

14/18

SLIDE 43

Who

Lectures and Practical Sessions Alessandro LAZARIC SequeL Team INRIA-Lille Nord Europe

alessandro.lazaric@inria.fr researchers.lille.inria.fr/˜lazaric/

A. LAZARIC – Introduction to Reinforcement Learning

15/18

SLIDE 44

When/What/Where

See planning on the website.

A. LAZARIC – Introduction to Reinforcement Learning

16/18

SLIDE 45

Evaluation

◮ Three homework (dynamic programming, multi-armed bandit,

approximate dynamic programming): 2.5 points each.

◮ Review of literature with oral presentation: 12.5 points.

A. LAZARIC – Introduction to Reinforcement Learning

17/18

SLIDE 46

Reinforcement Learning

Alessandro Lazaric alessandro.lazaric@inria.fr sequel.lille.inria.fr