The Reinforcement Learning Problem Robert Platt Northeastern - - PowerPoint PPT Presentation

the reinforcement learning problem
SMART_READER_LITE
LIVE PREVIEW

The Reinforcement Learning Problem Robert Platt Northeastern - - PowerPoint PPT Presentation

The Reinforcement Learning Problem Robert Platt Northeastern University Agent Action Agent World Observation Reward On a single time step, agent does the following: 1. observe some information 2. select an action to execute 3. take note


slide-1
SLIDE 1

The Reinforcement Learning Problem

Robert Platt Northeastern University

slide-2
SLIDE 2

Agent

Action Observation Reward

On a single time step, agent does the following:

  • 1. observe some information
  • 2. select an action to execute
  • 3. take note of any reward

Goal of agent: select actions that maximize sum of expected future rewards.

Agent World

slide-3
SLIDE 3

Example: rat in a maze

Agent World

Move left/right/up/down Observe position in maze Reward = +1 if get cheese

slide-4
SLIDE 4

Example: robot makes coffee

Agent World

Move robot joints Observe camera image Reward = +1 if coffee in cup

slide-5
SLIDE 5

Example: agent plays pong

Agent World

Joystick command Observe screen pixels Reward = game score

slide-6
SLIDE 6

Reinforcement Learning

Action Observation Reward Agent World

Goal of agent: select actions that maximize sum of expected future rewards. – agent computes a rule for selecting actions to execute

slide-7
SLIDE 7

Reinforcement Learning

Agent World

Goal of agent: select actions that maximize sum of expected future rewards. – agent computes a rule for selecting actions to execute

Joystick command Observe screen pixels Reward = game score

slide-8
SLIDE 8

Model Free Reinforcement Learning

Agent World

Agent learns a strategy for selecting actions based on experience – no prior model of system dynamics, i.e. no prior knowledge

  • f “how the world works”

– no prior model of reward, i.e. no prior knowledge of what actions lead to reward

Joystick command Observe screen pixels Reward = game score

slide-9
SLIDE 9

Distinction Relative to Planning

Agent World

Agent learns a strategy for selecting actions based on prior model – agent is given a model of system dynamics in advance – agent is “told” which states/actions are rewarding or not

Joystick command Observe screen pixels Reward = game score

slide-10
SLIDE 10

RL vs Planning

When to use planning: – when the system is easily modeled – when the system is deteriminstic When to use RL: – hard to model systems – stochastic systems

slide-11
SLIDE 11

RL vs Planning

When to use planning: – when the system is easily modeled – when the system is deteriminstic When to use RL: – hard to model systems – stochastic systems

Ultimately, RL and planning are closely related...