the reinforcement learning problem
play

The Reinforcement Learning Problem Robert Platt Northeastern - PowerPoint PPT Presentation

The Reinforcement Learning Problem Robert Platt Northeastern University Agent Action Agent World Observation Reward On a single time step, agent does the following: 1. observe some information 2. select an action to execute 3. take note


  1. The Reinforcement Learning Problem Robert Platt Northeastern University

  2. Agent Action Agent World Observation Reward On a single time step, agent does the following: 1. observe some information 2. select an action to execute 3. take note of any reward Goal of agent: select actions that maximize sum of expected future rewards.

  3. Example: rat in a maze Move left/right/up/down Agent World Observe position in maze Reward = +1 if get cheese

  4. Example: robot makes coffee Move robot joints Agent World Observe camera image Reward = +1 if coffee in cup

  5. Example: agent plays pong Joystick command Agent World Observe screen pixels Reward = game score

  6. Reinforcement Learning Action Agent World Observation Reward Goal of agent: select actions that maximize sum of expected future rewards. – agent computes a rule for selecting actions to execute

  7. Reinforcement Learning Joystick command Agent World Observe screen pixels Reward = game score Goal of agent: select actions that maximize sum of expected future rewards. – agent computes a rule for selecting actions to execute

  8. Model Free Reinforcement Learning Joystick command Agent World Observe screen pixels Reward = game score Agent learns a strategy for selecting actions based on experience – no prior model of system dynamics, i.e. no prior knowledge of “how the world works” – no prior model of reward, i.e. no prior knowledge of what actions lead to reward

  9. Distinction Relative to Planning Joystick command Agent World Observe screen pixels Reward = game score Agent learns a strategy for selecting actions based on prior model – agent is given a model of system dynamics in advance – agent is “told” which states/actions are rewarding or not

  10. RL vs Planning When to use RL: When to use planning: – hard to model systems – when the system is easily modeled – stochastic systems – when the system is deteriminstic

  11. RL vs Planning When to use RL: When to use planning: – hard to model systems – when the system is easily modeled – stochastic systems – when the system is deteriminstic Ultimately, RL and planning are closely related...

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend