Planning and Learning Robert Platt Northeastern University (some - PowerPoint PPT Presentation

Planning and Learning Robert Platt Northeastern University (some slides/material borrowed from Rich Sutton)

Planning What do you think of when you think about “planning”? – often, the word “planning” often means a specific class of algorithm – here, we use “planning” to mean any computational process that uses a model to create or improve a policy

For example: an unusual way to do planning – why does this satisfy our expanded definition?

Planning v Learning

Planning v Learning Often called “model-based RL”

Models in RL Model: anything the agent can use to predict how the environment will respond to its actions Two types of models: 1. Distribution model: description of all possibilities and their probabilities 2. Sample model: a.k.a. a simulation model – given a s,a pair, the sample model returns next state & reward – a sample model is often much easier to get than the distribution model

Models in RL This is how we defined “model” Model: anything the agent can use to predict how the environment will at the beginning of this course respond to its actions Two types of models: 1. Distribution model: description of all possibilities and their probabilities 2. Sample model: a.k.a. a simulation model – given a s,a pair, the sample model returns next state & reward – a sample model is often much easier to get than the distribution model In this section, we’re going to use this type of model a lot

Planning An unusual way to do planning:

Planning An unusual way to do planning: Here, we’re using a sample model, but we don’t learn the model

Dyna-Q Essentially, perform these two steps continuously: 1. learn model 2. plan using current model estimate

Dyna-Q This “model” could be very simple – it could just be a memory of Essentially, perform these two steps continuously: previously experienced transitions 1. learn model – make predictions based on memory 2. plan using current model estimate of most recent previous outcomes in this state/action.

Dyna-Q on a Simple Maze

Why does Dyna-Q do so well? Policies found using q-learning vs dyna-q halway through second episode – dyna-q w/ n=50 – optimal policy after three episodes!

Think-pair-share

What happens if model changes or is mis-estimated? (SB, Example 8.2) Environment changes here

Think-pair-share (SB, Example 8.2) Questions: – why does dyna-q stop getting reward? – why does it start again?

What is dyna-Q+?

Think-pair-share

Dyna-Q

Prioritized Sweeping Unfocused replay from model

Prioritized Sweeping Unfocused replay from model – can we do better?

Prioritized Sweeping Instead of replaying all of these transitions on each iteration, just replay the important ones… – Which states or state-action pairs should be generated during planning? – Work backward from states who’s value has just changed – Maintain a priority queue of state-action pairs whose values would change a lot if backed up, prioritized by the size of the change – When a new backup occurs, insert predecessors according to their priorities

Prioritized Sweeping TD error what’s this part doing?

Prioritized Sweeping: Performance Both use n=5 backups per environmental interaction

Trajectory sampling Idea: dyna-Q while sampling experiences from a trajectory rather than uniformly, i.e. from the on-policy distribution – is it better to sample uniformly or from the on-policy distribution?

Planning and Learning Robert Platt Northeastern University (some - PowerPoint PPT Presentation

Planning and Learning Robert Platt Northeastern University (some slides/material borrowed from Rich Sutton) Planning What do you think of when you think about planning? often, the word planning often means a specific class of

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

Family Planning Only Programs Current Family Planning Only Programs Family Planning Only

Set 9: Planning Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2018 Outline:

Planning I: Planning I: The Planning Process The Planning Process AU INSY 560, Singapore 1997,

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Chapter 11 Planning Planning examples PDDL (Planning Domain Definition Language) Planning

Division of Planning: Keepers of the Stuff http://transportation.ky.gov/planning/ Planning

STRATEGIC PLANNING STRATEGIC PLANNING STRATEGIC PLANNING STRATEGIC PLANNING AIKEN COUNTY PUBLIC

Planning Act Implementation: Transforming Planning in Practice Planning Act Implementation o

Planning Enforcement Adrian Duffield Head of Planning 1 The planning process Enforcement

Neighbourhood Planning Jo Rumble Neighbourhood Planning Communities Officer Dartmoor National

EE562: Robot Motion Planning Slides on Discrete Planning Abubakr Muhammad Discrete Planning

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Planning 8 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 8 1 8 Planning 8.1 The planning

Surveys, interviews, and diary studies Michelle Mazurek (some slides adapted from Blase Ur,

Simulation - Lectures Yee Whye Teh Part A Simulation TT 2013 Part A Simulation. TT 2013. Yee

Class Structure Last time: Batch RL This Time: MCTS Next time: Human in the Loop RL Lecture 14:

Zoom Logistics When listening, please set your video off and mute your side Please feel free to

Lecture 4/Chapter 4 How to Get a Good Sample Sampling Activity Study Designs; Focus on

Lock Sampling or: Yes, Panels are Different - Now What? 2019 DC-AAPOR/WSS Review-Preview Summer

EMPIRICAL USER-STUDIES human-computer interaction CSE 440 WINTER 2015 University of FEB 19 -

SALT Software, LLC 1 Types of Language Samples- Exposition SALT 2012 Reference Databases

Sambuz

Useful Links

Newsletter

Mail Us

Planning and Learning Robert Platt Northeastern University (some - PowerPoint PPT Presentation

Planning and Learning Robert Platt Northeastern University (some slides/material borrowed from Rich Sutton) Planning What do you think of when you think about planning? often, the word planning often means a specific class of

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&amp;N ICS 271 Fall 2016 Outline: Planning Planning

Family Planning Only Programs Current Family Planning Only Programs Family Planning Only

Set 9: Planning Classical Planning Systems Chapter 10 R&amp;N ICS 271 Fall 2018 Outline:

Planning I: Planning I: The Planning Process The Planning Process AU INSY 560, Singapore 1997,

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Chapter 11 Planning Planning examples PDDL (Planning Domain Definition Language) Planning

Division of Planning: Keepers of the Stuff http://transportation.ky.gov/planning/ Planning

STRATEGIC PLANNING STRATEGIC PLANNING STRATEGIC PLANNING STRATEGIC PLANNING AIKEN COUNTY PUBLIC

Planning Act Implementation: Transforming Planning in Practice Planning Act Implementation o

Planning Enforcement Adrian Duffield Head of Planning 1 The planning process Enforcement

Neighbourhood Planning Jo Rumble Neighbourhood Planning Communities Officer Dartmoor National

EE562: Robot Motion Planning Slides on Discrete Planning Abubakr Muhammad Discrete Planning

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Planning 8 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 8 1 8 Planning 8.1 The planning

Surveys, interviews, and diary studies Michelle Mazurek (some slides adapted from Blase Ur,

Simulation - Lectures Yee Whye Teh Part A Simulation TT 2013 Part A Simulation. TT 2013. Yee

Class Structure Last time: Batch RL This Time: MCTS Next time: Human in the Loop RL Lecture 14:

Zoom Logistics When listening, please set your video off and mute your side Please feel free to

Lecture 4/Chapter 4 How to Get a Good Sample Sampling Activity Study Designs; Focus on

Lock Sampling or: Yes, Panels are Different - Now What? 2019 DC-AAPOR/WSS Review-Preview Summer

EMPIRICAL USER-STUDIES human-computer interaction CSE 440 WINTER 2015 University of FEB 19 -

SALT Software, LLC 1 Types of Language Samples- Exposition SALT 2012 Reference Databases

Sambuz

Useful Links

Newsletter

Mail Us

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

Set 9: Planning Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2018 Outline: