CS 573: Artificial Intelligence Markov Decision Processes Dan Weld - PowerPoint PPT Presentation

CS 573: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington Many slides by Dan Klein & Pieter Abbeel / UC Berkeley. (http://ai.berkeley.edu) and some by Mausam & Andrey Kolobov

Logistics § No class next Tues 2/7 § PS3 – due next wed § Reinforcement learning starting next Thurs

Solving MDPs § Value Iteration § Real-Time Dynamic programming § Policy Iteration § Heuristic Search Methods § Reinforcement Learning

Solving MDPs § Value Iteration (IHDR) § Real-Time Dynamic programming (SSP) § Policy Iteration (IHDR) § Heuristic Search Methods (SSP) § Reinforcement Learning (IHDR)

Policy Iteration 1. Policy Evaluation 2. Policy Improvement

Part 1 - Policy Evaluation

Fixed Policies Do the optimal action Do what p says to do s s p (s) a s, p (s) s, a s,a,s ’ s, p (s),s ’ s ’ s ’ § Expectimax trees max over all actions to compute the optimal values § If we fixed some policy p (s), then the tree would be simpler – only one action per state § … though the tree’s value would depend on which policy we fixed

Computing Utilities for a Fixed Policy § A new basic operation: compute the utility of a state s under s a fixed (generally non-optimal) policy p (s) § Define the utility of a state s, under a fixed policy p : s, p (s) V p (s) = expected total discounted rewards starting in s and following p s, p (s),s ’ s ’ § Recursive relation (variation of Bellman equation):

Example: Policy Evaluation Always Go Right Always Go Forward

Iterative Policy Evaluation Algorithm § How do we calculate the V’s for a fixed policy p ? s p (s) § Idea 1: Turn recursive Bellman equations into updates (like value iteration) s, p (s) s, p (s),s ’ s ’ § Efficiency: O(S 2 ) per iteration § Often converges in much smaller number of iterations compared to VI

� Linear Policy Evaluation Algorithm § Another way to calculate the V’s for a fixed policy p ? s p (s) § Idea 2: Without the maxes, the Bellman equations are just a linear system of equations s, p (s) 𝑊 " 𝑡 = % 𝑈 𝑡, 𝜌 𝑡 , 𝑡 ) [𝑆 𝑡, 𝜌 𝑡 , 𝑡 ) + 𝛿𝑊 " (𝑡′)] s, p (s),s ’ s ’ 4) § Solve with Matlab (or your favorite linear system solver) § S equations, S unknowns = O(S 3 ) and EXACT ! § In large spaces, still too expensive

Policy Iteration § Initialize π(s) to random actions § Repeat § Step 1: Policy evaluation: calculate utilities of π at each s using a nested loop § Step 2: Policy improvement: update policy using one-step look-ahead For each s, what’s the best action to execute, assuming agent then follows π? Let π’(s) = this best action. π = π’ § Until policy doesn’t change

Policy Iteration Details § Let i =0 § Initialize π i (s) to random actions § Repeat § Step 1: Policy evaluation: § Initialize k=0; Forall s, V 0π (s) = 0 § Repeat until V π converges § For each state s, § Let k += 1 § Step 2: Policy improvement: § For each state, s, § If π i == π i+1 then it’s optimal; return it. § Else let i += 1

Example Initialize π 0 to“always go right” Perform policy evaluation Perform policy improvement ? Iterate through states Has policy changed? ? Yes! i += 1 ?

Example π 1 says “always go up” Perform policy evaluation Perform policy improvement ? Iterate through states Has policy changed? ? No! We have the optimal policy ?

Policy Iteration Properties § Policy iteration finds the optimal policy, guaranteed (assuming exact evaluation)! § Often converges (much) faster

Modified Policy Iteration [van Nunen 76] § initialize π 0 as a random [proper] policy § Repeat Approximate Policy Evaluation: Compute V π n-1 by running only few iterations of iterative policy eval. Policy Improvement: Construct π n greedy wrt V π n-1 § Until convergence § return π n 20

Comparison § Both value iteration and policy iteration compute the same thing (all optimal values) § In value iteration: § Every iteration updates both the values and (implicitly) the policy § We don’t track the policy, but taking the max over actions implicitly recomputes it § What is the space being searched? § In policy iteration: § We do fewer iterations § Each one is slower (must update all V π and then choose new best π) § What is the space being searched? § Both are dynamic programs for planning in MDPs

Comparison II § Changing the search space. § Policy Iteration § Search over policies § Compute the resulting value § Value Iteration § Search over values § Compute the resulting policy 23

Solving MDPs § Value Iteration § Real-Time Dynamic programming § Policy Iteration § Heuristic Search Methods § Reinforcement Learning

CS 573: Artificial Intelligence Markov Decision Processes Dan Weld - PowerPoint PPT Presentation

CS 573: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington Many slides by Dan Klein & Pieter Abbeel / UC Berkeley. (http://ai.berkeley.edu) and some by Mausam & Andrey Kolobov Logistics No class next

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Lecture Overview What is Artificial Intelligence? Agents acting in an environment

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

8th November 2019 Artificial Intelligence Finance Institute NYU Courant Artificial Intelligence

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Introduction to Artificial Intelligence Introduktion til kunstig intelligens DM533 Artificial

Qt on Raspberry Pi Jeff Tranter Integrated Computer Solutions (ICS) Qt Developer Days 2012

Introducing HPC with a Raspberry Pi Cluster A practical use of and good excuse to build Raspberry

Raspberry Pi October 1, 2019 Laboratory for Perceptual Robotics College of Information and

ACADEMIC LYCEUM INTERNATIONAL HOUSE TASHKENT SUBJECT ENGLISH LANGUAGE COURSE 1 ST

Is SAFe Evil? Henrik Kniberg Lars Roost Hi! Lars Roost Henrik Kniberg Program Manager &

Linear Biases in AEGIS Keystream Brice Minaud ANSSI, France SAC August 15, 2014 Plan 1

The Raspberry Pi: A Platform for Replicable Performance Benchmarks? Holger Knoche and Holger

+ Shoot A Pi! with Eclipse Kura David d Woodar odard @ Eurotech Luca uca Dazi @ Eurotech

CS 573: Artificial Intelligence Markov Decision Processes Dan Weld - PowerPoint PPT Presentation

CS 573: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington Many slides by Dan Klein & Pieter Abbeel / UC Berkeley. (http://ai.berkeley.edu) and some by Mausam & Andrey Kolobov Logistics No class next

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Lecture Overview What is Artificial Intelligence? Agents acting in an environment

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

8th November 2019 Artificial Intelligence Finance Institute NYU Courant Artificial Intelligence

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Introduction to Artificial Intelligence Introduktion til kunstig intelligens DM533 Artificial

Qt on Raspberry Pi Jeff Tranter Integrated Computer Solutions (ICS) Qt Developer Days 2012

Introducing HPC with a Raspberry Pi Cluster A practical use of and good excuse to build Raspberry

Raspberry Pi October 1, 2019 Laboratory for Perceptual Robotics College of Information and

ACADEMIC LYCEUM INTERNATIONAL HOUSE TASHKENT SUBJECT ENGLISH LANGUAGE COURSE 1 ST

Is SAFe Evil? Henrik Kniberg Lars Roost Hi! Lars Roost Henrik Kniberg Program Manager &amp;

Linear Biases in AEGIS Keystream Brice Minaud ANSSI, France SAC August 15, 2014 Plan 1

The Raspberry Pi: A Platform for Replicable Performance Benchmarks? Holger Knoche and Holger

+ Shoot A Pi! with Eclipse Kura David d Woodar odard @ Eurotech Luca uca Dazi @ Eurotech

Is SAFe Evil? Henrik Kniberg Lars Roost Hi! Lars Roost Henrik Kniberg Program Manager &