Markov Decision Process Assumption: agent gets to observe the state - PDF document

Feb 10, 2023 •46 likes •106 views

Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS Markov Decision Process Assumption: agent gets to observe the state [Drawing from Sutton and Barto, Reinforcement Learning: An Introduction, 1998] Page 1 Markov

Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS Markov Decision Process Assumption: agent gets to observe the state [Drawing from Sutton and Barto, Reinforcement Learning: An Introduction, 1998] Page 1 �
Markov Decision Process (S, A, T, R, H) Given S: set of states n A: set of actions n T: S x A x S x {0,1,…,H} à [0,1], T t (s,a,s’) = P( s t+1 = s’ | s t = s, a t =a) n R: S x A x S x {0, 1, …, H} à < R t (s,a,s’) = reward for ( s t+1 = s’, s t = s, a t =a) n H: horizon over which the agent will act n Goal: Find ¼ : S x {0, 1, …, H} à A that maximizes expected sum of rewards, i.e., n Examples MDP (S, A, T, R, H), goal: q Cleaning robot q Walking robot q Pole balancing q Games: tetris, backgammon q Server management q Shortest path problems q Model for animals, people Page 2 �
Canonical Example: Grid World § The agent lives in a grid § Walls block the agent’s path § The agent’s actions do not always go as planned: § 80% of the time, the action North takes the agent North (if there is no wall there) § 10% of the time, North takes the agent West; 10% East § If there is a wall in the direction the agent would have been taken, the agent stays put § Big rewards come at the end Grid Futures Deterministic Grid World Stochastic Grid World X X E N S E N S W ? W X X X X 6 Page 3 �
Solving MDPs n In an MDP, we want an optimal policy π *: S x 0:H → A n A policy π gives an action for each state for each time t=5=H t=4 t=3 t=2 t=1 t=0 n An optimal policy maximizes expected sum of rewards Contrast: In deterministic, want an optimal plan, or sequence of actions, n from start to a goal Value Iteration n Idea: n = the expected sum of rewards accumulated when starting from state s and acting optimally for a horizon of i steps n Algorithm: n Start with for all s. n For i=1, … , H Given V i *, calculate for all states s 2 S: n This is called a value update or Bellman update/back-up Page 4 �
Example Example: Value Iteration V 2 V 3 n Information propagates outward from terminal states and eventually all states have correct value estimates Page 5 �
Practice: Computing Actions n Which action should we chose from state s: n Given optimal values V*? n = greedy action with respect to V* n = action choice with one step lookahead w.r.t. V* 11 Today and forthcoming lectures Optimal control: provides general computational approach to tackle control n problems. n Dynamic programming / Value iteration n Discrete state spaces (DONE!) n Discretization of continuous state spaces n Linear systems n LQR n Extensions to nonlinear settings: n Local linearization n Differential dynamic programming n Optimal Control through Nonlinear Optimization n Open-loop n Model Predictive Control n Examples: Page 6 �

Recommend

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr Conor McArdle EE414 - Markov Chains 1/30 Markov Processes A Markov Process is a stochastic process X t with the Markov property : Pr ( X t n x n |

491 views • 30 slides

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models

470 views • 8 slides

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov Decision Processes Processes Processes Processes Marta Kwiatkowska Department of Computer Science, University of

524 views • 36 slides

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov chains Hidden Markov Models (HMMs) Algorithms: Viterbi, forward, backward, posterior decoding Profile HMMs Baum-Welch algorithm 9001

1.16k views • 87 slides

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

CSCE 471/871 Lecture 3: CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden Markov Models Stephen Scott Markov Chains Stephen Scott Hidden Markov Models Specifying an HMM sscott@cse.unl.edu 1

439 views • 26 slides

Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example

3/25/2017 Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example Markov Decision Processes MDP definition Optimal Policies Auto Racing Example CSE 415: Introduction to Artificial Intelligence

431 views • 11 slides

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov Chains o Definition o Stationary Property o Paths in Markov Chains o Classification of States o Steady States in MCs. Stochastic Processes 2 Markov

781 views • 28 slides

Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo

Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Background Intervals Markov Decision Processes Markov chains

528 views • 30 slides

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov assumption: X t depends on bounded subset of X 0: t 1 Temporal probability models First-order Markov process: P ( X t | X 0: t 1 ) = P ( X t | X t

359 views • 7 slides

POMDPs (Ch. 17.4-17.6) Markov Decision Process Recap of Markov Decision Processes (MDPs): Know:

POMDPs (Ch. 17.4-17.6) Markov Decision Process Recap of Markov Decision Processes (MDPs): Know: - Current state (s) - Rewards for states (R(s)) Uncertain: - Result of actions (a) POMDPs Today we look at Partially Observable MDPs: Know: -

451 views • 26 slides

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise continuous-time Markov chains Imprecise continuous-time Markov chains Continuous-time Markov chains Continuous-time Markov chains Markov assumption

807 views • 54 slides

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions based on these models For the

1.33k views • 96 slides

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

1 Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions based on these models For the

437 views • 32 slides

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Markov Processes and Applications Markov Processes and Applications Discrete-Time Markov Chains D k h Continuous Time Markov Chains Continuous-Time Markov Chains Applications Applications Queuing theory Performance

977 views • 41 slides

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM

980 views • 32 slides

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Verifying Continuous-Time Markov Chains Verifying Continuous-Time Markov Chains Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov Chains What are discrete-time Markov chains? 2 Reachability

567 views • 34 slides

AND PARAMETERS CSSE 120 Rose Hulman Institute of Technology Function Review Functions can

MORE FUNCTIONS AND PARAMETERS CSSE 120 Rose Hulman Institute of Technology Function Review Functions can take multiple parameters def distance (p1, p2): # p1, p2 are points xdist = abs(p1.getX()- p2.getX()) ydist = abs(p1.getY()-

274 views • 9 slides

The Value of Values Rich Hickey Datomic, Clojure I.T. Information Inform to convey

The Value of Values Rich Hickey Datomic, Clojure I.T. Information Inform to convey knowledge via facts give shape to (the mind) Information the facts What is a Fact? Place where specific information is stored

401 views • 37 slides

Writing Results for Writing Results for ANOVA ANOVA Rick Balkin Balkin, Ph.D., LPC , Ph.D.,

Writing Results for Writing Results for ANOVA ANOVA Rick Balkin Balkin, Ph.D., LPC , Ph.D., LPC Rick Department of Counseling Department of Counseling Texas A&M University-Commerce Texas A&M University-Commerce

557 views • 7 slides

Arguments for and against capitalism Philosophy of Economics University of Virginia Matthias

Arguments for and against capitalism Philosophy of Economics University of Virginia Matthias Brinkmann Capitalist Realism Think about the strangeness of today's situation. Thirty, forty years ago, we were still debating about what the

418 views • 17 slides

MATH 12002 - CALCULUS I 4.4: Average Value of a Function Professor Donald L. White Department

MATH 12002 - CALCULUS I 4.4: Average Value of a Function Professor Donald L. White Department of Mathematical Sciences Kent State University D.L. White (Kent State University) 1 / 7 Average Value of a Function Let y = f ( x ) be a

238 views • 7 slides

Creating a Values-based, Restorative-centered Workplace I I R P W O R L D C O N F E R E N C E

Creating a Values-based, Restorative-centered Workplace I I R P W O R L D C O N F E R E N C E S H A R O N M A S T , W O R K S H O P F A C I L I T A T O R O C T O B E R 2 0 1 3 OBJECTIVES Learn how successful organizations are

541 views • 36 slides

CS 105: TOPIC 12 DICTIONARIES Max Fowler (Computer Science)

CS 105: TOPIC 12 DICTIONARIES Max Fowler (Computer Science) https://pages.github-dev.cs.illinois.edu/cs-105/web/ JULY 26, 2020 Week 7 Video Series Topics Dictionaries A reminder on Key vs Value Advanced dictionary use

520 views • 22 slides

Spark RDD Operations Transformation and Actions 1 MapReduce Vs RDD Both MapReduce and RDD can

Spark RDD Operations Transformation and Actions 1 MapReduce Vs RDD Both MapReduce and RDD can be modeled using the Bulk Synchronous Parallel (BSP) model Communication Independent Local Independent Local Processor 1 Processing Processing

600 views • 24 slides