Processes (MDP) Prof. Kuan-Ting Lai 2020/3/20 Markov Decision - PowerPoint PPT Presentation

Sep 15, 2022 •246 likes •593 views

Finite Markov Decision Processes (MDP) Prof. Kuan-Ting Lai 2020/3/20 Markov Decision Process (MDP) https://en.wikipedia.org/wiki/Markov_decision_process Markov Property Current state can represent all information from the past states

Finite Markov Decision Processes (MDP) Prof. Kuan-Ting Lai 2020/3/20
Markov Decision Process (MDP) https://en.wikipedia.org/wiki/Markov_decision_process
Markov Property • Current state can represent all information from the past states • i.e. memoryless • Let bygones be bygones
Markov Process • A Markov process is a memoryless random process, i.e. a sequence of random states S 1 , S 2 , … with Markov property • Transition probability P(s, s’) is the probability of moving from state s to state s’
Student Markov Chain
Student Markov Chain Episodes
Example: Student Markov Chain Transition Matrix
Adding Reward to Markov Process • A Markov reward process is a Markov chain with values.
Student MRP
Discounted Future Return G t • The discount 𝛿 ∈ [0,1] is the present value of future rewards − 𝛿 close to 0 leads to “short - sighed” evaluation − 𝛿 close to 1 leads to “far -sighed ” evaluation
Why add discount factor 𝛿 ? • Uncertainty about the future • Avoids infinite returns in cyclic Markov processes • Animal/human behaviour shows preference for immediate reward
Value Function • The value function v(s) estimates the long-term value of state s
Student MRP Returns 1 • 𝛿 = 2
State-Value Function for Student MRP (1)
State-Value Function for Student MRP (2)
State-Value Function for Student MRP (3)
Bellman Equation for MRPs • The value function can be decomposed into two parts: − immediate reward R t+1 − discounted value of next state 𝛿 v(S t+1 )
Backup Diagram for Bellman Equation
Calculating Student MDP using Bellman Equation
Markov Decision Process • A Markov decision process (MDP) is a Markov reward process with decisions.
Student MDP with Actions
Policy • MDP Policies only depend on the current state, i.e. stationary
Policies
Value Function
State-Value Function for Student MDP
Backup Diagram for 𝑤 𝜌 and 𝑟 𝜌
Bellman Expectation Equation for Student MDP
Optimal Value Function
Optimal Value Function for Student MDP
Optimal Action-Value Function for Student MDP
Reference • Davlid Silver, Lecture 2: Markov Decision Processes, Reinforcement Learning (https://www.youtube.com/watch?v=lfHX2hHRMVQ&list=PLqYmG7hTraZDM- OYHWgPebj2MfCFzFObQ&index=2) • Chapter 3, Richard S. Sutton and Andrew G. Barto , “Reinforcement Learning: An Introduction,” 2 nd edition, Nov. 2018

Recommend

Logic Programming and MDPs for Planning Alborz Geramifard Winter 2009 Index Introduction Logic

Logic Programming and MDPs for Planning Alborz Geramifard Winter 2009 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Why do we care about

1.04k views • 103 slides

Whats Next? 29 1 The MDP Journey Next Spring Leadership MDP Badge Fall Symposium

Whats Next? 29 1 The MDP Journey Next Spring Leadership MDP Badge Fall Symposium Development Symposium (Feb-Dec 2020) (January 2020) (March 2020) How do I manage How do I manage How do I manage myself? (with) others? (with) my

222 views • 3 slides

CS 730/730W/830: Intro AI MDP Wrap-Up ADP Q -Learning 1 handout: slides project proposals are

CS 730/730W/830: Intro AI MDP Wrap-Up ADP Q -Learning 1 handout: slides project proposals are due Wheeler Ruml (UNH) Lecture 18, CS 730 1 / 14 MDP Wrap-Up RTDP MDPs ADP Q -Learning MDP Wrap-Up Wheeler Ruml (UNH) Lecture 18, CS

431 views • 20 slides

Talk overview Introduction and historical background Multiple delivery publishing (MDP)

Structured Documents on the Web Jacco van Ossenbruggen CWI Amsterdam Talk overview Introduction and historical background Multiple delivery publishing (MDP) MDP on the Web: Style sheets Conclusion intro/history MDP w eb

532 views • 22 slides

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Birth and Death Processes Today: Birth processes Birth and Death Processes Death processes Biarth and death processes Bo Friis Nielsen 1 Limiting behaviour of birth and death processes Next week 1 DTU Informatics Finite state

309 views • 7 slides

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

CPSC 313: Intro to Computer Systems Programs, Processes, Threads Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes in UNIX (Chapter 3) Programs, Processes, and Threads Programs, Processes, and

386 views • 11 slides

1 Markov Decision Processes Markov Decision Processes An MDP is defined by: An MDP is

Non-Deterministic Search CSE 473: Artificial Intelligence Markov Decision Processes Dieter Fox University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are

370 views • 10 slides

Lack of Generalization Feature Vectors Rather than use every single detail of a state space, we }

Extending the Value Update Procedure function Q-Learning ( mdp ) returns a policy inputs : mdp , an MDP s S, a A, Q ( s, a ) = 0 repeat for each episode E : set start-state s s 0 repeat for each time-step t of episode E , until s

243 views • 9 slides

U ( s ) = U ( s ) + [ r + U ( s 0 ) U ( s )] inputs : mdp , an MDP, and , a policy to

Learning the Value of a Policy } The dynamic programming algorithm we have seen works fine if we already know everything about an MDP system, including: Probabilities of all state-action transitions 1. Rewards we get in each case 2. } If we

472 views • 7 slides

Implementing Processes Implementing Processes Review: Threads vs vs. Processes . Processes

Implementing Processes Implementing Processes Review: Threads vs vs. Processes . Processes Review: Threads 1. The process is a kernel abstraction for an independent executing program. includes at least one thread of control data also

416 views • 12 slides

Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example

3/25/2017 Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example Markov Decision Processes MDP definition Optimal Policies Auto Racing Example CSE 415: Introduction to Artificial Intelligence

431 views • 11 slides

Birth-Death Processes Birth-Death Processes: Transient Solution Poisson Process: State

Birth-Death Processes Birth-Death Processes: Transient Solution Poisson Process: State Distribution Poisson Process: Inter-arrival Times Dr Conor McArdle EE414 - Birth-Death Processes 1/17 Birth-Death Processes Birth-Death Processes are

282 views • 17 slides

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and Context Mode, Space, and Context Processes and the Kernel Processes and the Kernel The kernel sets processes up process in private data data

614 views • 50 slides

Lecture 1: Lvy processes A. E. Kyprianou Department of Mathematical Sciences, University of

Lecture 1: Lvy processes Lecture 1: Lvy processes A. E. Kyprianou Department of Mathematical Sciences, University of Bath 1/ 22 Lecture 1: Lvy processes Lvy processes 2/ 22 Lecture 1: Lvy processes Lvy processes A process X = {

959 views • 53 slides

Renewal Processes Bo Friis Nielsen 1 1 DTU Informatics 02407 Stochastic Processes 8, October 27

Renewal Processes Bo Friis Nielsen 1 1 DTU Informatics 02407 Stochastic Processes 8, October 27 2020 Bo Friis Nielsen Renewal Processes Renewal Processes Today: Renewal phenomena Next week Markov Decision Processes Three weeks from

586 views • 25 slides

Natural Processes & Human Activities Natural Processes & Human Activities bellwork 1

Natural Processes & Human Activities Natural Processes & Human Activities bellwork 1 Natural Processes & Human Activities bellwork 2 Natural Processes & Human Activities bellwork 3 Natural Processes & Human Activities

882 views • 16 slides

Temporal Difference Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Oct 12, 13,

Temporal Difference Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Oct 12, 13, 19, 2020 Agenda Introduction TD Evaluation TD Control Agenda Understand incremental computation of Monte Carlo methods From incremental

1.03k views • 88 slides

DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall

This lecture will be recorded!!! Welcome to DS595/CS525 Reinforcement Learning Prof. Yanhua Li Time: 6:00pm 8:50pm R Zoom Lecture Fall 2020 Last lecture v Reinforcement Learning Components Model, Value function, Policy v Model-based

947 views • 78 slides

MPC-Based Humanoid Gait Generation with application to Pursuit-Evasion (slides prepared by Nicola

Autonomous and Mobile Robotics Prof. Giuseppe Oriolo MPC-Based Humanoid Gait Generation with application to Pursuit-Evasion (slides prepared by Nicola Scianca and Daniele De Simone) Introduction COMANOID: Multi-contact Collaborative

816 views • 56 slides

Advanced Virgo and Einstein Telescope Bas Swinkels (Nikhef) 7 th Belgium-Dutch GW meeting

Advanced Virgo and Einstein Telescope Bas Swinkels (Nikhef) 7 th Belgium-Dutch GW meeting Groningen 29/05/2018 Advanced Virgo B. Swinkels AdVirgo & ET - Groningen 1 Advanced Virgo A lot has happened since the last meeting ...

713 views • 14 slides

Waiting Times in BMAP/BMAP/1 Queues MAM-9, Budapest Nail Akar, Bilkent University, Ankara, Turkey

Continuous-Valued Lindley Process Markov Renewal Processes as Arrival and Service Models Steady-state Waiting Time in MRP-ME/MRP-ME/1 Queues Conclusions and Future Work Waiting Times in BMAP/BMAP/1 Queues MAM-9, Budapest Nail Akar, Bilkent

1.12k views • 66 slides

Implementation Issues More from Interface point of view V Eye Y U N X Z Viewing Coordinate

Implementation Issues More from Interface point of view V Eye Y U N X Z Viewing Coordinate System (VCS) World Coordinate System (WCS) View Coordinate System (VCS) Viewing coordinate system Position and orientation of the view plane

595 views • 31 slides

10703 Deep Reinforcement Learning Tom Mitchell September 5, 2018 Solving known MDPs Many slides

9/7/18 10703 Deep Reinforcement Learning Tom Mitchell September 5, 2018 Solving known MDPs Many slides borrowed from Katerina Fragkiadaki Russ Salakhutdinov Markov Decision Process (MDP) A Markov Decision Process is a tuple is a

501 views • 14 slides

Measuring together with the continuum large Miguel Angel Mota (ITAM) Joint work with David

Measuring together with the continuum large Miguel Angel Mota (ITAM) Joint work with David Asper o III Arctic Set Theory Meeting Definition Measuring holds if and only if for every sequence C = ( C : 1 ) , if each C is a

576 views • 44 slides