processes mdp
play

Processes (MDP) Prof. Kuan-Ting Lai 2020/3/20 Markov Decision - PowerPoint PPT Presentation

Finite Markov Decision Processes (MDP) Prof. Kuan-Ting Lai 2020/3/20 Markov Decision Process (MDP) https://en.wikipedia.org/wiki/Markov_decision_process Markov Property Current state can represent all information from the past states


  1. Finite Markov Decision Processes (MDP) Prof. Kuan-Ting Lai 2020/3/20

  2. Markov Decision Process (MDP) https://en.wikipedia.org/wiki/Markov_decision_process

  3. Markov Property • Current state can represent all information from the past states • i.e. memoryless • Let bygones be bygones

  4. Markov Process • A Markov process is a memoryless random process, i.e. a sequence of random states S 1 , S 2 , … with Markov property • Transition probability P(s, s’) is the probability of moving from state s to state s’

  5. Student Markov Chain

  6. Student Markov Chain Episodes

  7. Example: Student Markov Chain Transition Matrix

  8. Adding Reward to Markov Process • A Markov reward process is a Markov chain with values.

  9. Student MRP

  10. Discounted Future Return G t • The discount 𝛿 ∈ [0,1] is the present value of future rewards − 𝛿 close to 0 leads to “short - sighed” evaluation − 𝛿 close to 1 leads to “far -sighed ” evaluation

  11. Why add discount factor 𝛿 ? • Uncertainty about the future • Avoids infinite returns in cyclic Markov processes • Animal/human behaviour shows preference for immediate reward

  12. Value Function • The value function v(s) estimates the long-term value of state s

  13. Student MRP Returns 1 • 𝛿 = 2

  14. State-Value Function for Student MRP (1)

  15. State-Value Function for Student MRP (2)

  16. State-Value Function for Student MRP (3)

  17. Bellman Equation for MRPs • The value function can be decomposed into two parts: − immediate reward R t+1 − discounted value of next state 𝛿 v(S t+1 )

  18. Backup Diagram for Bellman Equation

  19. Calculating Student MDP using Bellman Equation

  20. Markov Decision Process • A Markov decision process (MDP) is a Markov reward process with decisions.

  21. Student MDP with Actions

  22. Policy • MDP Policies only depend on the current state, i.e. stationary

  23. Policies

  24. Value Function

  25. State-Value Function for Student MDP

  26. Backup Diagram for 𝑤 𝜌 and 𝑟 𝜌

  27. Bellman Expectation Equation for Student MDP

  28. Optimal Value Function

  29. Optimal Value Function for Student MDP

  30. Optimal Action-Value Function for Student MDP

  31. Reference • Davlid Silver, Lecture 2: Markov Decision Processes, Reinforcement Learning (https://www.youtube.com/watch?v=lfHX2hHRMVQ&list=PLqYmG7hTraZDM- OYHWgPebj2MfCFzFObQ&index=2) • Chapter 3, Richard S. Sutton and Andrew G. Barto , “Reinforcement Learning: An Introduction,” 2 nd edition, Nov. 2018

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend