eecs 3401 ai and logic prog lecture 20
play

EECS 3401 AI and Logic Prog. Lecture 20 Adapted from official - PowerPoint PPT Presentation

EECS 3401 AI and Logic Prog. Lecture 20 Adapted from official slides for 3-ed ed. Russell & Norvig (Ch.17) Vitaliy Batusov vbatusov@cse.yorku.ca York University November 30, 2020 Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS


  1. EECS 3401 — AI and Logic Prog. — Lecture 20 Adapted from official slides for 3-ed ed. Russell & Norvig (Ch.17) Vitaliy Batusov vbatusov@cse.yorku.ca York University November 30, 2020 Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 1 / 55

  2. Today: Sequential Decision-Making Required reading: Russell & Norvig Ch. 17.1–17.3 Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 2 / 55

  3. Context Covered to date: Search; Belief Networks Today: Markov Decision Processes Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 3 / 55

  4. Basic Idea behind MDP Goal: decision making under uncertainty and a notion of utility Random variables to describe the world (like in Belief Networks) But now the world is again dynamical Transition model: specifies the probability distribution over the latest state variables, given the previous values Markov assumption : current state depends on only a finite fixed number of previous states First-order Markov process: current state depends only on last state Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 4 / 55

  5. Sequential Decision Problems Search uncertainty Planning and utility explicit actions and subgoals uncertainty MDP and utility uncertain sensing Decision-theoretic Planning POMDP Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 5 / 55

  6. Example MDP States: s ∈ S , actions: a ∈ A Transition model : T ( s , a , s ′ ) � P ( s ′ | s , a ) — probability that a in s leads to s ′ Reward function : � − 0 . 04 (small penalty for non-terminal states) R ( s ) = ± 1 for terminal states Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 6 / 55

  7. Solving MDPs In search problems, the aim is to find an optimal sequence of actions In MDPs, the aim is to find an optimal policy π ( s ) I.e., best action for every possible state s The optimal policy maximizes the expected sum of rewards Suppose R ( s ) = − 0 . 04. Optimal policy: Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 7 / 55

  8. Risk and Reward Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 8 / 55

  9. Utility of State Sequences Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 9 / 55

  10. Utility of States Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 10 / 55

  11. Utility of States Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 11 / 55

  12. Dynamic Programming Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 12 / 55

  13. Value Iteration Algorithm Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 13 / 55

  14. Convergence Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 14 / 55

  15. Policy Iteration Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 15 / 55

  16. Modified Policy Iteration Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 16 / 55

  17. Partial Observability Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 17 / 55

  18. Partial Observability Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 18 / 55

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend