EECS 3401 AI and Logic Prog. Lecture 20 Adapted from official - - PowerPoint PPT Presentation

eecs 3401 ai and logic prog lecture 20
SMART_READER_LITE
LIVE PREVIEW

EECS 3401 AI and Logic Prog. Lecture 20 Adapted from official - - PowerPoint PPT Presentation

EECS 3401 AI and Logic Prog. Lecture 20 Adapted from official slides for 3-ed ed. Russell & Norvig (Ch.17) Vitaliy Batusov vbatusov@cse.yorku.ca York University November 30, 2020 Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS


slide-1
SLIDE 1

EECS 3401 — AI and Logic Prog. — Lecture 20

Adapted from official slides for 3-ed ed. Russell & Norvig (Ch.17) Vitaliy Batusov vbatusov@cse.yorku.ca

York University

November 30, 2020

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 1 / 55

slide-2
SLIDE 2

Today: Sequential Decision-Making Required reading: Russell & Norvig Ch. 17.1–17.3

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 2 / 55

slide-3
SLIDE 3

Context

Covered to date: Search; Belief Networks Today: Markov Decision Processes

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 3 / 55

slide-4
SLIDE 4

Basic Idea behind MDP

Goal: decision making under uncertainty and a notion of utility Random variables to describe the world (like in Belief Networks) But now the world is again dynamical Transition model: specifies the probability distribution over the latest state variables, given the previous values Markov assumption: current state depends on only a finite fixed number of previous states First-order Markov process: current state depends only on last state

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 4 / 55

slide-5
SLIDE 5

Sequential Decision Problems

Search Planning MDP Decision-theoretic Planning POMDP

uncertainty and utility uncertain sensing explicit actions and subgoals uncertainty and utility

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 5 / 55

slide-6
SLIDE 6

Example MDP

States: s ∈ S, actions: a ∈ A Transition model: T(s, a, s′) P(s′ | s, a) — probability that a in s leads to s′ Reward function: R(s) = −0.04 (small penalty for non-terminal states) ±1 for terminal states

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 6 / 55

slide-7
SLIDE 7

Solving MDPs

In search problems, the aim is to find an optimal sequence of actions In MDPs, the aim is to find an optimal policy π(s) I.e., best action for every possible state s The optimal policy maximizes the expected sum of rewards Suppose R(s) = −0.04. Optimal policy:

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 7 / 55

slide-8
SLIDE 8

Risk and Reward

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 8 / 55

slide-9
SLIDE 9

Utility of State Sequences

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 9 / 55

slide-10
SLIDE 10

Utility of States

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 10 / 55

slide-11
SLIDE 11

Utility of States

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 11 / 55

slide-12
SLIDE 12

Dynamic Programming

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 12 / 55

slide-13
SLIDE 13

Value Iteration Algorithm

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 13 / 55

slide-14
SLIDE 14

Convergence

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 14 / 55

slide-15
SLIDE 15

Policy Iteration

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 15 / 55

slide-16
SLIDE 16

Modified Policy Iteration

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 16 / 55

slide-17
SLIDE 17

Partial Observability

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 17 / 55

slide-18
SLIDE 18

Partial Observability

Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 18 / 55