Announcements
- Homework
k 4: MDPs s (lead TA: Iris)
- Due Mon 7 Oct at 11:59pm
- Pr
Project 2 t 2: Mu Multi-Ag Agent Search (lead TA: Zhaoqing)
- Due Thu 10 Oct at 11:59pm
- Offi
Office H Hours
- Iris:
s: Mon 10.00am-noon, RI 237
- JW
JW: Tue 1.40pm-2.40pm, DG 111
- Zh
Zhaoqi qing: : Thu 9.00am-11.00am, HS 202
- El
Eli: Fri 10.00am-noon, RY 207
CS 4100: Artificial Intelligence
Markov Decision Processes II
Jan-Willem van de Meent, Northeastern University
[These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]
Example: Grid World
- A
A maze-like ke problem
- The agent lives in a grid
- Walls block the agent’s path
- No
Nois isy movement: act actions s do
- not
- t al
always ays go as as plan anned ed
- 80% of the time, the action North takes the agent North
(if there is no wall there)
- 10% of the time, North takes the agent West; 10% East
- If there is a wall in the direction the agent would have
been taken, the agent stays put
- The
The age gent nt receives s rewards s each h time st step
- Small “living” reward each step (can be negative)
- Big rewards come at the end (good or bad)
- Go
Goal: l: maxim imiz ize sum of rewa wards
Recap: MDPs
- Marko
kov v decisi sion processe sses: s:
- Set of st
states S
- Start st
state s0
- Se
Set of actions A
- Transi
sitions P( P(s’ s’|s, s,a) (or T( T(s, s,a,s’) ’))
- Re
Rewards R( R(s, s,a,s’) (and discount g)
- MDP quantities
s so so far:
- Po
Policy = Choice of action for each state
- Ut