Reinforcement Learning Lectures 4 and 5
Gillian Hayes 18th January 2007
Gillian Hayes RL Lecture 4/5 18th January 2007
Reinforcement Learning Lectures 4 and 5 Gillian Hayes 18th January - - PowerPoint PPT Presentation
Reinforcement Learning Lectures 4 and 5 Gillian Hayes 18th January 2007 Gillian Hayes RL Lecture 4/5 18th January 2007 1 Reinforcement Learning Framework Rewards, Returns Environment Dynamics Components of a Problem
Gillian Hayes RL Lecture 4/5 18th January 2007
1
Gillian Hayes RL Lecture 4/5 18th January 2007
2
State/
AGENT Action at st+1 r t+1 Situation s t t Reward r POLICY VALUE FUNCTION ENVIRONMENT
Gillian Hayes RL Lecture 4/5 18th January 2007
3
∞
Gillian Hayes RL Lecture 4/5 18th January 2007
4
ss′ = Pr{st+1 = s′ | st = s, at = a}
Gillian Hayes RL Lecture 4/5 18th January 2007
Dynamics of Environment 5
ss′ = E{rt+1 | st = s, at = a, st+1 = s′}
Gillian Hayes RL Lecture 4/5 18th January 2007
6
ss′
ss′Ra ss′
Gillian Hayes RL Lecture 4/5 18th January 2007
7
Gillian Hayes RL Lecture 4/5 18th January 2007
8
wait
search wait
search
Gillian Hayes RL Lecture 4/5 18th January 2007
9
∞
Gillian Hayes RL Lecture 4/5 18th January 2007
10
∞
Gillian Hayes RL Lecture 4/5 18th January 2007
11
∞
∞
ss′[Ra ss′ + γEπ{ ∞
ss′[Ra ss′ + γV π(s′)]
Gillian Hayes RL Lecture 4/5 18th January 2007
12
ss′[Ra ss′ + γ
Gillian Hayes RL Lecture 4/5 18th January 2007
13
Gillian Hayes RL Lecture 4/5 18th January 2007
14
∞
∞
ss′[Ra ss′ + γEπ{ ∞
ss′[Ra ss′ + γVπ(s′)] Gillian Hayes RL Lecture 4/5 18th January 2007
15
∞
Gillian Hayes RL Lecture 4/5 18th January 2007
16
π
π
Gillian Hayes RL Lecture 4/5 18th January 2007
17
a
a
a
a
a
ss′[Ra ss′ + γV ∗(s′)] Gillian Hayes RL Lecture 4/5 18th January 2007
Bellman Optimality Equations 1 18
a′ Q∗(st+1, a′) | st = s, at = a}
ss′[Ra ss′ + γ max a′ Q∗(s′, a′)]
Gillian Hayes RL Lecture 4/5 18th January 2007
19
ss′, P a ss′ known, then can solve equations for V ∗ (or
ss′
Gillian Hayes RL Lecture 4/5 18th January 2007
Bellman Optimality Equations 2 20
Gillian Hayes RL Lecture 4/5 18th January 2007
21
ss′, Ra ss′
Gillian Hayes RL Lecture 4/5 18th January 2007
22
ss′ → probability of going from s → s′ if do a
ss′ from doing a in s and reaching s′
k=0 rt+k+1γk
Gillian Hayes RL Lecture 4/5 18th January 2007
Components of an RL Problem 23
Gillian Hayes RL Lecture 4/5 18th January 2007
24
Gillian Hayes RL Lecture 4/5 18th January 2007