Reinforcement Learning
Robert Platt Northeastern University Some images and slides are used from:
- 1. CS188 UC Berkeley
- 2. RN, AIMA
Reinforcement Learning Robert Platt Northeastern University Some - - PowerPoint PPT Presentation
Reinforcement Learning Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Conception of agent act Agent World sense RL conception of agent Agent takes actions a Agent World s,r
Agent takes actions Agent perceives states and rewards
Image: Berkeley CS188 course notes (downloaded Summer 2015)
Image: Berkeley CS188 course notes (downloaded Summer 2015)
Image: Berkeley CS188 course notes (downloaded Summer 2015)
Overheated
Fast Fast
Slow Slow
0.5 0.5 0.5 0.5 1.0 1.0
+1 +1 +1 +2 +2
Image: Berkeley CS188 course notes (downloaded Summer 2015)
Overheated
Image: Berkeley CS188 course notes (downloaded Summer 2015)
https://www.youtube.com/watch?v=goqWX7bC-ZY
Image: Berkeley CS188 course notes (downloaded Summer 2015)
Number of times agent reached s' by taking a from s Set of rewards obtained when reaching s' by taking a from s
Number of times agent reached s' by taking a from s Set of rewards obtained when reaching s' by taking a from s
Goal: Compute expected age of students in this class Unknown P(A): “Model Based” Unknown P(A): “Model Free” Without P(A), instead collect samples [a1, a2, … aN]
Why does this work? Because samples appear with the right frequencies. Why does this work? Because eventually you learn the right model.
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
π(s) s s, π(s) s1 '
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
π(s) s s, π(s) s1 ' s2 '
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
π(s) s s, π(s) s1 ' s2 ' s3 '
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
π(s) s s, π(s) s1 ' s2 ' s3 '
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
s'
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Assume: γ = 1, α = 1/2
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Assume: γ = 1, α = 1/2
B, east, C, -2
Observed reward
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Assume: γ = 1, α = 1/2
B, east, C, -2
C, east, D, -2
Observed reward
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Image: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Exact Q’s Approximate Q’s
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
1 2 3 4 1 2 3 2 2 2 2 4 2 6
2 2 4
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
2
Slide: Berkeley CS188 course notes (downloaded Summer 2015)
Gradient descent