Lecture 12: Batch RL
Emma Brunskill
CS234 Reinforcement Learning.
Winter 2018 Slides drawn from Philip Thomas with modifications
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Lecture 12: Batch RL Emma Brunskill CS234 Reinforcement Learning. - - PowerPoint PPT Presentation
Lecture 12: Batch RL Emma Brunskill CS234 Reinforcement Learning. Winter 2018 Slides drawn from Philip Thomas with modifications Winter 2018 Slides drawn from Philip Thomas Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL /
CS234 Reinforcement Learning.
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
1
2
3
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
1
2
3
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
L
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
1
2
3
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
n
t
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
n
n
i=1(wi
t=1 γtRi t) in our case.
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
n
t
L
n
t
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
n
t
i=1 wi n
t
Winter 2018 Slides drawn from Philip Thomas / 68
i=1 wi n
t
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
n
∞
t(Ri t − ˆ
t, Ai t)) + γtρi t−1ˆ
t),
t = t τ1 πe(aτ
πb(aτ
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
n
i=1
t=1 πe(at
πb(at
t=1 γtRi t
Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
L
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
n
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
n
n
n−1
i=1(Xi − ¯
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
0.002715 0.003832 n=10000 n=30000 n=60000 n=100000 Expected Normalized Return None, CUT None, BCa k-Fold, CUT k-Fold, Bca
80
Blood Glucose (sugar) Eat Carbohydrates Release Insulin
81
Blood Glucose (sugar) Eat Carbohydrates Release Insulin Hyperglycemia
82
Blood Glucose (sugar) Eat Carbohydrates Release Insulin Hypoglycemia Hyperglycemia
83
84
85
Probability Policy Changed Probability Policy Worse
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 12: Batch RL Winter 2018 Slides drawn from Philip Thomas / 68