Lecture 15: Batch RL
Emma Brunskill
CS234 Reinforcement Learning.
Winter 2019 Slides drawn from Philip Thomas with modifications
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Lecture 15: Batch RL Emma Brunskill CS234 Reinforcement Learning. - - PowerPoint PPT Presentation
Lecture 15: Batch RL Emma Brunskill CS234 Reinforcement Learning. Winter 2019 Slides drawn from Philip Thomas with modifications Winter 2019 Slides drawn from Philip Thomas Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL /
CS234 Reinforcement Learning.
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
1
2
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
L
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
1
2
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
n
t
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
n
n
i=1(wi
t=1 γtRi t) in our case.
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Lj−1
Lj−1
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
n
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
n
t
L
n
t
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
n
t
i=1 wi n
t
Winter 2019 Slides drawn from Philip Thomas / 72
i=1 wi n
t
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
n
∞
t(Ri t − ˆ
t, Ai t)) + γtρi t−1ˆ
t),
t = t τ1 πe(aτ
πb(aτ
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
n
i=1
t=1 πe(at
πb(at
t=1 γtRi t
Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
n
n
i=1(wi
t=1 γtRi t) in our case.
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
L
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
n
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
n
n
n−1
i=1(Xi − ¯
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
0.002715 0.003832 n=10000 n=30000 n=60000 n=100000 Expected Normalized Return None, CUT None, BCa k-Fold, CUT k-Fold, Bca
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 15: Batch RL Winter 2019 Slides drawn from Philip Thomas / 72