Lecture 14: Batch RL
Emma Brunskill
CS234 Reinforcement Learning.
Winter 2020 Slides drawn from Philip Thomas with modifications
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Lecture 14: Batch RL Emma Brunskill CS234 Reinforcement Learning. - - PowerPoint PPT Presentation
Lecture 14: Batch RL Emma Brunskill CS234 Reinforcement Learning. Winter 2020 Slides drawn from Philip Thomas with modifications *Note: we only went carefully through slides before slide 34. The remaining slides are kept for those interested
CS234 Reinforcement Learning.
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
1
2
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
L
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
1
2
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Lj−1
Lj−1
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
n
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
n
t
L
n
t
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
n
t
i=1 wi n
t
Winter 2020 Slides drawn from Philip Thomas / 70
i=1 wi n
t
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
i=1 wi n
t
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
n
∞
t(Ri t − ˆ
t, Ai t)) + γtρi t−1ˆ
t),
t = t τ1 πe(aτ
πb(aτ
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
n
i=1
t=1 πe(at
πb(at
t=1 γtRi t
Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
n
n
i=1(wi
t=1 γtRi t) in our case.
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
L
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
n
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
n
n
n−1
i=1(Xi − ¯
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
0.002715 0.003832 n=10000 n=30000 n=60000 n=100000 Expected Normalized Return None, CUT None, BCa k-Fold, CUT k-Fold, Bca
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70
Emma Brunskill (CS234 Reinforcement Learning. )Lecture 14: Batch RL Winter 2020 Slides drawn from Philip Thomas / 70