Lecture 12: Fast Reinforcement Learning 1
Emma Brunskill
CS234 Reinforcement Learning
Winter 2020
1With some slides derived from David Silver Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 1 / 62
Refresh Your Understanding: Multi-armed Bandits Select all that are - - PowerPoint PPT Presentation
Lecture 12: Fast Reinforcement Learning 1 Emma Brunskill CS234 Reinforcement Learning Winter 2020 1 With some slides derived from David Silver Lecture 12: Fast Reinforcement Learning 1 Emma Brunskill (CS234 Reinforcement Learning ) Winter 2020
1With some slides derived from David Silver Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 1 / 62
1
Nt(a) log(1/δ)
2
3
Nt(a) log(t/δ)
4
5
6
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 2 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 3 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 4 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 5 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 6 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 7 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 8 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 9 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 10 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 11 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 12 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 13 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 14 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 15 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 16 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 17 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 18 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 19 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 20 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 21 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 22 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 23 / 62
1
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 24 / 62
1
2
1Note:This is a made up example. This is not the actual expected efficacies of the
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 25 / 62
1
2
3
4
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 26 / 62
1
2
3
4
5
6
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 27 / 62
1
2
3
4
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 28 / 62
1
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 29 / 62
1
2
3
4
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 30 / 62
1
2
3
4
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 31 / 62
1
2
3
4
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 32 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 33 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 34 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 35 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 36 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 37 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 38 / 62
Lecture 12: Fast Reinforcement Learning 1 Winter 2020 39 / 62
Lecture 12: Fast Reinforcement Learning 1 Winter 2020 40 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 41 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 42 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 43 / 62
1
2
3
4
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 44 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 45 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 46 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 47 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 48 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 49 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 50 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 51 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 52 / 62
rmax (1−γ) T
i=1 αi where αi is the learning rate on step i
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 53 / 62
1 Be very optimistic until confident that empirical estimates are close to
2 Be optimistic given the information have
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 54 / 62
s′ ˆ
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 55 / 62
1 1−γ
(nsa(st,at)+1)
n(st,at) and ˆ
nsa(st,at)
s′ ˆ
β
nsa(s,a) ∀ s ∈ S, a ∈ A
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 56 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 57 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 58 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 59 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 60 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 61 / 62
Emma Brunskill (CS234 Reinforcement Learning ) Lecture 12: Fast Reinforcement Learning 1 Winter 2020 62 / 62