Deep Reinforcement Learning
Shan-Hung Wu
shwu@cs.nthu.edu.tw
Department of Computer Science, National Tsing Hua University, Taiwan
Machine Learning
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 1 / 57
Deep Reinforcement Learning Shan-Hung Wu shwu@cs.nthu.edu.tw - - PowerPoint PPT Presentation
Deep Reinforcement Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 1 / 57 Outline
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 1 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 2 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 3 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 4 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 4 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 5 / 57
1
2
a′ fQ∗(s′,a′;Θ)−fQ∗(s,a;Θ)
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 5 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 6 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 6 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 7 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 8 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 9 / 57
1
2
a′ fQ∗(s′,a′;Θ)−fQ∗(s,a;Θ)
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 9 / 57
1
2
a′ fQ∗(s′,a′;Θ)−fQ∗(s,a;Θ)
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 9 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 10 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 11 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 11 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 12 / 57
1
2
3
i
a′ fQ∗(s(i+1),a′;Θ)−fQ∗(s(i),a(i);Θ)
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 12 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 13 / 57
1
2
3
i
a′ fQ∗(s(i+1),a′;Θ−)−fQ∗(s(i),a(i);Θ)
4
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 13 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 14 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 14 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 15 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 16 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 16 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 17 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 18 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 19 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 20 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 21 / 57
i
a′ fQ∗(s(i+1),a′;Θ−)−fQ∗(s(i),a(i);Θ)
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 22 / 57
i
a′ fQ∗(s(i+1),a′;Θ);Θ−)−fQ∗(s(i),a(i);Θ)
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 23 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 24 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 25 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 25 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 25 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 25 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 26 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 27 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 28 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 29 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 29 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 30 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 31 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 32 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 32 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 32 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 33 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 33 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 33 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 34 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 35 / 57
Deep Reinforcement Learning Machine Learning 35 / 57
1
2
3
4
i
5
i
6
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 36 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 37 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 38 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 39 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 39 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 39 / 57
P(τ;Φ) R(τ)
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 40 / 57
P(τ;Φ) R(τ)
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 40 / 57
P(τ;Φ) R(τ)
t′=t γt′R(t′)
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 40 / 57
1
2
t′=t γt′R(i,t′)
3
i,t
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 41 / 57
1
2
t′=t γt′R(i,t′)
3
i,t
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 41 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 42 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 43 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 43 / 57
t′=t γt′R(i,t′)
t′=t γt′R(i,t′)
t′=t γt′R(i,t′) by a DNN and take advantage of its
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 43 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 44 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 44 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 45 / 57
t′=t γt′R(i,t′) −b estimates Qπ(s(i,t),a(i,t))−Vπ(s(i,t)), the advantage of
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 45 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 46 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 46 / 57
1
2
3
4
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 47 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 48 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 48 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 49 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 49 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 50 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 51 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 51 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 51 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 52 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 52 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 53 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 53 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 53 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 53 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 54 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 55 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 56 / 57
Shan-Hung Wu (CS, NTHU) Deep Reinforcement Learning Machine Learning 57 / 57