CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
CSCE 496/896 Lecture 7: Reinforcement Learning
Stephen Scott
(Adapted from Paul Quint)
sscott@cse.unl.edu
1 / 53
CSCE 496/896 Lecture 7: Learning Stephen Scott Reinforcement - - PowerPoint PPT Presentation
CSCE 496/896 Lecture 7: Reinforcement CSCE 496/896 Lecture 7: Learning Stephen Scott Reinforcement Learning Introduction MDPs Q Learning Stephen Scott TD Learning DQN (Adapted from Paul Quint) Atari Example Go Example
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
1 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
2 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
3 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
4 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
5 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
6 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
7 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
8 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
100 100
100 100 90 90 81
9 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
10 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
11 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
12 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
a′
13 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
100 81
R
66 72
Initial state: s1
100 90 81
R
66
Next state: s2
aright
ˆ Q(s1, aright) ← r + γ max
a′
ˆ Q(s2, a′) = 0 + 0.9 max{66, 81, 100} = 90
14 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
15 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
a′
a′
a′
a′
a′
s′′,a′ |ˆ
16 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
17 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
18 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
19 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
a
20 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
a
21 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
22 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
23 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
24 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
25 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
a′
26 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
1
2
a′
θt(st+1, a′) − Qθt(st, at)
3
27 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
28 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
29 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
30 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
31 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
32 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
33 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
34 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
35 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
36 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
37 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
38 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
39 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
40 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
1
2
3
4
5
41 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
42 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
43 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
b Nr(s,b)
44 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
45 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
46 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
1
2
3
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
48 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
49 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
50 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
51 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
52 / 53
CSCE 496/896 Lecture 7: Reinforcement Learning Stephen Scott Introduction MDPs Q Learning TD Learning DQN Atari Example Go Example
AlphaGo AlphaGo Zero AlphaZero
53 / 53