CS885 Reinforcement Learning Lecture 12: June 8, 2018
Deep Recurrent Q-Networks [GBC] Chap. 10
CS885 Spring 2018 Pascal Poupart 1 University of Waterloo
CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep - - PowerPoint PPT Presentation
CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Recurrent neural networks Long short term memory (LSTM) networks Deep
CS885 Spring 2018 Pascal Poupart 1 University of Waterloo
CS885 Spring 2018 Pascal Poupart 2 University of Waterloo
CS885 Spring 2018 Pascal Poupart 3
University of Waterloo
CS885 Spring 2018 Pascal Poupart 4 University of Waterloo
CS885 Spring 2018 Pascal Poupart 5 University of Waterloo
CS885 Spring 2018 Pascal Poupart 6 University of Waterloo
CS885 Spring 2018 Pascal Poupart 7 University of Waterloo
X X X X X X X X X
gate
gate
gate input gate input gate input gate forget gate forget gate forget gate
CS885 Spring 2018 Pascal Poupart 8 University of Waterloo
image image
CS885 Spring 2018 Pascal Poupart 9
Ini@alize weights ! and " ! at random in [−1,1] Observe current state ( Loop Execute policy for en@re episode Add episode ()*, +*, ),, +,, )-, +-, … , )/, +/) to experience buffer Sample episode from buffer Ini@alize ℎ1 For 2 = 1 @ll the end of the episode do
4566 4! = 7
8 9! :;;!(= )*..?), = +? − ̂ B − C max
= GHIJ Q " L :;;" !(=
)*..?M*), = +?M*
4N! OPP!(= QJ..H), = GH 4!
Update weights: ! ← ! − S 4566
4!
Every T steps, update target: " ! ← !
University of Waterloo
CS885 Spring 2018 Pascal Poupart 10
Initialize weights ! and " ! at random in [−1,1] Observe current state ( Loop Execute policy for entire episode Add episode ()*, +*, ),, +,, )-, +-, … , )/, +/) to experience buffer Sample episode from buffer Initialize ℎ1 For 2 = 1 till the end of the episode do
4566 4! = 7
8 9! :;;!(ℎ=>* ? )=), ? += − ̂ B − C max
? GH Q " J :;;" ! ℎ=>* ?
)= ? )=K* , ? +=K*
4L! MNN!(OPQR ? SP), ? G 4!
ℎ= ← :;;"
!(ℎ=>*, ?
)=) Update weights: ! ← ! − U
4566 4!
Every V steps, update target: " ! ← !
University of Waterloo
CS885 Spring 2018 Pascal Poupart 11 University of Waterloo
Flickering games (missing observaBons)