cs885 reinforcement learning lecture 12 june 8 2018
play

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Recurrent neural networks Long short term memory (LSTM) networks Deep


  1. CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10 University of Waterloo CS885 Spring 2018 Pascal Poupart 1

  2. Outline • Recurrent neural networks – Long short term memory (LSTM) networks • Deep recurrent Q-networks University of Waterloo CS885 Spring 2018 Pascal Poupart 2

  3. ParEal Observability • Hidden Markov model – Initial state distribution: Pr($ % ) – Transition probabilities: Pr($ '() |$ ' ) – Observation probabilities: Pr(+ ' |$ ' ) • Belief monitoring ∝ Pr + ' $ ' ∑ / 012 Pr $ ' $ '3) Pr($ '3) |+ )..'3) ) Pr $ ' + )..' s 0 s 1 s 2 s 4 s 3 o 1 o 2 o 3 o 4 o 5 University of Waterloo CS885 Spring 2018 Pascal Poupart 3

  4. Recurrent Neural Network (RNN) • In RNNs, outputs can be fed back to the network as inputs, crea:ng a recurrent structure • HMMs can be simulated and generalized by RNNs • RNNs can be used for belief monitoring ! " : vector of observa:ons # " : belief state University of Waterloo CS885 Spring 2018 Pascal Poupart 4

  5. Training • Recurrent neural networks are trained by backpropaga6on on the unrolled network – E.g. backpropaga6on through 6me • Weight sharing: – Combine gradients of shared weights into a single gradient • Challenges: – Gradient vanishing (and explosion) – Long range memory – Predic6on driF University of Waterloo CS885 Spring 2018 Pascal Poupart 5

  6. Long Short Term Memory (LSTM) • Special gated structure to control memorization and forgetting in RNNs • Mitigate gradient vanishing • Facilitate long term memory University of Waterloo CS885 Spring 2018 Pascal Poupart 6

  7. Unrolled long short term memory &'( $ &'( # &'( % output output output X X X gate gate gate ℎ " ℎ $ ℎ % ℎ # X X X X X X forget forget forget gate gate gate input input input )* # )* $ )* % gate gate gate University of Waterloo CS885 Spring 2018 Pascal Poupart 7

  8. Deep Recurrent Q-Network • Hausknecht and Stone (2016) – Atari games • TransiBon model – LSTM network • ObservaBon model – ConvoluBonal network image image University of Waterloo CS885 Spring 2018 Pascal Poupart 8

  9. Deep Recurrent Q-Network Ini@alize weights ! and " ! at random in [−1,1] Observe current state ( Loop Execute policy for en@re episode Add episode ( ) * , + * , ) , , + , , ) - , + - , … , ) / , + / ) to experience buffer Sample episode from buffer Ini@alize ℎ 1 For 2 = 1 @ll the end of the episode do 4566 4! = 7 9 ! :;; ! (= ) *..? ), = + ? − ̂ B − 4N ! OPP ! (= Q J..H ), = G H C max G HIJ Q " L :;; " ! (= ) *..?M* ), = + ?M* 8 4! = Update weights: ! ← ! − S 4566 4! Every T steps, update target: " ! ← ! University of Waterloo CS885 Spring 2018 Pascal Poupart 9

  10. Deep Recurrent Q-Network Initialize weights ! and " ! at random in [−1,1] Observe current state ( Loop Execute policy for entire episode Add episode ( ) * , + * , ) , , + , , ) - , + - , … , ) / , + / ) to experience buffer Sample episode from buffer Initialize ℎ 1 For 2 = 1 till the end of the episode do 4566 4! = 7 9 ! :;; ! (ℎ =>* ? ) = ), ? + = − ̂ B − 4L ! MNN ! (O PQR ? S P ), ? G C max G H Q " J :;; " ! ℎ =>* ? ) = ? ) =K* , ? + =K* 8 4! ? ℎ = ← :;; " ! (ℎ =>* , ? ) = ) 4566 Update weights: ! ← ! − U 4! Every V steps, update target: " ! ← ! University of Waterloo CS885 Spring 2018 Pascal Poupart 10

  11. Results Flickering games (missing observaBons) University of Waterloo CS885 Spring 2018 Pascal Poupart 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend