neural map structured memory for deep reinforcement
play

Neural Map: Structured Memory for Deep Reinforcement Learning - PowerPoint PPT Presentation

CS 885 Reinforcement Learning Neural Map: Structured Memory for Deep Reinforcement Learning Emilio Parisotto and Ruslan Salakhutdinov, ICLR 2018 Presented by Andreas Stckel July 6, 2018 motivation: navigating partially observable


  1. CS 885 Reinforcement Learning Neural Map: Structured Memory for Deep Reinforcement Learning Emilio Parisotto and Ruslan Salakhutdinov, ICLR 2018 Presented by Andreas Stöckel July 6, 2018

  2. motivation: navigating partially observable environments (i) 1

  3. motivation: navigating partially observable environments (i) 1

  4. motivation: navigating partially observable environments (i) 1

  5. motivation: navigating partially observable environments (i) Screenshot of “The Elder Scrolls: Arena”, 1994 1

  6. mandates memory units tend to forget quickly incorporating locality into DNC motivation: navigating partially observable environments (ii) Partially observable environment Reduce interference by interference (DNC) have problems with Difgerentiable Neural Computer External memories such as Long-short-term memory (LSTM) Observations Reinforcement learning setup through arbitrary environments Discrete a t agent reaches goal Sparse reward r , i.e. only when Partially observable s t 2 ▶ State: ▶ Reward: ▶ Actions: ▶ Goal: Find policy π ( a | s ) navigating

  7. units tend to forget quickly incorporating locality into DNC motivation: navigating partially observable environments (ii) Reinforcement learning setup Reduce interference by interference (DNC) have problems with Difgerentiable Neural Computer External memories such as Long-short-term memory (LSTM) mandates memory Observations through arbitrary environments Discrete a t agent reaches goal Sparse reward r , i.e. only when Partially observable s t 2 ▶ State: ▶ Partially observable environment ▶ Reward: ▶ Actions: ▶ Goal: Find policy π ( a | s ) navigating

  8. incorporating locality into DNC motivation: navigating partially observable environments (ii) through arbitrary environments Reduce interference by interference (DNC) have problems with Difgerentiable Neural Computer External memories such as mandates memory Reinforcement learning setup Observations Discrete a t agent reaches goal Sparse reward r , i.e. only when Partially observable s t 2 ▶ State: ▶ Partially observable environment ▶ Reward: ▶ Long-short-term memory (LSTM) units tend to forget quickly ▶ Actions: ▶ Goal: Find policy π ( a | s ) navigating

  9. incorporating locality into DNC motivation: navigating partially observable environments (ii) through arbitrary environments Reduce interference by interference (DNC) have problems with Difgerentiable Neural Computer mandates memory Reinforcement learning setup Observations agent reaches goal Discrete a t Sparse reward r , i.e. only when Partially observable s t 2 ▶ State: ▶ Partially observable environment ▶ Reward: ▶ Long-short-term memory (LSTM) units tend to forget quickly ▶ External memories such as ▶ Actions: ▶ Goal: Find policy π ( a | s ) navigating

  10. motivation: navigating partially observable environments (ii) through arbitrary environments interference (DNC) have problems with Difgerentiable Neural Computer mandates memory Reinforcement learning setup Observations Discrete a t agent reaches goal Sparse reward r , i.e. only when Partially observable s t 2 ▶ State: ▶ Partially observable environment ▶ Reward: ▶ Long-short-term memory (LSTM) units tend to forget quickly ▶ External memories such as ▶ Actions: ▶ Goal: Find policy π ( a | s ) navigating ⇒ Reduce interference by incorporating locality into DNC

  11. overview I Background Computer Actor-Critic II Neural Map Network III Empirical Evaluation IV Summary & Conclusion ▶ Memory Systems ▶ 2D Goal-Search Environment ▶ Difgerentiable Neural ▶ 3D Doom Environment ▶ Asynchronous Advantage

  12. Part I BACKGROUND

  13. 3 external memories for deep neural networks EXTERNAL NEURAL NETWORK MEMORIES No write operator Write operator Memory networks Di � erentiable Neural Computer Convolutional network over External memory matrix with past M states di � erentiable access operations

  14. t c R t , t e T t v T differentiable neural computer (dnc): read and write t c W t c W 1 M t 1 M t erase vector e t , value v t Write operation: Given write context c W t M T r t t Given read context vector c R Read operation: 4 ▶ External memory matrix M ∈ R W × H ▶ Associative memory: ▶ Associate context with value c t → v t ▶ Given ˜ c t ≈ c t retrieve r t ≈ v t

  15. differentiable neural computer (dnc): read and write Given read context vector c R t erase vector e t , value v t Given write context c W t t 4 ▶ External memory matrix M ∈ R W × H ▶ Read operation: ▶ Associative memory: r t = M T t c R ▶ Associate context with value ▶ Write operation: c t → v t t , ▶ Given ˜ c t ≈ c t retrieve r t ≈ v t M t + 1 = M t ◦ ( 1 − c W t ) + c W t e T t v T

  16. asynchronous advantage actor-critic (a3c) as baseline (Actor-Critic) 5 ▶ REINFORCE policy gradient descent with value function V π ( s ) ( ) ∇ π log π ( a t | s t ) G t − V π ( s t ) ∞ ∑ G t = γ k R t + k k = 0 ▶ Here: π ( a | s ) is deep neural network

  17. Part II NEURAL MAP NETWORK

  18. neural map: overview (i) 6 W x , y x , y r t W c t SMAX C M t M t+1 READ WRITE f CNN f NN H s t INPUT o t π ( a | s ) SMAX OUTPUT OUT f NN

  19. neural map: overview (ii) r t f OUT Policy o t Output Update Variables Write c t Context t Read Input state Time step Location 7 Neural map Operations t ∈ Z = read ( M t ) ( x t , y t ) ∈ R 2 = context ( M t , s t , r t ) w ( x t , y t ) s t , r t , c t , M ( x t , y t ) ( ) s t ∈ R n = write t + 1 M t , w ( x t , y t ) ( ) M t ∈ R C × H × W = update M t + 1 t + 1 r t , c t , w ( x t , y t ) [ ] = t + 1 ( ) π ( s t | a ) = SoftMax NN ( o t )

  20. neural map: read 8 W x , y x , y r t c t W SMAX C M t+1 M t READ WRITE f CNN f NN H s t o t INPUT π ( a | s ) SMAX OUTPUT OUT f NN ▶ Summarize memory in single “read vector” r t = f READ CNN ( M t ) ∈ R C

  21. exp a x y z w exp a z w a x y q t M x y M x y neural map: context t t t t x y c t x y x y t t C t 9 W x , y x , y r t W c t SMAX C M t+1 M t READ WRITE f CNN f NN H s t o t INPUT π ( a | s ) SMAX OUTPUT OUT f NN ▶ Decode context c t based on input state s t [ ] q t = W ∈ R C s t , r t

  22. exp a x y z w exp a z w M x y neural map: context x y C t t x y x y c t t t t t t 9 W x , y x , y r t W c t SMAX C M t+1 M t READ WRITE f CNN f NN H s t o t INPUT π ( a | s ) SMAX OUTPUT OUT f NN ▶ Decode context c t based on input state s t [ ] q t = W ∈ R C s t , r t a ( x , y ) q t , M ( x , y ) ⟨ ⟩ = ∈ R

  23. M x y neural map: context t C t t x y x y c t t t t t exp 9 W x , y x , y r t W c t SMAX C M t+1 M t READ WRITE f CNN f NN H s t o t INPUT π ( a | s ) SMAX OUTPUT OUT f NN ▶ Decode context c t based on input state s t a ( x , y ) ( ) α ( x , y ) [ ] q t = W ∈ R C = ) ∈ R s t , r t a ( z , w ) ( ∑ z , w exp a ( x , y ) q t , M ( x , y ) ⟨ ⟩ = ∈ R

  24. neural map: context t t t t t t t exp 9 W x , y x , y r t W c t SMAX C M t+1 M t READ WRITE f CNN f NN H s t o t INPUT π ( a | s ) SMAX OUTPUT OUT f NN ▶ Decode context c t based on input state s t a ( x , y ) ( ) α ( x , y ) [ ] q t = W ∈ R C = ) ∈ R s t , r t a ( z , w ) ( ∑ z , w exp a ( x , y ) q t , M ( x , y ) x , y α ( x , y ) M ( x , y ) ∑ ⟨ ⟩ = ∈ R c t = ∈ R C

  25. neural map: write & update NN t 10 W x , y x , y r t W c t SMAX C M t M t+1 READ WRITE f CNN f NN H s t o t INPUT π ( a | s ) SMAX OUTPUT OUT f NN ▶ Compute content of memory for t + 1 at location x , y  w ( x t , y t ) if ( a , b ) = ( x t , y t )  w ( x t , y t ) s t , r t , c t , M ( x t , y t ) M ( a , b ) t + 1 ([ ]) = f WRITE = t + 1 t + 1 M ( a , b ) if ( a , b ) ̸ = ( x t , y t )  t + 1

  26. neural map: output & policy t f OUT 11 W x , y x , y r t W c t SMAX C M t M t+1 READ WRITE f CNN f NN H s t o t INPUT π ( a | s ) SMAX OUTPUT OUT f NN ▶ Compute policy π ( a | s ) c t , r t , M ( x t , y t ) [ ] ( ) o t = π ( a | s ) = SoftMax NN ( o t )

  27. LSTM s t r t c t 1 h t context M t h t neural map: extensions Use GRU equations to disable/weaken memory update LSTM controller for 3D environments W , H too small, confuses network (reads what it just wrote) Use LSTM to remember read/write operations h t 1 c t Egocentric navigation Real-world agents do not know their global x , y location Always write to centre of memory, translate memory on movement 12 ▶ Write operation: Gated Recurrent Units (GRUs)

  28. neural map: extensions Use GRU equations to disable/weaken memory update Egocentric navigation Real-world agents do not know their global x , y location Always write to centre of memory, translate memory on movement 12 ▶ Write operation: Gated Recurrent Units (GRUs) ▶ LSTM controller for 3D environments ▶ W , H too small, confuses network (reads what it just wrote) ⇒ Use LSTM to remember read/write operations h t = LSTM ( s t , r t , c t − 1 , h t − 1 ) c t = context ( M t , h t )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend