Neural Map
Structured Memory for Deep RL
Emilio Parisotto eparisot@andrew.cmu.edu
PhD Student Machine Learning Department Carnegie Mellon University
Neural Map Structured Memory for Deep RL Emilio Parisotto - - PowerPoint PPT Presentation
Neural Map Structured Memory for Deep RL Emilio Parisotto eparisot@andrew.cmu.edu PhD Student Machine Learning Department Carnegie Mellon University Supervised Learning Deep Most deep learning problems are posed as supervised Neural
Emilio Parisotto eparisot@andrew.cmu.edu
PhD Student Machine Learning Department Carnegie Mellon University
Action Observation
learning problems.
Deep Neural Net
a reward signal:
dynamic environments.
Observation / State Action Reward
Observation / State Action Reward Deep Neural Net
parameterize the policy
maximize reward using:
Chaplot, Lample, AAAI 2017
Observation / State Action Reward Deep Neural Net
reactive tasks.
horizon of less than 1 second.
partially observable:
scale up to partially-observable tasks.
Observation / State Action Reward Deep Neural Net
usually simple:
advanced external memory?
(Graves et al, 2014.)
(Graves et al, 2016.)
difficult to train, especially using RL.
Learned External Memory
Suppose an agent is in a simple maze:
Suppose an agent is in a simple maze:
Suppose an agent is in a simple maze:
At the start, no a priori knowledge to store color into memory. Needs the following to hold: 1. Write color to memory at start of maze. 2. Never overwrite memory of color over ‘T’ time steps. 3. Find and enter the goal. All conditions must hold or else episode is useless.
Solution: Write everything into memory!
based memory tasks
1. Perform a read operation over their memory database. 2. Write the latest percept into memory.
Oh et al., 2016
Easy to learn never need to guess on what to store in memory.
But can be inefficient:
𝑁𝑢
𝑿 𝑿 𝑳
𝑁𝑢
𝑿 𝑿 𝑳
𝑁𝑢
𝑿 𝑿 𝑳
𝑁𝑢
𝑿 𝑿 𝑳
𝑁𝑢
𝑿 𝑿 𝑳
𝑁𝑢
𝑿 𝑿 𝑳
𝑁𝑢
𝑿 𝑿 𝑳
𝑁𝑢
𝑿 𝑿 𝑳
used to compute policy.
𝑁𝑢
𝑠𝑢
neural map using a deep convolutional network.
provides a global summary. 𝑿 𝑿 𝑳
𝑵𝒖
𝑵𝒖
read 𝑠
𝑢
𝑵𝒖
𝜷𝒖
memory cell.
𝑵𝒖
𝜷𝒖
memory cell.
𝑵𝒖
𝜷𝒖
memory cell.
𝑵𝒖
𝜷𝒖
memory cell.
𝑵𝒖
𝜷𝒖 𝜷𝒖 ⊛ 𝑵𝒖
similarities 𝜷𝒖 and memory cells 𝑵𝒖
𝑵𝒖
𝜷𝒖 𝜷𝒖 ⊛ 𝑵𝒖
vector 𝒅𝒖
𝑵𝒖
𝜷𝒖 𝜷𝒖 ⊛ 𝑵𝒖
Intuitively:
query 𝒓𝒖
write to the current position in the neural map.
current position with this new vector.
𝑥𝑢+1
𝑵𝒖 𝑵𝒖+𝟐
𝑥𝑢+1
𝑵𝒖 𝑵𝒖+𝟐
Chung et al., 2014
what we wrote.
a policy.
Real Map (Not Visible) 3xKxK Input State (Partially observable) 3x15x3
ViZDoom (Kempka et al., 2016)
map its environment using the Neural Map?
Contact Information: Emilio Parisotto (eparisot@andrew.cmu.edu)
Observations True State Context Read Distribution