FeUdal Networks for Hierarchical Reinforcement Learning
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu: DeepMind
FeUdal Networks for Hierarchical Reinforcement Learning Alexander - - PowerPoint PPT Presentation
FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu: DeepMind Rene Bidart (rbbidart@uwaterloo.ca) CS885 June 22, 2018
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu: DeepMind
Reinforcement Learning is hard!
How do we make decisions?
No, we reason using hierarchies of abstraction.
not use hierarchical structure in policies?
middle ages
the lower levels, but not over people many layers lower
Reward Hiding:
their commands, not through an external reward
Information Hiding
levels of the hierarchy
Agent Agent Agent Rewards Rewards Environment Rewards Actions
complex or less obviously hierarchical problems
Manager
Worker
Worker Goals, Rewards Environment Actions Manager Rewards
Architecture
embedding
Goals
worker in latent space
Worker Goals, Rewards Environment Actions Manager Rewards
Shared Dense Embedding
produce goal and action
○ 16 8x8 filters ○ 32 4x4 filters ○ 256 fully connected ○ ReLU
Manager: Goal embedding
summed over last 10 time steps
environment
Worker: Action Embedding
○ Rows: actions [a] ○ Columns : embedding dimension [k]
Goal embedding: Worker
using linear transformation - ɸ
○ Can’t produce a 0 vector ○ Can’t ignore the manager’s input, so manager’s goal will influence final policy
Action: Worker
matrix (U) with goal embedding (w)
actions
goal Actor-critic: Value function from internal critic:
Reward isn’t truly hierarchical
Actor-Critic:
Feasibility
location in state space Structural Generalization
state space
Dilated RNN [Chang et al. 2017]:
rewards
manager is significantly worse
missing in a lot of DL papers.
feeding it into a fully connected network?