SLIDE 1 ETH Zurich – Distributed Computing – www.disco.ethz.ch
Manuel Fritsche
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA
Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning
Oliver Richter, , Gino Brunner, Roger Wattenhofer
SLIDE 2
Base actions on predictions
SLIDE 3
Reinforcement learning Agent Environment
SLIDE 4
Reinforcement learning
SLIDE 5
How to choose the action?
SLIDE 6
Return value
SLIDE 7
Value function
SLIDE 8
Reinforcement learning Agent Environment
SLIDE 9
Sparse reward settings Agent Environment
?
SLIDE 10
Agent Environment
SLIDE 11
Reward the exploration of novel states
SLIDE 12
Reward the exploration of novel states
SLIDE 13
How to find novel states? make predictions
A
SLIDE 14
How to find novel states? make predictions get surprised
F A
SLIDE 15
Curiosity prediction reality
SLIDE 16 Asynchronous Advantage Actor-Critic architecture (A3C)
A3C Network Feature Extractor
A3C
SLIDE 17 Adding curiosity
Forward Model A3C Network Feature Extractor 1 Feature Extractor 2
SLIDE 18 Learning good features
Forward Model Inverse Model A3C Network Feature Extractor 1 Feature Extractor 2 Feature Extractor 2 Pathak et. al, ICML 2017, A3C + ICM
SLIDE 19 Good features for all
Forward Model Inverse Model A3C Network Feature Extractor Feature Extractor
A3C + Pred
SLIDE 20 Adding Value Prediction
Forward Model Inverse Model Feature Extractor Feature Extractor A3C Network A3C Network
A3C + Pred + VPC
SLIDE 21
Value Prediction Consistency
SLIDE 22
Value Prediction Consistency
SLIDE 23
Value Prediction Consistency
SLIDE 24
Let’s see how it works in practice
SLIDE 25
Rewards per episode
SLIDE 26
Rewards per episode
SLIDE 27
Rewards per episode
SLIDE 28
Rewards per episode
SLIDE 29
Thinking bigger
SLIDE 30
Rewards per episode
SLIDE 31
Rewards per episode
SLIDE 32
Rewards per episode
SLIDE 33
Rewards per episode
SLIDE 34
Doom environment
SLIDE 35
Doom Setup
SLIDE 36
Rewards per episode
SLIDE 37
Rewards per episode
SLIDE 38
Rewards per episode
SLIDE 39
Rewards per episode
SLIDE 40
Question & Answers
?