Using State Predictions for Value Regularization in Curiosity Driven - - PowerPoint PPT Presentation

using state predictions for value regularization in
SMART_READER_LITE
LIVE PREVIEW

Using State Predictions for Value Regularization in Curiosity Driven - - PowerPoint PPT Presentation

Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning Oliver Richter, , Manuel Fritsche Gino Brunner, Roger Wattenhofer ETH Zurich Distributed Computing


slide-1
SLIDE 1

ETH Zurich – Distributed Computing – www.disco.ethz.ch

Manuel Fritsche

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA

Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning

Oliver Richter, , Gino Brunner, Roger Wattenhofer

slide-2
SLIDE 2

Base actions on predictions

slide-3
SLIDE 3

Reinforcement learning Agent Environment

slide-4
SLIDE 4

Reinforcement learning

slide-5
SLIDE 5

How to choose the action?

slide-6
SLIDE 6

Return value

slide-7
SLIDE 7

Value function

slide-8
SLIDE 8

Reinforcement learning Agent Environment

slide-9
SLIDE 9

Sparse reward settings Agent Environment

?

slide-10
SLIDE 10

Agent Environment

slide-11
SLIDE 11

Reward the exploration of novel states

slide-12
SLIDE 12

Reward the exploration of novel states

slide-13
SLIDE 13

How to find novel states? make predictions

A

slide-14
SLIDE 14

How to find novel states? make predictions get surprised

F A

slide-15
SLIDE 15

Curiosity prediction reality

slide-16
SLIDE 16

Asynchronous Advantage Actor-Critic architecture (A3C)

A3C Network Feature Extractor

A3C

slide-17
SLIDE 17

Adding curiosity

Forward Model A3C Network Feature Extractor 1 Feature Extractor 2

slide-18
SLIDE 18

Learning good features

Forward Model Inverse Model A3C Network Feature Extractor 1 Feature Extractor 2 Feature Extractor 2 Pathak et. al, ICML 2017, A3C + ICM

slide-19
SLIDE 19

Good features for all

Forward Model Inverse Model A3C Network Feature Extractor Feature Extractor

A3C + Pred

slide-20
SLIDE 20

Adding Value Prediction

Forward Model Inverse Model Feature Extractor Feature Extractor A3C Network A3C Network

A3C + Pred + VPC

slide-21
SLIDE 21

Value Prediction Consistency

slide-22
SLIDE 22

Value Prediction Consistency

slide-23
SLIDE 23

Value Prediction Consistency

slide-24
SLIDE 24

Let’s see how it works in practice

slide-25
SLIDE 25

Rewards per episode

slide-26
SLIDE 26

Rewards per episode

slide-27
SLIDE 27

Rewards per episode

slide-28
SLIDE 28

Rewards per episode

slide-29
SLIDE 29

Thinking bigger

slide-30
SLIDE 30

Rewards per episode

slide-31
SLIDE 31

Rewards per episode

slide-32
SLIDE 32

Rewards per episode

slide-33
SLIDE 33

Rewards per episode

slide-34
SLIDE 34

Doom environment

slide-35
SLIDE 35

Doom Setup

slide-36
SLIDE 36

Rewards per episode

slide-37
SLIDE 37

Rewards per episode

slide-38
SLIDE 38

Rewards per episode

slide-39
SLIDE 39

Rewards per episode

slide-40
SLIDE 40

Question & Answers

?