Prefrontal cortex as a Meta-reinforcement learning system Matthew - - PowerPoint PPT Presentation

prefrontal cortex as a meta reinforcement learning system
SMART_READER_LITE
LIVE PREVIEW

Prefrontal cortex as a Meta-reinforcement learning system Matthew - - PowerPoint PPT Presentation

Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL Mnih et al, Nature (2015) Mnih et al, Nature (2015) Yamins & DiCarlo, 2016 Schultz et al, Science


slide-1
SLIDE 1

Prefrontal cortex as a Meta-reinforcement learning system

Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL

slide-2
SLIDE 2

Mnih et al, Nature (2015)

slide-3
SLIDE 3
slide-4
SLIDE 4

Mnih et al, Nature (2015)

slide-5
SLIDE 5

Yamins & DiCarlo, 2016

slide-6
SLIDE 6

Schultz et al, Science (1997)

slide-7
SLIDE 7

Jederberg et al., 2016

slide-8
SLIDE 8

Jederberg et al., 2016

slide-9
SLIDE 9

Mante et al., Nature, 2013 Song et al., Elife, 2017

slide-10
SLIDE 10
slide-11
SLIDE 11

Lake et al, BBS (2017)

slide-12
SLIDE 12
slide-13
SLIDE 13

Harlow, Psychological Review, 1949

“Learning to learn”

slide-14
SLIDE 14

Harlow, Psychological Review, 1949

Training episodes

“Learning to learn”

slide-15
SLIDE 15

Mnih et al, Nature (2015)

slide-16
SLIDE 16
slide-17
SLIDE 17

Jederberg et al., 2016

slide-18
SLIDE 18

Jederberg et al., 2016

slide-19
SLIDE 19

https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/

slide-20
SLIDE 20
slide-21
SLIDE 21

a

t

v

t

  • t

a

t

  • 1

r

t

  • 1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci., 2016; Duan et al., arXiv (2016)

slide-22
SLIDE 22

0.7 0.4 0.6 0.9 0.3 0.1 0.8 0.7

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

slide-23
SLIDE 23

a

t

v

t

  • t

a

t

  • 1

r

t

  • 1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

slide-24
SLIDE 24

Trial

100 80 60 40 1 20 1 2 3 4

Cumulative regret

Gittins indices UCB Thompson sampling

Trial Episode

Left Right

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

slide-25
SLIDE 25

at vt

  • t at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

slide-26
SLIDE 26

0.7 0.3 0.6 0.4 0.3 0.7 0.8 0.2

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

slide-27
SLIDE 27

Trial

100 80 60 40 1 20 1 2 3 4

Cumulative regret

Gittins indices UCB Thompson sampling

Trial Episode

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

slide-28
SLIDE 28

Training episodes

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

slide-29
SLIDE 29

a

t

v

t

  • t

a

t

  • 1

r

t

  • 1

δ

(PFC)

(DA)

Volkmann et al., Nature Reviews Neurology, 2010

slide-30
SLIDE 30

4 2

  • 2
  • 4
  • 4
  • 2

2 4

log2

RR RL

log2

CR CL

4 2

  • 2
  • 4

log2

RR RL

log2

CR CL

  • 4
  • 2

2 4 Tsutsui et al., Nature Comms, 2016

Wang et al., Nature Neuroscience (2018)

slide-31
SLIDE 31

a

t

v

t

  • t

a

t

  • 1

r

t

  • 1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018)

slide-32
SLIDE 32

at-1 rt-1 at-1x rt-1 vt

0.2 0.1 0.3 0.4 0.5 0.6

Proportion

Tsutsui et al., Nature Comms, 2016 0.2 0.1 0.3 0.4 0.5 0.6

Correlation

at-1 rt-1 at-1x rt-1 vt Wang et al., Nature Neuroscience (2018)

slide-33
SLIDE 33

at vt

  • t at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018)

slide-34
SLIDE 34
slide-35
SLIDE 35

Trial

100 80 60 40 1 20 1 2 3 4

Cumulative regret

Gittins indices UCB Thompson sampling

Trial Episode

slide-36
SLIDE 36

A B

20 40 60 80 100 120 140 160 180 200

Step

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 20 40 60 80 100 120 140 160 180 200

Step

Reward probability Inferred/decoded volatility Learning rate action feedback

Behrens et al., Nature Neuroscience, 2007 Wang et al., Nature Neuroscience (2018)

slide-37
SLIDE 37

Behrens et al., Nature Neuroscience, 2007 Wang et al., Nature Neuroscience (2018)

slide-38
SLIDE 38

a

t

v

t

  • t

a

t

  • 1

r

t

  • 1

δ

(PFC)

(DA)

Volkmann et al., Nature Reviews Neurology, 2010

slide-39
SLIDE 39

Bromberg-Martin et al, J Neurophys, 2010

REVERSAL

Wang et al., Nature Neuroscience (2018)

slide-40
SLIDE 40

a

t

v

t

  • t

a

t

  • 1

r

t

  • 1

δ

(PFC)

(DA)

Left rewarded Right rewarded

Wang et al., Nature Neuroscience (2018)

slide-41
SLIDE 41
slide-42
SLIDE 42

Miller, Botvinick & Brody, Nat. Neuro., 2017; Daw et al., Neuron, 2011

Model-based RPE

Stage 2 1

  • 1

1

  • 1

Meta-RL RPE

Reward r2 = 0.89

Model-based RL (from model-free RL)

Wang et al., Nature Neuroscience (2018)

slide-43
SLIDE 43

DA blocked upon food reward from large/risky option DA blocked upon food reward from small/certain option DA triggered upon food omission from large/risky option

Wang et al., arXiv; 2018 Stopper et al., Neuron, 2014

Optogenetic manipulation of dopamine

slide-44
SLIDE 44

Mnih et al, Nature (2015)

slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47
  • Richer environments / abstractions (Espeholt et al., arXiv, 2018)
  • Architectural biases (e.g., Raposo et al., NIPS, 2017)
  • Complementary forms of meta-learning (e.g., Fernando et al., under review)
  • Episodic reinstatement (Ritter et al., in press)

Current / Future Work

slide-48
SLIDE 48

Neuroscience and AI: A virtuous circle

slide-49
SLIDE 49

Jane Wang Zeb Kurth-Nelson Dharshan Kumaran Chris Summerfield Hubert Soyer Joel Leibo Sam Ritter

Collaborators

Adam Santoro Tim Lillicrap David Barrett Dhruva Tirumala Remi Munos Charles Blundell Demis Hassabis

DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL