[PPT] - Learning to Navigate at City Scale Raia Hadsell Senior Research PowerPoint Presentation

SLIDE 1

[BBH Brazil for Renault / Art: Pedro Utzeri]

Learning to Navigate … at City Scale

Raia Hadsell

Senior Research Scientist

SLIDE 2

Where am I? Where am I going?

Where did I start? How distant is A from B? What is the shortest path from A to B? Have I been here before? How long until we get there?

Navigation

SLIDE 3

Raia Hadsell - Learning to Navigate - 2018

Exploration

Multi-task prediction

f sensory data

Representation

Grounding in  neuroscience

Memory

One-shot navigation  in unseen environment

Real world

Modularity and   transfer learning

SLIDE 4

Raia Hadsell - Learning to Navigate - 2018

Exploration

Multi-task prediction

f sensory data

Representation

Grounding in  neuroscience

Memory

One-shot navigation  in unseen environment

Real world

Modularity and   transfer learning

SLIDE 5

Raia Hadsell - Learning to Navigate - 2018

Can we teach agents to explore  partially observed environments?

Piotr Mirowski*, Razvan Pascanu*, Fabio Viola, Huberu Soyer, Andy Ballard, Andrea Banino,  Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharsh Kumaran and Raia Hadsell

arxiv.org/abs/1602.01783 (ICLR 2017)

Learning to Navigate in Complex Environments

[MIT News / Photo: Mark Ostow]

SLIDE 6

Raia Hadsell - Learning to Navigate - 2018

Navigation mazes

+10 +1

Within episode: Fixed goal (static or randomly changing b/w episodes)  Random respawns

[Beatuie et al (2016)  “DeepMind Lab”,  github.com/deepmind/lab]

SLIDE 7

Raia Hadsell - Learning to Navigate - 2018

Given sparse rewards… … explore and learn spatial knowledge

Accelerate reinforcement learning through auxiliary losses Derive spatial knowledge from auxiliary tasks: Depth prediction Local loop closure prediction Assess navigation skills through position decoding

SLIDE 8

Raia Hadsell - Learning to Navigate - 2018

v π

Agent training

CNN

v π

CNN

policy LSTM

Value and policy   are updated with estimate of policy gradient   given by the k-step advantage function A Advantage actor critic reinforcement learning

[Mnih, Badia et al (2015) “Asynchronous Methods for Deep Reinforcement Learning”]

Policy term:

rθ log π(at|st; θ)A(st, at; θV )

Agent observes state st and takes action at

SLIDE 9

Raia Hadsell - Learning to Navigate - 2018

v π

Navigation agent architectures

Hiddens rewardt-1

LSTM

CNN velocityt, actiont-1

v π

CNN

v π

CNN

policy LSTM policy LSTM

depth

Long Short-Term Memory (LSTM)

SLIDE 10

Raia Hadsell - Learning to Navigate - 2018

Results on large static mazes

Environment steps Reward at goal Importance of auxiliary tasks Environment steps Depth prediction as auxiliary task

utperforms using depth as inputs

SLIDE 11

Mirowski, Pascanu et al (2017), “Learning to Navigate in Complex Environments”

SLIDE 12

3D, first person environment
partially observed
procedural variations

… but it’s not real

SLIDE 13

SLIDE 14

Raia Hadsell - Learning to Navigate - 2018

Exploration

Multi-task prediction

f sensory data

Representation

Grounding in  neuroscience

Memory

One-shot navigation  in unseen environment

Real world

Modularity and   transfer learning

SLIDE 15

Raia Hadsell - Learning to Navigate - 2018

Can we solve navigation tasks in the real world?

Piotr Mirowski*, Matuhew Koichi Grimes, Mateusz Malinowski, Karl Moritz Hermann,   Keith Anderson, Denis Teplyashin, Karen Simonyan, Koray Kavukcuoglu,   Andrew Zisserman and Raia Hadsell

arxiv.org/abs/1804.00168

Learning to Navigate in Cities Without a Map

SLIDE 16

Raia Hadsell - Learning to Navigate - 2018

Can we solve navigation tasks in the real world?

Street View

SLIDE 17

Raia Hadsell - Learning to Navigate - 2018

Street View as an RL environment: StreetLearn

Google Maps graph Street View image RGB panoramic image  (we crop it and render at 84x84) Actions:  move to the next node,  turn left/right

SLIDE 18

Raia Hadsell - Learning to Navigate - 2018

New York, London, Paris

14,000 to 60,000 nodes (panoramas) per “city”, covering range of 3.5-5km
Discrete action space allows rotating in place and stepping to next node
Multi-city dataset and RL environment will be released later this year

SLIDE 19

Raia Hadsell - Learning to Navigate - 2018

The Courier Task

SLIDE 20

Raia Hadsell - Learning to Navigate - 2018

Test to get a black cab license in London
Candidates study for 3-4 years
Memorize 25,000 roads and 20,000 named locations
By the time they’ve passed the exam,

their hippocampuses are ‘significantly enlarged’.

The Knowledge

Woollett & Maguire. 2011. Acquiring ‘‘the Knowledge’’ of London’s Layout Drives Structural Brain Changes. Current Biology

SLIDE 21

SLIDE 22

Presentation Title — SPEAKER

SLIDE 23

Raia Hadsell - Learning to Navigate - 2018

The Courier Task

Random start and target
Navigation without a map
Reward shaped when close to goal (<200m)
Actions: rotate left, right, or step forward
Inputs for the agent at every time point t:

○ 84x84 RGB image observations ○ landmark-based goal description

SLIDE 24

Raia Hadsell - Learning to Navigate - 2018

[Mnih, Badia et al (2015) “Asynchronous Methods for Deep Reinforcement Learning”]

Architecture

SLIDE 25

Raia Hadsell - Learning to Navigate - 2018

Architecture

SLIDE 26

Raia Hadsell - Learning to Navigate - 2018

Successful learning on all 3 cities

Environment steps Environment steps Reward at goal New York City around NYU Central London

SLIDE 27

SLIDE 28

SLIDE 29

Raia Hadsell - Learning to Navigate - 2018

Examples of 1000-step episodes

Analysis of goal acquisition

Examples of value function for the same target

SLIDE 30

Raia Hadsell - Learning to Navigate - 2018

Generalization on new goal areas

Goal locations held-out during training  and landmark locations

SLIDE 31

Raia Hadsell - Learning to Navigate - 2018

Architecture

SLIDE 32

Raia Hadsell - Learning to Navigate - 2018

Given a sequence of cities (regions of NYC), compare the following

Multi-city modular transfer

Successful navigation in target city,  even though the convnet and policy LSTM are frozen   and only the goal LSTM is trained. Moreover, we note that the transfer success is correlated to number of cities seen during pre-training. single joint modular transfer

SLIDE 33

SLIDE 34

Learning to navigate in complex environments (ICLR 2017)

Piotr Mirowski*, Razvan Pascanu*, Fabio Viola, Hubert Soyer, Andy Ballard, Andrea Banino,  Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharsh Kumaran and Raia Hadsell

Learning to navigate in cities without a map (NIPS 2018)

Piotr Mirowski*, Matthew Koichi Grimes, Keith Anderson, Denis Teplyashin, Mateusz Malinowski,   Karl Moritz Hermann, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell

Many thanks to many collaborators!

www.deepmind.com www.raiahadsell.com

Learning to Navigate … at City Scale

Raia Hadsell

Where am I? Where am I going?

Navigation

Exploration

Representation

Memory

Real world

Exploration

Representation

Memory

Real world

Can we teach agents to explore partially observed environments?

Learning to Navigate in Complex Environments

Navigation mazes

+10 +1

Agent training

Navigation agent architectures

Results on large static mazes

Exploration

Representation

Memory

Real world

Can we solve navigation tasks in the real world?

Learning to Navigate in Cities Without a Map

Can we solve navigation tasks in the real world?

Street View

Street View as an RL environment: StreetLearn

New York, London, Paris

The Courier Task

The Knowledge

The Courier Task

Architecture

Architecture

Successful learning on all 3 cities

Analysis of goal acquisition

Generalization on new goal areas

Architecture

Multi-city modular transfer

Many thanks to many collaborators!

Can we teach agents to explore  partially observed environments?