Learning to Navigate at City Scale Raia Hadsell Senior Research - - PowerPoint PPT Presentation

learning to navigate at city scale
SMART_READER_LITE
LIVE PREVIEW

Learning to Navigate at City Scale Raia Hadsell Senior Research - - PowerPoint PPT Presentation

Learning to Navigate at City Scale Raia Hadsell Senior Research Scientist [BBH Brazil for Renault / Art: Pedro Utzeri] Navigation Where am I going? Where am I? Where did I start? How distant is A from B? What is the shortest path from A


slide-1
SLIDE 1

[BBH Brazil for Renault / Art: Pedro Utzeri]

Learning to Navigate … at City Scale

Raia Hadsell

Senior Research Scientist

slide-2
SLIDE 2

Where am I? Where am I going?

Where did I start? How distant is A from B? What is the shortest path from A to B? Have I been here before? How long until we get there?

Navigation

slide-3
SLIDE 3

Raia Hadsell - Learning to Navigate - 2018

Exploration

Multi-task prediction

  • f sensory data

Representation

Grounding in
 neuroscience

Memory

One-shot navigation
 in unseen environment

Real world

Modularity and 
 transfer learning

slide-4
SLIDE 4

Raia Hadsell - Learning to Navigate - 2018

Exploration

Multi-task prediction

  • f sensory data

Representation

Grounding in
 neuroscience

Memory

One-shot navigation
 in unseen environment

Real world

Modularity and 
 transfer learning

slide-5
SLIDE 5

Raia Hadsell - Learning to Navigate - 2018

Can we teach agents to explore
 partially observed environments?

Piotr Mirowski*, Razvan Pascanu*, Fabio Viola, Huberu Soyer, Andy Ballard, Andrea Banino,
 Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharsh Kumaran and Raia Hadsell

arxiv.org/abs/1602.01783 (ICLR 2017)

Learning to Navigate in Complex Environments

[MIT News / Photo: Mark Ostow]

slide-6
SLIDE 6

Raia Hadsell - Learning to Navigate - 2018

Navigation mazes

+10 +1

Within episode: Fixed goal (static or randomly changing b/w episodes)
 Random respawns

[Beatuie et al (2016)
 “DeepMind Lab”,
 github.com/deepmind/lab]

slide-7
SLIDE 7

Raia Hadsell - Learning to Navigate - 2018

Given sparse rewards… … explore and learn spatial knowledge

Accelerate reinforcement learning through auxiliary losses Derive spatial knowledge from auxiliary tasks: Depth prediction Local loop closure prediction Assess navigation skills through position decoding

slide-8
SLIDE 8

Raia Hadsell - Learning to Navigate - 2018

v π

Agent training

CNN

v π

CNN

policy LSTM

Value and policy 
 are updated with estimate of policy gradient 
 given by the k-step advantage function A Advantage actor critic reinforcement learning

[Mnih, Badia et al (2015) “Asynchronous Methods for Deep Reinforcement Learning”]

Policy term:

rθ log π(at|st; θ)A(st, at; θV )

Agent observes state st and takes action at

slide-9
SLIDE 9

Raia Hadsell - Learning to Navigate - 2018

v π

Navigation agent architectures

Hiddens rewardt-1

LSTM

CNN velocityt, actiont-1

v π

CNN

v π

CNN

policy LSTM policy LSTM

depth

Long Short-Term Memory (LSTM)

slide-10
SLIDE 10

Raia Hadsell - Learning to Navigate - 2018

Results on large static mazes

Environment steps Reward at goal Importance of auxiliary tasks Environment steps Depth prediction as auxiliary task

  • utperforms using depth as inputs
slide-11
SLIDE 11

Mirowski, Pascanu et al (2017), “Learning to Navigate in Complex Environments”

slide-12
SLIDE 12
  • 3D, first person environment
  • partially observed
  • procedural variations

… but it’s not real

slide-13
SLIDE 13
slide-14
SLIDE 14

Raia Hadsell - Learning to Navigate - 2018

Exploration

Multi-task prediction

  • f sensory data

Representation

Grounding in
 neuroscience

Memory

One-shot navigation
 in unseen environment

Real world

Modularity and 
 transfer learning

slide-15
SLIDE 15

Raia Hadsell - Learning to Navigate - 2018

Can we solve navigation tasks in the real world?

Piotr Mirowski*, Matuhew Koichi Grimes, Mateusz Malinowski, Karl Moritz Hermann, 
 Keith Anderson, Denis Teplyashin, Karen Simonyan, Koray Kavukcuoglu, 
 Andrew Zisserman and Raia Hadsell

arxiv.org/abs/1804.00168

Learning to Navigate in Cities Without a Map

slide-16
SLIDE 16

Raia Hadsell - Learning to Navigate - 2018

Can we solve navigation tasks in the real world?

Street View

slide-17
SLIDE 17

Raia Hadsell - Learning to Navigate - 2018

Street View as an RL environment: StreetLearn

Google Maps graph Street View image RGB panoramic image
 (we crop it and render at 84x84) Actions:
 move to the next node,
 turn left/right

slide-18
SLIDE 18

Raia Hadsell - Learning to Navigate - 2018

New York, London, Paris

  • 14,000 to 60,000 nodes (panoramas) per “city”, covering range of 3.5-5km
  • Discrete action space allows rotating in place and stepping to next node
  • Multi-city dataset and RL environment will be released later this year
slide-19
SLIDE 19

Raia Hadsell - Learning to Navigate - 2018

The Courier Task

slide-20
SLIDE 20

Raia Hadsell - Learning to Navigate - 2018

  • Test to get a black cab license in London
  • Candidates study for 3-4 years
  • Memorize 25,000 roads and 20,000 named locations
  • By the time they’ve passed the exam, 


their hippocampuses are ‘significantly enlarged’.

The Knowledge

Woollett & Maguire. 2011. Acquiring ‘‘the Knowledge’’ of London’s Layout Drives Structural Brain Changes. Current Biology

slide-21
SLIDE 21
slide-22
SLIDE 22

Presentation Title — SPEAKER

slide-23
SLIDE 23

Raia Hadsell - Learning to Navigate - 2018

The Courier Task

  • Random start and target
  • Navigation without a map
  • Reward shaped when close to goal (<200m)
  • Actions: rotate left, right, or step forward
  • Inputs for the agent at every time point t:

○ 84x84 RGB image observations ○ landmark-based goal description

slide-24
SLIDE 24

Raia Hadsell - Learning to Navigate - 2018

[Mnih, Badia et al (2015) “Asynchronous Methods for Deep Reinforcement Learning”]

Architecture

slide-25
SLIDE 25

Raia Hadsell - Learning to Navigate - 2018

Architecture

slide-26
SLIDE 26

Raia Hadsell - Learning to Navigate - 2018

Successful learning on all 3 cities

Environment steps Environment steps Reward at goal New York City around NYU Central London

slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29

Raia Hadsell - Learning to Navigate - 2018

Examples of 1000-step episodes

Analysis of goal acquisition

Examples of value function for the same target

slide-30
SLIDE 30

Raia Hadsell - Learning to Navigate - 2018

Generalization on new goal areas

Goal locations held-out during training
 and landmark locations

slide-31
SLIDE 31

Raia Hadsell - Learning to Navigate - 2018

Architecture

slide-32
SLIDE 32

Raia Hadsell - Learning to Navigate - 2018

Given a sequence of cities (regions of NYC), compare the following

Multi-city modular transfer

Successful navigation in target city,
 even though the convnet and policy LSTM are frozen 
 and only the goal LSTM is trained. Moreover, we note that the transfer success is correlated to number of cities seen during pre-training. single joint modular transfer

slide-33
SLIDE 33
slide-34
SLIDE 34
  • Learning to navigate in complex environments (ICLR 2017)


Piotr Mirowski*, Razvan Pascanu*, Fabio Viola, Hubert Soyer, Andy Ballard, Andrea Banino,
 Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharsh Kumaran and Raia Hadsell

  • Learning to navigate in cities without a map (NIPS 2018)


Piotr Mirowski*, Matthew Koichi Grimes, Keith Anderson, Denis Teplyashin, Mateusz Malinowski, 
 Karl Moritz Hermann, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell

Many thanks to many collaborators!

www.deepmind.com www.raiahadsell.com