Cognitive Mapping and Planning for Visual Navigation
Saurabh Gupta1,2 James Davidson2 Sergey Levine1,2 Rahul Sukthankar2 Jitendra Malik1,2
1UC Berkeley 2Google
Presented by Kent Sommer
Korea Advanced Institute of Science and Technology
Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 - - PowerPoint PPT Presentation
Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 , 2 James Davidson 2 Sergey Levine 1 , 2 Rahul Sukthankar 2 Jitendra Malik 1 , 2 1 UC Berkeley 2 Google Presented by Kent Sommer Korea Advanced Institute of Science and
1UC Berkeley 2Google
Korea Advanced Institute of Science and Technology
1
Robot equipped with a first person camera Dropped into a novel environment Navigate in the environment Robot Navigation in novel envionments 2
3
4
LSD-SLAM RRT 5
Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning, Zhu et al., ICRA 2017 End-to-End Training of Deep Visuomotor Polocies, Levine et al., JMLR 2015 Human-level control through deep reinforcement learning, Mnih et al., Nature 2014
Context
xt xt−M
CNN Q
DQN
Context
xt
CNN Q
DRQN
Context Memory Q
xt
CNN
MQN
Context Memory
xt
CNN Q
MRQN
Context Memory
xt
CNN Q
FRMQN Control of Memory, Active Perception, and Action in Minecraft, Oh et al., IMCL 2016 6
generic siamese layers scene-specific layers
W
target
policy (4) value (1) ResNet-50 ResNet-50 fc (512) fc (512) fc (512) fc (512) 224x224x3 224x224x3 embedding fusion policy (4) value (1) policy (4) value (1)
...
scene #1 scene #2 scene #N
7
8
90o Egomotion
Update multiscale belief
coordinate frame Multiscale belief of the world in egocentric coordinate frame 90o Egomotion 90o
90o
Multiscale belief about the world in egocentric coordinate frame
Fully Connected Layers with ReLUs. Encoder Network (ResNet 50) Decoder Network with residual connections 90o Egomotion Differentiable Warping Combine Confidence and belief about world from previous time step. Confidence and belief about world from previous time step, warped using egomotion. Updated confidence and belief about world.
Past Frames and Egomotion
10
s′ P(s′|s, a)Vn(s′)
1Aviv Tamar et al. “Value iteration networks”.
11
s′ P(s′|s, a)Vn(s′)
Trainable using simulated data 12
13
14
3
2St´
3Image from: John Schulman´
15
16
Geometric Results: Mean distance to goal location, 75th percentile distance to goal and success rate after executing the policy for 39 time steps. 17
Semantic Results: Mean distance to goal location, 75th percentile distance to goal and success rate after executing the policy for 39 time steps. 18
Agents exhibit backtracking behavior! 19
Missed Thrashing Tight 20
Video Demonstration 21
22
22