Vision-Language Navigation with Self-Supervised Auxiliary Reasoning - PowerPoint PPT Presentation

Dec 27, 2023 •338 likes •559 views

MONASH INFORMATION TECHNOLOGY Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang Outline 1. Embodied Navigation 2. Vision-language Navigation Task 3. Related Works 4.

MONASH INFORMATION TECHNOLOGY Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang
Outline 1. Embodied Navigation 2. Vision-language Navigation Task 3. Related Works 4. Our Methods 5. Conclusion
Embodied Navigation Problem 1. datasets providing 3D assets with semantic annotations 2. simulators render these assets and simulate an embodied agent 3. tasks that define evaluable problems that enable us to benchmark scientific progress
Synthetic Image / Real Image Advantage Advantage • • More data Close to real application • Faster rendering Disadvantage • Disadvantage Less data • • Limited application Easily Overfitting Transfer: Sim-Real Joint Reinforcement Transfer for 3D Indoor Navigation (by Zhu et al. CVPR 2019)
Matterport3D
Habitat Simulator A flexible, high-performance 3D simulator with configurable agents, multiple sensors, and generic 3D dataset handling (with built-in support for MatterPort3D, Gibson, Replica, and other datasets). Advantage: • Real image • Fast rendering • Continuous action space Disadvantage: • Low rendering quality
PointGoal Task
ObjectGoal Task
Vision Language Navigation (VLN) Task Room-to-room (R2R) dataset • 90 houses • 7k trajectories • 21k instructions Computer Vision + Natural Language Processing • Natural Language + Reinforcement Learning • More detailed description • Require complex scene understanding
VLN baseline (seq-to-seq) Disadvantage: 1. Supervised learning is easily overfitting 2. Does not sufficiently exploit the panoramic view 3. The action space is redundant 4. Training-testing domain gap
Speaker-Follower Model Speaker: trajectory to instruction Follower: instruction to trajectory
Speaker-Follower Model
Reinforced Cross-Modal Matching (RCM) Advantage: 1. Use cross-modal attention 2. Introduce RL+ supervised learning
Environmental Dropout (Envdrop)
Self-Supervised Auxiliary Reasoning Tasks Please turn left and walk through the living room. Exit the room and Rich information to explore: turn right into the bedroom. • Semantics of the route 𝑕𝑝𝑏𝑚 • Navigation Progress • Vision Language Consistency 𝑢 2 • Room Structure 𝑢 1 Navigation Node 𝑢 0 Navigation Edge Feasible Edge
Self-Supervised Auxiliary Reasoning Tasks Please turn left and walk through the living room. Exit the room and We require the agent to: turn right into the bedroom. • Interpret its actions 𝑕𝑝𝑏𝑚 • Reason about the past • Align Vision-language explicitly 𝑢 2 • Predict the future 𝑢 1 Navigation Node 𝑢 0 Navigation Edge Feasible Edge
Self-Supervised Auxiliary Reasoning Tasks
Self-Supervised Auxiliary Reasoning Tasks
Self-Supervised Auxiliary Reasoning Tasks
Self-Supervised Auxiliary Reasoning Tasks Demo Code
Thank You

Recommend

VFW Auxiliary LOCAL AUXILIARY TREASURERS AND TRUSTEES TRAINING Presented By VFW Auxiliary

VFW Auxiliary LOCAL AUXILIARY TREASURERS AND TRUSTEES TRAINING Presented By VFW Auxiliary National Headquarters Staff George Martin, Director of Accounting VFW Auxiliary Headquarters Phone (816) 561 8655 406 West 34 th Street Toll Free

1.12k views • 85 slides

Office of Auxiliary Services Presented by Dr. Gregory A. McCord Chief Auxiliary Services Officer

Beaufort County School District Office of Auxiliary Services Presented by Dr. Gregory A. McCord Chief Auxiliary Services Officer November 28, 2017 Auxiliary Services..its a team effort!!! Dr. Gregory A. McCord, Chief Auxiliary Services

190 views • 15 slides

Navigation, Gravitation and Navigation, Gravitation and Navigation, Gravitation and Navigation,

W.W. Hansen Experimental Physics Laboratory, Stanford, CA 94305 Navigation, Gravitation and Navigation, Gravitation and Navigation, Gravitation and Navigation, Gravitation and Cosmology with Cold Atom Sensors Cosmology with Cold Atom Sensors

729 views • 24 slides

VFW Auxiliary LOCAL AUXILIARY TREASURERS AND TRUSTEES TRAINING Presented By George Martin

VFW Auxiliary LOCAL AUXILIARY TREASURERS AND TRUSTEES TRAINING Presented By George Martin Director of Accounting VFW Auxiliary Headquarters Phone (816) 561 8655 406 West 34 th Street Toll Free (866) 299 1286 10 th Floor Fax (816) 931

461 views • 30 slides

VFW Auxiliary INVESTMENTS HELD AT VFW AUXILIARY NATIONAL HEADQUARTERS Presented By George

VFW Auxiliary INVESTMENTS HELD AT VFW AUXILIARY NATIONAL HEADQUARTERS Presented By George Martin Director of Accounting VFW Auxiliary Headquarters Phone (816) 561 8655 406 West 34 th Street Toll Free (866) 299 1286 10 th Floor Fax

392 views • 8 slides

Haptic Navigation in Mobile Contexts Agenda What is Haptic Navigation? Advantages of

Haptic Navigation in Mobile Contexts Agenda What is Haptic Navigation? Advantages of Haptic Navigation Challenges of Haptic Navigation Examples of (Mobile) Haptic Navigation Devices Hanna Venesvirta 29/10/2008 2 Seminar

151 views • 11 slides

React Native Navigation Screens, moving, parameters React Navigation React Navigation is not

React Native Navigation Screens, moving, parameters React Navigation React Navigation is not from facebook; its created by the React Native community Uses platform-specific native primitives. Must install the React Navigation

628 views • 34 slides

React Native Navigation: Tabs 1 Tab Navigation the most common style of navigation in

React Native Navigation: Tabs 1 Tab Navigation the most common style of navigation in mobile apps is tab-based navigation. This can be tabs on the bottom of the screen or on the top below the header or even instead of a header

976 views • 17 slides

Private Aids to Navigation Training Guide APRIL, 2013 November 2009 District 1SR Navigation

United States Coast Guard Auxiliary District 1SR Private Aids to Navigation Training Guide APRIL, 2013 November 2009 District 1SR Navigation Systems Department District 1SR - PATON Training Guide PRIVATE AIDS TO NAVIGATION

499 views • 5 slides

Self-Supervised Feature Learning by Learning to Spot Artifacts Wonbin Kim Self-Supervised

Self-Supervised Feature Learning by Learning to Spot Artifacts Wonbin Kim Self-Supervised Learning To exploit different labelings that are freely available besides or within visual data, and to use them as intrinsic reward signals to

387 views • 24 slides

OFDM Signal Navigation NAV 2008 2 OFDM Signal Navigation NAV 2008 3 OFDM Signal Navigation

OFDM Signal Navigation NAV 2008 2 OFDM Signal Navigation NAV 2008 3 OFDM Signal Navigation NAV 2008 4 OFDM symbol c n,Kmin x Serial-parallel converter Symbol mapping Transported c n,Kmin+1 data x + c n,Kmax K min sequential

912 views • 22 slides

HOUSING NAVIGATION CENTER https://www.hayward-ca.gov/content/hayward- housing-navigation-center

CITY OF HAYWARD HOUSING NAVIGATION CENTER https://www.hayward-ca.gov/content/hayward- housing-navigation-center Hayward Housing Navigation Center Hayward City Council approved a navigation center for individuals experiencing homelessness,

213 views • 17 slides

Spatial navigation in humans Recap: navigation strategies and spatial representations Spatial

3/12/17 Spatial navigation in humans Recap: navigation strategies and spatial representations Spatial navigation with immersive virtual reality (VENLab) Do we construct a metric cognitive map? Importance of visual landmarks in navigation!

197 views • 5 slides

United States Coast Guard Auxiliary [AV Version] WEB-BASED PRIVATE AID TO NAVIGATION SYSTEM

WEB-BASED PRIVATE AID TO NAVIGATION SYSTEM TRAINING GUIDE United States Coast Guard Auxiliary [AV Version] WEB-BASED PRIVATE AID TO NAVIGATION SYSTEM Introduction Since it is always possible that, with the long intervals between use, many AVs

619 views • 41 slides

Dependency Parse Dependency Tags aux auxiliary auxpass passive auxiliary cop

Dependency Parse Dependency Tags aux auxiliary auxpass passive auxiliary cop -- copula conj conjunct cc coordination ref -- referent subj subject nsubj nominal subject nsubjpass

1.18k views • 69 slides

Term Rep placement Deep rep lacement Auxiliary constructor i Auxiliary constructor i module

Term Rep placement Deep rep lacement Auxiliary constructor i Auxiliary constructor i module Tree-drepl d l T d l imports Tree-syntax Deep replacement: repla ace only occurrences close exports A bottom-up transformer that to the

183 views • 14 slides

UAV prototype for terrain dose-rate mapping Eng. Diego Garca (CNEA-UTN)

UAV prototype for terrain dose-rate mapping Eng. Diego Garca (CNEA-UTN) diegogarcia@cae.cnea.gov.ar Eng. Lucio Martinez Garbino (CNEA-UTN) luciojmg@cae.cnea.gov.ar 1 Origins of the project AGE CNEA-CAE requires a surface dose rate map

241 views • 13 slides

Architectural Analysis of Microsoft Dynamics NAV Tom Hvitved hvitved@diku.dk Department of

Architectural Analysis of Microsoft Dynamics NAV Tom Hvitved hvitved@diku.dk Department of Computer Science University of Copenhagen 3gERP Workshop, November 18th 2008 Outline Introduction 1 Motivation 2 Architectural Analysis 3 Table

708 views • 31 slides

Ruby Monstas Session 25 Agenda Recap Layouts Exercises Recap

Ruby Monstas Session 25 Agenda Recap Layouts Exercises Recap https://docs.google.com/presentation/d/1rxHrB4wLk7TMMMtIwC9azwNgnYQAFrae620IxjLHygI/edit#slide=id.ge8e058835_0_0 DIV element Text around<div> Outer DIV element

531 views • 17 slides

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling Tsu-Jui Fu Xin Wang

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling Tsu-Jui Fu Xin Wang Matthew Peterson Scott Grafton Miguel Eckstein William Wang UC Santa Barbara Vision-and-Language Navigation (VLN) Achieve the goal based on the

387 views • 9 slides

Utilizing Precision Navigation and Autonomy to Support Combat Diver CONOPS 1 STIDD and Greensea

Utilizing Precision Navigation and Autonomy to Support Combat Diver CONOPS 1 STIDD and Greensea Partnership STIDD EXPERTS IN DIVER PROPULSION Producing military submersibles since 1998 Most widely used two-man underwater mobility

763 views • 26 slides

Succinct Trie Indexes Made Practical Huanchen Zhang David G. Andersen, Michael Kaminsky, Andrew

Succinct Trie Indexes Made Practical Huanchen Zhang David G. Andersen, Michael Kaminsky, Andrew Pavlo, Kimberly Keeton DRAM price wont fall forever Price Year Memory-efficient data structures are helpful Smaller data structures More

1.49k views • 75 slides

Mapping FOSDEM for accessibility The team Laura https://codingcatgirl.de/ Created

Mapping FOSDEM for accessibility The team Laura https://codingcatgirl.de/ Created C3NAV Shin Ice Linux Sysadmin (Germany) Fosdem volunteer for different years Johan Van de Wauw Developing sensor solutions at

434 views • 17 slides

Motion Planning for Example Finding Evasive Targets in a Cluttered Environment cleared region

Motion Planning for Example Finding Evasive Targets in a Cluttered Environment cleared region robot robots visibility region + hidin hiding region 1 2 3 Map Building 4 5 6 1 2 Problem Assumptions Target is unpredictable

184 views • 7 slides