Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions
Clara Cannon ccannon@cs.utexas.edu
Walk the Talk: Connecting Language, Knowledge, and Action in Route - - PowerPoint PPT Presentation
Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions Clara Cannon ccannon@cs.utexas.edu AGENDA INTRODUCTION MACRO ARCHITECTURE Modeling Route Instructions Representing Expected Views and Actions
Clara Cannon ccannon@cs.utexas.edu
○ Modeling Route Instructions ○ Representing Expected Views and Actions ○ Interleaving Action, Perception, and Modeling ○ Robustness to Errors and Ambiguities ○ Inferring Actions Implicit in Instructions
○ Implicit Action Inference Experiment
Senario: A host provided instructions for you to find an office in a building you have never visited. Most likely, the instruction set will be incomplete and you will have to infer certain steps in order to arrive at the correct destination. You are using your knowledge of language, understanding of spatial actions (i.e. “turn so that your back is facing the pink wall”), and a model of the environment to resolve ambiguities.
instructions
specifications
its environment
6 Modules: 1. Syntax Parser 2. Content Framer 3. Instruction Modeler 4. Executor 5. Robot Controller 6. View Description Matcher
Linguistic Grounding Spatial Grounding
imperative model
combine information across phrases and sentences
knowledge of environment and execute instructions in the context
sensory functions
against sensory observations and world models (expected model against world model)
word/statement to a model of the surface meaning as a nested attribute value matrix
word/statement by deleting punctuation, arbitrary text ordering, inflectional suffixes, and spelling variations
directly models verb-argument structure
representation of the surface meaning of an instruction element to an imperative model containing compound action specifications
prepositions in route instructions and knowledge of how perception and action depend on local spatial configuration in similar environments
a pose/orientation in the environment, given descriptions in the instructions
scenes along the route
location within the relative view of observer, and description of appearance and attributes
actions
○ TURN: changes an agent’s pose/orientation but preserves its location ○ TRAVEL: changes an agent’s location but preserves its pose ○ VERIFY: checks an observation against a description of an expected view ○ DECLARE-GOAL: terminated instruction following by assertion that the agent is at the desired destination
in route instructions by modeling which simple actions to take under which perceptual or cognitive conditions
specification
translate to pre-conditions, while-conditions, and post conditions
context and state of the following route instruction
the next
against sensory observations. It treats view description as constraints the observation stream must meet.
enough context to disambiguate
nearsted know synonym or abstract hypernym using WordNet
the constituent is ignored and the remaining clause is modeled
a sentence from a set of instructions, it will parse the others.
1. Route instructions contain redundant information 2. Essential information in route instructions is stated using a small variety of content frames for direction movements
knowledge
along and until parameters of a TRAVEL action
○ Pre: the path should be immediately in front and the chair should be in the front in the distance ○ Post: The chair will be local to the agent
may take exploratory actions to gain information or determine the location of a reference object
navigate with different maps and starting poses
exploratory action in order to orient itself with the environment according to the instruction
route instructions written by 6 human directors
with omissions)
establish baseline
hard to follow, 6: detailed, easy to follow)
Benefits of using VR environment: 1. All route directors had similar exposure to environment 2. Pertinent aspects of environment were known and repeatable 3. Directors learn environment through first person perspective as followers 4. MARCO can navigate same environment as people
from a starting location
reached the desired destination
lost
incorrect inference (i.e. incorrectly identifying the color blue)
participants, (2) full MARCO model, (3) MARCO w/o TURN inference, (4), MARCO w/o TRAVEL inference, (5) MARCO w/o TURN or TRAVEL inference
instructions
inference
expensively replicated than similar works
implicit actions when following natural language route instructions
communicating route information through unfamiliar large scale spaces
action sequencer or an algorithm reasoning on inferred route topology, generalizing methods to larger domains (i.e. cooking, first aid, etc), using similar evaluation techniques for other large scale instructional tasks