[PPT] - Walk the Talk: Connecting Language, Knowledge, and Action in Route PowerPoint Presentation

SLIDE 1

Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions

Clara Cannon ccannon@cs.utexas.edu

SLIDE 2

AGENDA

INTRODUCTION
MACRO ARCHITECTURE

○ Modeling Route Instructions ○ Representing Expected Views and Actions ○ Interleaving Action, Perception, and Modeling ○ Robustness to Errors and Ambiguities ○ Inferring Actions Implicit in Instructions

EVALUATION

○ Implicit Action Inference Experiment

CONCLUSION
DISCUSSION

SLIDE 3

Introduction

Senario: A host provided instructions for you to find an office in a building you have never visited. Most likely, the instruction set will be incomplete and you will have to infer certain steps in order to arrive at the correct destination. You are using your knowledge of language, understanding of spatial actions (i.e. “turn so that your back is facing the pink wall”), and a model of the environment to resolve ambiguities.

SLIDE 4

What is Marco?

An agent that follows free-form natural language route

instructions

Represents and executes a sequence of compound action

specifications

Infers implicit actions
Can perform, implicit, explicit, and exploratory actions in

its environment

Manually built and hand tuned

SLIDE 5

MARCO Architecture

6 Modules: 1. Syntax Parser 2. Content Framer 3. Instruction Modeler 4. Executor 5. Robot Controller 6. View Description Matcher

Linguistic Grounding Spatial Grounding

SLIDE 6

MACRO Architecture

An example of a parse tree, shows the transformation of text into an

imperative model

Syntax Parser: models surface structure of a word/statement
Content Framer: interprets surface meaning of a word/statement
Instruction Modeler: applies spatial and linguistic knowledge to

combine information across phrases and sentences

Executor: interleaves action and perceptions; acts to gain

knowledge of environment and execute instructions in the context

f the spatial model
Robot Controller: interface for the particular follower's motor and

sensory functions

View Description Matcher: checks symbolic view descriptions

against sensory observations and world models (expected model against world model)

SLIDE 7

Modeling Route Instructions

Syntax parser parses raw route instruction text
The content framer translates the surface structure of a

word/statement to a model of the surface meaning as a nested attribute value matrix

The content frame models the nested structure and sense of a

word/statement by deleting punctuation, arbitrary text ordering, inflectional suffixes, and spelling variations

Syntax parser uses probabilistic context-free grammar which

directly models verb-argument structure

Content framer gets word sense from WordNet

SLIDE 8

Modeling Route Instructions

Instruction modeler translates the content frame’s

representation of the surface meaning of an instruction element to an imperative model containing compound action specifications

Infers the model by applying knowledge of verbs and

prepositions in route instructions and knowledge of how perception and action depend on local spatial configuration in similar environments

SLIDE 9

Representing Expected Views and Actions

View description represents what the follower expects at

a pose/orientation in the environment, given descriptions in the instructions

Instructions tend to describe some distinctive attribute of some

scenes along the route

For each expected object, it models the object’s type,

location within the relative view of observer, and description of appearance and attributes

SLIDE 10

Representing Expected Views and Actions

Route instructions require at least four low level simple

actions

○ TURN: changes an agent’s pose/orientation but preserves its location ○ TRAVEL: changes an agent’s location but preserves its pose ○ VERIFY: checks an observation against a description of an expected view ○ DECLARE-GOAL: terminated instruction following by assertion that the agent is at the desired destination

SLIDE 11

Representing Expected Views and Actions

Compound action specifications capture the commands

in route instructions by modeling which simple actions to take under which perceptual or cognitive conditions

Each clause is interpreted as a compound action

specification

Adverbs, verb objects, and prepositional phrases

translate to pre-conditions, while-conditions, and post conditions

SLIDE 12

Interleaving Action, Perception, and Modeling

The executor sequences simple actions given the environment

context and state of the following route instruction

Executes each compound action specification before moving to

the next

The robot controller executes simple actions
The view description matcher checks symbolic view descriptors

against sensory observations. It treats view description as constraints the observation stream must meet.

Defers handling ambiguity until the environment can provide

enough context to disambiguate

SLIDE 13

Robustness to Errors and Ambiguities

When MARCO does not know a word, it searches for its

nearsted know synonym or abstract hypernym using WordNet

If the content framer encounters a constituent it cannot model,

the constituent is ignored and the remaining clause is modeled

Similar strategy applies to the parser. If the parser cannot parse

a sentence from a set of instructions, it will parse the others.

Argument for techniques working:

1. Route instructions contain redundant information 2. Essential information in route instructions is stated using a small variety of content frames for direction movements

SLIDE 14

Inferring Actions Implicit in Instructions

Implicit actions are inferred using linguistic and spatial

knowledge

Ex: “Go down the hall to the chair”
The language model interprets the phrase structure as

along and until parameters of a TRAVEL action

MACRO infers conditions of the TRAVEL action as

○ Pre: the path should be immediately in front and the chair should be in the front in the distance ○ Post: The chair will be local to the agent

SLIDE 15

Inferring Actions Implicit in Instructions

If the pre and post condition are not met, the executor

may take exploratory actions to gain information or determine the location of a reference object

The figure shows how an instruction is applied to

navigate with different maps and starting poses

In some scenarios, the agent must perform an

exploratory action in order to orient itself with the environment according to the instruction

SLIDE 16

Evaluation

Evaluated MACRO in 3 environments with corpus of

route instructions written by 6 human directors

Corpus consists of 786 route instruction texts (682

with omissions)

36 human subjects followed route instructors to

establish baseline

Used desktop virtual reality environment
Text route instructions were ranked 1-6 (1: vague,

hard to follow, 6: detailed, easy to follow)

SLIDE 17

Evaluation

Benefits of using VR environment: 1. All route directors had similar exposure to environment 2. Pertinent aspects of environment were known and repeatable 3. Directors learn environment through first person perspective as followers 4. MARCO can navigate same environment as people

SLIDE 18

Evaluation

For testing, gave the agent route instructions via text

from a starting location

Success was determined by whether or not the agent

reached the desired destination

Did not account for speed of arrival
No explanation for what happens when an agent gets

lost

No account of what happens when an agent makes an

incorrect inference (i.e. incorrectly identifying the color blue)

SLIDE 19

Implicit Action Inference Experiment

Results for 5 types of follower: (1) human

participants, (2) full MARCO model, (3) MARCO w/o TURN inference, (4), MARCO w/o TRAVEL inference, (5) MARCO w/o TURN or TRAVEL inference

Humans were able to find the destination with an
verall mean success rate of 69%
MARCO successfully followed 61% of route

instructions

TURN inference is more valuable than TRAVEL

inference

SLIDE 20

Conclusion

Authors claim this experiment is more easily and less

expensively replicated than similar works

This paper uses knowledge of language and space to infer

implicit actions when following natural language route instructions

Contributes an assessment of human performance for

communicating route information through unfamiliar large scale spaces

Future work includes: replacing executor algorithm with a full

action sequencer or an algorithm reasoning on inferred route topology, generalizing methods to larger domains (i.e. cooking, first aid, etc), using similar evaluation techniques for other large scale instructional tasks

SLIDE 21

Walk the Talk: Connecting Language, Knowledge, and Action in Route - - PowerPoint PPT Presentation

Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions

AGENDA

Introduction

What is Marco?

MARCO Architecture

MACRO Architecture

Modeling Route Instructions

Modeling Route Instructions

Representing Expected Views and Actions

Representing Expected Views and Actions

Representing Expected Views and Actions

Interleaving Action, Perception, and Modeling

Robustness to Errors and Ambiguities

Inferring Actions Implicit in Instructions

Inferring Actions Implicit in Instructions

Evaluation

Evaluation

Evaluation

Implicit Action Inference Experiment

Conclusion

Discussion!