Walk the Talk: Connecting Language, Knowledge, and Action in Route - - PowerPoint PPT Presentation

walk the talk connecting language knowledge and action in
SMART_READER_LITE
LIVE PREVIEW

Walk the Talk: Connecting Language, Knowledge, and Action in Route - - PowerPoint PPT Presentation

Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions Clara Cannon ccannon@cs.utexas.edu AGENDA INTRODUCTION MACRO ARCHITECTURE Modeling Route Instructions Representing Expected Views and Actions


slide-1
SLIDE 1

Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions

Clara Cannon ccannon@cs.utexas.edu

slide-2
SLIDE 2

AGENDA

  • INTRODUCTION
  • MACRO ARCHITECTURE

○ Modeling Route Instructions ○ Representing Expected Views and Actions ○ Interleaving Action, Perception, and Modeling ○ Robustness to Errors and Ambiguities ○ Inferring Actions Implicit in Instructions

  • EVALUATION

○ Implicit Action Inference Experiment

  • CONCLUSION
  • DISCUSSION
slide-3
SLIDE 3

Introduction

Senario: A host provided instructions for you to find an office in a building you have never visited. Most likely, the instruction set will be incomplete and you will have to infer certain steps in order to arrive at the correct destination. You are using your knowledge of language, understanding of spatial actions (i.e. “turn so that your back is facing the pink wall”), and a model of the environment to resolve ambiguities.

slide-4
SLIDE 4

What is Marco?

  • An agent that follows free-form natural language route

instructions

  • Represents and executes a sequence of compound action

specifications

  • Infers implicit actions
  • Can perform, implicit, explicit, and exploratory actions in

its environment

  • Manually built and hand tuned
slide-5
SLIDE 5

MARCO Architecture

6 Modules: 1. Syntax Parser 2. Content Framer 3. Instruction Modeler 4. Executor 5. Robot Controller 6. View Description Matcher

Linguistic Grounding Spatial Grounding

slide-6
SLIDE 6

MACRO Architecture

  • An example of a parse tree, shows the transformation of text into an

imperative model

  • Syntax Parser: models surface structure of a word/statement
  • Content Framer: interprets surface meaning of a word/statement
  • Instruction Modeler: applies spatial and linguistic knowledge to

combine information across phrases and sentences

  • Executor: interleaves action and perceptions; acts to gain

knowledge of environment and execute instructions in the context

  • f the spatial model
  • Robot Controller: interface for the particular follower's motor and

sensory functions

  • View Description Matcher: checks symbolic view descriptions

against sensory observations and world models (expected model against world model)

slide-7
SLIDE 7

Modeling Route Instructions

  • Syntax parser parses raw route instruction text
  • The content framer translates the surface structure of a

word/statement to a model of the surface meaning as a nested attribute value matrix

  • The content frame models the nested structure and sense of a

word/statement by deleting punctuation, arbitrary text ordering, inflectional suffixes, and spelling variations

  • Syntax parser uses probabilistic context-free grammar which

directly models verb-argument structure

  • Content framer gets word sense from WordNet
slide-8
SLIDE 8

Modeling Route Instructions

  • Instruction modeler translates the content frame’s

representation of the surface meaning of an instruction element to an imperative model containing compound action specifications

  • Infers the model by applying knowledge of verbs and

prepositions in route instructions and knowledge of how perception and action depend on local spatial configuration in similar environments

slide-9
SLIDE 9

Representing Expected Views and Actions

  • View description represents what the follower expects at

a pose/orientation in the environment, given descriptions in the instructions

  • Instructions tend to describe some distinctive attribute of some

scenes along the route

  • For each expected object, it models the object’s type,

location within the relative view of observer, and description of appearance and attributes

slide-10
SLIDE 10

Representing Expected Views and Actions

  • Route instructions require at least four low level simple

actions

○ TURN: changes an agent’s pose/orientation but preserves its location ○ TRAVEL: changes an agent’s location but preserves its pose ○ VERIFY: checks an observation against a description of an expected view ○ DECLARE-GOAL: terminated instruction following by assertion that the agent is at the desired destination

slide-11
SLIDE 11

Representing Expected Views and Actions

  • Compound action specifications capture the commands

in route instructions by modeling which simple actions to take under which perceptual or cognitive conditions

  • Each clause is interpreted as a compound action

specification

  • Adverbs, verb objects, and prepositional phrases

translate to pre-conditions, while-conditions, and post conditions

slide-12
SLIDE 12

Interleaving Action, Perception, and Modeling

  • The executor sequences simple actions given the environment

context and state of the following route instruction

  • Executes each compound action specification before moving to

the next

  • The robot controller executes simple actions
  • The view description matcher checks symbolic view descriptors

against sensory observations. It treats view description as constraints the observation stream must meet.

  • Defers handling ambiguity until the environment can provide

enough context to disambiguate

slide-13
SLIDE 13

Robustness to Errors and Ambiguities

  • When MARCO does not know a word, it searches for its

nearsted know synonym or abstract hypernym using WordNet

  • If the content framer encounters a constituent it cannot model,

the constituent is ignored and the remaining clause is modeled

  • Similar strategy applies to the parser. If the parser cannot parse

a sentence from a set of instructions, it will parse the others.

  • Argument for techniques working:

1. Route instructions contain redundant information 2. Essential information in route instructions is stated using a small variety of content frames for direction movements

slide-14
SLIDE 14

Inferring Actions Implicit in Instructions

  • Implicit actions are inferred using linguistic and spatial

knowledge

  • Ex: “Go down the hall to the chair”
  • The language model interprets the phrase structure as

along and until parameters of a TRAVEL action

  • MACRO infers conditions of the TRAVEL action as

○ Pre: the path should be immediately in front and the chair should be in the front in the distance ○ Post: The chair will be local to the agent

slide-15
SLIDE 15

Inferring Actions Implicit in Instructions

  • If the pre and post condition are not met, the executor

may take exploratory actions to gain information or determine the location of a reference object

  • The figure shows how an instruction is applied to

navigate with different maps and starting poses

  • In some scenarios, the agent must perform an

exploratory action in order to orient itself with the environment according to the instruction

slide-16
SLIDE 16

Evaluation

  • Evaluated MACRO in 3 environments with corpus of

route instructions written by 6 human directors

  • Corpus consists of 786 route instruction texts (682

with omissions)

  • 36 human subjects followed route instructors to

establish baseline

  • Used desktop virtual reality environment
  • Text route instructions were ranked 1-6 (1: vague,

hard to follow, 6: detailed, easy to follow)

slide-17
SLIDE 17

Evaluation

Benefits of using VR environment: 1. All route directors had similar exposure to environment 2. Pertinent aspects of environment were known and repeatable 3. Directors learn environment through first person perspective as followers 4. MARCO can navigate same environment as people

slide-18
SLIDE 18

Evaluation

  • For testing, gave the agent route instructions via text

from a starting location

  • Success was determined by whether or not the agent

reached the desired destination

  • Did not account for speed of arrival
  • No explanation for what happens when an agent gets

lost

  • No account of what happens when an agent makes an

incorrect inference (i.e. incorrectly identifying the color blue)

slide-19
SLIDE 19

Implicit Action Inference Experiment

  • Results for 5 types of follower: (1) human

participants, (2) full MARCO model, (3) MARCO w/o TURN inference, (4), MARCO w/o TRAVEL inference, (5) MARCO w/o TURN or TRAVEL inference

  • Humans were able to find the destination with an
  • verall mean success rate of 69%
  • MARCO successfully followed 61% of route

instructions

  • TURN inference is more valuable than TRAVEL

inference

slide-20
SLIDE 20

Conclusion

  • Authors claim this experiment is more easily and less

expensively replicated than similar works

  • This paper uses knowledge of language and space to infer

implicit actions when following natural language route instructions

  • Contributes an assessment of human performance for

communicating route information through unfamiliar large scale spaces

  • Future work includes: replacing executor algorithm with a full

action sequencer or an algorithm reasoning on inferred route topology, generalizing methods to larger domains (i.e. cooking, first aid, etc), using similar evaluation techniques for other large scale instructional tasks

slide-21
SLIDE 21

Discussion!