Situated Mapping of Sequential Instructions to Actions with - PowerPoint PPT Presentation

Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation Alane Suhr and Yoav Artzi

Executing Context- Dependent Instructions Task: map a sequence of instructions to actions Existing Work Today Symbolic System Actions Representations Learning from Modeling Context Exploration

Executing a Sequence of Instructions 2 6 7 1 3 4 5 Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

Problem Setup • Task: follow sequence of instructions • Learning from instructions and corresponding world states Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

Related Work • Context-dependent language understanding • Static environments Miller et al. 1996, Zettlemoyer and Collins 2009, Suhr et al. 2018 (e.g., large database) Long et al. 2016, • Environments that Guu et al. 2017, Fried change over time while et al. 2018 instructions are given Chen and Mooney 2011, Chen 2012, Artzi and • Following instructions in isolation; Zettlemoyer 2013, Artzi et al. 2014, Andreas and varying levels of supervision Klein 2015, Bisk et al. 2016, Misra et al. 2017

Today 1. Attention-based model for generating sequences of system actions that modify the environment 2. Exploration-based learning procedure that avoids biases learned early in training

System Actions 1 2 3 4 5 6 7 Mix it pop 2; • Each beaker is a pop 2; stack pop 2; • Actions are pop push 2 brown; push 2 brown; and push push 2 brown;

Meaning Representation 1 2 3 4 5 6 7 Mix it High-level Representation mix(prevArg2(2)) Engineering Program vs. pop 2; pop 2; pop 2; push 2 brown; System Learning push 2 brown; Abstractions Actions push 2 brown;

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state • Four inputs • Output: a sequence of actions • Attend over each input Current state when generating actions

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state Current state Encode instructions

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state Current state Encode states

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state Decoder state Current state Initialize decoder

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Attention Initial state Decoder state Current instruction Current state Attend over current instruction

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Attention Attention Initial state Decoder state Current instruction Previous instructions Current state Attend over previous instructions

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Attention Attention Initial state Decoder state Attention Current instruction Previous instructions Current state Initial state Attend over initial state

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Attention Attention Initial state Decoder state Attention Current instruction Previous instructions Current state Initial state Attention Current state Attend over current state

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Decoder state Current instruction MLP Previous instructions Current state Initial state Current state Predict action

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Current state Execute action, update state

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Current state Attention Attend over new state

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Action decoder pop 7 Current state

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Action decoder pop 7 pop 7 Current state

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Action decoder pop 7 pop 7 push 7 brown Current state

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Action decoder pop 7 pop 7 push 7 brown Current state push 7 brown

Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Action decoder pop 7 pop 7 push 7 brown Current state push 7 brown push 7 brown

Learning from World State Annotation • Goal: learn a policy that maps from instructions and environment states to actions • Approach Empty out the leftmost beaker of purple chemical • Learn through exploring the environment and observing Then, add the contents of the first beaker to the second rewards • Policy gradient with contextual bandit Mix it Then, drain 1 unit from it • Challenge: overcome biases acquired early during learning Same for 1 more unit

Situated Mapping of Sequential Instructions to Actions with - PowerPoint PPT Presentation

Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation Alane Suhr and Yoav Artzi Executing Context- Dependent Instructions Task: map a sequence of instructions to actions Existing Work Today Symbolic

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Actions of Compact Quantum Groups V Free and homogeneous actions I Kenny De Commer (VUB,

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

SITUATED COGNITION Situated Imagining and the Holy Grail of Moral Philosophy Luke Roelofs

Discounted Cash Flow Valuation Model Water Infrastructure Assets Situated in the heart of South

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Actions of Compact Quantum Groups III Reduced and universal actions Kenny De Commer (VUB,

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Flow Batteries Operational experience at the edge of the grid Chris Winter RedFlow Limited

Produ ducts cts and nd se service vices 2 Ind ndepend ependent ent ranking. . The he

Analgesic Strategies Dr Doug Johnson Birmingham Childrens Hospital ABRA Annual Scientific

A Partial Characterization of Virtually Cohen-Macaulay Simplicial Complexes Nathan Kenshur,

Computing How to compute with large sensitive data? Biomedical data Proprietary data Secure

I zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA - Capabiilty Analysis: An Industrial

Wanakas Only Lakefront Accommodation LAKE WANAKA | NEW ZEALAND LAKE WANAKA | NEW ZEALAND Full

Smarter Choices: lessons from Perth & Kinross Tim Steiner, SYSTRA on behalf of Perth &

Situated Mapping of Sequential Instructions to Actions with - PowerPoint PPT Presentation

Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation Alane Suhr and Yoav Artzi Executing Context- Dependent Instructions Task: map a sequence of instructions to actions Existing Work Today Symbolic

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Actions of Compact Quantum Groups V Free and homogeneous actions I Kenny De Commer (VUB,

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

SITUATED COGNITION Situated Imagining and the Holy Grail of Moral Philosophy Luke Roelofs

Discounted Cash Flow Valuation Model Water Infrastructure Assets Situated in the heart of South

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Actions of Compact Quantum Groups III Reduced and universal actions Kenny De Commer (VUB,

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Flow Batteries Operational experience at the edge of the grid Chris Winter RedFlow Limited

Produ ducts cts and nd se service vices 2 Ind ndepend ependent ent ranking. . The he

Analgesic Strategies Dr Doug Johnson Birmingham Childrens Hospital ABRA Annual Scientific

A Partial Characterization of Virtually Cohen-Macaulay Simplicial Complexes Nathan Kenshur,

Computing How to compute with large sensitive data? Biomedical data Proprietary data Secure

I zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA - Capabiilty Analysis: An Industrial

Wanakas Only Lakefront Accommodation LAKE WANAKA | NEW ZEALAND LAKE WANAKA | NEW ZEALAND Full

Smarter Choices: lessons from Perth &amp; Kinross Tim Steiner, SYSTRA on behalf of Perth &amp;

Smarter Choices: lessons from Perth & Kinross Tim Steiner, SYSTRA on behalf of Perth &