A Survey of Reinforcement Learning Informed by Natural Language - PowerPoint PPT Presentation

A Survey of Reinforcement Learning Informed by Natural Language Luketina et al., IJCAI 2019 Maria Fabiano

Outline 1. Motivation 2. Background 3. Current Use of Natural Language in RL 4. Trends for Natural Language in RL 5. Future Work 6. Critique

Motivation Current Problems in RL Most real-world tasks require some kind of language processing. ● RL poorly generalizes to tasks that are very similar to what it trains on, which ● limits its real-world practicality. Previous research has been limited by small corpora or synthetic language. ● Solutions with Natural Language Advances in language representation learning allow models to integrate ● world knowledge from text corpora into decision-making problems. Potential to improve generalization, overcome issues related to data ● constraints, and take advantage of human priors.

Background: RL Agents learn what actions to take in various states to maximize a cumulative reward . Goal : Find a policy 𝜌 (a|s) that maximizes the expected discounted cumulative return. Applications : continuous control, dialogue, board games, video games Limitations : real-world use is limited by data requirements and poor generalization

Background: Knowledge Transfer Recent NLP work has seen models transfer syntactic and semantic knowledge to downstream tasks. Transfer world and task-specific knowledge to sequential decision-making processes Understanding explicit goals (“go to the door”) ● Policy constraints (“avoid the scorpion”) ● Generic information about the reward or policy ● (“scorpions are fast”) Object affordances (what can be done with an object) ● Agents could learn to use NLP and information retrieval to seek information in order to make progress on a task.

Current Use of Natural Language in RL Natural Language in RL Language Language conditional assisted Language in Rewards Communicating Instruction the action & Structuring from domain following observation policies instructions knowledge space

Current Use of Natural Language in RL Language Language conditional assisted Language is part of the task Language facilitates learning formulation These tasks are not mutually exclusive. In both cases, language information can be task-independent (e.g., conveying general priors) or task-dependent (e.g., instructions).

Current Use of Natural Language in RL

Language-Conditional RL Language is part of the task Interpret and execute instructions given in language ● Language is part of the state and action space ● Often, the full language isn’t needed to solve the problem, but the full ● language assists by structuring the policy or providing auxiliary rewards Instruction Following Rewards from Instructions Observation & Action Space High-level instruction Learn a reward function Environments use language sequences (actions, goals, or for driving the interaction policies) with the agent

Language-Conditional: Instruction Following Instructions can be specific actions, goal states, or desired policies Effective agents can: 1. Execute the instruction 2. Generalize to unseen instructions Ties to hierarchical RL Oh et al., 2017 ○ Parameterized skill performs different subtasks ○ Objective function makes analogies between similar ○ subtasks to try to learn the entire subtask space Meta controller reads the instructions, decides ○ which subtask to perform, and passes subtask parameters to the parameterized skill Parameterized skill executes the given subtask ○

Language-Conditional: Rewards from Instructions Use the instructions to induce a reward function To apply instruction-following in a broader context, we need a way to ● automatically evaluate if an instruction was completed. Common architecture: a reward-learning module learns to ground an ● instruction to a goal, then generates a reward for a policy-learning module Use standard IRL or an adversarial process. ● The reward learner is the discriminator that discerns between goal states and visited states. ○ The agent is rewarded for visiting states the discriminator cannot discern from goal states. When environment rewards are sparse, instructions can help generate ● auxiliary rewards to help learn efficiently.

Language-Conditional: Observation & Action Space Environments use language to drive interaction with the agent Much more challenging – observation and action spaces grow combinatorially ● with vocabulary size and grammar complexity Cardinal directions (“Go north”) vs. relative (“go to the blue ball southwest of the green box”) ○ Dialogue systems, QA, VQA, EQA ● Multiple-choice nature makes these problems similar to instruction following ○ To help create consistent benchmarks in this space, TextWorld generates text ● games that behave as RL environments

TextWorld Example 1

TextWorld Example 2

Language-Assisted RL Language assists the task via transfer learning Language is not essential to the task, but assists via transfer of knowledge Specifies features, annotates states or entities, describes subtasks ● Most cases are task-specific ● Pre-trained embeddings and parsers provide task-independent information ●

Language-Assisted: Communicating Domain Knowledge For more general settings outside instruction following, potentially ● task-relevant information could be available Advice about the policy, information about the environment ○ Unstructured, descriptive language is more available than instructive ● Must retrieve useful information for a given context ○ Must ground that information with respect to observations ○ Narasimhan et al, 2018 ● Ground the meaning of text to the dynamics of the environment ○ Allows an agent to bootstrap policy learning in a new environment ○

Language-Assisted: Structuring Policies Construct priors on the model by ● communicating information about the state or dynamics of an environment Shape representations to be more ○ generalized abstractions Make a representation space more ○ interpretable to humans Efficiently structure computations ○ within a model Example: Learning to Compose ● Neural Networks for Question Answering

Trends for Natural Language in RL 1. Language-conditional RL is more studied than language-assisted RL 2. Learning from task-dependent text is more common than task-independent 3. Little research has been done in how to use unstructured text in knowledge transfer from task-dependent text 4. Little research in using language structure to build compositional representations and internal plans 5. Synthetically generated languages (instead of natural language) are the standard for instruction following

Learning from Text Corpora in the Wild Task-independent RL systems can’t generalize to language outside outside of the training ● distribution without transfer from a language model “Fetch a stick” vs. “Return with a stick” vs. “Grab a stick and come back” ○ Would enable agents to better utilize task-dependent corpora ● Task-dependent Transfer task-specific corpora and fine-tune a pre-trained information ● retrieval system. The RL agent queries the retrieval system and uses relevant information. Example: game manuals ○

Diverse Environments with Real Semantics A central promise of language in RL is helping agents adapt to new goals, reward functions, and environment dynamics . This is measured only by instruction following benchmarks in closed-task domains (navigation, object manipulation, etc.) and closed worlds. Small vocabulary sizes ● Multiple pieces of evidence to ground ● each word To generalize, RL needs more diverse environments with complex composition. 3D house simulation ● Minecraft ●

Future Work Use pre-trained language models to transfer world knowledge ● Learn from natural text rather than instructions or synthetic language ● Use more diverse environments with complex composition and real-world ● semantics Develop standardized environments and evaluations to properly measure ● progress of natural language and RL integration Agents that can query knowledge more explicitly and reason with it ● Pre-trained information retrieval systems ○

Critique “Good” “Not so Good” Provide compelling motivation for More background for RL ● ● why RL + NLP is worth studying Q-learning, imitation learning ○ Similarity of multiple-choice QA ● Challenges the field to take the ● problems to instruction following next step in elevate RL Factors they missed that make it ● Provides many positive example ● worthwhile to work in this space that show the feasibility of this Success in multimodal NLP work ○ work Success in other modalities with RL ○ Language can inform RL; is the ● converse true?

A Survey of Reinforcement Learning Informed by Natural Language - PowerPoint PPT Presentation

A Survey of Reinforcement Learning Informed by Natural Language Luketina et al., IJCAI 2019 Maria Fabiano Outline 1. Motivation 2. Background 3. Current Use of Natural Language in RL 4. Trends for Natural Language in RL 5. Future Work

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Informed Search strategies AIMA sections 3.5, 3.6 Summary Informed Search strategies

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

CS31 Discussion 1E Spring 17: week 10 TA: Bo-Jhang Ho bojhang@cs.ucla.edu Credit to former

DataLab: Introducing Software Engineering Thinking into Data Science Education at Scale Yang Zhang

Performance metrics How is my parallel code performing and scaling? Performance metrics A

The Easy (and Free!) way to implement drill-through

The Discipline of Prayer Prayer as a Discipline Prayer Changes Things Then Jesus told his

Bayes to the Rescue Markov Chain Monte Carlo in the pMSSM Mike Saelim Laboratory for Elementary

DroidCluster: Towards Smartphone Cluster Computing The Streets are Paved with Potential Computer

Jesus tells a parable about a midnight request of a friend. This friend needs to borrow