Structured Losses Zero-Shot Task Generalization with Multi-Task - PowerPoint PPT Presentation

Structured Losses Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli Oh et al. 2017 https://arxiv.org/abs/1706.05064 Presented by Belén Saldías belen@mit.edu Friday, November 6, 2020

Outline 1. Paper: Oh et al. 2017 11:35 - 12:05 (~ 30 mins) 2. Breakout rooms discussion 12:05 - 12:20 (~ 15 mins) 3. Class discussion 12:20 - 12:30 (~ 10 mins) 2

Zero-Shot Task ● Problem set up Approach and technical contributions ● Generalization with ● Related work Multi-Task Deep Reinforcement Learning ● Learning a Parameterized Skill Learning to Execute Sequential Instructions ● Oh et al. 2017 ● Conclusions & Takeaways ● Discussion Feel free to raise your blue-Zoom hand if you want to add something as the presentation goes! 3

Motivation: Zero-shot task generalization Problem: It is infeasible to train a household robot to do every possible combinations of instructions. 1. Go to the kitchen 2. Wash dishes 3. Empty the trash can Goal: Train the agent on a small set of tasks 4. Go to the bedroom such that it can generalize over a larger set of tasks without additional training. Unseen Task space Seen Oh et al. 2017 4

Motivation: Multi-task Deep Reinforcement Learning (RL) The agent is required to: - Perform many different tasks depending on the given task description. - Generalize over unseen task descriptions. Agent Action Observation Task Description Oh et al. 2017 5

Task: Problem set up Instruction execution: an agent's task is to execute a given list of instructions described by a simple form of natural language while dealing with unexpected events . Assumption: Each instruction can be executed by performing one or more high-level subtask in sequence. Oh et al. 2017 6

Task: Problem set up Instruction execution: an agent's task is to execute a given list of instructions described by a simple form of natural language while dealing with unexpected events . Assumption: Each instruction can be executed by performing one or more high-level subtask in sequence. Challenges: ○ Generalization ■ Unseen subtasks (skill learning stage) ■ Longer sequences of instructions ○ Delayed reward (subtask updater) ○ Interruptions (bonus or emergencies) ○ Memory (loop tasks) Oh et al. 2017 7

Discussion prompts (keep in mind for later) 1. What are the limitations of this framework? Why? 2. How does structuring losses inform learned representations? 3. How could common sense reasoning and information be injected to the model so that we don't rely as much in training analogies. 4. How do you think this architecture would generalize to other specific tasks/scenarios? Why? 5. What are some tasks that the current framework wouldn't be able to generalize? Why? 8

Approach and technical contributions The learning problem is divided in two stages 1) Learning parameterized skills to perform 2) Learning to execute instructions using the subtasks and generalize to unseen subtasks. learned skills. subtask := several disentangled parameters Oh et al. 2017 9

Approach and technical contributions The learning problem is divided in two stages 1) Learning parameterized skills to perform How to generalize? subtasks and generalize to unseen subtasks. New objective function that encourages making subtask := several disentangled parameters analogies between similar subtasks so that the manifold of the subtasks spaces can be learned without experiencing all subtasks. The authors show that the analogy-making objective can generalize successfully. Oh et al. 2017 10

Approach and technical contributions The learning problem is divided in two stages How to generalize? 2) Learning to execute instructions using the learned skills. The meta controller 's ability to learn when to update a subtask plays a key role in solving the overall problem. Oh et al. 2017 11

Related work Hierarchical RL Hierarchical Deep RL - Much of previous work has assumed an optimal - Most of the recent work on hierarchical RL and sequence of subtasks fixed during evaluation. Also deep learning build an open-loop policy at the using meta meta controller and a set low-level controllers high-level controller that waits until the previous for subtasks . subtask is finished to trigger the next subtask. - Makes it hard to evaluate the agent's ability to solve - This open-loop approach is not able to handle previously unseen sequential tasks in a zero-shot interruptions, while this work proposed an fashion unless the agent is trained on the new tasks. architecture that can switch its subtask at any time . - Different to previous work, in this work instructions are a description of the tasks, where the agent needs to learn to use these descriptions to maximize reward. Oh et al. 2017 12

Related work Zero-Shot Task Generalization Instruction execution - Some previous work aimed at generalization by - Some work has focused on using natural language mapping task descriptions to policies or using understanding to map instructions to actions. sub-networks that are shared across tasks. - This work focuses on generalization to sequences - Andreas et al. (2016) proposes a framework to of instructions without any supervision for language generalize over new sequence of pre-learned tasks. understanding or for actions. - This work propose a flexible metric learning - Branan et al. (2009) tackles a similar problem but method (i.e., analogy-making) that can be applied with only a single instruction at a time , while the to various generalization scenarios. authors' agent works on aligning a list of instructions and internal state. - This work aims to generalize to both to unseen tasks and unseen sequences of them. Oh et al. 2017 13

Approach The learning problem is divided in two stages 1) Learning parameterized skills to perform 2) Learning to execute instructions using the subtasks and generalize to unseen subtasks. learned skills. subtask := several disentangled parameters Oh et al. 2017 14

1) Learning a Parameterized Skill Object-independent scenario Training Testing To generalize, the agent assumes: ● Semantics of each parameter are consistent. ● Required knowledge: "Pick up ⚽ as you pick Pick up ( 📧 ) Pick up ( 🎿 ) up 📧 ." Throw ( ⚽ ) Object-dependent scenario ● Semantics of a task depend on a combination of parameters (e.g., target Training Testing object). ● Impossible to generalize over unseen Interact ( 🍏 ) = eat Interact ( 🍠 ) = eat combinations without any prior knowledge. Interact ( ⚽ ) = throw ● Required knowledge: "Interact with 🍠 as you interact with 🍏 ." 15 Oh et al. 2017

1) Learning a Parameterized Skill CONVx4 + LSTM Pick up 📧 Representation of task parameters Deep neural net Oh et al. 2017 16

1) Learning a Parameterized Skill CONVx4 + LSTM Actor-Critic (Fully-connected output layer) Binary classification Pick up (Fully-connected output layer) 📧 Analogy making (Fully-connected output layer) Representation of Aiming to generalize, this task parameters introduces knowledge about tasks through analogy-making in the task embedding space. Deep neural net Oh et al. 2017 17

1) Learning a Parameterized Skill CONVx4 + LSTM Actor-Critic (Fully-connected output layer) Binary classification Pick up (Fully-connected output layer) 📧 Analogy making (Fully-connected output layer) Representation of Aiming to generalize, this task parameters introduces knowledge about tasks through analogy-making in the task embedding space. Deep neural net, trained end-to-end with these three objectives. Oh et al. 2017 18

1.1) Learning to Generalize by Analogy-Making Object-independent scenario [Visit, X] : [Visit, Y] :: [Pick up, X] : [Pick up, Y] Goal: learn correspondence between tasks. difference [Visit, X] [Pick up, X] = [Visit, Y] difference [Pick up, Y] Analogy-making unseen Constraints in embedding space Acquire knowledge about the relationship between different task parameters when learning the task embedding. 19 Oh et al. 2017

1.1) Learning to Generalize by Analogy-Making Object-independent scenario [Visit, X] : [Visit, Y] :: [Pick up, X] : [Pick up, Y] Goal: learn correspondence between tasks. difference [Visit, X] [Pick up, X] = [Visit, Y] difference [Pick up, Y] Analogy-making unseen Constraints in embedding space Analogy-making (similar to Mikolov et al. (2013)). Prevent trivial solutions and learn differences between tasks. 20 Oh et al. 2017

1.1) Learning to Generalize by Analogy-Making Object-independent scenario [Visit, X] : [Visit, Y] :: [Pick up, X] : [Pick up, Y] Goal: learn correspondence between tasks. difference [Visit, X] [Pick up, X] = [Visit, Y] difference [Pick up, Y] Analogy-making unseen Constraints in embedding space Analogy-making (similar to Mikolov et al. (2013)). Weighted sum of these three restrictions is added as a Prevent trivial solutions and regularizer. learn differences between tasks. 21 Oh et al. 2017

1) Learning a Parameterized Skill Actor-Critic (Fully-connected output layer) Binary classification (Fully-connected output layer) Pick up Analogy making (Fully-connected output layer) Representation of 📧 task parameters analogy-making regularizer Fine-tune multi-task cross-entropy loss for policy termination prediction Oh et al. 2017 22

Structured Losses Zero-Shot Task Generalization with Multi-Task - PowerPoint PPT Presentation

Structured Losses Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli Oh et al. 2017 https://arxiv.org/abs/1706.05064 Presented by Beln Saldas belen@mit.edu

Contents of Presentation Types of losses Causes of losses Prevention of losses

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

LOSSES OEE Workshop Siyambulela Bozo: Junior Project Manager AIDC - TPM Pres resentation

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

L101: Introduction to Structured Prediction Ryan Cotterell What is structured prediction?

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

Food losses and waste in Fresh Fruit & Vegetables supply chains Indonesia Quick Scan

PREVENTION OF FOOD LOSSES IN THE FIELD In order to prevent these losses in the field, the use of

Overall Equipment Effectiveness OEE Calculation Eight Major Plant Losses Sr. No. Losses

Lecture Lecture 5 5 Losses of Losses of Prestress Prestress Dr. Hazim Dwairi Dr.

cognitive models physical and device architectural Cognitive models Goal and task

2/27/2019 1. 1 2. 2 3. 3 www.hingebrokers.com 1 2/27/2019 4. 4 5. 5 6. 6

10/4/2019 Received speakers honorarium and/or consultation fees from: Mundipharma, Merck

Statistical Aspects of Quantum Computing Yazhen Wang Department of Statistics University of

Greedy Method Outline / Reading Greedy Method as a fundamental algorithm design technique

Hierarchical Task Networks Planning to perform tasks rather than to achieve goals 1

Multi-Robot Collaborative Dense Scene Reconstruction Siyan Dong ng 1,4 Kai Xu 2,4 Qiang ang

Reflexives and Control Constructions in Real English [Linguistics 128/228: Real English: The

Structured Losses Zero-Shot Task Generalization with Multi-Task - PowerPoint PPT Presentation

Structured Losses Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli Oh et al. 2017 https://arxiv.org/abs/1706.05064 Presented by Beln Saldas belen@mit.edu

Contents of Presentation Types of losses Causes of losses Prevention of losses

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

LOSSES OEE Workshop Siyambulela Bozo: Junior Project Manager AIDC - TPM Pres resentation

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

L101: Introduction to Structured Prediction Ryan Cotterell What is structured prediction?

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

Food losses and waste in Fresh Fruit &amp; Vegetables supply chains Indonesia Quick Scan

PREVENTION OF FOOD LOSSES IN THE FIELD In order to prevent these losses in the field, the use of

Overall Equipment Effectiveness OEE Calculation Eight Major Plant Losses Sr. No. Losses

Lecture Lecture 5 5 Losses of Losses of Prestress Prestress Dr. Hazim Dwairi Dr.

cognitive models physical and device architectural Cognitive models Goal and task

2/27/2019 1. 1 2. 2 3. 3 www.hingebrokers.com 1 2/27/2019 4. 4 5. 5 6. 6

10/4/2019 Received speakers honorarium and/or consultation fees from: Mundipharma, Merck

Statistical Aspects of Quantum Computing Yazhen Wang Department of Statistics University of

Greedy Method Outline / Reading Greedy Method as a fundamental algorithm design technique

Hierarchical Task Networks Planning to perform tasks rather than to achieve goals 1

Multi-Robot Collaborative Dense Scene Reconstruction Siyan Dong ng 1,4 Kai Xu 2,4 Qiang ang

Reflexives and Control Constructions in Real English [Linguistics 128/228: Real English: The

Food losses and waste in Fresh Fruit & Vegetables supply chains Indonesia Quick Scan