Transfer from Simulation to Real World through Learning Deep Inverse - PowerPoint PPT Presentation

Introduction Method Result Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Zaermba OpenAI, San Francisco, CA, USA Presentor: Hao-Wei Lee Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Method Result Develop Control Policy for a System If you have a robot. To find a good way to control it, you can either: Peform reinforcement learning during the robot operation. takes higher cost and time. Perform reinforcement learning on a simulation of the robot. Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Method Result Learn Policies from Simulation? Policies learned from simulation usually cannot be used directly. Simulation often captures only high level trajectories, ignoring details of physical properties. Can we transfer learned policy from simulation to real world? Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Deep Inverse Dynamic Model Method Training of Inverse Dynamics Neural Network Result Transfer Learning of Policy Policies are found by simulation instead of real world. Use neural network to map learned policy in source environment (simulation) to target environment (real world). Transfer good policies in one simulation to many other real world environments. Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Deep Inverse Dynamic Model Method Training of Inverse Dynamics Neural Network Result Variables in Environments Each environment has its own: State Space S : s ∈ S are states of the environment. Action Space A : a ∈ A are actions can be take. Observation Space O : o(s) is the observation of environment in state s System Forward Dynamic: T ( s , a ) = s ′ , determine new state s ′ given action and previous state Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Deep Inverse Dynamic Model Method Training of Inverse Dynamics Neural Network Result Deep Inverse Dynamic Model τ − k : : Trajectory: { o } most recent k observations and k-1 actions of target environment. π source : Good enough policy in source environment. φ : Inverse dynamics is a neural network that maps source policy to target policy. Figure: Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Deep Inverse Dynamic Model Method Training of Inverse Dynamics Neural Network Result Deep Inverse Dynamic Model 1 Compute source action a source = π source ( τ − k : ) according to target trajectory. 2 Observe the next state given τ − k : and a source : o next = o ( T source ( τ − k : , a source )) ˆ 3 Feed ˆ o next and τ − k : to Inverse dynamics that produce a target Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Deep Inverse Dynamic Model Method Training of Inverse Dynamics Neural Network Result Training of Inverse Dynamics Neural Network I Given trajectory of previous k time step and the desired observation o k + 1 , the network output action that leads to desired observation φ : ( o 0 , a 0 , o 1 , . . . , a k − 1 , o k , o k + 1 ) → a k Training data are obtained by preliminary inverse dynamics model φ and prelimiary policy π target of target environment Diversity of training data can be achieved by adding noise to predefined actions Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Deep Inverse Dynamic Model Method Training of Inverse Dynamics Neural Network Result Architecture of Inverse Dynamic Neural Network input: previous k observations, previous k − 1 actions, desired observation for next time step output: the action that leads to desired observation Hidden layer: two fully connected hidden layer with 256 unit followed by ReLU activation function. Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Method Result Simulation 1 to Simulation 2 Transfer I The experiments are performed on Simulators that can change conditions of it’s environment. The source and target environment are basically the same model except gravity or motor noise The following four models are used for simulation. Figure: From left to right are Reacher, Hopper, Half-cheetah, and Humanoid Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Method Result Simulation 1 to Simulation 2 Transfer II Variation of Gravity Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Method Result Simulation 1 to Simulation 2 Transfer III Variation of Motor Noise Figure: Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Method Result Simulation to Real Transfer The real evironment is a physical Fetch Robot. The groundtruth is the observation obtained by directy apply reinforcement learning on the robot. The baseline to compare with is a PD controller. Figure: The discrepancy between observations on transferred policy and ground truth is measured. Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Introduction Method Result Conclusion The method succefully adapt complex control policies to real world. obsrvation in source and target environment are assume the same, which are not always true. The method can also be applied to the simulation that actions cannot be seen. Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Transfer from Simulation to Real World through Learning Deep

Transfer from Simulation to Real World through Learning Deep Inverse - PowerPoint PPT Presentation

Introduction Method Result Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Zaermba

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Transfer Learning Eu Wern Teh What are we covering? Why transfer learning? Fine

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Real Students Real World Real Work Real Life: A Plan for a Holistic Approach to Supporting

Introduction to Simulation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Introduction to I/O and Disk Management 1 Secondary Storage Management Disks just like

Collect ollectiv ive e Fr Framew amewor ork k and and Per erfor ormance mance Optimiz

Welcome to the co u rse ! FOU N DATION S OF IN FE R E N C E Jo Hardin Instr u ctor What is

Learning State of the Art 1 19.11.2019 What is Deep Learning? https://youtu.be/Kfe5hKNwrCU

Introduction Kevin has over 20 years experience of working within financial markets technology

Themis Ensemble Manager Presented To: WoWoHa Seminar June 2020 David Domyancic, James Corbett,

Deterministic Policy Gradient, Advanced RL Algorithms Milan Straka December 10, 2018 Charles

Nonlinear Shallow Water Testbed Model Chris Eldred, Colorado State University A) Project

Transfer from Simulation to Real World through Learning Deep Inverse - PowerPoint PPT Presentation

Introduction Method Result Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Zaermba

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Transfer Learning Eu Wern Teh What are we covering? Why transfer learning? Fine

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Real Students Real World Real Work Real Life: A Plan for a Holistic Approach to Supporting

Introduction to Simulation Banks, Carson, Nelson &amp; Nicol Discrete-Event System Simulation

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Introduction to I/O and Disk Management 1 Secondary Storage Management Disks just like

Collect ollectiv ive e Fr Framew amewor ork k and and Per erfor ormance mance Optimiz

Welcome to the co u rse ! FOU N DATION S OF IN FE R E N C E Jo Hardin Instr u ctor What is

Learning State of the Art 1 19.11.2019 What is Deep Learning? https://youtu.be/Kfe5hKNwrCU

Introduction Kevin has over 20 years experience of working within financial markets technology

Themis Ensemble Manager Presented To: WoWoHa Seminar June 2020 David Domyancic, James Corbett,

Deterministic Policy Gradient, Advanced RL Algorithms Milan Straka December 10, 2018 Charles

Nonlinear Shallow Water Testbed Model Chris Eldred, Colorado State University A) Project

Introduction to Simulation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation