knowledge transfer between robots with similar dynamics
play

Knowledge Transfer Between Robots with Similar Dynamics for - PowerPoint PPT Presentation

Knowledge Transfer Between Robots with Similar Dynamics for High-Accuracy Impromptu Trajectory Tracking European Control Conference June 26, 2019 SiQi Zhou 1 , Andriy Sarabakha 2 , Erdal Kayacan 3 , Mohamed K. Helwa 1 , and Angela P. Schoellig 1


  1. Knowledge Transfer Between Robots with Similar Dynamics for High-Accuracy Impromptu Trajectory Tracking European Control Conference June 26, 2019 SiQi Zhou 1 , Andriy Sarabakha 2 , Erdal Kayacan 3 , Mohamed K. Helwa 1 , and Angela P. Schoellig 1 1 Dynamic Systems Lab, University of Toronto Institute for Aerospace Studies 2 School of Mechanical and Aerospace Engineering, Nanyang Technological University 3 Department of Engineering, Aarhus University

  2. Introduction Designing control systems for high-accuracy tracking can be challenging Baseline Closed-Loop System Actual Desired Output Output Baseline Plant Controller Nonlinearities Unmodeled Effects Tracking Error Desired Trajectory Actual Trajectory 2

  3. Introduction Neural networks as add-on blocks to enhance ‘black-box’ systems Baseline Closed-Loop System Actual DNN Desired Output DNN Offline Output Ref. Baseline Plant Learning Module Controller (Source System Inverse) Nonlinearities State Unmodeled Effects Tracking Error Desired Trajectory Actual Trajectory 3

  4. Introduction Neural networks as add-on blocks to enhance ‘black-box’ systems 4

  5. Note: If the video on previous slide has a problem, the full version of the video can be viewed here: https://youtu.be/C_teLkJDq3Y

  6. Introduction Neural networks as add-on blocks to enhance ‘black-box’ systems Baseline Closed-Loop System Actual DNN Desired Output DNN Offline Output Ref. Baseline Plant Learning Module Controller (Source System Inverse) Nonlinearities State Unmodeled Effects Average of 62% error reduction over 30 test trajectories Count Tracking Error Desired Trajectory Actual Trajectory % RMS Error Reduction 6

  7. Research Question What if we have a team of robots with different dynamics? 7

  8. Research Question ? ? ? Source Target Robot Robots Implication of similarity? 8

  9. Related Literature Transfer experience to accelerate learning on new tasks or for new robots Knowledge transfer: Leverage existing data or learned experience to accelerate or improve subsequent learning Cross-Task Transfer Knowledge Cross- Cross- Transfer Robot Robot (Robotics) Transfer Transfer 9

  10. Related Literature Approaches for transferring data across robots Knowledge transfer: Leverage existing data or learned experience to accelerate or improve subsequent learning Cross-Task Transfer Alignment-Based Map from Source to Target Knowledge Cross- Cross- (e.g., [Bócsi et al., 2013; Helwa & Schoellig, 2017]) Transfer Robot Robot Invariant Feature Learning (Robotics) Transfer Transfer Exploiting Common Feature Space (e.g., [Gupta et al., 2017; Daftry et al., 2016]) 10

  11. Related Literature Approaches for transferring data across robots Knowledge transfer: Leverage existing data or learned experience to accelerate or improve subsequent learning Cross-Task Transfer Alignment-Based Source Target Map from Source to Target Data Data Knowledge Cross- Cross- (e.g., [Bócsi et al., 2013; Helwa & Schoellig, 2017]) Transfer Robot Robot Invariant Feature Learning (Robotics) Transfer Transfer Exploiting Common Feature Space (e.g., [Gupta et al., 2017; Daftry et al., 2016]) 11

  12. Related Literature Approaches for transferring data across robots Knowledge transfer: Leverage existing data or learned experience to accelerate or improve subsequent learning Cross-Task Transfer Source Encoder Decoder State Alignment-Based Map from Source to Target Knowledge Cross- Cross- (e.g., [Bócsi et al., 2013; Helwa & Schoellig, 2017]) Target Transfer Robot Robot Encoder State Decoder Invariant Feature Learning (Robotics) Transfer Transfer Exploiting Common Feature Space (e.g., [Gupta et al., 2017; Daftry et al., 2016]) [Gupta et al., 2017] 12

  13. Related Literature Maximizing learning efficiency on physical robots shares a broader interest Knowledge transfer: Leverage existing data or learned experience to accelerate or improve subsequent learning • Sim-to-Real (e.g., [Marco et al., 2017]) ed ests elated Cross-Task eres • Meta-Learning (e.g., [Finn et al., 2017]) Inter • Modularity (e.g., [Devin et al., 2017]) Transfer Rel • … Alignment-Based Map from Source to Target Knowledge Cross- Cross- (e.g., [Bócsi et al., 2013; Helwa & Schoellig, 2017]) Transfer Robot Robot Invariant Feature Learning (Robotics) Transfer Transfer Exploiting Common Feature Space (e.g., [Gupta et al., 2017; Daftry et al., 2016]) 13

  14. Contributions 1. Impromptu knowledge transfer (i.e., without 1. Impromptu knowledge transfer (i.e., without additional a-priori data collection on the robots) additional a-priori data collection on the robots) 2. Stability analysis of transfer-enhanced system and 2. Stability analysis of transfer-enhanced system and Source Source its connection to system similarity (linear case) its connection to system similarity (linear case) 3. Verification of the knowledge transfer approach 3. Verification of the knowledge transfer approach with quadrotors impromptu tracking experiments with quadrotors impromptu tracking experiments Target Target 14

  15. Theoretical Results Problem definition Target Baseline Closed-Loop System Setup: Consider closed-loop source and target Actual Sys. Output Ref. Baseline systems represented in the following form Plant Controller Assumption: The source and the target systems a) are minimum phase b) have well-defined and the same relative degree Goal: To enhance the target baseline system with minimal amount of data (re)collection and training 15

  16. Theoretical Results Leveraging the DNN inverse module from the source system Target Baseline Closed-Loop System Actual DNN Desired Output DNN Offline Output Reference Baseline Plant Learning Module Controller (Source System Inverse) How to leverage the source DNN model? • Update source DNN • Online correction learning State Offline Learning Module Approximates Inverse of the Source Robot System [CDC 17] approximated by a DNN (when and are unknown) 16

  17. Theoretical Results Using online learning to adapt to the differences Target Baseline Closed-Loop System Actual DNN Sys. Desired Output DNN Offline Output Ref. Ref. Baseline Plant Learning Module Controller (Source System Inverse) Online Learning Module for Reference Adjustments Online Learning Module Online (Inverse Correction) Module Ref. Adaptation Error State Gain Prediction Ideal Expressions for Exact Tracking Predicted output of target system when is sent to the system 17

  18. Theoretical Results Using online learning to adapt to the differences Target Baseline Closed-Loop System Actual DNN Sys. Desired Output DNN Offline Output Ref. Ref. Baseline Plant Learning Module Controller (Source System Inverse) Online Learning Module for Reference Adjustments Online Learning Module Online (Inverse Correction) Module Ref. Adaptation Error State Gain Prediction Onlin line T Train inin ing D Dataset Ideal Expressions for Exact Tracking Online Learning of Error Predictor (Based ed o on Lates test Observations) 18

  19. Theoretical Results Characterizing similarity between the source and the target systems Target Baseline Closed-Loop System Linear Case Actual DNN Sys. Desired Output DNN Offline Output Ref. Ref. Baseline Plant Learning Module Controller (Source System Inverse) Input-Output Equation Online Learning Module Online (Inverse Correction) and Module Ref. where State Similarity Characterization Target Source = state-to-output gain = input-to-output gain 19

  20. Theoretical Results Higher similarity leads to higher tolerances for learning error Assumptions Target Baseline Closed-Loop System Actual DNN Sys. Desired Output DNN Offline Output 1. Input-to-state stable Ref. Ref. Baseline Plant Learning Module Controller (Source System Inverse) 2. Offline module corresponds to the source inverse Online Learning Module 3. Error of the online learning Online (Inverse Correction) Module Ref. module is bounded as State Stability of the Overall Learning-Enhanced Target System when when (i.e., online learning module is not active) 20

  21. Experiments We test our online learning approach on arbitrary hand drawings Samples of Arbitrary Hand-Drawn Test Trajectories 21

  22. Experiments We test our online learning approach on arbitrary hand drawings Trajectories in the x and z Directions With offline transfer alone: 38% error reduction With online transfer: 67% error reduction Transfer Target Source Compensates for slow response Path in the x-z Plane Desired Baseline w/ DNN w/ Online 22

  23. Experiments We can effectively reduce the amount of data required for training robots With offline transfer alone: 46% error reduction With online transfer: 74% error reduction (Comparable to fully-trained DNNs) Transfer Target Target Source 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend