Knowledge Transfer Between Robots with Similar Dynamics for - - PowerPoint PPT Presentation

knowledge transfer between robots with similar dynamics
SMART_READER_LITE
LIVE PREVIEW

Knowledge Transfer Between Robots with Similar Dynamics for - - PowerPoint PPT Presentation

Knowledge Transfer Between Robots with Similar Dynamics for High-Accuracy Impromptu Trajectory Tracking European Control Conference June 26, 2019 SiQi Zhou 1 , Andriy Sarabakha 2 , Erdal Kayacan 3 , Mohamed K. Helwa 1 , and Angela P. Schoellig 1


slide-1
SLIDE 1

Knowledge Transfer Between Robots with Similar Dynamics for High-Accuracy Impromptu Trajectory Tracking

European Control Conference June 26, 2019 SiQi Zhou1, Andriy Sarabakha2, Erdal Kayacan3, Mohamed K. Helwa1, and Angela P. Schoellig1

1 Dynamic Systems Lab, University of Toronto Institute for Aerospace Studies 2 School of Mechanical and Aerospace Engineering, Nanyang Technological University 3 Department of Engineering, Aarhus University

slide-2
SLIDE 2

2

Tracking Error

Desired Trajectory Actual Trajectory

Introduction

Designing control systems for high-accuracy tracking can be challenging

Nonlinearities Unmodeled Effects

Plant Baseline Controller

Actual Output Desired Output

Baseline Closed-Loop System

slide-3
SLIDE 3

Tracking Error

Desired Trajectory Actual Trajectory

Nonlinearities Unmodeled Effects

3

Introduction

Neural networks as add-on blocks to enhance ‘black-box’ systems

Plant DNN Offline Learning Module

(Source System Inverse)

Baseline Controller

Actual Output DNN Ref. State

Baseline Closed-Loop System

Desired Output

slide-4
SLIDE 4

4

Introduction

Neural networks as add-on blocks to enhance ‘black-box’ systems

slide-5
SLIDE 5

Note: If the video on previous slide has a problem, the full version of the video can be viewed here: https://youtu.be/C_teLkJDq3Y

slide-6
SLIDE 6

Tracking Error

Desired Trajectory Actual Trajectory

Nonlinearities Unmodeled Effects

6

Introduction

Neural networks as add-on blocks to enhance ‘black-box’ systems

Plant DNN Offline Learning Module

(Source System Inverse)

Baseline Controller

Actual Output DNN Ref. State

Baseline Closed-Loop System

Average of 62% error reduction over 30 test trajectories

% RMS Error Reduction Count

Desired Output

slide-7
SLIDE 7

7

Research Question

What if we have a team of robots with different dynamics?

slide-8
SLIDE 8

8

Research Question

Target Robots Source Robot

? ? ?

Implication of similarity?

slide-9
SLIDE 9

Knowledge Transfer (Robotics)

9

Related Literature

Transfer experience to accelerate learning on new tasks or for new robots Knowledge transfer: Leverage existing data or learned experience to accelerate or improve subsequent learning Cross-Task Transfer Cross- Robot Transfer Cross- Robot Transfer

slide-10
SLIDE 10

10

Related Literature

Approaches for transferring data across robots Knowledge Transfer (Robotics) Knowledge transfer: Leverage existing data or learned experience to accelerate or improve subsequent learning Cross-Task Transfer Cross- Robot Transfer Cross- Robot Transfer Invariant Feature Learning

Exploiting Common Feature Space

(e.g., [Gupta et al., 2017; Daftry et al., 2016])

Alignment-Based

Map from Source to Target

(e.g., [Bócsi et al., 2013; Helwa & Schoellig, 2017])

slide-11
SLIDE 11

11

Related Literature

Approaches for transferring data across robots Knowledge Transfer (Robotics) Knowledge transfer: Leverage existing data or learned experience to accelerate or improve subsequent learning Cross-Task Transfer Cross- Robot Transfer Cross- Robot Transfer Invariant Feature Learning

Exploiting Common Feature Space

(e.g., [Gupta et al., 2017; Daftry et al., 2016])

Alignment-Based

Map from Source to Target

(e.g., [Bócsi et al., 2013; Helwa & Schoellig, 2017])

Source Data Target Data

slide-12
SLIDE 12

Encoder Encoder Decoder Decoder

Source State Target State [Gupta et al., 2017]

12

Related Literature

Approaches for transferring data across robots Knowledge Transfer (Robotics) Knowledge transfer: Leverage existing data or learned experience to accelerate or improve subsequent learning Cross-Task Transfer Cross- Robot Transfer Cross- Robot Transfer Invariant Feature Learning

Exploiting Common Feature Space

(e.g., [Gupta et al., 2017; Daftry et al., 2016])

Alignment-Based

Map from Source to Target

(e.g., [Bócsi et al., 2013; Helwa & Schoellig, 2017])

slide-13
SLIDE 13

13

Related Literature

Maximizing learning efficiency on physical robots shares a broader interest Knowledge Transfer (Robotics) Knowledge transfer: Leverage existing data or learned experience to accelerate or improve subsequent learning Cross-Task Transfer Cross- Robot Transfer Cross- Robot Transfer Invariant Feature Learning

Exploiting Common Feature Space

(e.g., [Gupta et al., 2017; Daftry et al., 2016])

Alignment-Based

Map from Source to Target

(e.g., [Bócsi et al., 2013; Helwa & Schoellig, 2017])

Rel elated ed Inter eres ests

  • Sim-to-Real (e.g., [Marco et al., 2017])
  • Meta-Learning (e.g., [Finn et al., 2017])
  • Modularity (e.g., [Devin et al., 2017])
slide-14
SLIDE 14

Source Target

14

Contributions

  • 1. Impromptu knowledge transfer (i.e., without

additional a-priori data collection on the robots)

  • 2. Stability analysis of transfer-enhanced system and

its connection to system similarity (linear case)

  • 3. Verification of the knowledge transfer approach

with quadrotors impromptu tracking experiments

  • 1. Impromptu knowledge transfer (i.e., without

additional a-priori data collection on the robots)

  • 2. Stability analysis of transfer-enhanced system and

its connection to system similarity (linear case)

  • 3. Verification of the knowledge transfer approach

with quadrotors impromptu tracking experiments

Source Target

slide-15
SLIDE 15

15

Plant Baseline Controller

Target Baseline Closed-Loop System

Sys. Ref.

Theoretical Results

Problem definition Setup: Consider closed-loop source and target systems represented in the following form Assumption: The source and the target systems a) are minimum phase b) have well-defined and the same relative degree Goal: To enhance the target baseline system with minimal amount of data (re)collection and training

Actual Output

slide-16
SLIDE 16

16

Plant DNN Offline Learning Module

(Source System Inverse)

Baseline Controller

Desired Output Actual Output DNN Reference

Theoretical Results

Leveraging the DNN inverse module from the source system Offline Learning Module Approximates Inverse of the Source Robot System [CDC 17]

State

Target Baseline Closed-Loop System

approximated by a DNN (when and are unknown)

How to leverage the source DNN model?

  • Update source DNN
  • Online correction learning
slide-17
SLIDE 17

17

Theoretical Results

Using online learning to adapt to the differences Online Learning Module for Reference Adjustments

Adaptation Gain Error Prediction

Plant DNN Offline Learning Module

(Source System Inverse)

Online Learning Module

(Inverse Correction)

Baseline Controller

DNN Ref. Online Module Ref. State Actual Output Sys. Ref. Desired Output

Target Baseline Closed-Loop System

Ideal Expressions for Exact Tracking

Predicted output of target system when is sent to the system

slide-18
SLIDE 18

18

Theoretical Results

Using online learning to adapt to the differences Online Learning Module for Reference Adjustments

Adaptation Gain Error Prediction

Plant DNN Offline Learning Module

(Source System Inverse)

Online Learning Module

(Inverse Correction)

Baseline Controller

DNN Ref. Online Module Ref. State Actual Output Sys. Ref. Desired Output

Target Baseline Closed-Loop System

Ideal Expressions for Exact Tracking Online Learning of Error Predictor

Onlin line T Train inin ing D Dataset (Based ed o

  • n Lates

test Observations)

slide-19
SLIDE 19

19

Theoretical Results

Characterizing similarity between the source and the target systems

Plant DNN Offline Learning Module

(Source System Inverse)

Online Learning Module

(Inverse Correction)

Baseline Controller

DNN Ref. Online Module Ref. State Actual Output Sys. Ref. Desired Output

Target Baseline Closed-Loop System

= state-to-output gain = input-to-output gain

Similarity Characterization

Target Source

Linear Case and where Input-Output Equation

slide-20
SLIDE 20

20

Theoretical Results

Higher similarity leads to higher tolerances for learning error Stability of the Overall Learning-Enhanced Target System

Plant DNN Offline Learning Module

(Source System Inverse)

Online Learning Module

(Inverse Correction)

Baseline Controller

DNN Ref. Online Module Ref. State Actual Output Sys. Ref. Desired Output

Target Baseline Closed-Loop System

Assumptions

  • 1. Input-to-state stable
  • 2. Offline module corresponds to

the source inverse

  • 3. Error of the online learning

module is bounded as (i.e., online learning module is not active) when when

slide-21
SLIDE 21

21

Experiments

We test our online learning approach on arbitrary hand drawings

Samples of Arbitrary Hand-Drawn Test Trajectories

slide-22
SLIDE 22

22

Experiments

We test our online learning approach on arbitrary hand drawings

With offline transfer alone: 38% error reduction With online transfer: 67% error reduction

Path in the x-z Plane Trajectories in the x and z Directions

Transfer

Target Source

Desired Baseline w/ DNN w/ Online

Compensates for slow response

slide-23
SLIDE 23

With offline transfer alone: 46% error reduction With online transfer: 74% error reduction (Comparable to fully-trained DNNs)

23

Experiments

We can effectively reduce the amount of data required for training robots

Target

Transfer

Target Source

slide-24
SLIDE 24

Summary

DNN inverse for tracking performance enhancement of single robots [ICRA17, CDC17] Online learning approach for impromptu cross- robot transfer of previously trained DNNs Connection between system similarity and stability

  • f target system enhanced with online learning

Performance improvement of 74% with online learning in quadrotor impromptu tracking tasks DNN inverse for tracking performance enhancement of single robots [ICRA17, CDC17] Online learning approach for impromptu cross- robot transfer of previously trained DNNs Connection between system similarity and stability

  • f target system enhanced with online learning

Performance improvement of 74% with online learning in quadrotor impromptu tracking tasks

slide-25
SLIDE 25

a light-writing by

Summary

DNN inverse for tracking performance enhancement of single robots [ICRA17, CDC17] Online learning approach for impromptu cross- robot transfer of previously trained DNNs Connection between system similarity and stability

  • f target system enhanced with online learning

Performance improvement of 74% with online learning in quadrotor impromptu tracking tasks DNN inverse for tracking performance enhancement of single robots [ICRA17, CDC17] Online learning approach for impromptu cross- robot transfer of previously trained DNNs Connection between system similarity and stability

  • f target system enhanced with online learning

Performance improvement of 74% with online learning in quadrotor impromptu tracking tasks