Synthesis of skilled robotic behaviour through human sensorimotor - PowerPoint PPT Presentation

Synthesis of skilled robotic behaviour through human sensorimotor adaptation Jan Babič Jožef Stefan Institute Slovenia

Well studied arm-reaching Force field OFF Force field ON Force field Before training After training perturbation Data from Shadmehr and Wise (2005) Catch trials: suddenly turn off the force field to see the effect of training Results: Central Nervous System forms an internal model to nullify the effect of the force field Illustration adapted from Milner and Franklin (2005)

� � Computational theories • kinematic minimal jerk model (Flash & Hogan 1985) • minimal torque change model (Uno et al. 1989) • minimal motor command change model (Nakano et al. 1999) • combination of control minimization (∫ 𝑣 $ ) and best performance with signal dependent noise (Harris & Wolpert 1998) • stohastic optimal control theory (Todorov & Jordan 2002)

What is missing? • arm-reaching paradigm is too constrained • any optimality principle for a functional modality of the brain should be a suboptimal goal to increase fitness (ecological viewpoint) • sensorimotor adaptation from a wider scope of reinforcement learning with subgoals • fitness maximization, injury avoidance, neural energy, memory dependence, cheap and approximate sub-goal solution, …

Our approach • to expose sensorimotor control mechanisms and the adaptations to danger of falling and injury • unconstrained whole body motion – squat to standing movements • non-trivial perturbations • whole body equivalent to well-studied arm-reaching motion • same level of complexity • but, it inherently involves the danger of falling and injury Babic, J., Oztop, E., Kawato, M. Human motor adaptation in whole body motion . Nature PG: Scientific Reports 6, 32868, (2016). Rueckert, E., Čamernik, J., Peters, J., & Babič, J. Probabilistic Movement Models Show that Postural Control Precedes and Predicts Volitional Motor Control . Nature PG: Scientific Reports , 6 , 28455 (2016).

Experimental setup Display Visual Feedback Display Motion Capture Target COM position COM 6DOF Parallel Platform Base COM position Perturbation generation Marker velocity Platform position

Experiment

Adaptation to perturbations Trajectory Area Failed Trials A B 0.03 4 * r = .738 Average number of failed trials 0.02 3 Trajectory area / m² 0.01 2 0 1 -0.01 0 1 2 3 4 5 8 2 3 4 5 Block number Block number Very fast adaptation to perturbations • Perturbed trajectories remained different to • unperturbed trajectories Failures correspond to adaptation •

Adaptation to perturbations 1 subject 1 subject 2 subject 3 subject 4 0.8 0.6 0.4 Normalized vertical displacement / m 0.2 0 subject 5 subject 6 subject 7 subject 8 1 0.8 0.6 0.4 0.2 0 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 Horizontal displacement / m Inter-subject consistency in re-optimized trajectories •

Adaptation mechanism 1 subject 1 subject 2 subject 3 subject 4 0.8 0.6 0.4 Normalized vertical displacement / m 0.2 0 subject 5 subject 6 subject 7 subject 8 1 0.8 0.6 0.4 0.2 0 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 Horizontal displacement / m Catch trials after adaptation stabilized • Inter-subject variability in aftereffects • Active compensation of perturbations •

̇ Predictive Component Measure Focus on feed-forward mechanisms governing the motion • How to quantify motion of COM before the feedback mechanisms could • alter the motion? 𝑄𝐷 = 𝑌 +,- (𝑢) Introduction of predictive-response measure , 𝑢 = 20 𝑛𝑡 • 𝐺(𝑢) Trajectory Area Measure Measures: Predictive component Measure Feedback mechanisms Motion Control Processes: Feedforward mechanisms Time: Start of motion 20 ms Feedback starts acting End of motion

Inter-subject variability 1 0.05 5 6 Trajectory area / m² 0.04 3 0.03 8 2 4 0.02 7 0.01 -0.002 -0.001 0 0.001 0.002 0.003 0.004 Predictive component / m ⋅ s ⁻ ¹ ⋅ N ⁻ ¹ • predictive-response measure is strong predictor of afftereffects subjects used little exploration during adaptation process •

Catch-trial simulations Adapted COM Motion Feedforward Controller Dynamic Model of Feedback Movement System Controller Joint Angles Gain Parameters Perturbation Switch subject 1 subject 2 subject 3 subject 4 1 subject 1 subject 2 subject 3 subject 4 1 0.8 0.8 0.6 0.6 0.4 0.4 Normalized vertical displacement / m Normalized vertical displacement / m 0.2 0.2 0 0 subject 5 subject 6 subject 7 subject 8 1 1 subject 5 subject 6 subject 7 subject 8 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 Horizontal displacement / m Horizontal displacement / m

Summary Very fast adaptation to perturbations • Perturbed trajectories remained different to • unperturbed trajectories Inter-subject variability in aftereffects • predictive-response measure is strong predictor • of afftereffects subjects used little exploration during adaptation • process Combining Sensorimotor Adaptation and • Reinforcement Learning

Skill synthesis for autonomy For autonomous operation, the key issue is transferring the control policy • learnt by human to the robot Human Robot Learning: ~Adaptive Controller Learn π: s → u Feedback to human sensory system (f) Robot state (s) Feedback Interface Human Motion (m) Motor command (u) Feedforward Interface

Robot skill synthesis machine learning techniques are more efficient for supervised than unsupervised learning and optimal control problems illustration adapted from Milner and Franklin (2005) human brain + supervised learning >> robot skill generation

Body schema is flexible Body schema is flexible 円 Figure from (Maravita & Iriki 2004) • representation for body schema: VIP neurons integrate somatosensory and visual information with visual receptive fields anchored to the hand/arm of the monkey • Tool use modifies the body schema (Iriki et al. 1996)

Shared control for human-robot interacting tasks ROBOT FEEDBACK CONTROL POLICY HUMAN MACHINE MOTOR MOTOR COMMANDS HUMAN MOTOR COMMANDS COMMANDS SHARED HUMAN-ROBOT HUMAN ROBOT CONTROL INTERFACE ACTUAL SYSTEM COMMANDS FEEDBACK FEEDBACK INTERFACE The method is based on Locally Weighted Regression (LWR) • Shared control algorithm delegates the control responsibility between • human demonstrator and current robotic skill (control policy)

Force Interaction Task Peternel, L., Oztop, E., & Babic, J. A Shared Control Method for Online Human-in-the-Loop Robot Learning Based on Locally Weighted Regression. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Daejeon, Korea, 2016.

Evolution of robot adaptation

Force Interaction Task Peternel, L., Petric, T., & Babic, J. Human-in-the-loop approach for teaching robot assembly tasks using impedance control interface. IEEE International Conference on Robotics and Automation (ICRA) , Seattle, USA, 2015. p. 1497–1502.

Reactive postural control Peternel, L., Babic, J. Humanoid robot posture-control learning in real-time based on human sensorimotor learning ability. IEEE International Conference on Robotics and Automation (ICRA) , Karlsruhe, Germany, 2013. p. 5309-5314.

Responsibility transfer The influence weighting algorithm calculates the mean square error (MSE) • between the human reaction and predicted reaction over a period T during the demonstration. The maximum MSE is set as a reference for the weighting criterion: • MSE = C total MSE max The criterion is used to weight the human influence and the influence of the • autonomous controller. The output that is controlling the robot is calculated by: • = + - y Cy (1 C y ) human predicted when MSE does not improve over N periods the algorithm disconnects the human • from the control loop. At that point the robot is considered trained. • 23

Responsibility transfer 24

Human – Robot Physical Collaboration Peternel, L., Petric, T., Oztop, E., Babic, J. Teaching robots to cooperate with humans in dynamic manipulation tasks based on multi-modal human-in-the-loop approach. Autonomous Robots , 2014, vol. 36, p. 123-136.

Autonomy • Two layered imitation system – First layer extracts the frequency – Canonic dynamic system – Second layer learns the waveform – Output dynamic system • The waveform is learned in real-time • Adaptations: – Frequency – Phase – Amplitude

Co-adaptive control of exoskeletons Tight interconnection between human and exoskeleton • Human adapts muscular activation to the exoskeleton assistance • Exoskeleton adapts to human motion • Peternel, L., Tomoyuki, N., Petric, T., Ude, A., Morimoto, J., Babic, J. Adaptive control of exoskeleton robots for periodic assistive behaviours based on EMG feedback minimisation. PloS one , 2016, vol. 11, no. 2.

Evolution of trajectories

In collaboration with: Funding: Erhan Oztop , OZU, Turkey FP7 CoDyCo Mitsuo Kawato & Jun Morimoto , ATR, Japan Horizon 2020 SPEXOR Luka Peternel , IIT, Italy Tadej Petric , JSI, Slovenia

Synthesis of skilled robotic behaviour through human sensorimotor - PowerPoint PPT Presentation

Synthesis of skilled robotic behaviour through human sensorimotor adaptation Jan Babi Joef Stefan Institute Slovenia Well studied arm-reaching Force field OFF Force field ON Force field Before training After training perturbation

skilled labour migration programmes Estimated annual inflow of skilled and highly skilled labour

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Session 14 Introduction to Behaviour that Challenges SECTION 5: 1 Behaviour Behaviour that is

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Department of Immigration and Citizenship Skilled visas for Australia Agenda for todays

Chapter 6. Object and System Behaviour 1. Object Behaviour Modelling 2. Global System Behaviour

Learning in Robotic Systems Robotic Agents @ Allegheny College Janyl Jumadinova November 27,

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

RVFuzzer: Finding Input Validation Bugs in Robotic Vehicles through Control-Guided Testing Taegyu

Catherine Lennox EDPS 650 What is prosocial behaviour? How is prosocial behaviour related to

ANTI SOCIAL BEHAVIOUR WHAT IS ANTISOCIAL WHAT IS ANTISOCIAL BEHAVIOUR BEHAVIOUR Bullying

Anti- -Social Behaviour Statistics Social Behaviour Statistics Anti for Cannock Chase for

Anti-Social Behaviour Anti-Social Behaviour - Anti-social Behaviour, Crime and Policing Act 2014

Program Behaviour Program Behaviour semantics .c .c .c source program code inputs Program

System Information for Skilled Foreign Workers Background As of January 2015, skilled foreign

ThermOS System Support for Dynamic Thermal Management of Chip Multi-Processors Filippo Sironi

The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 De Denis is Yarats 12 12 1

Introduction to Pattern Oriented Analysis and Design (POAD) Instructor: Dr. Hany H. Ammar Dept.

Introduction, The PID Controller, State Space Models Automatic Control, Basic Course, Lecture 1

Layout and simulation of the ATF2 feedback/feed-forward system in the context of FONT Javier

Cryogenic Controls for FAIR Review on Cryogenics for FAIR Ralph C. Br 28.02.2012 Controls

Potential Reinterpretation of Clean Water Act TAS Provisions USEPA Office of Science and

COVID: UNA MIRADA MACROECONMICA Ivn Werning, MIT FEN-UC Julio 2020 RESEARCH ON COVID: