Synthesis of skilled robotic behaviour through human sensorimotor - - PowerPoint PPT Presentation

synthesis of skilled robotic behaviour through human
SMART_READER_LITE
LIVE PREVIEW

Synthesis of skilled robotic behaviour through human sensorimotor - - PowerPoint PPT Presentation

Synthesis of skilled robotic behaviour through human sensorimotor adaptation Jan Babi Joef Stefan Institute Slovenia Well studied arm-reaching Force field OFF Force field ON Force field Before training After training perturbation


slide-1
SLIDE 1

Synthesis of skilled robotic behaviour through human sensorimotor adaptation

Jan Babič Jožef Stefan Institute Slovenia

slide-2
SLIDE 2

Well studied arm-reaching

Force field OFF After training

Force field perturbation

Before training Force field ON

Illustration adapted from Milner and Franklin (2005) Data from Shadmehr and Wise (2005)

Catch trials: suddenly turn off the force field to see the effect of training Results: Central Nervous System forms an internal model to nullify the effect of the force field

slide-3
SLIDE 3

Computational theories

  • kinematic minimal jerk model (Flash & Hogan 1985)
  • minimal torque change model (Uno et al. 1989)
  • minimal motor command change model (Nakano et al. 1999)
  • combination of control minimization (∫ 𝑣$
  • ) and best

performance with signal dependent noise (Harris & Wolpert 1998)

  • stohastic optimal control theory (Todorov & Jordan 2002)
slide-4
SLIDE 4

What is missing?

  • arm-reaching paradigm is too constrained
  • any optimality principle for a functional modality of the brain

should be a suboptimal goal to increase fitness (ecological viewpoint)

  • sensorimotor adaptation from a wider scope of reinforcement

learning with subgoals

  • fitness maximization, injury avoidance, neural energy, memory

dependence, cheap and approximate sub-goal solution, …

slide-5
SLIDE 5

Our approach

  • to expose sensorimotor control mechanisms and the

adaptations to danger of falling and injury

  • unconstrained whole body motion – squat to standing

movements

  • non-trivial perturbations
  • whole body equivalent to well-studied arm-reaching motion
  • same level of complexity
  • but, it inherently involves the danger of falling and injury

Babic, J., Oztop, E., Kawato, M. Human motor adaptation in whole body motion. Nature PG: Scientific Reports 6, 32868, (2016).

Rueckert, E., Čamernik, J., Peters, J., & Babič, J. Probabilistic Movement Models Show that Postural Control Precedes and Predicts Volitional Motor Control. Nature PG: Scientific Reports, 6, 28455 (2016).

slide-6
SLIDE 6

Experimental setup

Perturbation generation

6DOF Parallel Platform

Motion Capture Display

Marker velocity Platform position

Base COM position Target COM position

COM

Visual Feedback Display

slide-7
SLIDE 7

Experiment

slide-8
SLIDE 8

Adaptation to perturbations

Trajectory Area

1 2 3 4 5 8

Block number Trajectory area / m²

  • 0.01

0.01 0.02 0.03

*

1 2 3 4 2 3 4 5

Average number of failed trials Block number

A B

r = .738

  • Very fast adaptation to perturbations
  • Perturbed trajectories remained different to

unperturbed trajectories

  • Failures correspond to adaptation

Failed Trials

slide-9
SLIDE 9

Adaptation to perturbations

subject 1 subject 2 subject 3 subject 4 subject 8 subject 7 subject 6 subject 5

Horizontal displacement / m Normalized vertical displacement / m

0.2 0.4 0.6 0.8 1

  • 0.05

0.05

  • 0.05

0.05

  • 0.05

0.05

  • 0.05

0.05 0.2 0.4 0.6 0.8 1

  • Inter-subject consistency in re-optimized trajectories
slide-10
SLIDE 10

Adaptation mechanism

subject 1 subject 2 subject 3 subject 4 subject 8 subject 7 subject 6 subject 5

Horizontal displacement / m Normalized vertical displacement / m

0.2 0.4 0.6 0.8 1

  • 0.05

0.05

  • 0.05

0.05

  • 0.05

0.05

  • 0.05

0.05 0.2 0.4 0.6 0.8 1

  • Catch trials after adaptation stabilized
  • Inter-subject variability in aftereffects
  • Active compensation of perturbations
slide-11
SLIDE 11

Start of motion Feedback starts acting End of motion

Feedback mechanisms Feedforward mechanisms

Trajectory Area Measure Predictive component Measure 20 ms

Time: Motion Control Processes: Measures:

Predictive Component Measure

𝑄𝐷 = 𝑌+,-(𝑢) ̇ 𝐺(𝑢) , 𝑢 = 20 𝑛𝑡

  • Focus on feed-forward mechanisms governing the motion
  • How to quantify motion of COM before the feedback mechanisms could

alter the motion?

  • Introduction of predictive-response measure
slide-12
SLIDE 12

Inter-subject variability

1

2

3

4

5 6 7 8

Predictive component / m⋅s⁻¹⋅N⁻¹ Trajectory area / m²

  • 0.002
  • 0.001

0.001 0.002 0.003 0.004 0.01 0.02 0.03 0.04 0.05

  • predictive-response measure is strong predictor of afftereffects
  • subjects used little exploration during adaptation process
slide-13
SLIDE 13

Catch-trial simulations

Feedforward Controller

Joint Angles

Perturbation Switch

Dynamic Model of Movement System

Feedback Controller

Gain Parameters

Adapted COM Motion

subject 1 subject 2 subject 3 subject 4 subject 8 subject 7 subject 6 subject 5

Horizontal displacement / m Normalized vertical displacement / m

0.2 0.4 0.6 0.8 1

  • 0.05

0.05

  • 0.05

0.05

  • 0.05

0.05

  • 0.05

0.05 0.2 0.4 0.6 0.8 1

Horizontal displacement / m Normalized vertical displacement / m

0.2 0.4 0.6 0.8 1

  • 0.05

0.05 subject 1 subject 2 subject 3 subject 4 subject 8 subject 7 subject 6 subject 5

  • 0.05

0.05

  • 0.05

0.05

  • 0.05

0.05 0.2 0.4 0.6 0.8 1

slide-14
SLIDE 14
  • Very fast adaptation to perturbations
  • Perturbed trajectories remained different to

unperturbed trajectories

  • Inter-subject variability in aftereffects
  • predictive-response measure is strong predictor
  • f afftereffects
  • subjects used little exploration during adaptation

process

  • Combining Sensorimotor Adaptation and

Reinforcement Learning

Summary

slide-15
SLIDE 15

Skill synthesis for autonomy

  • For autonomous operation, the key issue is transferring the control policy

learnt by human to the robot

Motor command (u) Human Motion (m) Robot state (s) Feedback to human sensory system (f)

Human ~Adaptive Controller

Feedforward Interface Feedback Interface

Robot Learning: Learn π: s → u

slide-16
SLIDE 16

Robot skill synthesis

illustration adapted from Milner and Franklin (2005)

machine learning techniques are more efficient for supervised than unsupervised learning and optimal control problems

human brain + supervised learning >> robot skill generation

slide-17
SLIDE 17

Body schema is flexible

  • representation for body schema: VIP neurons integrate somatosensory

and visual information with visual receptive fields anchored to the hand/arm of the monkey

  • Tool use modifies the body schema (Iriki et al. 1996)

Figure from (Maravita & Iriki 2004)

Body schema is flexible

slide-18
SLIDE 18

Shared control for human-robot interacting tasks

ROBOT HUMAN

ROBOT CONTROL POLICY SHARED CONTROL SYSTEM HUMAN-ROBOT INTERFACE FEEDBACK INTERFACE

HUMAN MOTOR COMMANDS MACHINE MOTOR COMMANDS HUMAN MOTOR COMMANDS FEEDBACK ACTUAL COMMANDS FEEDBACK

  • The method is based on Locally Weighted Regression (LWR)
  • Shared control algorithm delegates the control responsibility between

human demonstrator and current robotic skill (control policy)

slide-19
SLIDE 19

Force Interaction Task

Peternel, L., Oztop, E., & Babic, J. A Shared Control Method for Online Human-in-the-Loop Robot Learning Based on Locally Weighted Regression. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 2016.

slide-20
SLIDE 20

Evolution of robot adaptation

slide-21
SLIDE 21

Peternel, L., Petric, T., & Babic, J. Human-in-the-loop approach for teaching robot assembly tasks using impedance control interface. IEEE International Conference on Robotics and Automation (ICRA), Seattle, USA, 2015. p. 1497–1502.

Force Interaction Task

slide-22
SLIDE 22

Reactive postural control

Peternel, L., Babic, J. Humanoid robot posture-control learning in real-time based on human sensorimotor learning ability. IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 2013. p. 5309-5314.

slide-23
SLIDE 23
  • The influence weighting algorithm calculates the mean square error (MSE)

between the human reaction and predicted reaction over a period T during the demonstration.

  • The maximum MSE is set as a reference for the weighting criterion:
  • The criterion is used to weight the human influence and the influence of the

autonomous controller.

  • The output that is controlling the robot is calculated by:
  • when MSE does not improve over N periods the algorithm disconnects the human

from the control loop.

  • At that point the robot is considered trained.

max total

MSE C MSE =

(1 )

human predicted

y Cy C y = +

  • 23

Responsibility transfer

slide-24
SLIDE 24

24

Responsibility transfer

slide-25
SLIDE 25

Human – Robot Physical Collaboration

Peternel, L., Petric, T., Oztop, E., Babic, J. Teaching robots to cooperate with humans in dynamic manipulation tasks based on multi-modal human-in-the-loop approach. Autonomous Robots, 2014, vol. 36, p. 123-136.

slide-26
SLIDE 26
  • Two layered imitation system

– First layer extracts the frequency – Canonic dynamic system – Second layer learns the waveform – Output dynamic system

  • The waveform is learned in real-time
  • Adaptations:

– Frequency – Phase – Amplitude

Autonomy

slide-27
SLIDE 27
  • Tight interconnection between human and exoskeleton
  • Human adapts muscular activation to the exoskeleton assistance
  • Exoskeleton adapts to human motion

Co-adaptive control of exoskeletons

Peternel, L., Tomoyuki, N., Petric, T., Ude, A., Morimoto, J., Babic, J. Adaptive control of exoskeleton robots for periodic assistive behaviours based on EMG feedback minimisation. PloS one, 2016, vol. 11, no. 2.

slide-28
SLIDE 28

Evolution of trajectories

slide-29
SLIDE 29

In collaboration with: Erhan Oztop, OZU, Turkey Mitsuo Kawato & Jun Morimoto, ATR, Japan Luka Peternel, IIT, Italy Tadej Petric, JSI, Slovenia Funding: FP7 CoDyCo Horizon 2020 SPEXOR