Synthesis of skilled robotic behaviour through human sensorimotor - - PowerPoint PPT Presentation
Synthesis of skilled robotic behaviour through human sensorimotor - - PowerPoint PPT Presentation
Synthesis of skilled robotic behaviour through human sensorimotor adaptation Jan Babi Joef Stefan Institute Slovenia Well studied arm-reaching Force field OFF Force field ON Force field Before training After training perturbation
Well studied arm-reaching
Force field OFF After training
Force field perturbation
Before training Force field ON
Illustration adapted from Milner and Franklin (2005) Data from Shadmehr and Wise (2005)
Catch trials: suddenly turn off the force field to see the effect of training Results: Central Nervous System forms an internal model to nullify the effect of the force field
Computational theories
- kinematic minimal jerk model (Flash & Hogan 1985)
- minimal torque change model (Uno et al. 1989)
- minimal motor command change model (Nakano et al. 1999)
- combination of control minimization (∫ 𝑣$
- ) and best
performance with signal dependent noise (Harris & Wolpert 1998)
- stohastic optimal control theory (Todorov & Jordan 2002)
What is missing?
- arm-reaching paradigm is too constrained
- any optimality principle for a functional modality of the brain
should be a suboptimal goal to increase fitness (ecological viewpoint)
- sensorimotor adaptation from a wider scope of reinforcement
learning with subgoals
- fitness maximization, injury avoidance, neural energy, memory
dependence, cheap and approximate sub-goal solution, …
Our approach
- to expose sensorimotor control mechanisms and the
adaptations to danger of falling and injury
- unconstrained whole body motion – squat to standing
movements
- non-trivial perturbations
- whole body equivalent to well-studied arm-reaching motion
- same level of complexity
- but, it inherently involves the danger of falling and injury
Babic, J., Oztop, E., Kawato, M. Human motor adaptation in whole body motion. Nature PG: Scientific Reports 6, 32868, (2016).
Rueckert, E., Čamernik, J., Peters, J., & Babič, J. Probabilistic Movement Models Show that Postural Control Precedes and Predicts Volitional Motor Control. Nature PG: Scientific Reports, 6, 28455 (2016).
Experimental setup
Perturbation generation
6DOF Parallel Platform
Motion Capture Display
Marker velocity Platform position
Base COM position Target COM position
COM
Visual Feedback Display
Experiment
Adaptation to perturbations
Trajectory Area
1 2 3 4 5 8
Block number Trajectory area / m²
- 0.01
0.01 0.02 0.03
*
1 2 3 4 2 3 4 5
Average number of failed trials Block number
A B
r = .738
- Very fast adaptation to perturbations
- Perturbed trajectories remained different to
unperturbed trajectories
- Failures correspond to adaptation
Failed Trials
Adaptation to perturbations
subject 1 subject 2 subject 3 subject 4 subject 8 subject 7 subject 6 subject 5
Horizontal displacement / m Normalized vertical displacement / m
0.2 0.4 0.6 0.8 1
- 0.05
0.05
- 0.05
0.05
- 0.05
0.05
- 0.05
0.05 0.2 0.4 0.6 0.8 1
- Inter-subject consistency in re-optimized trajectories
Adaptation mechanism
subject 1 subject 2 subject 3 subject 4 subject 8 subject 7 subject 6 subject 5
Horizontal displacement / m Normalized vertical displacement / m
0.2 0.4 0.6 0.8 1
- 0.05
0.05
- 0.05
0.05
- 0.05
0.05
- 0.05
0.05 0.2 0.4 0.6 0.8 1
- Catch trials after adaptation stabilized
- Inter-subject variability in aftereffects
- Active compensation of perturbations
Start of motion Feedback starts acting End of motion
Feedback mechanisms Feedforward mechanisms
Trajectory Area Measure Predictive component Measure 20 ms
Time: Motion Control Processes: Measures:
Predictive Component Measure
𝑄𝐷 = 𝑌+,-(𝑢) ̇ 𝐺(𝑢) , 𝑢 = 20 𝑛𝑡
- Focus on feed-forward mechanisms governing the motion
- How to quantify motion of COM before the feedback mechanisms could
alter the motion?
- Introduction of predictive-response measure
Inter-subject variability
1
2
3
4
5 6 7 8
Predictive component / m⋅s⁻¹⋅N⁻¹ Trajectory area / m²
- 0.002
- 0.001
0.001 0.002 0.003 0.004 0.01 0.02 0.03 0.04 0.05
- predictive-response measure is strong predictor of afftereffects
- subjects used little exploration during adaptation process
Catch-trial simulations
Feedforward Controller
Joint Angles
Perturbation Switch
Dynamic Model of Movement System
Feedback Controller
Gain Parameters
Adapted COM Motion
subject 1 subject 2 subject 3 subject 4 subject 8 subject 7 subject 6 subject 5
Horizontal displacement / m Normalized vertical displacement / m
0.2 0.4 0.6 0.8 1
- 0.05
0.05
- 0.05
0.05
- 0.05
0.05
- 0.05
0.05 0.2 0.4 0.6 0.8 1
Horizontal displacement / m Normalized vertical displacement / m
0.2 0.4 0.6 0.8 1
- 0.05
0.05 subject 1 subject 2 subject 3 subject 4 subject 8 subject 7 subject 6 subject 5
- 0.05
0.05
- 0.05
0.05
- 0.05
0.05 0.2 0.4 0.6 0.8 1
- Very fast adaptation to perturbations
- Perturbed trajectories remained different to
unperturbed trajectories
- Inter-subject variability in aftereffects
- predictive-response measure is strong predictor
- f afftereffects
- subjects used little exploration during adaptation
process
- Combining Sensorimotor Adaptation and
Reinforcement Learning
Summary
Skill synthesis for autonomy
- For autonomous operation, the key issue is transferring the control policy
learnt by human to the robot
Motor command (u) Human Motion (m) Robot state (s) Feedback to human sensory system (f)
Human ~Adaptive Controller
Feedforward Interface Feedback Interface
Robot Learning: Learn π: s → u
Robot skill synthesis
illustration adapted from Milner and Franklin (2005)
machine learning techniques are more efficient for supervised than unsupervised learning and optimal control problems
human brain + supervised learning >> robot skill generation
Body schema is flexible
円
- representation for body schema: VIP neurons integrate somatosensory
and visual information with visual receptive fields anchored to the hand/arm of the monkey
- Tool use modifies the body schema (Iriki et al. 1996)
Figure from (Maravita & Iriki 2004)
Body schema is flexible
Shared control for human-robot interacting tasks
ROBOT HUMAN
ROBOT CONTROL POLICY SHARED CONTROL SYSTEM HUMAN-ROBOT INTERFACE FEEDBACK INTERFACE
HUMAN MOTOR COMMANDS MACHINE MOTOR COMMANDS HUMAN MOTOR COMMANDS FEEDBACK ACTUAL COMMANDS FEEDBACK
- The method is based on Locally Weighted Regression (LWR)
- Shared control algorithm delegates the control responsibility between
human demonstrator and current robotic skill (control policy)
Force Interaction Task
Peternel, L., Oztop, E., & Babic, J. A Shared Control Method for Online Human-in-the-Loop Robot Learning Based on Locally Weighted Regression. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 2016.
Evolution of robot adaptation
Peternel, L., Petric, T., & Babic, J. Human-in-the-loop approach for teaching robot assembly tasks using impedance control interface. IEEE International Conference on Robotics and Automation (ICRA), Seattle, USA, 2015. p. 1497–1502.
Force Interaction Task
Reactive postural control
Peternel, L., Babic, J. Humanoid robot posture-control learning in real-time based on human sensorimotor learning ability. IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 2013. p. 5309-5314.
- The influence weighting algorithm calculates the mean square error (MSE)
between the human reaction and predicted reaction over a period T during the demonstration.
- The maximum MSE is set as a reference for the weighting criterion:
- The criterion is used to weight the human influence and the influence of the
autonomous controller.
- The output that is controlling the robot is calculated by:
- when MSE does not improve over N periods the algorithm disconnects the human
from the control loop.
- At that point the robot is considered trained.
max total
MSE C MSE =
(1 )
human predicted
y Cy C y = +
- 23
Responsibility transfer
24
Responsibility transfer
Human – Robot Physical Collaboration
Peternel, L., Petric, T., Oztop, E., Babic, J. Teaching robots to cooperate with humans in dynamic manipulation tasks based on multi-modal human-in-the-loop approach. Autonomous Robots, 2014, vol. 36, p. 123-136.
- Two layered imitation system
– First layer extracts the frequency – Canonic dynamic system – Second layer learns the waveform – Output dynamic system
- The waveform is learned in real-time
- Adaptations:
– Frequency – Phase – Amplitude
Autonomy
- Tight interconnection between human and exoskeleton
- Human adapts muscular activation to the exoskeleton assistance
- Exoskeleton adapts to human motion
Co-adaptive control of exoskeletons
Peternel, L., Tomoyuki, N., Petric, T., Ude, A., Morimoto, J., Babic, J. Adaptive control of exoskeleton robots for periodic assistive behaviours based on EMG feedback minimisation. PloS one, 2016, vol. 11, no. 2.