PhD thesis proposal
Presented by: Leonel D. Rozo C. Advisors: Carme Torras Pablo Jiménez
- Barcelona. Spain
September 29th, 2008
using haptics and vision PhD thesis proposal Presented by: Leonel - - PowerPoint PPT Presentation
Robot coaching of manipulation tasks using haptics and vision PhD thesis proposal Presented by: Leonel D. Rozo C. Advisors: Carme Torras Pablo Jimnez Barcelona. Spain September 29 th , 2008 Outline Objectives 1. State of the art 2.
September 29th, 2008
1.
2.
3.
4.
5.
6.
Main objective
To provide robots with manipulation skills acquired
Specific objectives
To analyze (and adapt) different learning algorithms based
those that best suit the manipulation task features.
Incremental learning Fast learning Robust learning
To identify the relevant features in the manipulation tasks
from sensorial information with the aim of including them as input in the learning stage.
What to imitate ?
To develop a set-up where robot learning of manipulation
tasks by demonstration will take place. It will be composed of a robot (the learner) teleoperated through a haptic device driven by a human user (the coach).
To fuse haptic and visual information for improving and
speeding up the learning stage.
Introduction
Introduction
Why should robots learn ? Two main approaches exist for endowing robots with learning
capabilities:
Self-learning Learning from examples
LbD – History and concepts
Learning by demonstration
Symbolic approaches
Exact reproduction of the demonstrated task (playback) State-action-state representation
Unsuitable approach when uncertainty appears
If-then rules
(A. Billard et al. 2008)
LbD – History and concepts
Machine learning inclusion in programming by demonstration
Supervised methods
A training dataset composed by labelled input and desired outputs is given. Goal: Given a new input, to predict its corresponding output Some methods are:
Artificial neural networks
Decision trees
Bayesian statistics
Gaussian process regression
Nearest neighbour
Support vector machines
Unsupervised methods
A input dataset is presented but no feedback about it is given Goal: finding a representation of particular input patterns in a way that
reflects the statistical structure of the overall collection of input patterns
LbD – History and concepts
Imitation learning
What is imitation ?
Biological inspiration
From an act witnessed learn to do an act (Thorndike).
Robotics
Imitation takes place when an agent learns a behaviour from observing the execution of that behaviour by a teacher (Bakker and Kuniyoshi, 1996).
Current challenges
(P. Bakker & Y . Kuniyoshi, 1996)
LbD – History and concepts
Movement primitives (MP)
Inductive approach
MP are sequences of actions that accomplish a complete goal-directed behaviour and allow to have a compact state-action representation (Schaal, 1999). (S. Schaal, 1999 )
LbD – History and concepts
Movement primitives (MP)
Biological inspiration
A behaviour-based control approach (Mataric)
To use a control system that is based on a set of behaviours (MP), which are real- time processes that take inputs from sensors or other behaviours and send output commands to effectors or other system behaviour.
How to interpret and understand observed behaviors ? How to integrate the perception and motion control system to reconstruct what was observed ?
(Computational Neuroscience and Humanoid Robotics Department, ATR laboratories)
LbD – History and concepts
Control policies
The motor control problem which can be conceived as finding a task-specific
control policy
Imitation learning can be defined as the problem of how control policies can
be learned by observing a demonstration:
Imitation by direct policy learning
Imitation by learning policies from demonstrated trajectories
Imitation by model-based policy learning
Motor commands Policy States Algorithm parameters
LbD – History and concepts
What to imitate ? – Learning invariances over demonstrations
Finding those features of the task that are relevant to the reproduction
Those that appear most repeatedly in different demonstrations of the task i.e., the invariants in time (Billard et al., 2004)
Categorization of the human actions (Dillman,2004):
Performative
Commenting
Commanding Imitation task Observation process Execution process (Dillman, 2004)
LbD – History and concepts
Improving imitation learning
A task learned from imitation can be improved, corrected or refined
in two ways:
By using reinforcement learning
The given demonstrations enclose the search in the state-action space to a more reduced subspace, which means RL is focused
those areas where demonstration data yield
This approach is based on a self-improvement process, where the robot improves the learned skill by interacting with its environment (A. Billard et al. 2008)
LbD – History and concepts
By using active teaching
The learned action from imitation is corrected or refined through teacher’s support
The information goes from teacher to the robot The information flow is bi- directional due to a social activity is being carried out
toward benchmarks for improved learning. Interaction Studies, 8(3):441-464, 2007.
LbD – History and concepts
Incremental learning
Whenever new data are generated, these should be included in the
learning framework
New demonstrations Corrections Refinements
It is necessary to work with learning algorithms that accomplish
at least the following requirements:
Online learning Inexpensive computations Robustness in front of the interference problem Fast learning in highly dimensional state-action spaces
LbD – History and concepts
Locally weighted learning
LWL methods approximate nonlinear functions by means of piecewise linear
models
Memory-based
Locally weighted regression – LWR
Locally weighted partial least squares - LWPLS
LbD – History and concepts
Non-memory-based
Receptive field weighted regression – RFWR
Locally weighted projection regression – LWPR
LWPR is an incremental learning algorithm, which is able to deal with high
dimensional data streams. In addition is computationally cheap and numerically robust.
SHORTCOMING !!! Too many open parameters to be manually tuned (S. Schaal & C. Atkeson, 1998) (S. Vijayakumar & S. Schaal, 2000)
LbD – History and concepts
LWL-based Bayesian learning
These methods deal with the problem of manually tuning of the open
parameters in LWL algorithms
Bayesian locally weighted regression – BLWR
It treats all open parameters probabilistically and learns the appropriate local regime for each linearization problem based on the LWR algorithm approach.
It is Bayesian formulation of spatially local adaptive kernels for LWR
Randomly varying coefficient – RVC
Probabilistic method based on the paradigm of Bayesian probabilistic online learning
It treats each open parameter in LWPR as a probability distribution
Gaussian processes
Incremental GMM
Direct update method
It is based on the temporal coherence properties of data streams
It is assumed that were varying smoothly in time to adjust the GMM parameters when new data were observed
Reformulating the problem for a generic observation of multiple datapoints
Generative method
It uses Expectation-Maximization performed on data generated by GMR
Sparse online Gaussian processes - SOGP
LbD – History and concepts
Coaching
It can be divided into two process
Imitation learning
Observation
Execution
Active teaching
Observation and evaluation
Corrections and refinements
It allows ...
to acquire new knowledge
to focus attention on relevant task features
to give a strategy for correction
to help to iteratively define the characteristics of a successful outcome (A. Billard et al. 2008)
LbD – Entire systems
Systems based on vision Manipulation tasks Playing air hockey Gestures Human motion
Optimization criteria Bayesian methods HMM PCA Gaussian processes
LbD – Entire systems
Learning basketball official’s signals
Motion sensors Preprocessing stage by using PCA Actions are encoded in a probabilistic way by using GMM GMR is applied for reconstructing a general form for the signals
in a humanoid robot. 2007
LbD – Entire systems
Systems based
Assembly tasks Virtual environments
Optimization criteria Fuzzy logic HMM LWR Neural networks
LbD – Entire systems
Virtual environments
Learning the peg-in-hole insertion task
Virtual scene where user manipulates the peg by moving a haptic device
A preprocessing stage based on Dillman’s criteria was carried out for removing noise in the training data
Position and orientation of the peg, and forces/torques generated compose the input data training
A HMM was applied to identify and estimate the contact states
During the physical implementation, LWR was used for learning the trajectory in each state of the insertion procedure (S. Dong & F. Naghdy, 2007)
LbD – Entire systems
Teleoperated systems Manipulation tasks Grasping tasks Playing soccer Robot-assisted surgery
Decision trees Nearest neighbor Bayesian methods Manifold learning Neural networks Gaussian processes LWL LWPR Scaffolding
LbD – Entire systems
Haptically guided teleoperation for learning manipulation tasks
A mobile robot manipulator with a camera placed on its end-effector The human user teleoperates the robot through a haptic device
He/she is haptically guided by using information provided by the vision system
Hierarchical feedforward neural networks are used in the learning stage
Backpropagation
Learning to play soccer
Robot dog learns a variety of tasks related to robot soccer
Training data is composed of vision and proprioceptive information
Learning process is carried out by using:
LWPR
SOGP
(A. Howard & C. Park, 2007) (D. Grollman & O. Jenkins, 2008)
LbD – Entire systems Vision Haptics
Robot Coach Robot Coach Bentivegna et al. (2004) X X(p) Bentivegna et al. (2002) X X Calinon et al. (2007-1) X X Calinon et al. (2007-2) X X Calinon et al. (2007-3) X X X(p) Chen et al. (2002) X X X Dong et al. (2007) X X X Férnandez et al. (2003) X X Grollman et al. (2008) X X X(p) Howard et al. (2007) X X X(p) X Jenkins et al. (2006) X X Kaiser et al. (1996) X X Kang et al. (1995) X X X Kuniyoshi et al. (1994) X X Lockerd et al. (2004) X X Mayer et al. (2008) X X X PetersII et al. (2003) X X(p) Riley et al. (2006) X X Shon et al. (2007) X X
To exploit haptic and visual information fusion in robot
Providing the robot
with haptic and vision senses for improving and speeding up the learning stage
Haptics and vision will be the information sources for generating
the training data
Providing the human user with haptic feedback for getting or
improving a bidirectional information flow
Better samples of the tasks
Providing the coach
with virtual guides through haptic feedback
The demonstrations will be restricted to those are relevant to
the task
To develop a learning algorithm that complies with all
Both haptic and vision information will be taken into account
in the input training data
What to imitate ?
Providing a suitable incremental learning algorithm for the
coaching structure
When new samples are given When corrections and refinements are carried out How to imitate ?
1.
To develop a setup composed of a robot teleoperated by a human provided with visual and haptic feedback
2.
To review the state of the art on learning by demonstration
3.
To study and implement incremental learning algorithms
4.
To propose an incremental learning algorithm for the Coaching framework
5.
To study and implement robot Coaching in manipulation tasks
6.
To propose a suitable robot Coaching framework for learning manipulation skills
7.
To carry out experimental tests
8.
To write and defense the PhD thesis
Most of the learning by demonstration frameworks do not allow to improve the demonstrated task
Reinforcement learning
Active teaching
Incremental learning methods are necessary to achieve a suitable coaching framework
It is necessary to take the manipulation task features into account
Fast learning
Robust learning
Few works have studied and integrated vision and haptics in a perception system for both the robot and the coach
Improving the bidirectional information flow
Relevant variables in manipulation can be taken into account (Forces, torques, positions, velocities, etc.)