http://lasa.epfl.ch
Programming by Demonstration: Some recent challenges Aude G. - - PowerPoint PPT Presentation
Programming by Demonstration: Some recent challenges Aude G. - - PowerPoint PPT Presentation
Programming by Demonstration: Some recent challenges Aude G. Billard LASA Laboratory EPFL http://lasa.epfl.ch Generalizing: Learning a control law Learning a control law that ensures that you reach the target even if perturbed and that you
http://lasa.epfl.ch
Learning a control law that ensures that you reach the target even if perturbed and that you follow a particular dynamics
Generalizing: Learning a control law
http://lasa.epfl.ch
Learn a control law from examples
1
x
2
x Time-invariant DS with stable attractor
( )
x f x = &
( )
* *
0. x f x = = &
x*: target
http://lasa.epfl.ch
{ }
( )
( )
1
Make
- bservations of the state of the system
, , 1... . Learn , = , ; , : joint density (mixture of Gaussians) describing the distribution of velocity in state space.
i i K k k k k
N x x i N p x x w N x x µ
=
= Σ
∑
& & &
Learn a control law from examples
1
x
2
x
http://lasa.epfl.ch
{ }
( )
( )
1
Make
- bservations of the state of the system
, , 1... . Learn , = , ; , : joint density (mixture of Gaussians) describing the distribution of velocity in state space.
i i K k k k k
N x x i N p x x w N x x µ
=
= Σ
∑
& & &
Learn a control law from examples
2
x
1
x
http://lasa.epfl.ch
Learn a control law from examples
{ }
( )
( )
1
Make
- bservations of the state of the system
, , 1... . Learn , = , ; , : joint density (mixture of Gaussians) describing the distribution of velocity in state space.
i i K k k k k
N x x i N p x x w N x x µ
=
= Σ
∑
& & &
2
x
1
x
( ) ( )
~ ; , p x N x µ Σ
http://lasa.epfl.ch
Learn a control law from examples
{ }
( )
( )
1
Make
- bservations of the state of the system
, , 1... . Learn , = , ; , : joint density (mixture of Gaussians) describing the distribution of velocity in state space.
i i K k k k k
N x x i N p x x w N x x µ
=
= Σ
∑
& & &
1
x
2
x Compute (analytical expression for f)
( ) ( )
{ }
( )(
)
1
| =
K k k k k
h x A x b f x E p x x
=
+ =
∑
&
http://lasa.epfl.ch
Learn a control law from examples
{ }
( )
( )
1
Make
- bservations of the state of the system
, , 1... . Learn , = , ; , : joint density (mixture of Gaussians) describing the distribution of velocity in state space.
i i K k k k k
N x x i N p x x w N x x µ
=
= Σ
∑
& & &
1
x
2
x Compute (analytical expression for f) Determine the parameters of the density as a constrained optimization problem (maximize likelihood under stability constraints).
( ) ( )
{ }
( )(
)
1
| =
K k k k k
h x A x b f x E p x x
=
+ =
∑
&
( ) ( ) ( )
( )
1 1 1
Stability Constraints in terms of Gaussian Parameters ) 1,... )
k k k k x xx xx x T k k k k xx xx xx xx
a k K b µ µ
− − −
⎧ + ∑ ∑ = ⎪ ∀ = ⎨ ⎪ ∑ ∑ + ∑ ∑ ⎩
& & & &
p
Khansari and Billard, SEDS, IEEE TRO 2011
http://lasa.epfl.ch
Other examples of complex dynamics that can be estimated through SEDS.
1
x
2
x
Khansari and Billard, IEEE TRO 2011
Generalizing: Learning a control law
Stability at attractor Khansari and Billard, SEDS, IEEE TRO 2011
http://lasa.epfl.ch
1
x
2
x
Khansari and Billard, IEEE TRO 2011
Generalizing: Learning a control law
Other examples of complex dynamics that can be estimated through SEDS.
Stability at attractor
Khansari and Billard, SEDS, IEEE TRO 2011
http://lasa.epfl.ch
Learning motion with non-zero velocity at target
Kronander, Khansari and Billard, IROS 2011, JTSC Best Paper Award
Extend the SEDS model with modulation in speed at target
http://lasa.epfl.ch
Learning coupling across dynamical systems
Learn separately stable control laws to control for arm and fingers. Couple the two systems to allow adequate adaptation to perturbations.
Shukla and Billard, RSS 2011
http://lasa.epfl.ch
Learning coupling across dynamical systems
Learn separately stable control laws to control for arm and fingers. Couple the two systems to allow adequate adaptation to perturbations.
Shukla and Billard, RSS 2011
http://lasa.epfl.ch
Catching Objects in Flight
http://lasa.epfl.ch
Learning a skill is more than simply replaying a trajectory. It requires to understand what a skill is. To learn this, one needs to show several demonstrations to generalize across sets of examples.
What to Imitate?
Billard et al, Rob. and Aut. Systems, 2005; Calinon et al. IEEE SMC 2007
http://lasa.epfl.ch
How to Imitate?
?
Imitator
à à Find the closest solution according to some cost function
Demonstrator
http://lasa.epfl.ch
Key Idea: The world is uncertain; learn about its uncertainty through probabilistic modeling of information.
( )
{ }
var | p x x &
( )
{ }
| E p x x &
The expectation gives a reference trajectory Computing the variance provides crucial information
( ) ( )
1
, , ; ,
K i i i i
p x x w N x x µ
=
= Σ
∑
& &
Statistical model of the data
x x &
http://lasa.epfl.ch
Computing the variance provides crucial information The variance à provides a notion of feasible space of solutions à is used to compute new path in the face of changes in the context
Generalizing
To generate new trajectories that depart from the reference trajectory while remaining within the total variance.
x x &
http://lasa.epfl.ch
The variance à provides a notion of feasible space of solutions à is used to compute new path in the face of changes in the context
Generalizing
( )
H x x x = − ) & & & x x &
( )
{ }
| x E p x x = ) & &
( )
min u.c. (inverse kinematics) H x x x J x θ = − = ) & & & & &
Cost function
http://lasa.epfl.ch
The variance à provides a notion of feasible space of solutions à is used to compute new path in the face of changes in the context
Generalizing
( )
H x x x = − ) & & & x x &
( )
min u.c. (inverse kinematics) H x x x J x θ = − = ) & & & & &
Cost function Impossible solution
http://lasa.epfl.ch
The variance à provides a notion of feasible space of solutions à is used to compute new path in the face of changes in the context
Generalizing
( ) (
)
( ) (
)
1 T
H x x x x x x
−
= − Σ − ) ) & & & & & & x x &
Cost function
( ) ( )
{ }
var | x p x x Σ = & &
( )
min u.c. (inverse kinematics) H x J x θ = & & &
http://lasa.epfl.ch
Adap%ve ¡Grasping ¡
Grasping usually solved by searching for the optimal placement of fingers onto an object.
http://lasa.epfl.ch
Grasping usually solved by searching for the optimal placement of fingers onto an object. Knowing the extent to which one can adapt this grasp is useful for safe manipulation.
Adap%ve ¡Grasping ¡
Learn how comply with external perturbations while maintaining a firm grasp.
Sauser, Argall and Billard, Autonomous Robots, 2012
http://lasa.epfl.ch
Teaching through tactile sensing
Adap%ve ¡Grasping ¡
http://lasa.epfl.ch
Adap%ve ¡Grasping ¡
Teaching through tactile sensing
http://lasa.epfl.ch
Adap%ve ¡Grasping ¡
( )
Learn a probabilistic mapping , , between contact signature of the object (normal force and tactile response ) and fingers' posture . p s s φ θ φ θ
http://lasa.epfl.ch
Adap%ve ¡Grasping ¡
( )
Learn a probabilistic mapping , , between contact signature of the object (normal force and tactile response ) and fingers' posture . p s s φ θ φ θ
{ }
( )
( )
47 1
Make
- bservations of the state of the system
, , , 1... . = ; , : joint density (mixture of Gaussians) describing the observations and the correlation across the variables o
i i i i K k k k k
N s i N p w N ξ φ θ ξ ξ µ
=
= ∈ = Σ
∑
° f the system.
Can be used to predict the appropriate joint posture when perceiving a change in contact signature:
( )
{ }
( )
{ }
ˆ ˆ | , , | , E p s s E p s θ θ φ θ φ = =
http://lasa.epfl.ch
Adap%ve ¡Grasping ¡
( )
{ }
( )
{ }
ˆ ˆ | , , | , E p s s E p s θ θ φ θ φ = =
http://lasa.epfl.ch
Adap%ve ¡Grasping ¡
After Training
http://lasa.epfl.ch
Adap%ve ¡Grasping ¡
Another Example
http://lasa.epfl.ch
Adap%ve ¡Grasping ¡
Another Example
Refining knowledge using tactile interface (5 touchpads mounted on robot’s arm and wrist)
Adap%ve ¡Manipula%on ¡
Teaching through teleoperation using Interface for direct joint motion transfer (Xsens motion sensors)
http://lasa.epfl.ch
Reuse: To avoid re-learning a new task from scratch when the new task bears similarities with the old task
Adap%ve ¡Manipula%on ¡
http://lasa.epfl.ch
Reuse preserves variability learned in the previous task. This may be a drawback à Use tactile feedback to adapt locally this variability
Before Reuse After Reuse
Adap%ve ¡Manipula%on ¡
http://lasa.epfl.ch
Adap%ve ¡Manipula%on ¡
Reuse: One more example
http://lasa.epfl.ch
Being stiff is not always good à How to teach a robot to relax…
Teaching ¡robots ¡to ¡be ¡less ¡s0ff ¡
Low stiffness when carrying the liquid High stiffness when pouring the liquid
Kronander and Billard, ICRA 2012
http://lasa.epfl.ch
Shaking the robot: A natural method to teach a robot to relax.
Teaching ¡robots ¡to ¡be ¡less ¡s0ff ¡
Being stiff is not always good à How to teach a robot to relax…
http://lasa.epfl.ch
( )
Adjust stiffness at each time step:
t t t
K x x − %
( )
Record perturbation from current position . Set stiffness profile inversely proportional to variance of perturbation (the more variation, the less stiff): Covariance matrix: Eigenvalue decompo
t T
x x x Δ Σ = Δ Δ
1
sition: ~
T T t
U U K U U
−
Σ = Λ ⇒ Λ
Teaching ¡robots ¡to ¡be ¡less ¡s0ff ¡
( ) ( )
~ (critically damped)
PD control law to follow a desired trajectory , D
K t t t t t
x u K x x D x x = − − − % % %
http://lasa.epfl.ch
Teaching ¡robots ¡to ¡be ¡less ¡s0ff ¡
( )
Adjust stiffness at each time step:
t t t
K x x − %
( ) ( )
~ (critically damped)
PD control law to follow a desired trajectory , D
K t t t t t
x u K x x D x x = − − − % % %
( )
Record perturbation from current position . Set stiffness profile inversely proportional to variance of perturbation (the more variation, the less stiff): Covariance matrix: Eigenvalue decompo
t T
x x x Δ Σ = Δ Δ
1
sition: ~
T T t
U U K U U
−
Σ = Λ ⇒ Λ
http://lasa.epfl.ch
After training the robot manages to adapt naturally when required and remains stiff when required.
Teaching ¡robots ¡to ¡be ¡less ¡s0ff ¡
High coherence across trials à high confidence Little coherence across trials à low confidence
Angular Position (radian) of robot’s wrist
Velocity
Learning from Bad Demonstrations
- Search around the demonstrations
- Reproduce only parts where all demonstrators agreed
- Avoid regions with high uncertainty
Grollman and Billard, ICRA 2012, Best Paper Award Cognitive Robotics
High coherence across trials à high confidence Little coherence across trials à low confidence
Angular Position (radian) of robot’s wrist
Velocity
Learning from Bad Demonstrations
- Search around the demonstrations
- Reproduce only parts where all demonstrators agreed
- Avoid regions with high uncertainty
http://lasa.epfl.ch
Conclusion
Learning from human demonstration is foremost generalizing
- Learning a generic control law
- Learning feasible regions of the state space
Observing human demonstration is not sufficient to perform the task
- Extracting key features from demonstrations
- Use these to adapt the trajectory
Demonstrations do not need to be perfect solutions to the task à Learning from bad demonstrations provides crucial information on what is key to perform the task. à More useful to know several feasible solutions to the task than a single but optimal one
http://lasa.epfl.ch