Machine learning for tactile manipulation Jan Peters with Filipe - - PowerPoint PPT Presentation

machine learning for tactile manipulation
SMART_READER_LITE
LIVE PREVIEW

Machine learning for tactile manipulation Jan Peters with Filipe - - PowerPoint PPT Presentation

Machine learning for tactile manipulation Jan Peters with Filipe Veiga, Herke van Hoof, Oliver Kroemer, Roberto Calandra, Tucker Hermans, Yevgen Chetobar, Yilei Zheng, Zhengkun Yi Intelligent Autonomous Systems Dept. of Computer Science


slide-1
SLIDE 1

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Machine learning for tactile manipulation

Jan Peters with Filipe Veiga, Herke van Hoof, Oliver Kroemer, Roberto Calandra, Tucker Hermans, Yevgen Chetobar, Yilei Zheng, Zhengkun Yi

Intelligent Autonomous Systems

  • Dept. of Computer Science

Technische Universität Darmstadt Interdepartmental Robot Learning Group

  • Depts. of Autonomous Motion and

Empirical Inference Max Planck Institute for Intelligent Systems All work in this talk is part of the EU FP7 ICT Project “Tactile Manipulation” (TACMAN).

slide-2
SLIDE 2

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 2

Why don’t we have personal robots yet?

Manipulation appears so easy:

  • all open loop
  • all hard coded
  • recently: just

add a little vision

slide-3
SLIDE 3

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 3

So what was hard-coded and pretended in this TV commercial of the first robot ever?

Efficient teaching of robots by imitation & RL

We need learn to act using tactile sensing!

Predict slip and use it for gripping Predict slip and control it Predict material properties Efficient tactile exploration of surfaces

slide-4
SLIDE 4

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 4

How can we learn to act using tactile sensing?

  • 1. What can we learn to recognize from tactile

interaction?

  • 2. How can we efficiently explore through touch?
  • 3. How can we learn to control slip from touch?
  • 4. Can we obtain modular grip control from single

finger slip control?

  • 5. How can we self-improve manipulation?

Property recognition Tactile exploration Predict & Control Slip Grip Control by Slip Control Efficient teaching

slide-5
SLIDE 5

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 5

How can we learn to act using tactile sensing?

  • 1. What can we learn to recognize from tactile

interaction?

  • 2. How can we efficiently explore through touch?
  • 3. How can we learn to control slip from touch?
  • 4. Can we obtain modular grip control from

single finger slip control?

  • 5. How can we self-improve manipulation?

Property recognition Tactile exploration Predict & Control Slip Efficient teaching Grip Control by Slip Control

slide-6
SLIDE 6

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 6

What can we learn to recognize from tactile interaction?

  • Object and material recognition is

crucial for manipulation

  • Allows recognition at the point of

contact and type of possible interaction

  • Allows for the absence of accurate

models and vision ➡How much on material properties can we predict from tactile sensing?

Predict material properties Tucker Hermans Janine Hölscher

Hoelscher, J.; Peters, J.; Hermans, T. (2015). Evaluation of Interactive Object Recognition with Tactile Sensing, Proceedings

  • f the International Conference on Humanoid Robots.
slide-7
SLIDE 7

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 7

Experiment – Object Recognition

Collect data for every object ¡ Extract features ¡ Select training and validation data ¡ Train a classifier Use features to predict classification

slide-8
SLIDE 8

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

49 Objects

8

Plastic Sponge Wood Stone Ceramic Fabric Paper Metal Miscellaneous

slide-9
SLIDE 9

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Confusion Matrix

9

slide-10
SLIDE 10

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Confusion Matrix

10

slide-11
SLIDE 11

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Materials classification

11

Accuracy: 97.55% for 49 objects with textbook methods

Material classification, movement concatenation, linear SVM

slide-12
SLIDE 12

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 12

How can we learn to act using tactile sensing?

  • 1. What can we learn to recognize from tactile

interaction?

  • 2. How can we efficiently explore through touch?
  • 3. How can we learn to control slip from touch?
  • 4. Can we obtain modular grip control from

single finger slip control?

  • 5. How can we self-improve manipulation?

Property recognition Tactile exploration Predict & Control Slip Efficient teaching Grip Control by Slip Control

slide-13
SLIDE 13

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 13

How can we efficiently explore through touch?

  • Accurate object shape knowledge provides

import information for complex tasks such as grasping.

  • Vision-based methods suffer from limitations such

as the available illumination and are not applicable when the object is not visible or

  • ccluded.
  • When modeling an object using tactile sensors,

touching the object surface at a fixed grid of points can be sample inefficient. ➡ Efficiently exploring such object knowledge is key for many tasks!

Efficient tactile exploration of surfaces

Yi, Z.; Calandra, R.; Veiga, F.; van Hoof, H.; Hermans, T.; Zhang, Y.; Peters, J. (2016). Active Tactile Object Exploration with Gaussian Processes, Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS).

Zhengkun Yi

slide-14
SLIDE 14

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Efficient Active Tactile Object Exploration with Gaussian Processes

14

  • Use Gaussian Processes to model
  • bject surfaces.
  • Choose as covariance function the

squared exponential.

  • Inspired by Bayesian optimization

(BO).

  • The acquisition function is defined

as the predicted standard deviation.

  • Use DIRECT to find the

approximately optimal solution. (Jones et al. 1993)

slide-15
SLIDE 15

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Experiment: Using a Real Robot

15

Experimental setup BioTac

  • The object is fixed to a vertical surface.
  • The rectangular zones are to be reconstructed.
slide-16
SLIDE 16

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Experiment: Fast convergence…

16

Correlation coefficient

True function Reconstructed function

The correlation coefficient almost converges to 1 much faster when using the active touch approach.

slide-17
SLIDE 17

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 17

How can we learn to act using tactile sensing?

  • 1. What can we learn to recognize from tactile

interaction?

  • 2. How can we efficiently explore through touch?
  • 3. How can we learn to control slip from touch?
  • 4. Can we obtain modular grip control from

single finger slip control?

  • 5. How can we self-improve manipulation?

Property recognition Tactile exploration Predict & Control Slip Efficient teaching Grip Control by Slip Control

slide-18
SLIDE 18

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 18

How can we learn to control slip from touch?

  • Tactile sensing gives direct insight on the state of

the object and does not suffer occlusions.

  • Discrete events greatly influence the

manipulation tasks.

  • Slip between fingertip and object often results in

task failure. ➡ Can we predict the onset of slip and prevent/ control it before it happened?

Predict slip and control it

Veiga, F.F.; van Hoof, H.; Peters, J.; Hermans, T. (2015). Stabilizing Novel Objects by Learning to Predict Tactile Slip, Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems

Filipe Veiga

slide-19
SLIDE 19

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 19

Slip prediction as classification problem Slip prediction

  • as classification
  • with prediction horizon
  • Just slip detection

Features for classification

  • Single element feature
  • Delta feature with
  • Time window feature

Classifiers

  • Linear SVM, Random Forests

ct+τ f = f (φ(x

1:t))

ct+τ f ∈ {cslip,cnon_ slip}

τ f

τ f = 0

φ x

1:t

( )= xt

φ x

1:t

( )= xt,Δxt [ ]

Δxt = xt − xt−1

φ x

1:t

( )= xt−τ:t

slide-20
SLIDE 20

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 20

Experiments in slip prediction

Data set Data collection Slip prediction

τ f

Fscore Prediction horizon

No significant drop in performance!

slide-21
SLIDE 21

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Grip stabilization by slip prediction

Features based on Tactile information Slip Prediction as Binary Classification Grip Stabilization Control Based

  • n Slip Signal

φ x

1:t

( )

ct+τ f = f (φ(x

1:t))

FN[t +1]= FN[t]+σ ˆ FN[t] FN[t] if slip

  • therwise

" # $ % $

slide-22
SLIDE 22

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Grip Stabilization Experiment

  • Objects pinched against a vertical

table.

  • Robot attempts to move away from

the with random velocity.

  • Grip Stabilization controller triggers

when slip occurs.

  • Controller stays active until object is

successfully stabilized or maximum time duration is reached.

slide-23
SLIDE 23

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Grip Stabilization Experiment

23

Prediction horizon Success rate

Slip prediction horizon greatly increases stabilization success rate.

slide-24
SLIDE 24

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 24

How can we learn to act using tactile sensing?

  • 1. What can we learn to recognize from tactile

interaction?

  • 2. How can we efficiently explore through touch?
  • 3. How can we learn to control slip from touch?
  • 4. Can we obtain modular grip control from

single finger slip control?

  • 5. How can we self-improve manipulation?

Property recognition Tactile exploration Predict & Control Slip Grip Control by Slip Control Efficient teaching

slide-25
SLIDE 25

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Can modular grip control be composed by locally acting fingers?

Grip control during grasping and manipulation is difficult. Most approaches have a monolithic view of the hand.

✓Explicit finger Coordination.

  • More complex feedback based

controllers.

  • Harder planning problem.

➡How can we grasp objects using simpler feedback controllers while still assuring finger coordination?

25

Predict slip and use it for gripping Filipe Veiga

Veiga, F.F.; Edin, B.; Peters, J. (2016). Modular Finger Control through Tactile Feedback for In-Hand

  • Stabilization. Arxiv. https://arxiv.org/pdf/1612.08202.pdf
slide-26
SLIDE 26

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 26

Edin’s insight and the “Edin-Veiga hypothesis” Edin’s insight (paraphrased)

  • While human grasp planning is global, human grib control

is local in every finger — they “communicate” through the object.

  • Humans appear to control slip!
  • It does not really matter whether the fingers belong to

different humans.

“Edin-Veiga hypothesis”

1.A human and a robot finger can hold objects well- together! 2.Modular grip control can be accomplished through tactile sensing-based independent finger grip control

Benoni Edin, Neuroscientist at Umea University

slide-27
SLIDE 27

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 27

A human and a robot finger can hold

  • bjects well-together!
slide-28
SLIDE 28

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Modular grip control by tactile sensing- based independent finger grip control

Fingers are controlled independently with no explicit coordination. Tactile based slip predictors give feedback to each finger. Each finger attempts to locally stabilise the object. Finger coordination emerges from the tactile feedback.

28

slide-29
SLIDE 29

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Preliminary results on modular multi-finger grip control

Multiple stable grasps with different Robots. Grasps using 2, 3, 4 and 5 fingers. Grip control of unknown

  • bjects.

Experiments with the four finger Allegro hand and the five finger DLR hand.

29

2 finger grip control 3 finger grip control 4 finger grip control 5 finger grip control

slide-30
SLIDE 30

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Results on modular multi-finger grip control

30

Two Fingers Three Fingers Four Fingers

slide-31
SLIDE 31

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Results on modular multi-finger grip control

31

slide-32
SLIDE 32

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 32

How can we learn to act using tactile sensing?

  • 1. What can we learn to recognize from tactile

interaction?

  • 2. How can we efficiently explore through touch?
  • 3. How can we learn to control slip from touch?
  • 4. Can we obtain modular grip control from

single finger slip control?

  • 5. How can we self-improve manipulation?

Property recognition Tactile exploration Predict & Control Slip Grasp Control by Slip Control Efficient teaching

slide-33
SLIDE 33

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 33

How can we self-improve manipulation?

Learning continuous policies for robots with high-D sensing is hard:

  • Small data sets
  • Noisy data
  • Continuous states and actions
  • High dimensional state

representation ➡ Appropriate policy search methods are needed!

Efficient teaching of robots by imitation & RL Herke van Hoof O l i v e r K r

  • e

m e r Yevgen Chetobar

van Hoof, H.; Hermans, T.; Neumann, G.; Peters, J. (2015). Learning Robot In-Hand Manipulation with Tactile Features, Proceedings of the International Conference on Humanoid Robots (HUMANOIDS).

slide-34
SLIDE 34

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 34

RL with small batches

Maximizing return on small, noisy datasets? → overfitting and unstable behavior

− − − −2 −1 1 2 0.5 1 action probability −2 −1 1 2 0.1 0.2 action Q−value − − 1 probability

complex policies

− − − −2 −1 1 2 0.5 1 action probability

slide-35
SLIDE 35

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Relative entropy policy search

RL objective Bounded update

35

max

π(a|s) Ea,s(Ra,s)

s.t. KL(⇡(a|s)µπ(s)||q(a, s)) < ✏

Instead: smooth updates → bound divergence w.r.t. previous distribution q

Peters, J.; Muelling, K.; Altun, Y. (2010). Relative Entropy Policy Search, Proceedings of the Twenty- Fourth National Conference on Artificial Intelligence (AAAI)

slide-36
SLIDE 36

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 36

RL and REPS

Lagrangian optimisation yields sample-based policy

δ(a, s, V ) = Ra,s + Es0[V (s0)|s, a] − V (s)

π(ai|si) ∝ q(ai, si) exp ✓δ(ai, si, V ) η ◆

−2 −1 1 2 0.1 0.2 action Q−value −2 −1 1 2 0.5 1 action probability

slide-37
SLIDE 37

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 37

RL and REPS

Lagrangian optimisation yields sample-based policy

δ(a, s, V ) = Ra,s + Es0[V (s0)|s, a] − V (s)

π(ai|si) ∝ q(ai, si) exp ✓δ(ai, si, V ) η ◆

−2 −1 1 2 0.1 0.2 action Q−value −2 −1 1 2 0.5 1 action probability

slide-38
SLIDE 38

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Scraping Experiment

38

Initial Policy Final Policy

Robot should adapt motor primitives to disturbances

Sensory (tactile) coupling Improved robustness Reward similarity to demo sensory signal

Chebotar, Y.; Kroemer, O.; Peters, J. (2014). Learning Robot Tactile Sensing for Object Manipulation, Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems.

slide-39
SLIDE 39

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 39

Good Features for Self-Improving Tactile Stabilization Feature Discovery

  • Good tactile features difficult to get
  • Modify auto-encoders to learn features

and exploit structure

Experiment

  • Robot manipulates 2dof platform.
  • Input only from tactile sensors.
  • 19 taxels x 12 time steps: 228 dimensions
  • goal: move platform to origin

van Hoof, H.; Chen, N.; Karl, M.; van der Smagt, P.; Peters, J. (2016). Stable Reinforcement Learning with Autoencoders for Tactile and Visual Data, Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems.

number of roll-outs

50 100 150

average reward

  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

stabilization task

VAE, exploring policy VAE, exploiting policy raw input, exploring policy

slide-40
SLIDE 40

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 40

RL and REPS: Limitations

Expectation approximated through single samples

Assumes no noise

V and policy assumed linear in designed features

Manual feature design in high-d is strenuous Grids of features in high-d do not exploit manifolds ➡These limitations need to be overcome by non-parametric approaches!

a s’

van Hoof, H.; Peters, J.; Neumann, G. (2015). Learning of Non-Parametric Control Policies with High- Dimensional State Features, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS).

slide-41
SLIDE 41

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 41

Nonparametric policies

The Bellman error and Lagrangian parameters define policy Policy only defined at sampled state-action pairs Fit weighted Gaussian process as non-parametric policy

π(ai|si) ∝ q(ai, si) exp ✓δ(ai, si, V ) η ◆

slide-42
SLIDE 42

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 42

Pendulum swing-up with control noise

REPS NP + samples REPS NP + model REPS feature + model

noise: need model features represent space evenly grid (simulation) works well in low-d non-bounded methods unstable

NP value it, on-policy NP value it, off-policy grid

slide-43
SLIDE 43

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 43

Image-based pendulum swing-up

slide-44
SLIDE 44

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 44

Learned in-hand manipulation

slide-45
SLIDE 45

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 45

THE REal-MAKE: The actors, …

Allegro Hand Wessling Robotics/DLR FFH Hand KUKA/DLR Arms Mitsubishi PA-10 Right-Hand Robotics Reflex Hand

slide-46
SLIDE 46

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 46

THE REal-MAKE: The actors, the sensors, …

Syntouch BioTac

Yi, Z.; Peters, J.; Zhang, Y. (submitted). A Bioinspired Tactile Sensor for Surface Roughness Discrimination Kroemer, O.; Lampert, C.H.; Peters, J. (2011). Learning Dynamic Tactile Sensing with Robust Vision-based Training, IEEE Transactions on Robotics. Chebotar, Y.; Kroemer, O.; Peters, J. (2014). Learning Robot Tactile Sensing for Object Manipulation, Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems.

slide-47
SLIDE 47

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 47

THE REal-MAKE: The actors, the sensors, the STARS, …

Filipe Veiga T u c k e r H e r m a n s Janine Hölscher R

  • b

e r t

  • C

a l a n d r a O l i v e r K r

  • e

m e r Z h e n g u n Y i H e r k e v a n H

  • f

Yevgen Chetobar

slide-48
SLIDE 48

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS| 48

THE REal-MAKE: The actors, the sensors, the STARS, … and THE END Take-home messages

  • 1. We can predict object identity, rough-

ness, slip, … from tactile sensing.

  • 2. Efficient tactile surface exploration

can be obtained through BO.

  • 3. Slip prediction enables slip control!
  • 4. Multi-finger grip control can be

composed by many

  • 5. Efficient reinforcement learning can

enable adaptation to the task.

THE END

slide-49
SLIDE 49

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Contact Distributions

Want to learn which contacts afford a desired interaction Need to represent sets of contacts between objects

Variable number of unordered contacts

Compare distributions of contacts and not individual ones

49

Pushing Pushing & Grasping

slide-50
SLIDE 50

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Contact Distribution Kernels

Six dimensional contacts x: 3D positions and 3D forces Evaluated different contact distribution representations

Bag of Features and Exponential χ2 kernel Bhattacharyya Kernel Normalized Expected Likelihood Kernel

50

[x]1 [x]2 [x]1 [x]2 [x]1 [x]2

slide-51
SLIDE 51

Jan Peters | Intelligent Autonomous Systems @ TU Darmstadt|Robot Learning @ MPI-IS|

Blind Grasping Experiment

Predict successful grasps from tactile data before lift

200 grasps across 50 objects using ReflexHand

51

  • Nr. Training Samples

10 20 30 40 50 60 70 80 90 100

Accuracy

0.5 0.55 0.6 0.65 0.7 0.75 0.8

Bhat Expχ2 NEL BoF

Kroemer, O.; Peters, J. (2014). Predicting Object Interactions from Contact Distributions, Proceedings

  • f the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS).