A Bayesian Approach to Empirical Local Linearization for Robotics - - PowerPoint PPT Presentation

a bayesian approach to empirical local linearization for
SMART_READER_LITE
LIVE PREVIEW

A Bayesian Approach to Empirical Local Linearization for Robotics - - PowerPoint PPT Presentation

A Bayesian Approach to Empirical Local Linearization for Robotics Jo-Anne Ting 1 , Aaron DSouza 2 , Sethu Vijayakumar 3 , Stefan Schaal 1 1 University of Southern California, 2 Google, Inc., 3 University of Edinburgh ICRA 2008 May 23, 2008


slide-1
SLIDE 1

A Bayesian Approach to Empirical Local Linearization for Robotics

Jo-Anne Ting1, Aaron D’Souza2, Sethu Vijayakumar3, Stefan Schaal1

1University of Southern California, 2Google, Inc., 3University of Edinburgh

ICRA 2008 May 23, 2008

slide-2
SLIDE 2

2

Outline

  • Motivation
  • Past & related work
  • Bayesian locally weighted regression
  • Experimental results
  • Conclusions
slide-3
SLIDE 3

3

Motivation

  • Locally linear methods have been shown to be useful for robot

control (e.g., learning internal models of high-dimensional systems for feedforward control or local linearizations for optimal control & reinforcement learning).

  • A key problem is to find the “right” size of the local region for a

linearization, as in locally weighted regression.

  • Existing methods* use either cross-validation techniques, complex

statistical hypothesis or require significant manual parameter tuning for good & stable performance.

*e.g., supersmoothing (Friedman, 84), LWPR (Vijayakumar et al, 05), (Fan & Gijbels, 92 & 95)

X Y

slide-4
SLIDE 4

4

Outline

  • Motivation
  • Past & related work
  • Bayesian locally weighted regression
  • Experimental results
  • Conclusions
slide-5
SLIDE 5

5

Quick Review of Locally Weighted Regression

  • Given a nonlinear regression problem, , our goal is to

approximate a locally linear model at each query point xq in order to make the prediction:

  • We compute the measure of locality for each data sample with a

spatial weighting kernel K, e.g., wi = K(xi, xq, h).

  • If we can find the “right” local regime for each xq, nonlinear function

approximation may be solved accurately and efficiently.

y = f x

( )+

yq = bTxq

Previous methods may: i) Be sensitive to initial values ii) Require tuning/setting of open parameters iii) Be computationally involved Weighting kernel X Y

slide-6
SLIDE 6

6

Outline

  • Motivation
  • Past & related work
  • Bayesian locally weighted regression
  • Experimental results
  • Conclusions
slide-7
SLIDE 7

7

Bayesian Locally Weighted Regression

  • Our variational Bayesian algorithm:

i. Learns both b and the optimal h

  • ii. Handles high-dimensional data
  • iii. Associates a scalar indicator weight wi with each data sample
  • We assume the following prior distributions:

i = 1,..,N bm 2 ahm bhm N

2

n m = 1,..,d wim yi xim hm

p yi xi

( ) ~ Normal b

T xi, 2

( )

p b

2

( ) ~ Normal 0,

2 b 0

( )

p

2

( ) ~ Scaled-Inv-

2 n, N 2

( )

where each data sample has a weight wi:

wi = wim

m=1 d

  • , where p wim

( ) ~ Bernoulli

1 + xim xqm

( )

r hm

  • 1

( )

hm ~ Gamma ahm,bhm

( )

slide-8
SLIDE 8

8

Inference Procedure

  • We can treat this as an EM learning problem (Dempster & Laird, ‘77):
  • We use a variational factorial approximation of the true joint posterior

distribution* (e.g., Ghahramani & Beal, ‘00) and a variational approximation on concave/convex functions, as suggested by (Jaakkola & Jordan, ‘00), to get analytically tractable inference.

Maximize L,where L = log p yi,wi,b,z,h xi

( )

i=1 N

  • *Q b, z,h

( ) = Q b, z ( )Q h

( )

L = log p yi xi,b, 2

( )

wi

i=1 N

  • +

log p wim

( )

m=1 d

  • i=1

N

  • + log p b 2

( )

+log p 2

( ) + log p h

( )

where

slide-9
SLIDE 9

9

Important Things to Note

  • For each local model, our algorithm:

i. Learns the optimal bandwidth value, h (i.e. the “appropriate” local regime)

  • ii. Is linear in the number of input dimensions per EM iteration (for

an extended model with intermediate hidden variables, z, introduced for fast computation)

  • iii. Provides a natural framework to incorporate prior knowledge of

the strong (or weak) presence of noise

slide-10
SLIDE 10

10

Outline

  • Motivation
  • Past & related work
  • Bayesian locally weighted regression
  • Experimental results
  • Conclusions
slide-11
SLIDE 11

11

Experimental Results: Synthetic data

Function with discontinuity + N(0,0.3025) output noise Function with increasing curvature + N(0,0.01) output noise

X Y

slide-12
SLIDE 12

12

Experimental Results: Synthetic data

Function with peak + N(0,0.09) output noise Straight line (notice “flat” kernels are learnt)

slide-13
SLIDE 13

13

Experimental Results: Synthetic data

Kernel Shaping Gaussian Process regression Target function

2D “cross” function* + N(0, 0.01)

Kernel Shaping: Learnt Kernels

*Training data has 500 samples and mean-zero noise with variance of 0.01 added to outputs.

slide-14
SLIDE 14

14

Experimental Results: Robot arm data

  • Given a kinematics problem for a 7 DOF robot arm:

we want to estimate the Jacobian, J, for the purpose of establishing the algorithm does the right thing for each local regression problem:

  • For a particular local linearization problem, we compare the estimated

Jacobian using BLWR, JBLWR, to the:

  • Analytically computed Jacobian, JA
  • Estimated Jacobian using locally weighted regression, JLWR

(where the optimal distance metric is found with cross-validation).

p = f

( )

Input data consists of 7 arm joint angles

p = x y

[

z]

T

Resulting position of arm’s end effector in Cartesian space

dp dt = df

( )

d

J=?

{

d dt

slide-15
SLIDE 15

15

Angular & Magnitude Differences of Jacobians

  • We compare each of the estimated Jacobian matrices, JLWR & JBLWR,

with the analytically computed Jacobian, JA.

  • Specifically, we calculate the angular & magnitude differences

between the row vectors of the Jacobian matrices:

  • Observations:
  • BLWR & LWR (with an optimally tuned distance metric) perform similarly
  • The problem is ill-conditioned and not so easy to solve as it may appear.
  • Angular differences for J2 are large, but magnitudes of vectors are small.

JA,1 JBLWR,1 e.g. consider the 1st row vector of JBLWR and the 1st row vector of JA

slide-16
SLIDE 16

16

Outline

  • Motivation
  • Past & related work
  • Bayesian locally weighted regression
  • Experimental results
  • Conclusions
slide-17
SLIDE 17

17

Conclusions

  • We have a Bayesian formulation of spatially locally adaptive kernels that:

i. Learns the optimal bandwidth value, h (i.e., “appropriate” local regime)

  • ii. Is computationally efficient
  • iii. Provides a natural framework to incorporate prior knowledge of noise

level

  • Extensions to high-dimensional data with redundant & irrelevant input

dimension, incremental version, embedding in other nonlinear methods,

  • etc. are ongoing.
slide-18
SLIDE 18

18

Angular & Magnitude Differences of Jacobians

0.5758 0.4687 0.1071 25 degrees J3 0.0427 0.2780 0.2353 79 degrees J2 0.6464 0.5280 0.1129 19 degrees J1 |JBLWR,i| |JA,i| abs(|JA,i|- |JBLWR,i|) ∠JA,i - ∠JBLWR,i Ji

Between analytical Jacobian JA & inferred Jacobian JBLWR

0.5903 0.4687 0.1216 27 degrees J3 0.0734 0.2780 0.2047 85 degrees J2 0.6411 0.5280 0.1182 16 degrees J1 |JLWR,i| |JA,i| abs(|JA,i|- |JLWR,i|) ∠JA,i - ∠JLWR,i Ji

Between analytical Jacobian JA & inferred Jacobian of LWR (with D=0.1) JLWR Observations: i) BLWR & LWR (with an optimally tuned D) perform similarly ii) Problem is ill-conditioned (condition number is very high ~1e5). iii) Angular differences for J2 are large, but magnitudes of vectors are small.