deviation ( ) 10/6/16 CSE-571: Robotics 2 2 X ~ N ( , - - PowerPoint PPT Presentation

deviation 10 6 16 cse 571 robotics 2 2 x n 2 2 y
SMART_READER_LITE
LIVE PREVIEW

deviation ( ) 10/6/16 CSE-571: Robotics 2 2 X ~ N ( , - - PowerPoint PPT Presentation

Slide from Pieter Abbeel Gaussian with mean ( ) and standard deviation ( ) 10/6/16 CSE-571: Robotics 2 2 X ~ N ( , ) 2 2 Y ~ N ( a b , a ) + Y aX b = + 2 2 2 X ~ N


slide-1
SLIDE 1
slide-2
SLIDE 2

10/6/16 CSE-571: Robotics 2

Slide from Pieter Abbeel

¡ Gaussian with mean ( ) and standard

deviation ( )

µ

σ

slide-3
SLIDE 3

) , ( ~ ) , ( ~

2 2 2

σ µ σ µ a b a N Y b aX Y N X + ⇒ ⎭ ⎬ ⎫ + =

slide-4
SLIDE 4

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + + ⋅ ⇒ ⎭ ⎬ ⎫

− − 2 2 2 1 2 2 2 2 1 2 1 1 2 2 2 1 2 2 2 1 2 2 2 2 2 1 1 1

1 , ~ ) ( ) ( ) , ( ~ ) , ( ~ σ σ µ σ σ σ µ σ σ σ σ µ σ µ N X p X p N X N X

slide-5
SLIDE 5

10/6/16 CSE-571: Robotics 5

p(x) = Ν(µ,Σ) x = xa xb ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ , µ = µa µb ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ Σ = Σaa Σab Σba Σbb ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ p(x) = 1 (2π)d/2 Σ

1/2 e −1 2(x−µ)T Σ−1(x−µ)

Picture from [Bishop: Pattern Recognition and Machine Learning, 2006]

xb xa

slide-6
SLIDE 6

" µ = [1; 0] " Σ = [1 0; 0 1] " µ = [-.5; 0] " Σ = [1 0; 0 1] " µ = [-1; -1.5] " Σ = [1 0; 0 1]

10/6/16 CSE-571: Robotics 6

Slide from Pieter Abbeel

slide-7
SLIDE 7

10/6/16 CSE-571: Robotics 7

Slide from Pieter Abbeel

! µ = [0; 0] ! Σ = [1 0 ; 0 1]

" µ = [0; 0] " Σ = [.6 0 ; 0 .6] " µ = [0; 0] " Σ = [2 0 ; 0 2]

slide-8
SLIDE 8

10/6/16 CSE-571: Robotics 8

Slide from Pieter Abbeel

" µ = [0; 0] " Σ = [1 0; 0 1] " µ = [0; 0] " Σ = [1 0.5; 0.5 1] " µ = [0; 0] " Σ = [1 0.8; 0.8 1]

slide-9
SLIDE 9

" µ = [0; 0] " Σ = [1 -0.5 ; -0.5 1] " µ = [0; 0] " Σ = [1 -0.8 ; -0.8 1] " µ = [0; 0] " Σ = [3 0.8 ; 0.8 1]

10/6/16 CSE-571: Robotics 9

Slide from Pieter Abbeel 1 3

slide-10
SLIDE 10

p(xa) = p(

xa,xb)dxb p(xa) = Ν µa, Σaa

( )

Pictures from [Bishop: PRML, 2006]

10/6/16 CSE-571: Robotics 10

p(xa | xb) = Ν µa|b, Σa|b

( )

µa|b = µa + ΣabΣbb

−1(xb −µb)

Σa|b = Σaa − ΣabΣbb

−1Σba

¡ Marginalizing joint distribution results

in a Gaussian

¡ Conditioning also leads to a Gaussian

Prior mean (b) Observed value Prior Variance (b) Cross co-variance Prior Variance (a) Shrink term (>= 0)

p xa xb ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ = Ν µa µb ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥, Σaa Σba Σab Σbb ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟

slide-11
SLIDE 11

10/6/16 CSE-571: Robotics 11

slide-12
SLIDE 12

¡ Modeling the relationship between real-valued

variables in data

▪ Sensor models, dynamics models, stock market etc

¡ Two broad classes of models:

▪ Parametric:

▪ Learn a model of the data, use model to make new predictions ▪ Eg: Linear, Non-linear, Neural Networks etc.

▪ Non-Parametric:

▪ Keep the data around and use it to make new predictions ▪ Eg: Nearest Neighbor methods, Locally Weighted Regression, Gaussian Processes etc.

10/6/16 CSE-571: Robotics 12

slide-13
SLIDE 13

2 4 6 8 10 −3 −2 −1 1 2 x y Parametric models Training set Linear Polynomial−4 Polynomial−8

10/6/16 CSE-571: Robotics 13

¡ Idea: Summarize

data using a learned model:

§ Linear, Polynomial § Neural Networks etc

¡ Computationally

efficient, tradeoff complexity vs generalization

slide-14
SLIDE 14

10/6/16 CSE-571: Robotics 14

¡ Idea: Use nearest

neighbor’s prediction (with some interpolation)

§ Non-parametric, keeps all

data

§ Ex: 1-NN, NN with linear

interpolation

¡ Easy. Needs lot of data § Best you can do in limit of

infinite data

¡ Computationally expensive

in high dimensions

2 4 6 8 10 −4 −3 −2 −1 1 2 3 4 X Y Non−Parametric models Training set 1−NN NN−Linear

slide-15
SLIDE 15

10/6/16 CSE-571: Robotics 15

¡ Idea: Interpolate based on

“close” training data

§ Closeness defined using a

“kernel” function

§ Test output is a weighted

interpolation of training

  • utputs

§ Locally Weighted

Regression, Gaussian Processes

¡ Can model arbitrary

(smooth) functions

§ Need to keep around some

(maybe all) training data

2 4 6 8 10 −4 −3 −2 −1 1 2 3 4 X Y Smooth Non−Parametric models Training set LWR−NN GP GP−Var

slide-16
SLIDE 16

10/6/16 CSE-571: Robotics 16

slide-17
SLIDE 17

¡ Non-parametric regression model ¡ Distribution over functions ¡ Fully specified by training data, mean and

covariance functions

¡ Covariance given by “kernel” which measures

distance of inputs in kernel space

10/6/16 17 CSE-571: Robotics

slide-18
SLIDE 18

¡ Given, inputs (x) and targets(y): ¡ GPs model the targets as a noisy function of the

inputs:

¡ Formally, a GP is a collection of random variables, any

finite number of which have a joint Gaussian distribution:

10/6/16 18 CSE-571: Robotics

f (x) ~ GP(m(x),k(x, x')) m(x) = E[ f (x)] k(x, x') = E[( f (x)− m(x))( f (x')− m(x'))]

yi = f (xi)+ε; ε ~ N(0,σ n

2)

1 1 2 2

{( , ),( , ), ,( , )} ( , )

n n

D y y y = … = x x x X y

slide-19
SLIDE 19

¡ Given a (finite) set of inputs (X), GP models the

  • utputs (y) as jointly Gaussian:

¡ Usually, we assume zero-mean prior

▪ Can define other mean functions (constant, polynomials etc)

10/6/16 19 CSE-571: Robotics

P(y | X ) = N(m(X ),K(X , X )+σ n

2I)

m = m(x1) m(x2) ! m(xn) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ K = k(x1,x1) … k(x1,xn) k(x2,x1) ! k(xi,xi) ! k(xn,x1) … k(xn,xn) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟

Noise

slide-20
SLIDE 20

¡ Covariance matrix (K) is defined through the

“kernel” function:

§ Specifies covariance of the outputs as the function

  • f inputs

¡ Example: Squared Exponential Kernel § Covariance proportional to distance in input space § Similar input points will have similar outputs

10/6/16 CSE-571: Robotics 20

k(x, ʹ x ) =σ f

2 e −1 2(x− ʹ x )W (x− ʹ x )T

slide-21
SLIDE 21

10/6/16 CSE-571: Robotics 21

Pictures from [Bishop: PRML, 2006]

¡ GP prior: Outputs jointly zero-mean Gaussian:

P(y | X) = Ν(0, K +σ n

2I)

slide-22
SLIDE 22

¡ Training data: ¡ Test pair (y unknown): ¡ GP outputs are jointly Gaussian: ¡ Conditioning on y:

P(y* | x*,y,X) = N µ*,σ *

2

( )

µ* = k*

T K +σ n 2I

( )

−1 y

σ *

2 = k** − k* T K +σ n 2I

( )

−1 k*

k*[i] = k(x*,xi); k** = k(x*,x*)

10/6/16 22 CSE-571: Robotics

1 1 2 2

{( , ),( , ), ,( , )} ( , )

n n

D y y y = … = x x x X y

p(xa | xb) = Ν µa|b, Σa|b

( )

µa|b = µa + ΣabΣbb

−1(xb − µb)

Σa|b = Σaa − ΣabΣbb

−1Σba

Recall conditional

{x*, y*}

P(y, y* | X, x*) = N(µ,Σ); P(y | X) = N(0, Κ +σ 2

nI)

slide-23
SLIDE 23

10/6/16 CSE-571: Robotics 23

slide-24
SLIDE 24

¡ Noise Standard deviation ( ) § Affects how likely a new observation changes

predictions (and covariance)

¡ Kernel (choose based on data) § SE, Exponential, Matern etc. ¡ Kernel hyperparameters: § SE kernel:

▪ Length scale (how fast the function changes) ▪ Scale factor (how large the function variance is)

10/6/16 CSE-571: Robotics 24

σ 2

n

k(x, ʹ x ) =σ f

2 e −1 2(x− ʹ x )W (x− ʹ x )T

slide-25
SLIDE 25

10/6/16 CSE-571: Robotics 25

k(x, ′ x ) = θ0 exp −θ1 2 x − ′ x

2

⎛ ⎝ ⎜ ⎞ ⎠ ⎟ +θ2 +θ3xTx'

Pictures from [Bishop: PRML, 2006]

slide-26
SLIDE 26

¡ Maximize data log likelihood: ¡ Compute derivatives wrt. params ¡ Optimize using conjugate gradient descent

log p(y | X,θ) = − 1 2 yT K + σ n

2I

( )

−1 y − 1

2 log K + σ n

2I

( )− n

2 log2π

2 2

, ,

n f

l θ σ σ = 〈 〉

10/6/16 26 CSE-571: Robotics

θ* = argmax

θ

p(y | X,θ)

slide-27
SLIDE 27

10/6/16 CSE-571: Robotics 27

slide-28
SLIDE 28
  • Learn hyperparameters via numerical methods
  • Learn noise model at the same time

28 10/6/16 CSE-571: Robotics

slide-29
SLIDE 29
  • System:
  • Commercial blimp envelope with custom gondola
  • XScale based computer with Bluetooth connectivity
  • Two main motors with tail motor (3D control)
  • Ground truth obtained via VICON motion capture system

29 10/6/16 CSE-571: Robotics

slide-30
SLIDE 30

¡ 12-D state=[pos,rot,transvel,rotvel] ¡ Describes evolution of state as ODE ¡ Forces / torques considered: buoyancy, gravity, drag, thrust ¡ 16 parameters are learned by optimization on ground truth

motion capture data

⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − − = ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ =

∑ ∑

− −

) * ( ) * ( ) (

1 1

ω ω ω ξ ω ξ J Torques J Mv Forces M H v R v p dt d s

e b

  • 10/6/16

CSE-571: Robotics 30

slide-31
SLIDE 31
  • Use ground truth state to extract:

– Dynamics data

  • Learn model using Gaussian process regression

– Learn process noise inherent in system

1 1 1

], , [ s c s Δ

2 2 2

], , [ , s c s Δ =

S

D

1

c

1

s

2

c

2

s

2

  • 1

s Δ

3

s

2

s Δ

3

31 10/6/16 CSE-571: Robotics

slide-32
SLIDE 32
  • Combine GP model with parametric model
  • Advantages

– Captures aspects of system not considered by parametric

model

– Learns noise model in same way as GP-only models – Higher accuracy for same amount of training data

1

c

1

s

2

s

1

s Δ

]) , ([ ], , [

1 1 1 1 1

c s f s c s − Δ

=

X

D

]) , ([

1 1 c

s f

32 10/6/16 CSE-571: Robotics

slide-33
SLIDE 33

Propagation method pos(mm) rot(deg) vel(mm/s) rotvel(deg/s) Param 3.3 0.5 14.6 1.5 GPonly 1.8 0.2 9.8 1.1 EGP 1.6 0.2 9.6 1.3

33

Dynamics model error

  • 1800 training points, mean error over 900 test points
  • For dynamics model, 0.25 sec predictions

10/6/16 CSE-571: Robotics

slide-34
SLIDE 34

¡ Heteroscedastic (state dependent) noise ¡ Non-stationary GPs ¡ Coupled outputs ¡ Sparse GPs

§ Online: Decide whether or not to accept new point § Remove points § Optimize small set of points

¡ Classification

§ Laplace approximation § No closed-form solution, sampling

10/6/16 CSE-571: Robotics 34

slide-35
SLIDE 35

¡ GPs provide flexible modeling framework ¡ Take data noise and uncertainty due to data

sparsity into account

¡ Combination with parametric models

increases accuracy and reduces need for training data

¡ Computational complexity

is a key problem

10/6/16 35 CSE-571: Robotics

slide-36
SLIDE 36

¡ Website: http://www.gaussianprocess.org/ ¡ GP book: http://www.gaussianprocess.org/gpml/ ¡ GPLVM: http://inverseprobability.com/fgplvm/ ¡ GPDM: http://www.dgp.toronto.edu/~jmwang/gpdm/ ¡ Bishop book:

http://research.microsoft.com/en-us/um/people/cmbishop/prml/

10/6/16 36 CSE-571: Robotics