deviation ( ) 10/6/16 CSE-571: Robotics 2 2 X ~ N ( , - - PowerPoint PPT Presentation
deviation ( ) 10/6/16 CSE-571: Robotics 2 2 X ~ N ( , - - PowerPoint PPT Presentation
Slide from Pieter Abbeel Gaussian with mean ( ) and standard deviation ( ) 10/6/16 CSE-571: Robotics 2 2 X ~ N ( , ) 2 2 Y ~ N ( a b , a ) + Y aX b = + 2 2 2 X ~ N
10/6/16 CSE-571: Robotics 2
Slide from Pieter Abbeel
¡ Gaussian with mean ( ) and standard
deviation ( )
µ
σ
) , ( ~ ) , ( ~
2 2 2
σ µ σ µ a b a N Y b aX Y N X + ⇒ ⎭ ⎬ ⎫ + =
⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + + ⋅ ⇒ ⎭ ⎬ ⎫
− − 2 2 2 1 2 2 2 2 1 2 1 1 2 2 2 1 2 2 2 1 2 2 2 2 2 1 1 1
1 , ~ ) ( ) ( ) , ( ~ ) , ( ~ σ σ µ σ σ σ µ σ σ σ σ µ σ µ N X p X p N X N X
10/6/16 CSE-571: Robotics 5
p(x) = Ν(µ,Σ) x = xa xb ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ , µ = µa µb ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ Σ = Σaa Σab Σba Σbb ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ p(x) = 1 (2π)d/2 Σ
1/2 e −1 2(x−µ)T Σ−1(x−µ)
Picture from [Bishop: Pattern Recognition and Machine Learning, 2006]
xb xa
" µ = [1; 0] " Σ = [1 0; 0 1] " µ = [-.5; 0] " Σ = [1 0; 0 1] " µ = [-1; -1.5] " Σ = [1 0; 0 1]
10/6/16 CSE-571: Robotics 6
Slide from Pieter Abbeel
10/6/16 CSE-571: Robotics 7
Slide from Pieter Abbeel
! µ = [0; 0] ! Σ = [1 0 ; 0 1]
" µ = [0; 0] " Σ = [.6 0 ; 0 .6] " µ = [0; 0] " Σ = [2 0 ; 0 2]
10/6/16 CSE-571: Robotics 8
Slide from Pieter Abbeel
" µ = [0; 0] " Σ = [1 0; 0 1] " µ = [0; 0] " Σ = [1 0.5; 0.5 1] " µ = [0; 0] " Σ = [1 0.8; 0.8 1]
" µ = [0; 0] " Σ = [1 -0.5 ; -0.5 1] " µ = [0; 0] " Σ = [1 -0.8 ; -0.8 1] " µ = [0; 0] " Σ = [3 0.8 ; 0.8 1]
10/6/16 CSE-571: Robotics 9
Slide from Pieter Abbeel 1 3
p(xa) = p(
∫
xa,xb)dxb p(xa) = Ν µa, Σaa
( )
Pictures from [Bishop: PRML, 2006]
10/6/16 CSE-571: Robotics 10
p(xa | xb) = Ν µa|b, Σa|b
( )
µa|b = µa + ΣabΣbb
−1(xb −µb)
Σa|b = Σaa − ΣabΣbb
−1Σba
¡ Marginalizing joint distribution results
in a Gaussian
¡ Conditioning also leads to a Gaussian
Prior mean (b) Observed value Prior Variance (b) Cross co-variance Prior Variance (a) Shrink term (>= 0)
p xa xb ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ = Ν µa µb ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥, Σaa Σba Σab Σbb ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟
10/6/16 CSE-571: Robotics 11
¡ Modeling the relationship between real-valued
variables in data
▪ Sensor models, dynamics models, stock market etc
¡ Two broad classes of models:
▪ Parametric:
▪ Learn a model of the data, use model to make new predictions ▪ Eg: Linear, Non-linear, Neural Networks etc.
▪ Non-Parametric:
▪ Keep the data around and use it to make new predictions ▪ Eg: Nearest Neighbor methods, Locally Weighted Regression, Gaussian Processes etc.
10/6/16 CSE-571: Robotics 12
2 4 6 8 10 −3 −2 −1 1 2 x y Parametric models Training set Linear Polynomial−4 Polynomial−8
10/6/16 CSE-571: Robotics 13
¡ Idea: Summarize
data using a learned model:
§ Linear, Polynomial § Neural Networks etc
¡ Computationally
efficient, tradeoff complexity vs generalization
10/6/16 CSE-571: Robotics 14
¡ Idea: Use nearest
neighbor’s prediction (with some interpolation)
§ Non-parametric, keeps all
data
§ Ex: 1-NN, NN with linear
interpolation
¡ Easy. Needs lot of data § Best you can do in limit of
infinite data
¡ Computationally expensive
in high dimensions
2 4 6 8 10 −4 −3 −2 −1 1 2 3 4 X Y Non−Parametric models Training set 1−NN NN−Linear
10/6/16 CSE-571: Robotics 15
¡ Idea: Interpolate based on
“close” training data
§ Closeness defined using a
“kernel” function
§ Test output is a weighted
interpolation of training
- utputs
§ Locally Weighted
Regression, Gaussian Processes
¡ Can model arbitrary
(smooth) functions
§ Need to keep around some
(maybe all) training data
2 4 6 8 10 −4 −3 −2 −1 1 2 3 4 X Y Smooth Non−Parametric models Training set LWR−NN GP GP−Var
10/6/16 CSE-571: Robotics 16
¡ Non-parametric regression model ¡ Distribution over functions ¡ Fully specified by training data, mean and
covariance functions
¡ Covariance given by “kernel” which measures
distance of inputs in kernel space
10/6/16 17 CSE-571: Robotics
¡ Given, inputs (x) and targets(y): ¡ GPs model the targets as a noisy function of the
inputs:
¡ Formally, a GP is a collection of random variables, any
finite number of which have a joint Gaussian distribution:
10/6/16 18 CSE-571: Robotics
f (x) ~ GP(m(x),k(x, x')) m(x) = E[ f (x)] k(x, x') = E[( f (x)− m(x))( f (x')− m(x'))]
yi = f (xi)+ε; ε ~ N(0,σ n
2)
1 1 2 2
{( , ),( , ), ,( , )} ( , )
n n
D y y y = … = x x x X y
¡ Given a (finite) set of inputs (X), GP models the
- utputs (y) as jointly Gaussian:
¡ Usually, we assume zero-mean prior
▪ Can define other mean functions (constant, polynomials etc)
10/6/16 19 CSE-571: Robotics
P(y | X ) = N(m(X ),K(X , X )+σ n
2I)
m = m(x1) m(x2) ! m(xn) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ K = k(x1,x1) … k(x1,xn) k(x2,x1) ! k(xi,xi) ! k(xn,x1) … k(xn,xn) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟
Noise
¡ Covariance matrix (K) is defined through the
“kernel” function:
§ Specifies covariance of the outputs as the function
- f inputs
¡ Example: Squared Exponential Kernel § Covariance proportional to distance in input space § Similar input points will have similar outputs
10/6/16 CSE-571: Robotics 20
k(x, ʹ x ) =σ f
2 e −1 2(x− ʹ x )W (x− ʹ x )T
10/6/16 CSE-571: Robotics 21
Pictures from [Bishop: PRML, 2006]
¡ GP prior: Outputs jointly zero-mean Gaussian:
P(y | X) = Ν(0, K +σ n
2I)
¡ Training data: ¡ Test pair (y unknown): ¡ GP outputs are jointly Gaussian: ¡ Conditioning on y:
P(y* | x*,y,X) = N µ*,σ *
2
( )
µ* = k*
T K +σ n 2I
( )
−1 y
σ *
2 = k** − k* T K +σ n 2I
( )
−1 k*
k*[i] = k(x*,xi); k** = k(x*,x*)
10/6/16 22 CSE-571: Robotics
1 1 2 2
{( , ),( , ), ,( , )} ( , )
n n
D y y y = … = x x x X y
p(xa | xb) = Ν µa|b, Σa|b
( )
µa|b = µa + ΣabΣbb
−1(xb − µb)
Σa|b = Σaa − ΣabΣbb
−1Σba
Recall conditional
{x*, y*}
P(y, y* | X, x*) = N(µ,Σ); P(y | X) = N(0, Κ +σ 2
nI)
10/6/16 CSE-571: Robotics 23
¡ Noise Standard deviation ( ) § Affects how likely a new observation changes
predictions (and covariance)
¡ Kernel (choose based on data) § SE, Exponential, Matern etc. ¡ Kernel hyperparameters: § SE kernel:
▪ Length scale (how fast the function changes) ▪ Scale factor (how large the function variance is)
10/6/16 CSE-571: Robotics 24
σ 2
n
k(x, ʹ x ) =σ f
2 e −1 2(x− ʹ x )W (x− ʹ x )T
10/6/16 CSE-571: Robotics 25
k(x, ′ x ) = θ0 exp −θ1 2 x − ′ x
2
⎛ ⎝ ⎜ ⎞ ⎠ ⎟ +θ2 +θ3xTx'
Pictures from [Bishop: PRML, 2006]
¡ Maximize data log likelihood: ¡ Compute derivatives wrt. params ¡ Optimize using conjugate gradient descent
log p(y | X,θ) = − 1 2 yT K + σ n
2I
( )
−1 y − 1
2 log K + σ n
2I
( )− n
2 log2π
2 2
, ,
n f
l θ σ σ = 〈 〉
10/6/16 26 CSE-571: Robotics
θ* = argmax
θ
p(y | X,θ)
10/6/16 CSE-571: Robotics 27
- Learn hyperparameters via numerical methods
- Learn noise model at the same time
28 10/6/16 CSE-571: Robotics
- System:
- Commercial blimp envelope with custom gondola
- XScale based computer with Bluetooth connectivity
- Two main motors with tail motor (3D control)
- Ground truth obtained via VICON motion capture system
29 10/6/16 CSE-571: Robotics
¡ 12-D state=[pos,rot,transvel,rotvel] ¡ Describes evolution of state as ODE ¡ Forces / torques considered: buoyancy, gravity, drag, thrust ¡ 16 parameters are learned by optimization on ground truth
motion capture data
⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − − = ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ =
∑ ∑
− −
) * ( ) * ( ) (
1 1
ω ω ω ξ ω ξ J Torques J Mv Forces M H v R v p dt d s
e b
- 10/6/16
CSE-571: Robotics 30
- Use ground truth state to extract:
– Dynamics data
- Learn model using Gaussian process regression
– Learn process noise inherent in system
1 1 1
], , [ s c s Δ
2 2 2
], , [ , s c s Δ =
S
D
1
c
1
s
2
c
2
s
2
- 1
s Δ
3
s
2
s Δ
3
- …
…
…
31 10/6/16 CSE-571: Robotics
- Combine GP model with parametric model
- Advantages
– Captures aspects of system not considered by parametric
model
– Learns noise model in same way as GP-only models – Higher accuracy for same amount of training data
1
c
1
s
2
s
1
s Δ
]) , ([ ], , [
1 1 1 1 1
c s f s c s − Δ
=
X
D
]) , ([
1 1 c
s f
32 10/6/16 CSE-571: Robotics
Propagation method pos(mm) rot(deg) vel(mm/s) rotvel(deg/s) Param 3.3 0.5 14.6 1.5 GPonly 1.8 0.2 9.8 1.1 EGP 1.6 0.2 9.6 1.3
33
Dynamics model error
- 1800 training points, mean error over 900 test points
- For dynamics model, 0.25 sec predictions
10/6/16 CSE-571: Robotics
¡ Heteroscedastic (state dependent) noise ¡ Non-stationary GPs ¡ Coupled outputs ¡ Sparse GPs
§ Online: Decide whether or not to accept new point § Remove points § Optimize small set of points
¡ Classification
§ Laplace approximation § No closed-form solution, sampling
10/6/16 CSE-571: Robotics 34
¡ GPs provide flexible modeling framework ¡ Take data noise and uncertainty due to data
sparsity into account
¡ Combination with parametric models
increases accuracy and reduces need for training data
¡ Computational complexity
is a key problem
10/6/16 35 CSE-571: Robotics
¡ Website: http://www.gaussianprocess.org/ ¡ GP book: http://www.gaussianprocess.org/gpml/ ¡ GPLVM: http://inverseprobability.com/fgplvm/ ¡ GPDM: http://www.dgp.toronto.edu/~jmwang/gpdm/ ¡ Bishop book:
http://research.microsoft.com/en-us/um/people/cmbishop/prml/
10/6/16 36 CSE-571: Robotics