deviation 10 6 16 cse 571 robotics 2 2 x n 2 2 y
play

deviation ( ) 10/6/16 CSE-571: Robotics 2 2 X ~ N ( , - PowerPoint PPT Presentation

Slide from Pieter Abbeel Gaussian with mean ( ) and standard deviation ( ) 10/6/16 CSE-571: Robotics 2 2 X ~ N ( , ) 2 2 Y ~ N ( a b , a ) + Y aX b = + 2 2 2 X ~ N


  1. Slide from Pieter Abbeel ¡ Gaussian with mean ( ) and standard µ deviation ( ) σ 10/6/16 CSE-571: Robotics 2

  2. 2 X ~ N ( , ) ⎫ µ σ 2 2 Y ~ N ( a b , a ) ⇒ µ + σ ⎬ Y aX b = + ⎭

  3. 2 2 2 X ~ N ( , ) 1 ⎫ ⎛ ⎞ µ σ σ σ 1 1 1 p ( X ) p ( X ) ~ N 2 1 , ⎜ ⎟ ⇒ ⋅ µ + µ ⎬ 1 2 1 2 ⎜ ⎟ 2 2 2 2 2 2 2 − − X ~ N ( , ) σ + σ σ + σ σ + σ µ σ ⎝ ⎠ ⎭ 2 2 2 1 2 1 2 1 2

  4. Picture from [Bishop: Pattern Recognition and Machine Learning, 2006] p ( x ) = Ν ( µ , Σ ) x b ⎛ ⎞ ⎛ ⎞ µ a x a x = µ = ⎜ ⎟ ⎜ ⎟ ⎟ , ⎜ ⎜ ⎟ µ b x b ⎝ ⎠ ⎝ ⎠ ⎛ ⎞ Σ aa Σ ab Σ = ⎜ ⎟ ⎜ ⎟ Σ ba Σ bb ⎝ ⎠ x a − 1 2( x − µ ) T Σ − 1 ( x − µ ) 1 p ( x ) = 1/2 e (2 π ) d /2 Σ 10/6/16 CSE-571: Robotics 5

  5. Slide from Pieter Abbeel " µ = [1; 0] " µ = [-.5; 0] " µ = [-1; -1.5] " Σ = [1 0; 0 1] " Σ = [1 0; 0 1] " Σ = [1 0; 0 1] 10/6/16 CSE-571: Robotics 6

  6. Slide from Pieter Abbeel ! µ = [0; 0] " µ = [0; 0] " µ = [0; 0] " Σ = [.6 0 ; 0 .6] " Σ = [2 0 ; 0 2] ! Σ = [1 0 ; 0 1] 10/6/16 CSE-571: Robotics 7

  7. Slide from Pieter Abbeel " µ = [0; 0] " µ = [0; 0] " µ = [0; 0] " Σ = [1 0; 0 1] " Σ = [1 0.5; 0.5 1] " Σ = [1 0.8; 0.8 1] 10/6/16 CSE-571: Robotics 8

  8. Slide from Pieter Abbeel " µ = [0; 0] " µ = [0; 0] " µ = [0; 0] " Σ = [1 -0.5 ; -0.5 1] " Σ = [1 -0.8 ; -0.8 1] " Σ = [3 0.8 ; 0.8 1] 1 3 10/6/16 CSE-571: Robotics 9

  9. Pictures from [Bishop: PRML, 2006] ¡ Marginalizing joint distribution results in a Gaussian ⎛ ⎞ ⎛ ⎞ p ( x a ) = p ( x a , x b ) dx b ∫ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x a µ a Σ aa Σ ab ⎜ ⎟ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ p ⎥ , ⎟ = Ν ⎜ ⎜ ⎟ ⎢ x b ⎥ ⎢ ⎢ ⎥ µ b Σ ba Σ bb ( ) p ( x a ) = Ν µ a , Σ aa ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎝ ⎠ ⎝ ⎠ ¡ Conditioning also leads to a Gaussian ( ) p ( x a | x b ) = Ν µ a | b , Σ a | b − 1 ( x b − µ b ) µ a | b = µ a + Σ ab Σ bb Prior mean Cross co-variance Prior Variance (b) Observed (b) value − 1 Σ ba Σ a | b = Σ aa − Σ ab Σ bb Prior Variance (a) Shrink term (>= 0) 10/6/16 CSE-571: Robotics 10

  10. 10/6/16 CSE-571: Robotics 11

  11. ¡ Modeling the relationship between real-valued variables in data ▪ Sensor models, dynamics models, stock market etc ¡ Two broad classes of models: ▪ Parametric: ▪ Learn a model of the data, use model to make new predictions ▪ Eg: Linear, Non-linear, Neural Networks etc. ▪ Non-Parametric: ▪ Keep the data around and use it to make new predictions ▪ Eg: Nearest Neighbor methods, Locally Weighted Regression, Gaussian Processes etc. 10/6/16 CSE-571: Robotics 12

  12. ¡ Idea: Summarize Parametric models data using a 2 learned model: § Linear, Polynomial 1 § Neural Networks etc 0 y − 1 ¡ Computationally efficient, tradeoff − 2 Training set complexity vs Linear Polynomial − 4 − 3 generalization Polynomial − 8 0 2 4 6 8 10 x 10/6/16 CSE-571: Robotics 13

  13. ¡ Idea: Use nearest Non − Parametric models neighbor’s prediction (with 4 some interpolation) 3 § Non-parametric, keeps all data 2 § Ex: 1-NN, NN with linear 1 interpolation 0 Y ¡ Easy. Needs lot of data − 1 § Best you can do in limit of infinite data − 2 − 3 ¡ Computationally expensive Training set in high dimensions − 4 1 − NN NN − Linear 0 2 4 6 8 10 X 10/6/16 CSE-571: Robotics 14

  14. ¡ Idea: Interpolate based on Smooth Non − Parametric models “close” training data 4 § Closeness defined using a “kernel” function 3 § Test output is a weighted 2 interpolation of training outputs 1 § Locally Weighted 0 Regression, Gaussian Y Processes − 1 − 2 ¡ Can model arbitrary − 3 (smooth) functions Training set LWR − NN § Need to keep around some − 4 GP (maybe all) training data GP − Var 0 2 4 6 8 10 X 10/6/16 CSE-571: Robotics 15

  15. 10/6/16 CSE-571: Robotics 16

  16. ¡ Non-parametric regression model ¡ Distribution over functions ¡ Fully specified by training data, mean and covariance functions ¡ Covariance given by “kernel” which measures distance of inputs in kernel space 10/6/16 CSE-571: Robotics 17

  17. ¡ Given, inputs (x) and targets(y): D {( x , y ),( x , y ), ,( x , y )} ( , ) X y = … = 1 1 2 2 n n ¡ GPs model the targets as a noisy function of the inputs: 2 ) y i = f ( x i ) + ε ; ε ~ N (0, σ n ¡ Formally, a GP is a collection of random variables, any finite number of which have a joint Gaussian distribution: f ( x ) ~ GP ( m ( x ), k ( x , x ')) m ( x ) = E [ f ( x )] k ( x , x ') = E [( f ( x ) − m ( x ))( f ( x ') − m ( x '))] 10/6/16 CSE-571: Robotics 18

  18. ¡ Given a (finite) set of inputs (X), GP models the outputs (y) as jointly Gaussian: Noise 2 I ) P ( y | X ) = N ( m ( X ), K ( X , X ) + σ n ⎛ ⎞ ⎛ ⎞ m ( x 1 ) k ( x 1 , x 1 ) … k ( x 1 , x n ) ⎜ ⎟ ⎜ ⎟ m ( x 2 ) ⎜ ⎟ k ( x 2 , x 1 ) ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ m = ! K = ! k ( x i , x i ) ! ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ m ( x n ) ⎜ ⎟ ⎜ ⎟ k ( x n , x 1 ) … k ( x n , x n ) ⎝ ⎠ ⎝ ⎠ ¡ Usually, we assume zero-mean prior ▪ Can define other mean functions (constant, polynomials etc) 10/6/16 CSE-571: Robotics 19

  19. ¡ Covariance matrix (K) is defined through the “kernel” function: § Specifies covariance of the outputs as the function of inputs ¡ Example: Squared Exponential Kernel § Covariance proportional to distance in input space § Similar input points will have similar outputs − 1 x ) T 2 e 2( x − ʹ x ) W ( x − ʹ k ( x , ʹ x ) = σ f 10/6/16 CSE-571: Robotics 20

  20. Pictures from [Bishop: PRML, 2006] ¡ GP prior: Outputs jointly zero-mean Gaussian: 2 I ) P ( y | X ) = Ν ( 0 , K + σ n 10/6/16 CSE-571: Robotics 21

  21. ¡ Training data: D {( x , y ),( x , y ), ,( x , y )} ( , ) X y = … = 1 1 2 2 n n ¡ Test pair (y unknown): { x * , y * } ¡ GP outputs are jointly Gaussian: P ( y , y * | X , x * ) = N ( µ , Σ ); P ( y | X ) = N (0, Κ + σ 2 n I ) ¡ Conditioning on y: ( ) ( ) 2 P ( y * | x * , y , X ) = N µ * , σ * p ( x a | x b ) = Ν µ a | b , Σ a | b − 1 y T K + σ n ( ) 2 I µ * = k * µ a | b = µ a + Σ ab Σ bb − 1 ( x b − µ b ) − 1 k * T K + σ n 2 = k ** − k * ( 2 I ) Σ a | b = Σ aa − Σ ab Σ bb − 1 Σ ba σ * k * [ i ] = k ( x * , x i ); k ** = k ( x * , x * ) Recall conditional 10/6/16 CSE-571: Robotics 22

  22. 10/6/16 CSE-571: Robotics 23

  23. ¡ Noise Standard deviation ( ) σ 2 n § Affects how likely a new observation changes predictions (and covariance) ¡ Kernel (choose based on data) § SE, Exponential, Matern etc. ¡ Kernel hyperparameters: − 1 x ) T 2 e 2( x − ʹ x ) W ( x − ʹ § SE kernel: k ( x , ʹ x ) = σ f ▪ Length scale (how fast the function changes) ▪ Scale factor (how large the function variance is) 10/6/16 CSE-571: Robotics 24

  24. Pictures from [Bishop: PRML, 2006] x ) = θ 0 exp − θ 1 ⎛ ⎞ 2 k ( x , ′ 2 x − ′ ⎟ + θ 2 + θ 3 x T x ' x ⎜ ⎝ ⎠ 10/6/16 CSE-571: Robotics 25

  25. ¡ Maximize data log likelihood: θ * = argmax p ( y | X , θ ) θ ( ) − 1 y − 1 ( ) − n log p ( y | X , θ ) = − 1 2 y T K + σ n 2 log K + σ n 2 log2 π 2 I 2 I ¡ Compute derivatives wrt. params 2 2 , , l θ = 〈 σ σ 〉 n f ¡ Optimize using conjugate gradient descent 10/6/16 CSE-571: Robotics 26

  26. 10/6/16 CSE-571: Robotics 27

  27. • Learn hyperparameters via numerical methods • Learn noise model at the same time 28 10/6/16 CSE-571: Robotics

  28. • System: • Commercial blimp envelope with custom gondola • XScale based computer with Bluetooth connectivity • Two main motors with tail motor (3D control) • Ground truth obtained via VICON motion capture system 29 10/6/16 CSE-571: Robotics

  29. e R v ⎡ ⎤ p ⎡ ⎤ b ⎢ ⎥ ⎢ ⎥ H ( ) ξ d ξ ⎢ ⎥ � ⎢ ⎥ s = = ⎢ ⎥ 1 M ( Forces * Mv ) − dt v ∑ ⎢ ⎥ − ω ⎢ ⎥ ⎢ ⎥ 1 ω ⎢ J − ( Torques * J ) ⎥ ∑ − ω ω ⎣ ⎦ ⎣ ⎦ ¡ 12-D state=[pos,rot,transvel,rotvel] ¡ Describes evolution of state as ODE ¡ Forces / torques considered: buoyancy, gravity, drag, thrust ¡ 16 parameters are learned by optimization on ground truth motion capture data 10/6/16 CSE-571: Robotics 30

  30. c 2 s o Δ 2 2 s … Δ c 1 o s 1 s 3 2 3 s 1 • Use ground truth state to extract: – Dynamics data D [ s , c ], s , [ s , c ], s = Δ Δ … S 1 1 1 2 2 2 • Learn model using Gaussian process regression – Learn process noise inherent in system … 31 10/6/16 CSE-571: Robotics

  31. c 1 s Δ 1 f ([ s 1 c , ]) 1 s s 2 1 • Combine GP model with parametric model D [ s , c ], s f ([ s , c ]) = Δ − X 1 1 1 1 1 • Advantages – Captures aspects of system not considered by parametric model – Learns noise model in same way as GP-only models – Higher accuracy for same amount of training data 32 10/6/16 CSE-571: Robotics

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend