cs 287 advanced robotics fall 2019 lecture 13 kalman
play

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, - PowerPoint PPT Presentation

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum Likelihood, Expectation Maximization Pieter Abbeel UC Berkeley EECS Outline n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood


  1. CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum Likelihood, Expectation Maximization Pieter Abbeel UC Berkeley EECS

  2. Outline n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

  3. Outline n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

  4. Overview X t- X 0 X t Filtering: 1 n z 0 z t-1 z t Smoothing: n X t- X t+ X 0 X t X T 1 1 z t+ z 0 z t-1 z t z T 1 Note: by now it should be clear that the “u” variables don’t really change anything n conceptually, and going to leave them out to have less symbols appear in our equations.

  5. Filtering n Generally, recursively compute:

  6. Smoothing Generally, recursively compute: n Forward: (same as filter) Backward: n n Combine: n

  7. Complete Smoother Algorithm Forward pass (= filter): n Backward pass: n Note 1: for all times t in one forward+backward pass Note 2: find P(x t | z 0 , …, z T ) by renormalizing Combine: n

  8. Pairwise Posterior n Find n Recall: a t ( x t ) = P ( x t , z 0 , . . . , z t ) b t ( x t ) = P ( z t +1 , . . . , z T | x t ) n So we can readily compute P ( x t , x t +1 , z 0 , . . . , z T ) (Law of total probability) = P ( x t , z 0 , . . . , z t ) P ( x t +1 | x t , z 0 , . . . , z t ) P ( z t +1 | x t +1 , x t , z 0 , . . . , z t ) P ( z t +2 , . . . , z T | x t +1 , x t , z 0 , . . . , z t +1 ) = P ( x t , z 0 , . . . , z t ) P ( x t +1 | x t ) P ( z t +1 | x t +1 ) P ( z t +2 , . . . , z T | x t +1 ) (Markov assumptions) = a t ( x t ) P ( x t +1 | x t ) P ( z t +1 | x t +1 ) b t +1 ( x t +1 ) (definitions a, b)

  9. Exercise n Find

  10. Kalman Smoother n = the smoother algorithm just covered for particular case when P(x t+1 | x t ) and P(z t | x t ) are linear Gaussians n We already know how to compute the forward pass (=Kalman filtering) n Backward pass: n Combination:

  11. Kalman Smoother Backward Pass n Exercise: work out integral for b t

  12. Matlab Code Data Generation Example A = [ 0.99 0.0074; -0.0136 0.99]; C = [ 1 1 ; -1 +1]; n x(:,1) = [-3;2]; n Sigma_w = diag([.3 .7]); Sigma_v = [2 .05; .05 1.5]; n w = randn(2,T); w = sqrtm(Sigma_w)*w; v = randn(2,T); v = sqrtm(Sigma_v)*v; n for t=1:T-1 n x(:,t+1) = A * x(:,t) + w(:,t); z(:,t) = C*x(:,t) + v(:,t); end % now recover the state from the measurements n P_0 = diag([100 100]); x0 =[0; 0]; n % run Kalman filter and smoother here n % + plot n

  13. Kalman Filter/Smoother Example

  14. Outline n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

  15. Overview Filtering: X t- X 0 X t n 1 z 0 z t-1 z t Smoothing: n X t- X t+ X 0 X t X T 1 1 z t+ z 0 z t-1 z t z T 1 MAP: n X t- X t+ X 0 X t X T 1 1 z t+ z 0 z t-1 z t z T 1

  16. MAP Sequence Naively solving by enumerating all possible combinations of x_0,…,x_T is exponential in T n Generally:

  17. MAP --- Complete Algorithm n O(T n 2 )

  18. Kalman Filter (aka Linear Gaussian) Setting Summations à integrals n But: can’t enumerate over all instantiations n However, we can still find solution efficiently: n the joint conditional P( x 0:T | z 0:T ) is a multivariate Gaussian n for a multivariate Gaussian the most likely instantiation equals the mean n à we just need to find the mean of P( x 0:T | z 0:T ) the marginal conditionals P( x t | z 0:T ) are Gaussians with mean equal to the mean of x t under the n joint conditional, so it suffices to find all marginal conditionals We already know how to do so: marginal conditionals can be computed by running the Kalman n smoother. Alternatively: solve convex optimization problem n

  19. Outline n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

  20. Thumbtack n Let θ = P(up), 1-θ = P(down) n How to determine θ ? n Empirical estimate: 8 up, 2 down à

  21. http://web.me.com/todd6ton/Site/Classroom_Blog/Entries/2009/10/7_A_Thumbtack_Experiment.html

  22. Maximum Likelihood θ = P(up), 1-θ = P(down) n Observe: n Likelihood of the observation sequence depends on θ: n Maximum likelihood finds n à extrema at θ = 0, θ = 1, θ = 0.8 à Inspection of each extremum yields θ ML = 0.8

  23. Maximum Likelihood More generally, consider binary-valued random variable with θ = P(1), 1-θ = P(0), assume we n observe n 1 ones, and n 0 zeros Likelihood: n Derivative: n Hence we have for the extrema: n n1/(n0+n1) is the maximum n = empirical counts. n

  24. Log-likelihood The function n is a monotonically increasing function of x Hence for any (positive-valued) function f: n Often more convenient to optimize log-likelihood rather than likelihood n Example: n

  25. Log-likelihood ßà Likelihood Reconsider thumbtacks: 8 up, 2 down n Likelihood n Log-likelihood n Concave Not Concave Definition: A function f is concave if and only n Concave functions are generally easier to maximize then non-concave n functions

  26. Concavity and Convexity f is convex if and only f is concave if and only x 1 x 1 x 2 x 2 λ x 2 +(1- λ )x 2 λx 2 +(1- λ )x 2 “Easy” to minimize “Easy” to maximize

  27. ML for Multinomial n Consider having received samples

  28. ML for Fully Observed HMM Given samples n Dynamics model: n Observation model: n à Independent ML problems for each and each

  29. ML for Exponential Distribution Source: wikipedia n Consider having received samples n 3.1, 8.2, 1.7 ll

  30. ML for Exponential Distribution Source: wikipedia n Consider having received samples

  31. Uniform n Consider having received samples

  32. ML for Gaussian n Consider having received samples

  33. ML for Conditional Gaussian Equivalently: More generally:

  34. ML for Conditional Gaussian

  35. ML for Conditional Multivariate Gaussian

  36. Aside: Key Identities for Derivation on Previous Slide

  37. ML Estimation in Fully Observed Linear Gaussian Bayes Filter Setting Consider the Linear Gaussian setting: n Fully observed, i.e., given n à Two separate ML estimation problems for conditional multivariate n Gaussian: 1: n 2: n

  38. Outline n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

  39. Priors --- Thumbtack Let θ = P(up), 1-θ = P(down) n How to determine θ ? n ML estimate: 5 up, 0 down à n Laplace estimate: add a fake count of 1 for each outcome n

  40. Priors --- Thumbtack n Alternatively, consider θ to be random variable n Prior P(θ) = C θ(1-θ) n Measurements: P( x | θ ) n Posterior: n Maximum A Posterior (MAP) estimation n = find θ that maximizes the posterior à

  41. Priors --- Beta Distribution Figure source: Wikipedia

  42. Priors --- Dirichlet Distribution n Generalizes Beta distribution n MAP estimate corresponds to adding fake counts n 1 , …, n K

  43. MAP for Mean of Univariate Gaussian Assume variance known. (Can be extended to also find MAP for variance.) n n Prior:

  44. MAP for Univariate Conditional Linear Gaussian Assume variance known. (Can be extended to also find MAP for variance.) n Prior: n [Interpret!]

  45. MAP for Univariate Conditional Linear Gaussian: Example TRUE --- Samples . ML --- MAP ---

  46. Cross Validation Choice of prior will heavily influence quality of result n Fine-tune choice of prior through cross-validation: n 1. Split data into “training” set and “validation” set n 2. For a range of priors, n n Train: compute θ MAP on training set n Cross-validate: evaluate performance on validation set by evaluating the likelihood of the validation data under θ MAP just found 3. Choose prior with highest validation score n n For this prior, compute θ MAP on (training+validation) set Typical training / validation splits: n 1-fold: 70/30, random split n 10-fold: partition into 10 sets, average performance for each set being the validation set and the other 9 being the training set n

  47. Outline n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

  48. Mixture of Gaussians Generally: n Example: n ML Objective: given data z (1) , …, z (m) n Setting derivatives w.r.t. θ , µ , Σ equal to zero does not enable to solve for their ML estimates in closed form n We can evaluate function à we can in principle perform local optimization. In this lecture: “EM” algorithm, which is typically used to efficiently optimize the objective (locally)

  49. Expectation Maximization (EM) Example: n Model: n Goal: n Given data z (1) , …, z (m) (but no x (i) observed) n Find maximum likelihood estimates of μ 1 , μ 2 n EM basic idea: if x (i) were known à two easy-to-solve separate ML problems n EM iterates over n E-step : For i=1,…,m fill in missing data x (i) according to what is most likely given the n current model ¹ M-step : run ML for completed data, which gives new model ¹ n

  50. EM Derivation EM solves a Maximum Likelihood problem of the form: n µ: parameters of the probabilistic model we try to find x: unobserved variables z: observed variables Jensen’s Inequality

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend