CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, - - PowerPoint PPT Presentation

cs 287 advanced robotics fall 2019 lecture 13 kalman
SMART_READER_LITE
LIVE PREVIEW

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, - - PowerPoint PPT Presentation

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum Likelihood, Expectation Maximization Pieter Abbeel UC Berkeley EECS Outline n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood


slide-1
SLIDE 1

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum Likelihood, Expectation Maximization

Pieter Abbeel UC Berkeley EECS

slide-2
SLIDE 2

n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

Outline

slide-3
SLIDE 3

n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

Outline

slide-4
SLIDE 4

n

Filtering:

n

Smoothing:

n

Note: by now it should be clear that the “u” variables don’t really change anything conceptually, and going to leave them out to have less symbols appear in our equations.

Overview

Xt-

1

Xt X0 zt-1 zt z0 Xt-

1

Xt Xt+

1

XT X0 zt-1 zt zt+

1

zT z0

slide-5
SLIDE 5
slide-6
SLIDE 6

n Generally, recursively compute:

Filtering

slide-7
SLIDE 7

n

Generally, recursively compute:

n

Forward: (same as filter)

Smoothing

n

Backward:

n

Combine:

slide-8
SLIDE 8

n

Forward pass (= filter):

n

Backward pass:

n

Combine:

Complete Smoother Algorithm

Note 1: for all times t in one forward+backward pass Note 2: find P(xt | z0, …, zT) by renormalizing

slide-9
SLIDE 9

n Find n Recall: n So we can readily compute

Pairwise Posterior

(Law of total probability) (Markov assumptions) (definitions a, b)

at(xt) = P(xt, z0, . . . , zt) bt(xt) = P(zt+1, . . . , zT | xt)

P(xt, xt+1, z0, . . . , zT ) = P(xt, z0, . . . , zt)P(xt+1 | xt, z0, . . . , zt)P(zt+1 | xt+1, xt, z0, . . . , zt)P(zt+2, . . . , zT | xt+1, xt, z0, . . . , zt+1) = P(xt, z0, . . . , zt)P(xt+1 | xt)P(zt+1 | xt+1)P(zt+2, . . . , zT | xt+1) = at(xt)P(xt+1 | xt)P(zt+1 | xt+1)bt+1(xt+1)

slide-10
SLIDE 10

n Find

Exercise

slide-11
SLIDE 11

n = the smoother algorithm just covered for particular case

when P(xt+1 | xt) and P(zt | xt) are linear Gaussians

n We already know how to compute the forward pass

(=Kalman filtering)

n Backward pass: n Combination:

Kalman Smoother

slide-12
SLIDE 12

n Exercise: work out integral for bt

Kalman Smoother Backward Pass

slide-13
SLIDE 13

n

A = [ 0.99 0.0074; -0.0136 0.99]; C = [ 1 1 ; -1 +1];

n

x(:,1) = [-3;2];

n

Sigma_w = diag([.3 .7]); Sigma_v = [2 .05; .05 1.5];

n

w = randn(2,T); w = sqrtm(Sigma_w)*w; v = randn(2,T); v = sqrtm(Sigma_v)*v;

n

for t=1:T-1 x(:,t+1) = A * x(:,t) + w(:,t); z(:,t) = C*x(:,t) + v(:,t); end

n

% now recover the state from the measurements

n

P_0 = diag([100 100]); x0 =[0; 0];

n

% run Kalman filter and smoother here

n

% + plot

Matlab Code Data Generation Example

slide-14
SLIDE 14

Kalman Filter/Smoother Example

slide-15
SLIDE 15

n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

Outline

slide-16
SLIDE 16

n

Filtering:

n

Smoothing:

n

MAP:

Overview

Xt-

1

Xt X0 zt-1 zt z0 Xt-

1

Xt Xt+

1

XT X0 zt-1 zt zt+

1

zT z0 Xt-

1

Xt Xt+

1

XT X0 zt-1 zt zt+

1

zT z0

slide-17
SLIDE 17

n Generally:

MAP Sequence

Naively solving by enumerating all possible combinations of x_0,…,x_T is exponential in T

slide-18
SLIDE 18

MAP --- Complete Algorithm

n O(T n2)

slide-19
SLIDE 19

n

Summations à integrals

n

But: can’t enumerate over all instantiations

n

However, we can still find solution efficiently:

n

the joint conditional P(x0:T | z0:T) is a multivariate Gaussian

n

for a multivariate Gaussian the most likely instantiation equals the mean à we just need to find the mean of P(x0:T | z0:T)

n

the marginal conditionals P(xt | z0:T) are Gaussians with mean equal to the mean of xt under the joint conditional, so it suffices to find all marginal conditionals

n

We already know how to do so: marginal conditionals can be computed by running the Kalman smoother.

n

Alternatively: solve convex optimization problem

Kalman Filter (aka Linear Gaussian) Setting

slide-20
SLIDE 20

n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

Outline

slide-21
SLIDE 21

n Let θ = P(up), 1-θ = P(down) n How to determine θ ? n Empirical estimate: 8 up, 2 down à

Thumbtack

slide-22
SLIDE 22

http://web.me.com/todd6ton/Site/Classroom_Blog/Entries/2009/10/7_A_Thumbtack_Experiment.html

slide-23
SLIDE 23
slide-24
SLIDE 24

n

θ = P(up), 1-θ = P(down)

n

Observe:

n

Likelihood of the observation sequence depends on θ:

n

Maximum likelihood finds

à extrema at θ = 0, θ = 1, θ = 0.8 à Inspection of each extremum yields θML = 0.8

Maximum Likelihood

slide-25
SLIDE 25

n

More generally, consider binary-valued random variable with θ = P(1), 1-θ = P(0), assume we

  • bserve n1 ones, and n0 zeros

n

Likelihood:

n

Derivative:

n

Hence we have for the extrema:

n

n1/(n0+n1) is the maximum

n

= empirical counts.

Maximum Likelihood

slide-26
SLIDE 26

n

The function is a monotonically increasing function of x

n

Hence for any (positive-valued) function f:

n

Often more convenient to optimize log-likelihood rather than likelihood

n

Example:

Log-likelihood

slide-27
SLIDE 27

n

Reconsider thumbtacks: 8 up, 2 down

n

Likelihood

n

Definition: A function f is concave if and only

n

Concave functions are generally easier to maximize then non-concave functions

Log-likelihood ßà Likelihood

n Log-likelihood

Concave Not Concave

slide-28
SLIDE 28

f is concave if and only “Easy” to maximize

Concavity and Convexity

x1

x2

λx2+(1-λ)x2

f is convex if and only “Easy” to minimize

x1

x2

λ x2+(1-λ)x2

slide-29
SLIDE 29

n Consider having received samples

ML for Multinomial

slide-30
SLIDE 30

n

Given samples

n

Dynamics model:

n

Observation model: à Independent ML problems for each and each

ML for Fully Observed HMM

slide-31
SLIDE 31

n Consider having received samples

n 3.1, 8.2, 1.7

ML for Exponential Distribution

Source: wikipedia

ll

slide-32
SLIDE 32

n Consider having received samples

ML for Exponential Distribution

Source: wikipedia

slide-33
SLIDE 33

n Consider having received samples

Uniform

slide-34
SLIDE 34

n Consider having received samples

ML for Gaussian

slide-35
SLIDE 35

Equivalently: More generally:

ML for Conditional Gaussian

slide-36
SLIDE 36

ML for Conditional Gaussian

slide-37
SLIDE 37

ML for Conditional Multivariate Gaussian

slide-38
SLIDE 38

Aside: Key Identities for Derivation on Previous Slide

slide-39
SLIDE 39

n

Consider the Linear Gaussian setting:

n

Fully observed, i.e., given

n

à Two separate ML estimation problems for conditional multivariate Gaussian:

n

1:

n

2:

ML Estimation in Fully Observed Linear Gaussian Bayes Filter Setting

slide-40
SLIDE 40

n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

Outline

slide-41
SLIDE 41

n

Let θ = P(up), 1-θ = P(down)

n

How to determine θ ?

n

ML estimate: 5 up, 0 down à

n

Laplace estimate: add a fake count of 1 for each outcome

Priors --- Thumbtack

slide-42
SLIDE 42

n Alternatively, consider θ to be random variable n Prior P(θ) = C θ(1-θ) n Measurements: P( x | θ ) n Posterior: n Maximum A Posterior (MAP) estimation

n = find θ that maximizes the posterior

à

Priors --- Thumbtack

slide-43
SLIDE 43

Priors --- Beta Distribution

Figure source: Wikipedia

slide-44
SLIDE 44

n Generalizes Beta distribution n MAP estimate corresponds to adding fake

counts n1, …, nK

Priors --- Dirichlet Distribution

slide-45
SLIDE 45

n

Assume variance known. (Can be extended to also find MAP for variance.) n Prior:

MAP for Mean of Univariate Gaussian

slide-46
SLIDE 46

n

Assume variance known. (Can be extended to also find MAP for variance.)

n

Prior:

MAP for Univariate Conditional Linear Gaussian

[Interpret!]

slide-47
SLIDE 47

MAP for Univariate Conditional Linear Gaussian: Example

TRUE --- Samples . ML --- MAP ---

slide-48
SLIDE 48

n

Choice of prior will heavily influence quality of result

n

Fine-tune choice of prior through cross-validation:

n

  • 1. Split data into “training” set and “validation” set

n

  • 2. For a range of priors,

n Train: compute θMAP on training set n Cross-validate: evaluate performance on validation set by evaluating the likelihood of the

validation data under θMAP just found

n

  • 3. Choose prior with highest validation score

n For this prior, compute θMAP on (training+validation) set

n

Typical training / validation splits:

n

1-fold: 70/30, random split

n

10-fold: partition into 10 sets, average performance for each set being the validation set and the other 9 being the training set

Cross Validation

slide-49
SLIDE 49

n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

Outline

slide-50
SLIDE 50

n

Generally:

n

Example:

n

ML Objective: given data z(1), …, z(m)

n

Setting derivatives w.r.t. θ, µ, Σ equal to zero does not enable to solve for their ML estimates in closed form

We can evaluate function à we can in principle perform local optimization. In this lecture: “EM” algorithm, which is typically used to efficiently optimize the objective (locally)

Mixture of Gaussians

slide-51
SLIDE 51

n

Example:

n

Model:

n

Goal:

n

Given data z(1), …, z(m) (but no x(i) observed)

n

Find maximum likelihood estimates of μ1, μ2

n

EM basic idea: if x(i) were known à two easy-to-solve separate ML problems

n

EM iterates over

n

E-step: For i=1,…,m fill in missing data x(i) according to what is most likely given the current model ¹

n

M-step: run ML for completed data, which gives new model ¹

Expectation Maximization (EM)

slide-52
SLIDE 52
slide-53
SLIDE 53

n

EM solves a Maximum Likelihood problem of the form: µ: parameters of the probabilistic model we try to find x: unobserved variables z: observed variables

EM Derivation

Jensen’s Inequality

slide-54
SLIDE 54

Jensen’s inequality

x1

x2

E[X] = λx1+(1-λ)x2

Illustration: P(X=x1) = 1-λ, P(X=x2) = λ

slide-55
SLIDE 55

EM Algorithm: Iterate

  • 1. E-step: Compute
  • 2. M-step: Compute

EM Derivation (ctd)

Jensen’s Inequality: equality holds when is a constant. This is achieved for

M-step optimization can be done efficiently in most cases E-step is usually the more expensive step It does not fill in the missing data x with hard values, but finds a distribution q(x)

slide-56
SLIDE 56

n

M-step objective is upper-bounded by true objective

n

M-step objective is equal to true objective at current parameter estimate

EM Derivation (ctd)

n

à Improvement in true objective is at least as large as improvement in M-step objective

slide-57
SLIDE 57

n

Estimate 1-d mixture of two Gaussians with unit variance:

n n

  • ne parameter μ; μ1 = μ - 7.5, μ2 = μ + 7.5

EM 1-D Example --- 2 iterations

slide-58
SLIDE 58

n X ~ Multinomial Distribution, P(X=k ; θ) = θk n Z ~ N(μk, Σk) n Observed: z(1), z(2), …, z(m)

EM for Mixture of Gaussians

slide-59
SLIDE 59

n E-step: n M-step:

EM for Mixture of Gaussians

slide-60
SLIDE 60

n

Given samples

n

Dynamics model:

n

Observation model:

n

ML objective:

à No simple decomposition into independent ML problems for each and each à No closed form solution found by setting derivatives equal to zero

ML Objective HMM

slide-61
SLIDE 61

à

θ and γ computed from “soft” counts

EM for HMM --- M-step

slide-62
SLIDE 62

n No need to find conditional full joint n Run smoother to find:

EM for HMM --- E-step

slide-63
SLIDE 63

n Linear Gaussian setting: n Given n ML objective: n EM-derivation: same as HMM

ML Objective for Linear Gaussians

slide-64
SLIDE 64

n Forward: n Backward:

EM for Linear Gaussians --- E-Step

slide-65
SLIDE 65

EM for Linear Gaussians --- M-step

slide-66
SLIDE 66

n When running EM, it can be good to keep track of the log-

likelihood score --- it is supposed to increase every iteration

EM for Linear Gaussians --- The Log-likelihood

slide-67
SLIDE 67

n As the linearization is only an approximation, when performing

the updates, we might end up with parameters that result in a lower (rather than higher) log-likelihood score

n à Solution: instead of updating the parameters to the newly

estimated ones, interpolate between the previous parameters and the newly estimated ones. Perform a “line-search” to find the setting that achieves the highest log-likelihood score

EM for Extended Kalman Filter Setting

slide-68
SLIDE 68

n Kalman smoothing n Maximum a posteriori sequence n Maximum likelihood n Maximum a posteriori parameters n Expectation maximization

Summary