Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

optimal control and dynamic programming
SMART_READER_LITE
LIVE PREVIEW

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Stochastic dynamic programming and linear quadratic control Output feedback linear quadratic control Separation principle Kalman filter LQG


slide-1
SLIDE 1

4SC000 Q2 2017-2018

Optimal Control and Dynamic Programming

Duarte Antunes

slide-2
SLIDE 2

Outline

  • Stochastic dynamic programming and linear

quadratic control

  • Output feedback linear quadratic control
  • Separation principle
  • Kalman filter
  • LQG design
slide-3
SLIDE 3

Stochastic formulation

Dynamic model

1

xk+1 = fk(xk, uk, wk)

h−1

X

k=0

gk(xk, uk, wk) + gh(xh), π = {µ0, . . . , µh−1} uk = µk(xk) Find a policy that minimizes Jπ(x0) = E[Ph−1

k=0 gk(xk, µk(xk), wk) + gh(xh)]

Stochastic disturbances Cost wk ∈ Rnw We assume that the stochastic disturbances have zero mean and are statistically independent (white)

slide-4
SLIDE 4

Stochastic dynamic programming algorithm

2

Start with for every and for each decision stage, starting from the last and moving backwards, , compute and k ∈ {h − 1, h − 2, . . . , 0} Jk µk Then is an optimal policy. {µ0, . . . , µh−1} (DP equation) from Jh(xh) = gh(xh) xh ∈ Xh and where is the minimizer in the DP equation. uk µk(xk) = uk Jk(xk) = min

uk∈Uk(xk) Ewk[gk(xk, uk, wk) + Jk+1(fk(xk, uk, wk))]

which reinforces the fact that is assumed to be constant while computing the expected value and taking the min. To be more precise we can write on the right hand side of the DP equation Ewk[gk(xk, uk, wk) + Jk+1(fk(xk, uk, wk))|xk] xk

slide-5
SLIDE 5

3

Linear quadratic control

π = {µ0, . . . , µh−1} uk = µk(xk) Find a policy which minimizes Dynamic model Cost function k ∈ {0, . . . , h − 1}

h−1

X

k=0

⇥x|

k

u|

k

⇤ Qk Sk S|

k

Rk xk uk

  • + x|

hQhxh

E[

h−1

X

k=0

⇥x|

k

u|

k

⇤ Qk Sk S|

k

Rk xk uk

  • + x|

hQhxh]

xk+1 = Akxk + Bkuk + ωk

zero mean and white E[ωk] = 0

slide-6
SLIDE 6

4

Optimal policy

Theorem: The optimal policy is

Kk = −

  • B|

kPk+1Bk + Rk

−1 S|

k + B| kPk+1Ak

  • Pk = A|

kPk+1Ak +Qk −(Sk +A| kPk+1Bk)

  • B|

kPk+1Bk +Rk

−1(S|

k +B| kPk+1Ak)

where and the resulting expected cost is given by k ∈ {0, . . . , h − 1} uk = Kkxk J0(x0) = x|

0P0x0 + Ph−1 k=0 trace(Pk+1E[wkw| k])

(Riccati iterations)

Ph = Qh

k ∈ {h − 1, . . . , 0}

slide-7
SLIDE 7

5

Discussion

  • The optimal policy is exactly the same as in the case

and to obtain the expected cost we just need to add a constant.

  • Recall the approaches to cope with stochastic disturbances
  • open loop: use the decisions of the optimal path
  • closed-loop: use the policy from deterministic DP
  • closed-loop: use the policy from stochastic DP
  • The result of the previous slide says that when the dynamic

model is linear and the cost is quadratic, the deterministic DP and stochastic DP yield the same policy.

  • When this happens we say that we have certainty equivalence

wk = 0

slide-8
SLIDE 8

6

Discussion

  • The result carries through if we consider an infinite horizon cost, under

the same assumptions that we have considered in the deterministic case

  • If the cost is not bounded we can consider an alternative (average) cost

uk = ¯ Kxk where satisfies the algebraic Riccati equation.) ¯ P ¯ K = −

  • B| ¯

PB+R −1 S| + B| ¯ PA

  • (optimal policy
  • However, now we need the extra assumption that the cost is
  • bounded. If this is the case (e.g. if ) it is given by

x|

0 ¯

Px0 + P∞

k=0 trace( ¯

PE[wkw|

k])

w` = 0, ` > L > 0 E[

X

k=0

x|

kQxk + 2x| kSuk + u| kRuk]

lim

T →∞

1 T E[

T −1

X

k=0

x|

kQxk + 2x| kSuk + u| kRuk]

which converges to , assuming trace( ¯ PW) W = E[wkw|

k]

∀k

slide-9
SLIDE 9

7

Example

Double integrator considered previously xk+1 = 1 τ 1

  • xk +

 τ 2

2

τ

  • uk

Q =  1 1

  • R = 1

τ = 0.2 Ph−1

k=0(x| kQxk + u| kRuk) + x| hQhxh

Optimal policy Optimal control input and states for K = ⇥−0.8412 −1.54⇤ uk = Kxk, k ≥ 0

t

5 10

y(t)

0.2 0.4 0.6 0.8 1

t

5 10

v(t)

  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.1

t

5 10

u(t)

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2

x0 = [y0 v0]| = [1 0]|

slide-10
SLIDE 10

8

Effects of disturbances

Consider now a disturbance at time xk+1 = 1 τ 1

  • xk +

 τ 2

2

τ

  • uk

(k = 10) 2 sec +wk

wk = 8 > > < > > : 0

  • , if k ∈ N0\{10}

0 a

  • if k = 10

where is a uniform random variable in the interval a Closed loop Open loop

t

10 20

y(t)

  • 0.2

0.2 0.4 0.6 0.8 1

t

10 20

v(t)

  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.1

t

10 20

u(t)

  • 1
  • 0.5

0.5

t

10 20

y(t)

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1

t

10 20

v(t)

  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

t

10 20

u(t)

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2

[−0.5 0.5]

slide-11
SLIDE 11

9

Computing the expected cost

We can compute the expected cost based on the expression

x|

0 ¯

Px0 + P∞

k=0 trace( ¯

PE[wkw|

k])

where is the solution the the algebraic Riccati equation which for the considered parameters is

¯ P =  9.1890 5.0249 5.0249 9.2324

  • wk =

8 > > < > > : 

  • , if k ∈ N0\{10}

 a

  • if k = 10

and we used the fact that ¯ P

= x|

0 ¯

Px0 + trace( ¯ P 0 E[a2]

  • )

E[a2] = Z 0.5

−0.5

t2dt = 1/12

Since, and we obtain the cost .

9.95 x0 = [0 1]|

slide-12
SLIDE 12

10

Computing the expected value

Alternatively we can simulate the system several times computing the cost for each simulation and then averaging (Monte Carlo method)

t

10 20

y(t)

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1

t

10 20

v(t)

  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.1

t

10 20

u(t)

  • 1
  • 0.5

0.5

t

10 20

y(t)

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1

t

10 20

v(t)

  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.1

t

10 20

u(t)

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2

9.534 9.097

average cost of 5000 simulations: 9.95 Simulation 1 Simulation 2

slide-13
SLIDE 13

Outline

  • Stochastic dynamic programming and linear

quadratic control

  • Output feedback linear quadratic control
  • Separation principle
  • Kalman filter
  • LQG design
slide-14
SLIDE 14

Problem formulation

Dynamic model xk+1 = fk(xk, uk, wk)

h−1

X

k=0

gk(xk, uk, wk) + gh(xh), π = {µ0, . . . , µh−1} Find a policy which minimizes Stochastic disturbances Cost

We assume that the initial state, the stochastic disturbances and the output noise have zero mean, are statistically independent and have a known probability distribution.

Output noise , ,

11

initial state unknown yk = hk(xk, uk−1, nk) Information set I0 = (y0) Ik = (y0, y1, . . . , yk, u0, u1, . . . , uk−1) k ≥ 1 uk = µk(Ik) Jπ = E[

h−1

X

k=0

gk(xk, µk(Ik), wk) + gh(xk)]

slide-15
SLIDE 15

First approach

We can reformulate a partial information optimal control problem as a standard full information optimal control problem by considering the state to be the information set. The problem is that the state space dimension increases exponentially. Stage 0 Stage 1 Stage 2 Information set Measurement space y0 y2 y1 y0 y1 y0 u0 u1 uncertainty

12

I0 = (y0) I1 = (y0, y1, u0) I2 = (y0, y1, y2, u0, u1)

slide-16
SLIDE 16

Second approach

It is possible to show that the knowledge of the probability distribution of the state given the information obtained so far denoted by is sufficient to determine optimal decisions Stage 0 Stage 1 Stage 2 Probability space u0 uncertainty The space space dimension is now fixed. However, it is typically not easy to store .

13

Pxk|Ik uk = µk(Pxk|Ik) Px0|I0 Px1|I1 Px2|I2 Pxk|Ik

slide-17
SLIDE 17

14

π = {µ0, . . . , µh−1} Find a policy which minimizes Dynamic model Cost function k ∈ {0, . . . , h − 1}

zero mean and independent E[ωk] = 0

Problem formulation

Output equation xk+1 = Axk + Buk + ωk Ph−1

k=0[x| k u| k]

 Q S S| R xk uk

  • + x|

hQhxh

Information set Ph−1

k=0 E[[x| k u| k]

 Q S S| R xk uk

  • + x|

hQhxh]

noise: zero mean and independent

yk = Cxk + nk uk = µk(Ik) Ik = (y0, y1, . . . , yk, u0, u1, . . . , uk−1)

slide-18
SLIDE 18

nk

V = E[nkn|

k]

  • The noise variables are independent random variables following a Gaussian distribution

with zero mean and covariance , i.e., .

Problem formulation

  • The initial condition follows a Gaussian (or Normal) distribution with mean and

covariance , denoted ¯ x0 .

  • For simplicity we consider time-invariant matrices for the model and cost.

f(x) =

1

(2π)kdet(Σ)e− 1

2 (x−¯

x0)| ¯ Φ−1 (x−¯ x0))

Prob[x0 ∈ A] = R

A f(x)dx

5

x1

  • 5
  • 5

x2

5 0.1 0.05

Probability Density

x0 = [1 2]| Example: 15

ωk

W = E[ωkω|

k]

wk ∼ N(0, W)

  • The disturbances variables are independent random variables following a Gaussian

distribution with zero mean and covariance , i.e., . nk ∼ N(0, V ) ¯ Φ0 = E[(x0 − ¯ x0)(x0 − ¯ x0)|]

¯ Φ0 =  2 0.5 0.5 2

  • x0 ∼ N(¯

x0, ¯ Φ0)

slide-19
SLIDE 19

Outline

  • Stochastic dynamic programming and linear

quadratic control

  • Output feedback linear quadratic control
  • Separation principle
  • Kalman filter
  • LQG design
slide-20
SLIDE 20

16

General solution structure

z−1 + + Process/plant xk

B A C

uk yk

From our general discussion, we know that we can first estimate and then make decisions (control)

Estimator Decision marker/ controller nk wk Pxk|Ik

Pxk|Ik

slide-21
SLIDE 21

17

Solution

z−1 + + Process/plant xk

B A C

uk yk

It turns out that:

  • is Gaussian, characterized by mean and covariance

which can be obtained by the Kalman filter. Kalman filter controller

  • The decision maker is the linear quadratic controller with the state replaced by the

estimate where the gains are obtained from the Riccati equations.

uk = Kkˆ xk|k

ˆ xk|k

uk = Kkˆ xk|k

ˆ xk|k

nk wk

Pxk|Ik

Φk|k = E[(xk − ˆ xk|k)(xk − ˆ xk|k)||Ik] Φk|k

slide-22
SLIDE 22

18

Discussion

  • This is one of the most beautiful & most useful results in control theory!

The controller is called linear quadratic Gaussian (LQG) compensator.

  • Separation principle LQG = LQR + LQE*.
  • As we have seen the LQR is the optimal controller when full state

feedback is available

  • For a linear system with the dynamic and output models in slide 14 and

known input, the Kalman filter can be shown to be the optimal estimator in the sense that for any vector and output there is no other estimator achieving smaller output variance

  • The LQG is the optimal controller among all the possible controllers

(including non-linear controllers) when the noise, disturbances and initial conditions are Gaussian (non Gaussian noise mentioned later)

  • We discuss now in detail the Kalman filter, and later give an example of

the separation theorem.

*LQE linear quadratic estimator, other term for the Kalman filter

C ˆ xk|k zk = Fxk E[(zk − F ˆ xk|k)2|Ik]

slide-23
SLIDE 23

19

Discussion

z−1 + + Process/plant xk

B A C

uk yk

  • The separation principle also holds for the infinite horizon problem

where we consider the average cost

Stationary Kalman filter controller ˆ xk

uk = Kxk

  • In this case the observer is the so-called stationary Kalman filter and

the control is the LQR controller with static gains

limT →∞ 1

T

PT −1

k=0 E[[x| k u| k]

 Q S S| R  xk uk

  • ]

nk wk

slide-24
SLIDE 24

Outline

  • Stochastic dynamic programming and linear

quadratic control

  • Output feedback linear quadratic control
  • Separation principle
  • Kalman filter
  • LQG design
slide-25
SLIDE 25

20

Kalman filter

xk+1 = Axk + Buk + ωk

Dynamic model Output equation Information set

Ik = (y0, y1, . . . , yk, u0, u1, . . . , uk−1) Given:

is Gaussian with mean and covariance iteratively computed by:

wk ∼ N(0, W) x0 ∼ N(¯ x0, ¯ Φ0)

Then:

Φk+1|k = AΦk|kA| + W Φk|k = Φk|k−1 − Φk|k−1C|(CΦk|k−1C| + V )−1CΦk|k−1 ˆ xk+1|k = Aˆ xk|k + Buk Lk = Φk|k−1C|(CΦk|k−1C| + V )−1 ˆ x0|−1 = ¯ x0 Initial condition k ≥ 0 Φ0|−1 = ¯ Φ0

yk = Cxk + nk

nk ∼ N(0, V )

ˆ xk|k = ˆ xk|k−1 + Lk(yk − Cˆ xk|k−1)

Pxk|Ik ∼ N(ˆ xk|k, Φk|k)

slide-26
SLIDE 26

21

Justification

Why is Gaussian? Bayes’ rule! For simplicity consider the initial stage, suppose , x0 ∈ R y0 = x0 + n0 and let the a priori probability distributions be

x0

  • 1

1 2 3 0.2 0.4 0.6 0.8 1 1.2

σ0 = 1/3

¯ x0 = 1

fx0(x0) =

1 √ 2πσ0 e − (x0−¯

x0)2 2σ2 y

x0-1 x0 x0+1 x0+2 x0+3 0.5 1 1.5 2 2.5 3 3.5 4

σn = 0.1

Think of as the position of a robot and as a sensor measurement. x0 y0 fy0|x0(y0) =

1 √ 2πσn e − (y0−x0)2

2σ2 n

Pxk|Ik ∼ N(ˆ xk|k, Φk|k)

slide-27
SLIDE 27

22

Justification

Then, from Bayes’ rule Prob[x0 ∈ A|y0 ∈ B] = αProb[y0 ∈ B|x0 ∈ A]Prob[x0 ∈ A]

  • r in terms of probability densities

fx0|y0(x0) = αfy0|x0(y0)fx0(x0) =

1 √ 2πσ0|0 e −

(x0−ˆ x0|0)2 2σ2 0|0

also Gaussian! x0|0 = x0 + L0(y0 − x0)

(after some calculations)

= α

1 √ 2πσn e − (y0−x0)2

2σ2 n

1 √ 2πσ0 e − (x0−¯

x0)2 2σ2

L0 =

σ2 σ2

0+σ2 n

σ2

0|0 = σ2 0 − σ4 σ2

0+σ2 n

α ¯ x0, y0

  • function of
slide-28
SLIDE 28

23

Justification

x0

  • 1

1 2 3 0.2 0.4 0.6 0.8 1 1.2

σ0 = 1/3

y

x0-1 x0 x0+1 x0+2 x0+3 0.5 1 1.5 2 2.5 3 3.5 4

σn = 0.1

Based on the measurement we update our new belief on y0 x0 Px0|I0 y = 1.2 Px0|y0 Px0 Py0|x0 y = 3 y = 0.5

x0

  • 1

1 2 3 4 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x0

  • 1

1 2 3 4 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x0

  • 1

1 2 3 4 0.5 1 1.5 2 2.5 3 3.5 4 4.5

slide-29
SLIDE 29

24

Justification

What about ? x1 = ax0 + bu0 + w x1

  • is a sum of Gaussian random variables when conditioned on u0, y0

and therefore is Gaussian.

  • Apply again Bayes’ rule to prove that is Gaussian.

If we continue this procedure we obtain that is Gaussian for every .

Px1|I1 Pxk|Ik Px1|I1 Px1|u0,y0 k

slide-30
SLIDE 30

25

Remarks

Φk+1|k = AΦk|kA| + W Φk|k = Φk|k−1 − Φk|k−1C|(CΦk|k−1C| + V )−1CΦk|k−1 ˆ xk+1|k = Aˆ xk|k + Buk

  • Prediction step
  • The Kalman filter can be divided into two steps with the following

interpretation. Based on the previous estimate predict in open loop the state at next step.

  • Correction/innovation/update step

Based on the new measurements update the estimate

Innovation

  • Note that in the innovation step the covariance decreases and in the

prediction step the covariance (typically) increases.

  • An example is provided in the appendix.

ˆ xk|k = ˆ xk|k−1 + Lk(yk − Cˆ xk|k−1)

slide-31
SLIDE 31

26

Remarks

  • Albeit counter-intuitive, the covariance matrices do not depend on the

measurements and on the control input and can be computed a priori.

  • (Duality) The equations to update

coincide with the Riccati equations from last lecture if we replace

Φk+1|k = AΦk|k−1A| − AΦk|k−1C|(CΦk|k−1C| + V )−1CΦk|k−1A| + W

Φk+1|k

(A, B, Q, R) → (A|, C|, W, V )

  • In particular we know that as , and

Φk+1|k → ¯ Φ k → ∞ Lk → ¯ L

  • Thus, the equations for the state estimator become

ˆ xk+1|k = Aˆ xk|k + Buk

¯ Φ = A¯ ΦA| − A¯ ΦC|(C ¯ ΦC| + V )−1C ¯ ΦA| + W ¯ L = ¯ ΦC|(C ¯ ΦC| + V )−1

S = 0

is stable if

  • bservable*, .

A + ˜ LC

(A, C)

where

˜ L = A¯ L

ˆ xk|k = ˆ xk|k−1 + ¯ L(yk − Cˆ xk|k−1)

*less stringent conditions for convergence can be inferred from the condition of the dual problem (slide 34, II_1).

slide-32
SLIDE 32

27

Rudolf E. Kalman

Historical note

Rudolf E. Kalman (1930 – present) Kalman's ideas on filtering were initially met with vast skepticism, so much so that he was forced to do the first publication of his results in mechanical engineering, rather than in electrical engineering or systems engineering. Kálmán had more success in presenting his ideas, however, while visiting Stanley F. Schmidt at the NASA Ames Research Center in 1960. This led to the use of Kálmán filters during the Apollo program, and furthermore, in the NASA Space Shuttle, in Navy submarines, and in unmanned aerospace vehicles and weapons, such as cruise missiles (source: wikipedia).

slide-33
SLIDE 33

Outline

  • Stochastic dynamic programming and linear

quadratic control

  • Output feedback linear quadratic control
  • Separation principle
  • Kalman filter
  • LQG design
slide-34
SLIDE 34

28

LQG design

xk+1 = Axk + Buk + ωk

Dynamic model Output equation

Given:

wk ∼ N(0, W) x0 ∼ N(¯ x0, ¯ Φ0)

limT →∞ 1

T

PT −1

k=0 E[[x| k u| k]

 Q S S| R  xk uk

  • ]

Cost function

¯ Φ = A¯ ΦA| − A¯ ΦC|(C ¯ ΦC| + V )−1C ¯ ΦA| + W ¯ L = ¯ ΦC|(C ¯ ΦC| + V )−1

Obtain: Observer gains Controller gains LQG controller:

¯ K = −

  • B| ¯

PB+R −1 S| + B| ¯ PA

  • ¯

P =A| ¯ PA+Q−(S+A| ¯ PB)

  • B| ¯

PB+R −1(S|+B| ¯ PA)

ˆ xk+1|k = Aˆ xk|k + Buk

yk = Cxk + nk

nk ∼ N(0, V )

uk = ¯ Kˆ xk|k ˆ xk|k = ˆ xk|k−1 + ¯ L(yk − Cˆ xk|k−1)

slide-35
SLIDE 35

29

Remarks

  • According to our convention depends on . This means that we assume that at

instant , the sensor measurement is acquired, the control input is computed, and the actuation is updated.

uk

yk yk

  • Since there are inevitably computation delays it is common practice to make

which can be computed between time steps and .

  • In the latter case we only need to run the following equations

k k k + 1 uk = ¯ Kˆ xk|k−1 uk = ¯ Kˆ xk|k−1 ˆ xk+1|k = Aˆ xk|k−1 + Buk + ˜ L(yk − Cˆ xk|k−1) ˜ L = A¯ L where .

  • Using either implementation one can show that the closed-loop is stable due to stability
  • f , .

A + ˜ LC

A + B ¯ K

  • To use our implementation with Matlab function kalman.m used option ‘current’, and for

this other implementation use ‘delayed’.

slide-36
SLIDE 36

30

Inverted pendulum

Linearized model (see [1, p. 32]) θ

[1] Feedback control of dynamic systems, Franklin, Powell, Emani-Naeini

d dt     x ˙ x θ ˙ θ     =      1 − (I+m`2)b

q m2g`2 q

1 − m`b

q mg`(M+m) q

         x ˙ x θ ˙ θ     +    

I+ml2 q ml q

    u(t)

q = (I + m`2)(M + m) − m2`2 State space m, I M x x = 0 ` u (I + m`2)¨ ✓ − mg`✓ = m`¨ x (M + m)¨ x + b ˙ x − m`¨ ✓ = u yk = ⇥1 0⇤ 2 6 6 4 x(kτ) ˙ x(kτ) θ(kτ) ˙ θ(kτ) 3 7 7 5 + nk nk ∼ N(0, σn)

  • sampling period

τ

slide-37
SLIDE 37

31

Matlab implementation

clear all, close all, clc rand('seed',1) % definition of the continuous-time model m = 0.2; M = 1; b = 0.05; I = 0.01; g = 9.8; l = 0.5; p = (I+m*l^2)*(M+m)-m^2*l^2; Ac = [0 1 0 0; 0 -(I+m*l^2)*b/p (m^2*g*l^2)/p 0; 0 0 0 1; 0 -(m*l*b)/p m*g*l*(M+m)/p 0]; Bc = [ 0; (I+m*l^2)/p; 0; m*l/p]; Cc = [1 0 0 0]; % discretization n = 4; tau = 0.1; sysd = c2d(ss(Ac,Bc,Cc,0),tau); A = sysd.a; B = sysd.b; C = sysd.c; % LQG control Q = diag([1 1 1 1]); S = zeros(4,1); R = 1; K = dlqr(A,B,Q,R,S); K = -K; W = diag([1 1 1 1]); V = 1; [~,~,~,L] = kalman(ss(sysd.a,[sysd.b eye(n)],sysd.c,[sysd.d zeros(1,n)],tau),W,V,'current');

Model definition Controller synthesis

slide-38
SLIDE 38

32

Matlab implementation

% simulation kend = 10/tau; x0 = [1 0 0 0]'; x(:,1) = x0; xhat_1(:,1) = x0; sigman = 0.0001; for k=1:kend y(:,k) = C*x(:,k)+sigman*randn(1); xhat(:,k) = xhat_1(:,k) + L*(y(:,k)-C*xhat_1(:,k)); u(:,k) = K*xhat(:,k); xhat_1(:,k+1) = A*xhat(:,k) + B*u(:,k); x(:,k+1) = A*x(:,k)+B*u(:,k); end plot((1:kend)*tau,u), figure, plot((1:kend)*tau,x(3,1:end-1)), figure, plot((1:kend)*tau,x(1,1:end-1)),

Simulation

slide-39
SLIDE 39

33

Time responses

Q = I, S = 0, R = 1, τ = 0.1 W = I, V = 1 y θ u t t t

1 2 3 4 5 6 7 8 9 10
  • 0.2
0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6 7 8 9 10
  • 0.02
  • 0.01
0.01 0.02 0.03 0.04 1 2 3 4 5 6 7 8 9 10
  • 0.8
  • 0.6
  • 0.4
  • 0.2
0.2 0.4 0.6 0.8

σn = 0.0001 x0 = ⇥1 0⇤|

slide-40
SLIDE 40

34

Tuning

  • In practice the matrices are used as

tuning knobs to improve the state estimation.

  • To facilitate the tuning, these matrices are typically chosen as

diagonal matrices.

  • If the filter is too “nervous”, i.e. it reacts very fast, and consequently

the estimates are very noisy - in this case, pick weights to trust less the sensors and more the model, that is, increase and decrease

  • If it reacts too slow (although it filters out noise), it adds a delay in

the control loop which is in general bad. In this case decrease and increase .

V = E[nkn|

k] W = E[ωkω| k]

W V V W

slide-41
SLIDE 41

35

Concluding remarks

  • We considered output feedback linear quadratic control. In this case, and assuming

that the disturbances are Gaussian:

  • Use Kalman filter to compute
  • Use optimal linear state-feedback controller (LQR infinite horizon cost)
  • Output feedback controller = LQR+LQE (separation principle)

Summary After this lecture, you should be able to:

  • Obtain the optimal linear quadratic control law when the linear plant is

subject to disturbances and compute the corresponding expected cost.

  • Execute and implement the time-varying Kalman filter equations and

compute the stationary solution of the Kalman filter.

  • Determine the optimal output feedback controller for a linear system

minimizing the expected value of a finite-horizon of infinite-horizon quadratic cost Pxk|Ik Pxk|Ik

slide-42
SLIDE 42

Appendix A

Example: Kalman filter

slide-43
SLIDE 43

A1

Example: Kalman filter

Consider a mass spring system with unitary mass ¨ y(t) = −ksy(t) x(t) =  y(t) ˙ y(t)

  • with natural period of one second ( ) and suppose that we

wish to obtain to obtain an estimate of the state at a rate from sensor measurements of position. wn = 2π τ = 0.1 ˙ x(t) =  1 −w2

n

  • |

{z }

Ac

x(t)

slide-44
SLIDE 44

A2

Discretization

To this effect we discretize the system, obtaining where the are independent Gaussian random variables with unitary variance modeling disturbances. dk xk+1 =  0.8090 0.0935 −3.6932 0.8090

  • |

{z }

A=eAcτ

xk +  0 σw

  • dk

| {z }

wk

The output equation is yk = ⇥1 0⇤ | {z }

C

xk + σnvk |{z}

nk

where the are independent Gaussian random variables with unitary variance. nk Thus, V = E[nkn|

k] = σ2 n

W = E[ωkω|

k] =

0 σ2

ω

  • σn = 0.1

σω = 0.5

slide-45
SLIDE 45

A3

Iterations

Φk+1|k = AΦk|kA| + W Φk|k = Φk|k−1 − Φk|k−1C|(CΦk|k−1C| + V )−1CΦk|k−1 Lk = Φk|k−1C|(CΦk|k−1C| + V )−1

Assume that the covariance on the initial state ¯ Φ0 = E[(x0 − ¯ x0)(x0 − ¯ x0)|] is given by Φ0|−1 = ¯ Φ0 = 1 1

  • Then we can iterate

and obtain the gains

slide-46
SLIDE 46

A4

Iterations

For k = 0 Φ0|0 = 1 1

1 1 1

  • (

⇥1 0⇤ 1 1 1

  • + 0.01)−1 ⇥1

0⇤ 1 1

  • =

 0.0099 1

  • Φ1|0 =

 0.8090 0.0935 −3.6932 0.8090 0.0099 1  0.8090 0.0935 −3.6932 0.8090 | + 0 0.25

  • =

 0.0152 0.0461 0.0461 1.0396

  • L0 =

 1

  • (

⇥1 0⇤  1 1 1

  • + 0.01)−1
slide-47
SLIDE 47

A5

Iterations

For L1 =  1

  • (

⇥1 0⇤  0.0152 0.0461 0.0461 1.0396 1

  • + 0.01)−1

k = 1

Φ1|1 =  0.0152 0.0461 0.0461 1.0396

 0.0152 0.0461 0.0461 1.0396  1

  • (

⇥ 1 ⇤  0.0152 0.0461 0.0461 1.0396  1

  • 0.01)−1

 0.0152 0.0461 0.0461 1.0396

  • =

0.6037 1.8271

  • =

 0.0060 0.0183 0.0183 0.9553

  • Φ2|1 =

 0.8090 0.0935 −3.6932 0.8090 0.0060 0.0183 0.0183 0.9553  0.8090 0.0935 −3.6932 0.8090 | + 0 0.25

  • =

 0.0151 0.0599 0.0599 0.8484

  • For

k = 2 ...

slide-48
SLIDE 48

A6

Expected value

and that the mean of the initial estimate is ˆ x0|−1 = ¯ x0 = 1.3

  • We can then iterate

Suppose that the true initial condition is x0 =  1

  • ˆ

xk|k = ˆ xk|k−1 + Lk(yk − Cˆ xk|k−1)

ˆ xk+1|k = Aˆ xk|k + Buk

slide-49
SLIDE 49

A7

Iterations

For an output ˆ x0|0 = 1.3

  • +

0.9901

  • (1.0979 − 1.3) =

1.0999

  • ˆ

x1|0 =  0.8090 0.0935 −3.6932 0.8090 1.0999

  • =

 0.8899 −4.0623

  • ˆ

x2|1 =  0.8090 0.0935 −3.6932 0.8090  0.8080 −4.3102

  • =

 0.2504 −6.4710

  • ˆ

x1|1 =  0.8899 −4.0623

  • +

0.6037 1.8271

  • (−0.8899) =

 0.8080 −4.3102

  • y0 = 1.0979

For an output y1 = 0.7542

slide-50
SLIDE 50

A8

Iterations

  • 2
  • 1

1 2

  • 10
  • 8
  • 6
  • 4
  • 2

2 4 6 8 10

x xk|k−1 xk|k × +

  • y

˙ y

k = 0 k = 1 k = 2 k = 9

slide-51
SLIDE 51

A9

Iterations

0.5 1 1.5

  • 4.6
  • 4.4
  • 4.2
  • 4
  • 3.8
  • 3.6

0.5 1 1.5 2 2.5

  • 1.2
  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4

k = 0 k = 1 { 1

4

pΦk|k−1x|kxk = 1} { 1

4

pΦk|kx|kxk = 1}