4SC000 Q2 2017-2018
Optimal Control and Dynamic Programming
Duarte Antunes
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Stochastic dynamic programming and linear quadratic control Output feedback linear quadratic control Separation principle Kalman filter LQG
4SC000 Q2 2017-2018
Duarte Antunes
Dynamic model
1
xk+1 = fk(xk, uk, wk)
h−1
X
k=0
gk(xk, uk, wk) + gh(xh), π = {µ0, . . . , µh−1} uk = µk(xk) Find a policy that minimizes Jπ(x0) = E[Ph−1
k=0 gk(xk, µk(xk), wk) + gh(xh)]
Stochastic disturbances Cost wk ∈ Rnw We assume that the stochastic disturbances have zero mean and are statistically independent (white)
2
Start with for every and for each decision stage, starting from the last and moving backwards, , compute and k ∈ {h − 1, h − 2, . . . , 0} Jk µk Then is an optimal policy. {µ0, . . . , µh−1} (DP equation) from Jh(xh) = gh(xh) xh ∈ Xh and where is the minimizer in the DP equation. uk µk(xk) = uk Jk(xk) = min
uk∈Uk(xk) Ewk[gk(xk, uk, wk) + Jk+1(fk(xk, uk, wk))]
which reinforces the fact that is assumed to be constant while computing the expected value and taking the min. To be more precise we can write on the right hand side of the DP equation Ewk[gk(xk, uk, wk) + Jk+1(fk(xk, uk, wk))|xk] xk
3
π = {µ0, . . . , µh−1} uk = µk(xk) Find a policy which minimizes Dynamic model Cost function k ∈ {0, . . . , h − 1}
h−1
X
k=0
⇥x|
k
u|
k
⇤ Qk Sk S|
k
Rk xk uk
hQhxh
E[
h−1
X
k=0
⇥x|
k
u|
k
⇤ Qk Sk S|
k
Rk xk uk
hQhxh]
xk+1 = Akxk + Bkuk + ωk
zero mean and white E[ωk] = 0
4
Theorem: The optimal policy is
Kk = −
kPk+1Bk + Rk
−1 S|
k + B| kPk+1Ak
kPk+1Ak +Qk −(Sk +A| kPk+1Bk)
kPk+1Bk +Rk
−1(S|
k +B| kPk+1Ak)
where and the resulting expected cost is given by k ∈ {0, . . . , h − 1} uk = Kkxk J0(x0) = x|
0P0x0 + Ph−1 k=0 trace(Pk+1E[wkw| k])
(Riccati iterations)
Ph = Qh
k ∈ {h − 1, . . . , 0}
5
and to obtain the expected cost we just need to add a constant.
model is linear and the cost is quadratic, the deterministic DP and stochastic DP yield the same policy.
wk = 0
6
the same assumptions that we have considered in the deterministic case
uk = ¯ Kxk where satisfies the algebraic Riccati equation.) ¯ P ¯ K = −
PB+R −1 S| + B| ¯ PA
x|
0 ¯
Px0 + P∞
k=0 trace( ¯
PE[wkw|
k])
w` = 0, ` > L > 0 E[
∞
X
k=0
x|
kQxk + 2x| kSuk + u| kRuk]
lim
T →∞
1 T E[
T −1
X
k=0
x|
kQxk + 2x| kSuk + u| kRuk]
which converges to , assuming trace( ¯ PW) W = E[wkw|
k]
∀k
7
Double integrator considered previously xk+1 = 1 τ 1
τ 2
2
τ
Q = 1 1
τ = 0.2 Ph−1
k=0(x| kQxk + u| kRuk) + x| hQhxh
Optimal policy Optimal control input and states for K = ⇥−0.8412 −1.54⇤ uk = Kxk, k ≥ 0
t
5 10
y(t)
0.2 0.4 0.6 0.8 1
t
5 10
v(t)
0.1
t
5 10
u(t)
0.2
x0 = [y0 v0]| = [1 0]|
8
Consider now a disturbance at time xk+1 = 1 τ 1
τ 2
2
τ
(k = 10) 2 sec +wk
wk = 8 > > < > > : 0
0 a
where is a uniform random variable in the interval a Closed loop Open loop
t
10 20
y(t)
0.2 0.4 0.6 0.8 1
t
10 20
v(t)
0.1
t
10 20
u(t)
0.5
t
10 20
y(t)
0.5 1
t
10 20
v(t)
t
10 20
u(t)
0.2
[−0.5 0.5]
9
We can compute the expected cost based on the expression
x|
0 ¯
Px0 + P∞
k=0 trace( ¯
PE[wkw|
k])
where is the solution the the algebraic Riccati equation which for the considered parameters is
¯ P = 9.1890 5.0249 5.0249 9.2324
8 > > < > > :
a
and we used the fact that ¯ P
= x|
0 ¯
Px0 + trace( ¯ P 0 E[a2]
E[a2] = Z 0.5
−0.5
t2dt = 1/12
Since, and we obtain the cost .
9.95 x0 = [0 1]|
10
Alternatively we can simulate the system several times computing the cost for each simulation and then averaging (Monte Carlo method)
t
10 20
y(t)
0.5 1
t
10 20
v(t)
0.1
t
10 20
u(t)
0.5
t
10 20
y(t)
0.5 1
t
10 20
v(t)
0.1
t
10 20
u(t)
0.2
9.534 9.097
average cost of 5000 simulations: 9.95 Simulation 1 Simulation 2
Dynamic model xk+1 = fk(xk, uk, wk)
h−1
X
k=0
gk(xk, uk, wk) + gh(xh), π = {µ0, . . . , µh−1} Find a policy which minimizes Stochastic disturbances Cost
We assume that the initial state, the stochastic disturbances and the output noise have zero mean, are statistically independent and have a known probability distribution.
Output noise , ,
11
initial state unknown yk = hk(xk, uk−1, nk) Information set I0 = (y0) Ik = (y0, y1, . . . , yk, u0, u1, . . . , uk−1) k ≥ 1 uk = µk(Ik) Jπ = E[
h−1
X
k=0
gk(xk, µk(Ik), wk) + gh(xk)]
We can reformulate a partial information optimal control problem as a standard full information optimal control problem by considering the state to be the information set. The problem is that the state space dimension increases exponentially. Stage 0 Stage 1 Stage 2 Information set Measurement space y0 y2 y1 y0 y1 y0 u0 u1 uncertainty
12
I0 = (y0) I1 = (y0, y1, u0) I2 = (y0, y1, y2, u0, u1)
It is possible to show that the knowledge of the probability distribution of the state given the information obtained so far denoted by is sufficient to determine optimal decisions Stage 0 Stage 1 Stage 2 Probability space u0 uncertainty The space space dimension is now fixed. However, it is typically not easy to store .
13
Pxk|Ik uk = µk(Pxk|Ik) Px0|I0 Px1|I1 Px2|I2 Pxk|Ik
14
π = {µ0, . . . , µh−1} Find a policy which minimizes Dynamic model Cost function k ∈ {0, . . . , h − 1}
zero mean and independent E[ωk] = 0
Output equation xk+1 = Axk + Buk + ωk Ph−1
k=0[x| k u| k]
Q S S| R xk uk
hQhxh
Information set Ph−1
k=0 E[[x| k u| k]
Q S S| R xk uk
hQhxh]
noise: zero mean and independent
yk = Cxk + nk uk = µk(Ik) Ik = (y0, y1, . . . , yk, u0, u1, . . . , uk−1)
nk
V = E[nkn|
k]
with zero mean and covariance , i.e., .
covariance , denoted ¯ x0 .
f(x) =
1
√
(2π)kdet(Σ)e− 1
2 (x−¯
x0)| ¯ Φ−1 (x−¯ x0))
Prob[x0 ∈ A] = R
A f(x)dx
5
x1
x2
5 0.1 0.05
Probability Density
x0 = [1 2]| Example: 15
ωk
W = E[ωkω|
k]
wk ∼ N(0, W)
distribution with zero mean and covariance , i.e., . nk ∼ N(0, V ) ¯ Φ0 = E[(x0 − ¯ x0)(x0 − ¯ x0)|]
¯ Φ0 = 2 0.5 0.5 2
x0, ¯ Φ0)
16
z−1 + + Process/plant xk
B A C
uk yk
From our general discussion, we know that we can first estimate and then make decisions (control)
Estimator Decision marker/ controller nk wk Pxk|Ik
Pxk|Ik
17
z−1 + + Process/plant xk
B A C
uk yk
It turns out that:
which can be obtained by the Kalman filter. Kalman filter controller
estimate where the gains are obtained from the Riccati equations.
uk = Kkˆ xk|k
ˆ xk|k
uk = Kkˆ xk|k
ˆ xk|k
nk wk
Pxk|Ik
Φk|k = E[(xk − ˆ xk|k)(xk − ˆ xk|k)||Ik] Φk|k
18
The controller is called linear quadratic Gaussian (LQG) compensator.
feedback is available
known input, the Kalman filter can be shown to be the optimal estimator in the sense that for any vector and output there is no other estimator achieving smaller output variance
(including non-linear controllers) when the noise, disturbances and initial conditions are Gaussian (non Gaussian noise mentioned later)
the separation theorem.
*LQE linear quadratic estimator, other term for the Kalman filter
C ˆ xk|k zk = Fxk E[(zk − F ˆ xk|k)2|Ik]
19
z−1 + + Process/plant xk
B A C
uk yk
where we consider the average cost
Stationary Kalman filter controller ˆ xk
uk = Kxk
the control is the LQR controller with static gains
limT →∞ 1
T
PT −1
k=0 E[[x| k u| k]
Q S S| R xk uk
nk wk
20
xk+1 = Axk + Buk + ωk
Dynamic model Output equation Information set
Ik = (y0, y1, . . . , yk, u0, u1, . . . , uk−1) Given:
is Gaussian with mean and covariance iteratively computed by:
wk ∼ N(0, W) x0 ∼ N(¯ x0, ¯ Φ0)
Then:
Φk+1|k = AΦk|kA| + W Φk|k = Φk|k−1 − Φk|k−1C|(CΦk|k−1C| + V )−1CΦk|k−1 ˆ xk+1|k = Aˆ xk|k + Buk Lk = Φk|k−1C|(CΦk|k−1C| + V )−1 ˆ x0|−1 = ¯ x0 Initial condition k ≥ 0 Φ0|−1 = ¯ Φ0
yk = Cxk + nk
nk ∼ N(0, V )
ˆ xk|k = ˆ xk|k−1 + Lk(yk − Cˆ xk|k−1)
Pxk|Ik ∼ N(ˆ xk|k, Φk|k)
21
Why is Gaussian? Bayes’ rule! For simplicity consider the initial stage, suppose , x0 ∈ R y0 = x0 + n0 and let the a priori probability distributions be
x0
1 2 3 0.2 0.4 0.6 0.8 1 1.2
σ0 = 1/3
¯ x0 = 1
fx0(x0) =
1 √ 2πσ0 e − (x0−¯
x0)2 2σ2 y
x0-1 x0 x0+1 x0+2 x0+3 0.5 1 1.5 2 2.5 3 3.5 4
σn = 0.1
Think of as the position of a robot and as a sensor measurement. x0 y0 fy0|x0(y0) =
1 √ 2πσn e − (y0−x0)2
2σ2 n
Pxk|Ik ∼ N(ˆ xk|k, Φk|k)
22
Then, from Bayes’ rule Prob[x0 ∈ A|y0 ∈ B] = αProb[y0 ∈ B|x0 ∈ A]Prob[x0 ∈ A]
fx0|y0(x0) = αfy0|x0(y0)fx0(x0) =
1 √ 2πσ0|0 e −
(x0−ˆ x0|0)2 2σ2 0|0
also Gaussian! x0|0 = x0 + L0(y0 − x0)
(after some calculations)
= α
1 √ 2πσn e − (y0−x0)2
2σ2 n
1 √ 2πσ0 e − (x0−¯
x0)2 2σ2
L0 =
σ2 σ2
0+σ2 n
σ2
0|0 = σ2 0 − σ4 σ2
0+σ2 n
α ¯ x0, y0
23
x0
1 2 3 0.2 0.4 0.6 0.8 1 1.2
σ0 = 1/3
y
x0-1 x0 x0+1 x0+2 x0+3 0.5 1 1.5 2 2.5 3 3.5 4
σn = 0.1
Based on the measurement we update our new belief on y0 x0 Px0|I0 y = 1.2 Px0|y0 Px0 Py0|x0 y = 3 y = 0.5
x0
1 2 3 4 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x0
1 2 3 4 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x0
1 2 3 4 0.5 1 1.5 2 2.5 3 3.5 4 4.5
24
What about ? x1 = ax0 + bu0 + w x1
and therefore is Gaussian.
If we continue this procedure we obtain that is Gaussian for every .
Px1|I1 Pxk|Ik Px1|I1 Px1|u0,y0 k
25
Φk+1|k = AΦk|kA| + W Φk|k = Φk|k−1 − Φk|k−1C|(CΦk|k−1C| + V )−1CΦk|k−1 ˆ xk+1|k = Aˆ xk|k + Buk
interpretation. Based on the previous estimate predict in open loop the state at next step.
Based on the new measurements update the estimate
Innovation
prediction step the covariance (typically) increases.
ˆ xk|k = ˆ xk|k−1 + Lk(yk − Cˆ xk|k−1)
26
measurements and on the control input and can be computed a priori.
coincide with the Riccati equations from last lecture if we replace
Φk+1|k = AΦk|k−1A| − AΦk|k−1C|(CΦk|k−1C| + V )−1CΦk|k−1A| + W
Φk+1|k
(A, B, Q, R) → (A|, C|, W, V )
Φk+1|k → ¯ Φ k → ∞ Lk → ¯ L
ˆ xk+1|k = Aˆ xk|k + Buk
¯ Φ = A¯ ΦA| − A¯ ΦC|(C ¯ ΦC| + V )−1C ¯ ΦA| + W ¯ L = ¯ ΦC|(C ¯ ΦC| + V )−1
S = 0
is stable if
A + ˜ LC
(A, C)
where
˜ L = A¯ L
ˆ xk|k = ˆ xk|k−1 + ¯ L(yk − Cˆ xk|k−1)
*less stringent conditions for convergence can be inferred from the condition of the dual problem (slide 34, II_1).
27
Historical note
Rudolf E. Kalman (1930 – present) Kalman's ideas on filtering were initially met with vast skepticism, so much so that he was forced to do the first publication of his results in mechanical engineering, rather than in electrical engineering or systems engineering. Kálmán had more success in presenting his ideas, however, while visiting Stanley F. Schmidt at the NASA Ames Research Center in 1960. This led to the use of Kálmán filters during the Apollo program, and furthermore, in the NASA Space Shuttle, in Navy submarines, and in unmanned aerospace vehicles and weapons, such as cruise missiles (source: wikipedia).
28
xk+1 = Axk + Buk + ωk
Dynamic model Output equation
Given:
wk ∼ N(0, W) x0 ∼ N(¯ x0, ¯ Φ0)
limT →∞ 1
T
PT −1
k=0 E[[x| k u| k]
Q S S| R xk uk
Cost function
¯ Φ = A¯ ΦA| − A¯ ΦC|(C ¯ ΦC| + V )−1C ¯ ΦA| + W ¯ L = ¯ ΦC|(C ¯ ΦC| + V )−1
Obtain: Observer gains Controller gains LQG controller:
¯ K = −
PB+R −1 S| + B| ¯ PA
P =A| ¯ PA+Q−(S+A| ¯ PB)
PB+R −1(S|+B| ¯ PA)
ˆ xk+1|k = Aˆ xk|k + Buk
yk = Cxk + nk
nk ∼ N(0, V )
uk = ¯ Kˆ xk|k ˆ xk|k = ˆ xk|k−1 + ¯ L(yk − Cˆ xk|k−1)
29
instant , the sensor measurement is acquired, the control input is computed, and the actuation is updated.
uk
yk yk
which can be computed between time steps and .
k k k + 1 uk = ¯ Kˆ xk|k−1 uk = ¯ Kˆ xk|k−1 ˆ xk+1|k = Aˆ xk|k−1 + Buk + ˜ L(yk − Cˆ xk|k−1) ˜ L = A¯ L where .
A + ˜ LC
A + B ¯ K
this other implementation use ‘delayed’.
30
Linearized model (see [1, p. 32]) θ
[1] Feedback control of dynamic systems, Franklin, Powell, Emani-Naeini
d dt x ˙ x θ ˙ θ = 1 − (I+m`2)b
q m2g`2 q
1 − m`b
q mg`(M+m) q
x ˙ x θ ˙ θ +
I+ml2 q ml q
u(t)
q = (I + m`2)(M + m) − m2`2 State space m, I M x x = 0 ` u (I + m`2)¨ ✓ − mg`✓ = m`¨ x (M + m)¨ x + b ˙ x − m`¨ ✓ = u yk = ⇥1 0⇤ 2 6 6 4 x(kτ) ˙ x(kτ) θ(kτ) ˙ θ(kτ) 3 7 7 5 + nk nk ∼ N(0, σn)
τ
31
clear all, close all, clc rand('seed',1) % definition of the continuous-time model m = 0.2; M = 1; b = 0.05; I = 0.01; g = 9.8; l = 0.5; p = (I+m*l^2)*(M+m)-m^2*l^2; Ac = [0 1 0 0; 0 -(I+m*l^2)*b/p (m^2*g*l^2)/p 0; 0 0 0 1; 0 -(m*l*b)/p m*g*l*(M+m)/p 0]; Bc = [ 0; (I+m*l^2)/p; 0; m*l/p]; Cc = [1 0 0 0]; % discretization n = 4; tau = 0.1; sysd = c2d(ss(Ac,Bc,Cc,0),tau); A = sysd.a; B = sysd.b; C = sysd.c; % LQG control Q = diag([1 1 1 1]); S = zeros(4,1); R = 1; K = dlqr(A,B,Q,R,S); K = -K; W = diag([1 1 1 1]); V = 1; [~,~,~,L] = kalman(ss(sysd.a,[sysd.b eye(n)],sysd.c,[sysd.d zeros(1,n)],tau),W,V,'current');
Model definition Controller synthesis
32
% simulation kend = 10/tau; x0 = [1 0 0 0]'; x(:,1) = x0; xhat_1(:,1) = x0; sigman = 0.0001; for k=1:kend y(:,k) = C*x(:,k)+sigman*randn(1); xhat(:,k) = xhat_1(:,k) + L*(y(:,k)-C*xhat_1(:,k)); u(:,k) = K*xhat(:,k); xhat_1(:,k+1) = A*xhat(:,k) + B*u(:,k); x(:,k+1) = A*x(:,k)+B*u(:,k); end plot((1:kend)*tau,u), figure, plot((1:kend)*tau,x(3,1:end-1)), figure, plot((1:kend)*tau,x(1,1:end-1)),
Simulation
33
Q = I, S = 0, R = 1, τ = 0.1 W = I, V = 1 y θ u t t t
1 2 3 4 5 6 7 8 9 10σn = 0.0001 x0 = ⇥1 0⇤|
34
tuning knobs to improve the state estimation.
diagonal matrices.
the estimates are very noisy - in this case, pick weights to trust less the sensors and more the model, that is, increase and decrease
the control loop which is in general bad. In this case decrease and increase .
V = E[nkn|
k] W = E[ωkω| k]
W V V W
35
that the disturbances are Gaussian:
Summary After this lecture, you should be able to:
subject to disturbances and compute the corresponding expected cost.
compute the stationary solution of the Kalman filter.
minimizing the expected value of a finite-horizon of infinite-horizon quadratic cost Pxk|Ik Pxk|Ik
Example: Kalman filter
A1
Consider a mass spring system with unitary mass ¨ y(t) = −ksy(t) x(t) = y(t) ˙ y(t)
wish to obtain to obtain an estimate of the state at a rate from sensor measurements of position. wn = 2π τ = 0.1 ˙ x(t) = 1 −w2
n
{z }
Ac
x(t)
A2
To this effect we discretize the system, obtaining where the are independent Gaussian random variables with unitary variance modeling disturbances. dk xk+1 = 0.8090 0.0935 −3.6932 0.8090
{z }
A=eAcτ
xk + 0 σw
| {z }
wk
The output equation is yk = ⇥1 0⇤ | {z }
C
xk + σnvk |{z}
nk
where the are independent Gaussian random variables with unitary variance. nk Thus, V = E[nkn|
k] = σ2 n
W = E[ωkω|
k] =
0 σ2
ω
σω = 0.5
A3
Φk+1|k = AΦk|kA| + W Φk|k = Φk|k−1 − Φk|k−1C|(CΦk|k−1C| + V )−1CΦk|k−1 Lk = Φk|k−1C|(CΦk|k−1C| + V )−1
Assume that the covariance on the initial state ¯ Φ0 = E[(x0 − ¯ x0)(x0 − ¯ x0)|] is given by Φ0|−1 = ¯ Φ0 = 1 1
and obtain the gains
A4
For k = 0 Φ0|0 = 1 1
1 1 1
⇥1 0⇤ 1 1 1
0⇤ 1 1
0.0099 1
0.8090 0.0935 −3.6932 0.8090 0.0099 1 0.8090 0.0935 −3.6932 0.8090 | + 0 0.25
0.0152 0.0461 0.0461 1.0396
1
⇥1 0⇤ 1 1 1
A5
For L1 = 1
⇥1 0⇤ 0.0152 0.0461 0.0461 1.0396 1
k = 1
Φ1|1 = 0.0152 0.0461 0.0461 1.0396
0.0152 0.0461 0.0461 1.0396 1
⇥ 1 ⇤ 0.0152 0.0461 0.0461 1.0396 1
0.0152 0.0461 0.0461 1.0396
0.6037 1.8271
0.0060 0.0183 0.0183 0.9553
0.8090 0.0935 −3.6932 0.8090 0.0060 0.0183 0.0183 0.9553 0.8090 0.0935 −3.6932 0.8090 | + 0 0.25
0.0151 0.0599 0.0599 0.8484
k = 2 ...
A6
and that the mean of the initial estimate is ˆ x0|−1 = ¯ x0 = 1.3
Suppose that the true initial condition is x0 = 1
xk|k = ˆ xk|k−1 + Lk(yk − Cˆ xk|k−1)
ˆ xk+1|k = Aˆ xk|k + Buk
A7
For an output ˆ x0|0 = 1.3
0.9901
1.0999
x1|0 = 0.8090 0.0935 −3.6932 0.8090 1.0999
0.8899 −4.0623
x2|1 = 0.8090 0.0935 −3.6932 0.8090 0.8080 −4.3102
0.2504 −6.4710
x1|1 = 0.8899 −4.0623
0.6037 1.8271
0.8080 −4.3102
For an output y1 = 0.7542
A8
1 2
2 4 6 8 10
x xk|k−1 xk|k × +
˙ y
k = 0 k = 1 k = 2 k = 9
A9
0.5 1 1.5
0.5 1 1.5 2 2.5
0.2 0.4
k = 0 k = 1 { 1
4
pΦk|k−1x|kxk = 1} { 1
4
pΦk|kx|kxk = 1}