4SC000 Q2 2017-2018
Optimal Control and Dynamic Programming
Duarte Antunes
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III Continuous-time optimal control problems Recap Discrete optimization Stage decision problems problems Dynamic system & Formulation Transition
4SC000 Q2 2017-2018
Duarte Antunes
1
Discrete optimization problems Stage decision problems Formulation Transition diagram Dynamic system & additive cost function DP algorithm Graphical DP algorithm & DP equation DP equation Partial information Bayesian inference & decisions based on prob. distribution Kalman filter and separation principle Alternative algorithms Dijkstra's algorithm Static optimization
2
Introduce optimal control concepts for continuous-time optimal control problems
Discrete
problems Stage decision problems Continuous-time control problems Formulation Transition diagram Discrete-time system & additive cost function Differential equations & additive cost function DP algorithm Graphical DP algorithm & DP equation DP equation Hamilton Jacobi Bellman equation Partial information Bayesian inference & decisions based on
Kalman filter and separation principle Continuous-time Kalman filter and separation principle Alternative algorithms Dijkstra's algorithm Static optimization Pontryagin’s maximum principle
And analyze frequency-domain properties of continuous-time LQR/LQG
3
consider
t ∈ [0, T]
˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T] Z T g(x(t), u(t))dt + gT (x(T))
f, g f(t, x(t), u(t)), g(t, x(t), u(t)) x(t) ∈ Rn u(t) ∈ U ⊆ Rm
4
solution to the differential equation
(u(t), x(t)) u(t) x(t) , t ∈ [0, T]
intervals which shape the derivative of the state (and thus determine its evolution) ˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T] Z T g(x(t), u(t))dt + gT (x(T))
t = T x(T)
5
coincides with the cost of the optimal path to the problem
u(t) = µ(t, x(t)), t ∈ [0, T] x(t) = ¯ x t µ µ ˙ x(s) = f(x(s), u(s)), x(t) = ¯ x, s ∈ [t, T] J(t, ¯ x) Z T
t
g(x(s), µ(s, x(s)))ds + gT (x(T)) Z T
t
g(x(s), u(s))ds + gT (x(T))
6
and the Pontryagin’s maximum principle (PMP) shall allow us to compute optimal paths.
discretization step (previously sampling period), apply DP and take the limit as the discretization step converges to zero. CT DP DT DP Discretization, step τ τ → 0 Taking the limit Optimal path and policy Stage decision problem CT control problem Optimal path and policy
7
+ −
R C
+ −
u How to charge the capacitor in a RC circuit with minimum energy loss in the resistor?
i x
˙ x(t) =
1 RC (u(t) − x(t))
Let us consider R = C = T = xdesired = 1 x(T) = xdesired x(0) = 0 min
u(t)
Z T (x(t) − u(t))2 R dt
8
Dynamic model Cost function Discretization times
discretization step τ x(t) = e−(t−tk) x(tk) | {z }
xk
+(1 − e−(t−tk)) u(tk) | {z }
uk
Z 1 (x(t) − u(t))2dt =
h−1
X
k=0
Z tk+1
tk
(e−(t−tk)xk + (1 − e−(t−tk))uk − uk)2dt =
h−1
X
k=0
Z tk+1
tk
e−2(t−tk)dt(xk − uk)2 =
h−1
X
k=0
1 − e−2τ 2 (xk − uk)2 xk+1 = e−τxk + (1 − e−τ)uk t ∈ [tk, tk+1) kh = T tk = kτ
9
time
1 1 + ∆ 1 x(t) The framework of stage decision problems does not take into account terminal constraints. Thus we apply a trick considering that a final control input is applied at the terminal time setting the state to the desired terminal value after seconds, . x(1 + ∆) = e−∆x(1) + (1 − e−∆)u(1) Since this terminal control input is given by x(1 + ∆) = 1 ∆ u(1) = 1 − e−∆x(1) (1 − e−∆)
10
The following cost approximates the original one that we are interested in
1 − e−∆x(1) (1 − e−∆) terminal cost ∆ → 0 γ(∆) → ∞ as Note that but if γ(∆)(xh − 1)2 → 0 xh → 1 γ(∆) = 1 − e−2∆ 2(1 − e−∆)2 Z 1+∆ (x(t) − u(t))2dt = Z 1 (x(t) − u(t))2dt + Z 1+∆
1
(x(t) − u(t))2dt =(
h−1
X
k=0
1 − e−2τ 2 (xk − uk)2) + 1 − e−2∆ 2 (xh − uh)2 =(
h−1
X
k=0
1 − e−2τ 2 (xk − uk)2) + γ(∆)(xh − 1)2
11
Jk(xk) = min
uk
(xk − uk)2 + Jk+1(e−τxk + (1 − e−τ)uk) Applying DP Jh(xh) = γ(∆)(xh − 1)2 Results in Obtained from Riccati equations Example τ = 0.2 ∆ = 0.01
time t
0.2 0.4 0.6 0.8 1
x(t)
0.2 0.4 0.6 0.8 1
time t
0.2 0.4 0.6 0.8 1
u(t)
0.5 1 1.5 2
uk = Kkxk + αk Jk(xk) = θkx2
k + γkxk + βk
12
Seems to be converging to u(t) = 1 + t x(t) = t . Later we will prove this. ∆ = 0.01 ∆ = 0.001 τ = 0.01 τ = 0.05 τ = 0.01 ∆ = 0.01
time t
0.2 0.4 0.6 0.8 1
x(t)
0.2 0.4 0.6 0.8 1
time t
0.2 0.4 0.6 0.8 1
u(t)
0.5 1 1.5 2
time t
0.2 0.4 0.6 0.8 1
x(t)
0.2 0.4 0.6 0.8 1
time t
0.2 0.4 0.6 0.8 1
u(t)
0.5 1 1.5 2
time t
0.2 0.4 0.6 0.8 1
x(t)
0.2 0.4 0.6 0.8 1
time t
0.2 0.4 0.6 0.8 1
u(t)
0.5 1 1.5 2
13
min
u0,...,uh−1 h−1
X
k=0
(1 − e−2τ) 2 (xk − uk)2 xk+1 = e−τxk + (1 − e−τ)uk s.t. x0 = 0 xh = 1 Static optimization problem which can handle constraints Lagrangian
L(x1, u0, λ1, . . . , xh−1, uh−1, λh) =
h−1
X
k=0
(1 − e−2τ) 2 (xk−uk)2+
h−1
X
k=0
λk+1(e−τxk+(1−e−τ)uk−xk+1)
Necessary optimality conditions amount to solving a linear system (when ) ∂L ∂xk = 0 ∂L ∂uk = 0 λk = (1 − e−2τ)(xk − uk) + λk+1e−τ 0 = (1 − e−2τ)(xk − uk) + λk+1(1 − e−τ) xk+1 = e−τxk + (1 − e−τ)uk x0 = 0 xh = 1 k ∈ {0, . . . , h − 1} k ∈ {0, . . . , h − 1} k ∈ {1, . . . , h − 1} k ∈ {0, . . . , h − 1} ∂L ∂λk+1 = 0
14
Again, seems to be converging to u(t) = 1 + t x(t) = t τ = 0.05 τ = 0.2 τ = 0.01
time t
0.2 0.4 0.6 0.8 1
x(t)
0.2 0.4 0.6 0.8 1
time t
0.2 0.4 0.6 0.8 1
u(t)
0.5 1 1.5 2
time t
0.2 0.4 0.6 0.8 1
x(t)
0.2 0.4 0.6 0.8 1
time t
0.2 0.4 0.6 0.8 1
u(t)
0.5 1 1.5 2
time t
0.2 0.4 0.6 0.8 1
x(t)
0.2 0.4 0.6 0.8 1
time t
0.2 0.4 0.6 0.8 1
u(t)
0.5 1 1.5 2
15
approach can be found in Bertsekas’ book) to derive the counterpart of DP for continuous-time control problems, which is the Hamilton Jacobi Bellman equation
derive the Pontryagin’s maximum principle.
capacitor, and solve many other problems. CT PMP CT DP DT PMP DT DP Discretization, step τ τ → 0 Taking the limit Optimal path and policy Stage decision problem CT control problem Optimal path and policy
16
Dynamic model Cost function
discretization, as in the linear case, but this approximation will suffice.
Discretization times
discretization step τ kh = T tk = kτ xk+1 = xk + τf(xk, uk) xk = x(kτ) uk = u(kτ) ˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T] Z T g(x(t), u(t))dt + gT (x(T))
h−1
X
k=0
g(xk, uk)τ + gh(xh) gh(x) = gT (x), ∀x
17
DP equations for the resulting stage decision problem Jh(xh) = gh(xh) Jk(xk) = min
uk∈U
g(xh, uk)τ + Jk+1(xk + τf(xk, uk)) For convenience let us define ¯ J(kτ, x) = min
u∈U
g(x, u)τ + ¯ J((k + 1)τ, x + τf(x, u)) ¯ J(hτ, x) = Jh(x) ¯ J(t, x) = Jk(x), k ∈ [kτ, (k + 1)τ) Then the dynamic programming algorithm can be written as k ∈ {h − 1, . . . , 0}
k ∈ {h − 1, . . . , 0}
∀x
¯ J(hτ, x) = gh(x) ∀x
∀x
18
Using first order Taylor series expansion
¯ J((k +1)τ, x+τf(x, u)) = ¯ J(kτ, x)+τ( ∂ ∂t ¯ J(kτ, x)+ ∂ ∂x ¯ J(kτ, x)f(x, u))+o(τ 2)
and replacing in the DP algorithm, we obtain Assuming that (wishful thinking....) as , converges to a continuously differentiable function, then
¯ J(kτ, x) = min
u∈U
g(x, u)τ + ¯ J(kτ, x)+τ( ∂ ∂t ¯ J(kτ, x)+ ∂ ∂x ¯ J(kτ, x)f(x, u))+o(τ 2)
¯ J(t, x)
0 = min
u∈U
g(x, u) + ∂ ∂t ¯ J(t, x) + ∂ ∂x ¯ J(t, x)f(x, u)
τ → 0
19
Suppose that is continuously differentiable in and , and is such that it satisfies the Hamilton-Jacobi-Bellman equation: V (t, u) t x 0 = min
u∈U
g(x, u) + ∂ ∂tV (t, x) + ∂ ∂xV (t, x)f(x, u) ∀t, x V (T, x) = gT (x) Suppose also that attains the minimum in the HJB equation for all . u = µ(t, x) t, x Then coincides with the optimal cost-to-go and coincides with the optimal policy. V (t, x) J(t, x) µ(t, x)
20
differential equation is just an extension of the DP algorithm.
the cost-to-go is differentiable.
time approach. It can be found in Bertsekas’ book, pag 111.
analytically.
then for linear systems and solve the previous problem of charging a capacitor.
21
For the simple problem* ˙ x(t) = u(t) u(t) ∈ U := [−1, 1]
1 2(x(T))2
dynamics cost t ∈ [0, T] The HJB equation is with the terminal condition Approach: find a candidate for optimality and check that it satisfies HJB. V (T, x) = 1 2x2
* example taken from Bertsekas’ book, p. 112
0 = min
u∈[−1,1]
∂ ∂tV (t, x) + ∂ ∂xV (t, x)u
22
There is an obvious candidate for optimality: move the state towards zero as quickly as possible and for an initial time and initial state , the cost is given by µ∗(t, x) = −sign(x) = 1 if x < 0, 0 if x = 0, − 1 if x > 0
t
x J∗(t, x) = 1 2(max{0, |x| − (T − t)})2 x T − t −(T − t)
23
This function satisfies the terminal condition of the HJB theorem J∗(T, x) = 1 2x2 satisfies the HJB equation 0 = min
u∈[−1,1][1 + sgn(x)u]max{0, |x| − (T − t)}
µ ∗ (t, x) = u = −sign(x) where the minimum in the HJB equation is achieved by (not unique when ) |x(t)| ≤ T − t Then this is an optimal policy.
∂ ∂xJ∗(t, x) = sign(x) max{0, |x| − (T − t)} ∂ ∂tJ∗(t, x) = max{0, |x| − (T − t)}
24
HJB Dynamic model Cost function Inspired by the fact that a discretization based approach would result in quadratic costs-to-go, let us try . If such function satisfies the HJB equations, it is the cost-to-go! V (t, x) = x|P(t)x ˙ x(t) = Ax(t) + Bu(t) x(0) = x0 0 = min
u∈Rm[x|Qx + 2x|Su + u|Ru + ∂V (t, x)
∂t + ∂V (t, x) ∂x (Ax + Bu)] V (T, x) = x|QT x x(T)|QT x(T) + Z T (x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt
Q S S| R
25
The HJB equation takes then the form To obtain the minimum, differentiate and equate to zero
which leads to which is only satisfied if We have concluded that if satisfies this Riccati equation, then is the cost-to-go and is the optimal policy. P(T) = QT P(T) = QT P(T) = QT P(t) J(t, x) = x|P(t)x µ(t, x) = K(t)x K(t)
0 = min
u∈Rm[x|Qx + 2x|Su + u|Ru + x| ˙
P(t)x + 2x|P(t)Ax + 2x|P(t)Bu)] 2(B|P(t) + S|)x + 2Ru = 0 u = −R−1(B|P(t) + S|)x 0 = x|( ˙ P(t) + P(t)A + A|P(t) − (P(t)B + S)R−1(B|P(t) + S|) + Q)x ˙ P(t) = −(P(t)A + A|P(t) − (P(t)B + S)R−1(B|P(t) + S|) + Q)
26
Finite horizon
The optimal control policy for the following problem is where is the unique solution of P(T) = QT P(t) ˙ x(t) = Ax(t) + Bu(t) u(t) = K(t)x(t) , x(0) = x0 the Riccati equation K(t) = −R−1(B|P(t) + S|) , Moreover, the optimal cost-to-go is given by x|
0P(0)x0
min
u
Z T (x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt + x(T)|QT x(T) ˙ P(t) = −(P(t)A + A|P(t) − (P(t)B + S)R−1(B|P(t) + S|) + Q)
27
Infinite horizon
The reasoning follows from similar arguments used in the context of stage decision problems. The optimal policy for the following problem is , where is the unique positive definite solution to the algebraic Riccati equation ˙ x(t) = Ax(t) + Bu(t) x(0) = x0 u(t) = Kx(t) (A + BK) P Moreover the closed-loop matrix has all its eigenvalues on the left-half complex plane and the optimal cost-to-go is given by . 0 = PA + A|P − (PB + S)R−1(B|P + S|) + Q K = −R−1(B|P + S|) x|
0Px0
Q S S| R
(A, B) controllable min
u
Z ∞ (x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt
28
Applying a trick allows to cast our problem in the standard LQR formulation ˙ x(t) = −x(t) + u(t) R 1
0 (x(t) − u(t))2dt + γ(x(1) − 1)2
|{z}
|{z}
|{z}
|{z}
|{z}
R 1 ⇥x(t) y(t)⇤ 1 x(t) y(t)
⇥x(t) y(t)⇤ −1
⇥x(1) y(1)⇤ γ −γ −γ γ x(1) y(1)
B S R QT Q
|{z}
Dynamic model Cost function ˙ x(t) ˙ y(t)
−1 x(t) y(t)
1
x(0) y(0)
x0 1
29
P(T) = QT ˙ P(t) = −(P(t)A + A|P(t) − (P(t)B + S)R−1(B|P(t) + S|) + Q) Riccati equations P(t) = p1(t) p2(t) p2(t) p3(t)
˙ p1(t) ˙ p2(t) ˙ p2(t) ˙ p3(t)
p1(t) p2(t) p2(t) p3(t) −1
−1 p1(t) p2(t) p2(t) p3(t)
p1(t) − 1 p2(t) ⇥p1(t) − 1 p2(t)⇤ − 1 1
˙ p1(t) = 2p1(t) + (p1(t) − 1)2 − 1 = p1(t)2 ˙ p2(t) = p2(t) + p2(t)(p1(t) − 1) = p1(t)p2(t) ˙ p3(t) = p2(t)2 p1(1) = −p2(1) = p3(1) = γ whose solution is (solution method not addressed here) p1(t) = −p2(t) = p3(t) = 1 1 + 1
γ − t
30
Optimal policy u(t) = −R−1(B|P + S) x(t) y(t)
⇥−(p1(t) − 1) −p2(t)⇤ x(t) 1
Optimal path for x(0) = 0 p1(t) = 1 1 + 1
γ − t
˙ x(t) = −x(t) + u(t) = −p1(t)(x(t) − 1) Letting the parameter of the artificial terminal cost converge to zero we obtain x(t) = t − (1 + 1
γ )
1 + 1
γ
+ 1 u(t) = 1 + t x(t) = t ∆ → 0 (γ → ∞)
31
solution is very hard to find.
equations (Riccati equations).
the state-space in our example was small.
different conditions which can be applied to more cases.
having a policy are exactly the same as for stage decision problems.
32
(LQR in continuous-time) when the horizon is infinite.
Summary: After this lecture you should be able to:
quadratic cost (Riccati equations).