4SC000 Q2 2017-2018
Optimal Control and Dynamic Programming
Duarte Antunes
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Static optimization approach to PMP Linear systems, quadratic cost, terminal constraints Shooting method Recap: continuous-time optimal control
4SC000 Q2 2017-2018
Duarte Antunes
1
Dynamic model Cost function The goal in this lecture is to find an optimal path using a new tool: Pontryagin’s maximum principle ˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T] Z T g(x(t), u(t))dt + gT (x(T))
(u(t), x(t))
2
via the discretization approach using static optimization.
discussed in the appendix (calculus of variations) and in the next lecture. continuous-time approach discretization approach Discretization, step τ τ → 0 Taking the limit Optimal path and policy Stage decision problem CT control problem Optimal path and policy solve optimal control problem
3
Dynamic model Cost function Discretization times
discretization step τ kh = T tk = kτ xk+1 = xk + τf(xk, uk) xk = x(kτ) uk = u(kτ) ˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T] Z T g(x(t), u(t))dt + gT (x(T))
h−1
X
k=0
g(xk, uk)τ + gh(xh) gh(x) = gT (x), ∀x
4
The Lagrangian is given by Then, the optimal solution (optimal path) must satisfy where λ = (λ1, . . . , λh) λi ∈ Rn ∂L(x, u, λ) ∂xk = 0 ∂L(x, u, λ) ∂uk = 0 ∂L(x, u, λ) ∂λk = 0 k ∈ {0, . . . , h − 1} k ∈ {1, . . . , h} k ∈ {1, . . . , h} x = (x1, x2, . . . , xh) u = (u0, u1, . . . , uh−1) L(x, u, λ) =
h−1
X
k=0
g(xk, uk)τ + gh(xh) +
h−1
X
k=0
λ|
k+1(xk + τf(xk, uk) − xk+1)
5
Variables
xk = x1,k x2,k . . . xn,k xi,k ∈ R ui,k ∈ R uk = u1,k u2,k . . . um,k λk = λ1,k λ2,k . . . λn,k λi,k ∈ R
Functions
fk : Rn × Rm → Rn gk : Rn × Rm → R gh : Rn → R fk(xk, uk) = f1,k(xk, uk) f2,k(xk, uk) . . . fn,k(xk, uk)
Derivatives
∂ ∂xh gh(xh) =
h
∂ ∂x1,h gh(xh) ∂ ∂x2,h gh(xh)
. . .
∂ ∂xn,h gh(xh)
i
For the problem
xk+1 = fk(xk, uk) Ph−1
k=0 gk(xk, uk) + gh(xh)
6
∂ ∂xk fk(xk, uk) =
∂ ∂x1,k f1,k(xk, uk) ∂ ∂x2,k f1,k(xk, uk)
. . .
∂ ∂xn,k f1,k(xk, uk) ∂ ∂x1,k f2,k(xk, uk) ∂ ∂x2,k f2,k(xk, uk)
. . .
∂ ∂xn,k f2,k(xk, uk)
. . . . . . . . . . . .
∂ ∂x1,k fn,k(xk, uk) ∂ ∂x2,k fn,k(xk, uk)
. . .
∂ ∂xn,k fn,k(xk, uk)
∂ ∂uk fk(xk, uk) =
∂ ∂u1,k f1,k(xk, uk) ∂ ∂u2,k f1,k(xk, uk)
. . .
∂ ∂um,k f1,k(xk, uk) ∂ ∂u1,k f2,k(xk, uk) ∂ ∂u2,k f2,k(xk, uk)
. . .
∂ ∂um,k f2,k(xk, uk)
. . . . . . . . . . . .
∂ ∂u1,k fn,k(xk, uk) ∂ ∂u2,k fn,k(xk, uk)
. . .
∂ ∂um,k fn,k(xk, uk)
∂ ∂uk gk(xk, uk) =
h
∂ ∂u1,k gk(xk, uk) ∂ ∂u2,k gk(xk, uk)
. . .
∂ ∂um,k gk(xk, uk)
i
∂ ∂xk gk(xk, uk) =
h
∂ ∂x1,k gk(xk, uk) ∂ ∂x2,k gk(xk, uk)
. . .
∂ ∂xn,k gk(xk, uk)
i
Derivatives
∈ Rn×n ∈ Rn×m ∈ R1×m ∈ R1×n
7
∂L(x, u, λ) ∂xk = 0 ∂L(x, u, λ) ∂uk = 0 ∂L(x, u, λ) ∂λk = 0 ∂L(x, u, λ) ∂xh = 0
k ∈ {1, . . . , h − 1}
k ∈ {0, . . . , h − 1} ∂ ∂uk g(xk, uk)τ + λ|
k+1
∂ ∂uk f(xk, uk)τ = 0 k ∈ {1, . . . , h} xk − xk−1 τ = f(xk−1, uk−1) xk = xk−1 + τf(xk−1, uk−1) ∂ ∂xk g(xk, uk)τ + λ|
k+1(I +
∂ ∂xk f(xk, uk)τ) − λk = 0 ∂ ∂xk g(xk, uk) + λ|
k+1( ∂
∂xk f(xk, uk)) = −(λ|
k+1 − λ| k
τ )
∂ ∂xh gh(xh) − λ| h = 0
8
Assuming that (wishful thinking....), as , converges to a continuously differentiable function, then
τ → 0
Let ¯ λ(t) = λk, t ∈ [kτ, (k + 1)τ) ¯ λ(t) Moreover, naturally and we also have xk − xk−1 τ = f(xk−1, uk−1) ˙ x(t) = f(x(t), u(t)) τ → 0 ˙ ¯ λ(t) = −( ∂f
∂x)|¯
λ(t) − ( ∂g
∂x)|
∂ ∂xk g(xk, uk) + λ|
k+1( ∂
∂xk f(xk, uk)) = −(λ|
k+1 − λ| k
τ )
τ → 0 ∂ ∂uk g(xk, uk)τ + λ|
k+1
∂ ∂uk f(xk, uk)τ = 0 τ → 0
∂ ∂ug(x(t), u(t)) + ¯
λ(t)| ∂
∂uf(x(t), u(t)) = 0
¯ λ(T) =
∂ ∂xgT (x(T))|
τ → 0 ∂ ∂xh gT (xh) − λ|
h = 0
9
If is an optimal path for the continuous-time optimal control problem, then there exists a function , denoted by co-state, such that (u∗(t), x∗(t)) (no constraints state and no input constraints, free terminal state) λ(t), t ∈ [0, T] ˙ x∗(t) = f(x∗(t), u∗(t)) ˙ λ(t) = −( ∂
∂xf(x∗(t), u∗(t)))|λ(t) − ( ∂ ∂xg(x∗(t), u∗(t)))| ∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0
x(0) = ¯ x0 (given) (terminal constraint for the co-state) λ(T) =
∂ ∂xgT (x∗(T))|
10
and uses arguments radically different from the intuitive arguments that we have used.
reason about the conditions appearing in the theorem.
˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T]
x(T) = ¯ xf
∂ ∂xh gT (xh) − λ|
h = 0
Following the discretization + static optimization approach, we obtain the same necessary equations for optimality except since the terminal state is now constant. In fact, the next result holds.
min
u
Z T g(x(t), u(t))dt
11
If is an optimal path for the continuous-time optimal control problem with terminal constraint . , then there exists a function such that (u∗(t), x∗(t)) (no constraints state and no input constraints, constrained terminal state) λ(t), t ∈ [0, T] ˙ x∗(t) = f(x∗(t), u∗(t)) ˙ λ(t) = −( ∂
∂xf(x∗(t), u∗(t)))|λ(t) − ( ∂ ∂xg(x∗(t), u∗(t)))| ∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0
x(0) = ¯ x0 (given) Note that contrarily to the previous case there is no constraint on the terminal value of the co-state.
x(T) = ¯ xf x(T) = ¯ xf
12
Consider a problem similar to a linear quadratic regulation problem for a scalar system but where the additive control input is a nonlinear function ˙ x(t) = ax(t) + `(u(t)) ` PMP equations ˙ λ(t) = −aλ(t) − qx(t) ru(t) + λ(t) d`(u(t))
du
= 0 min 1
2(
R T
0 qx(t)2 + ru(t)2dt + gT x(T)2)
˙ x(t) = ax(t) + `(u(t)) Boundary conditions x(0) = 1 x(0) = 1 λ(T) = gT x(T)
13
If q = 0 r = 1 gT = 1 `(u) = − log(u) T = 1 u(t) − λ(t)
1 u(t) = 0
x(0) = 1 λ(1) = x(1)
*only the positive root makes sense *
If we integrate the state equation (1) from zero to (variation of constants formula) we can obtain the value of (1) T = 1 x(1) ˙ x(t) = −x(t) − log(u(t)) ˙ λ(t) = λ(t) λ(t) = et−1x(1) u(t) = e
t−1 2 p
x(1) ˙ x(t) = −x(t) − (t−1)
2
− log( p x(1)) ˙ x(t) = −x(t) − (t−1)
2
− 1
2 log(x(1))
x(1) = e−1 x(0) |{z}
1
+ R 1
0 e−(1−s)( −s 2 + 1 2(1 − log(x(1)))ds
Replacing in the formulas above we get the optimal path a = −1 x(1) = 1
2(1 − log(x(1))(1 − 1 e))
x(1) = 0.5215
14
say of
∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0
which is sometimes called the control equation. Note that in fact these are equations and, at least implicitly, we can write as a function of u x, λ m
2n u u = h(x, λ) ˙ x(t) = f(x(t), h(x(t), λ(t))) x, λ ˙ λ(t) = −( ∂
∂xf(x(t), h(x(t), λ(t)))|λ(t) − ( ∂ ∂xg(x(t), h(x(t), λ(t)))|
with a known initial condition . The latter eq. is also known as adjoint eq. x(0) t ≥ 0
λ(0)
15
constrained, in which case are free and the terminal conditions of the PMP hold only for the remaining co-states var. λj(T) =
∂ ∂xj gT (x(T)), j /
∈ C λi xi(T) = ¯ xi, i ∈ C An example using this fact will be discussed later
quadratic cost) it is possible to solve these equations as a function of and then pick to satisfy terminal conditions
λ(0) λ(0) x(T) = ¯ xf or λ(T) =
∂ ∂xgT (x(T)∗)
a numerical method (e.g. shooting method discussed later, which actually tries to guess and see if the terminal conditions are met).
λ(0)
16
(u∗(t), x∗(t)) ˙ x∗(t) = f(x∗(t), u∗(t)) ˙ λ(t) = −( ∂
∂xf(x∗(t), u∗(t)))|λ(t) − ( ∂ ∂xg(x∗(t), u∗(t)))| ∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0
λ(T) =
∂ ∂xgT (x∗(T))
State eq. Adjoint eq. Control eq. The boundary conditions depend on the constraints on the terminal state: ∃λ(t), t ∈ [0, T] is an optimal path candidate if s.t. x∗(0) = ¯ x0 x∗(0) = ¯ x0 x∗(T) = ¯ x x∗(0) = ¯ x0 x∗
i (T) = ¯
xi, i ∈ C λ∗
j(T) = ∂ ∂xj gT (x∗(T)), j /
∈ C (no terminal state constraints) (terminal state fully constrained) (only some components of the state are constrained)
17
If we define the Hamiltonian the conditions of the PMP take the following elegant form Moreover, and therefore the Hamiltonian remains constant along optimal paths! H(x, u, λ) = g(x, u) + λ|f(x, u) ˙ λ(t) = − ∂ ∂xH(x∗(t), u∗(t), λ(t))| ˙ x∗(t) = ∂ ∂λH(x∗(t), u∗(t), λ(t))| ∂ ∂uH(x∗(t), u∗(t), λ(t)) = 0 d dtH(x∗(t), u∗(t), λ(t)) = (∂H ∂x + d dtλ(t))f(x∗(t), u∗(t)) + ∂H ∂u d dtu = 0
18
function
this is often called the Pontryagin’s minimum principle.
stationary point would be a maximum, and therefore in some literature the nomenclature Pontryagin’s maximum principle is used.
common we will use it in the course. has a stationary point as a function of the control input when we fix the state and the co-state of the optimal path. ∂ ∂uH(x∗(t), u∗(t), λ(t)) = 0 u(t) → H(x∗(t), u(t), λ(t))
19
Dynamic model Cost function ˙ x(t) = Ax(t) + Bu(t) x(0) = x0 PMP necessary conditions for optimality 1 2(x(T)|QT x(T) + Z T (x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt) ˙ λ(t) = −A|λ(t) − (Qx(t) + S|u(t)) λ(T) = QT x(T) u(t) = −R−1(B|λ(t) + S|x(t)) B|λ(t) + S|x(t) + u(t)|R = 0 ˙ x(t) = Ax(t) + Bu(t) x(0) = x0
20
λ(T) = QT x(T) x(0) x(T) Given a continuous-time optimal control problem with linear dynamic model and quadratic cost: u(t) = −R−1(B|λ(t) + S|x(t))
known and one of the following:
H
x(T) λ(T)
x(0) λ(0)
x(t) λ(t)
x(0) λ(0)
˙ x(t) ˙ λ(t)
A − BR−1S| −BR−1B| −(Q − SR−1S|) −(A| − SR−1B|) x(t) λ(t)
(i) (ii) (iii) xi(T), i ∈ C λj(T) =
∂ ∂xj QT x(T), j /
∈ C known This leads to a linear system with 2n equation and 2n unknowns, which allows to obtain
21
+ −
R C
+ −
u How to charge the capacitor in a RC circuit with minimum energy loss in the resistor?
i x
˙ x(t) =
1 RC (u(t) − x(t))
Let us consider R = C = T = xdesired = 1 x(T) = xdesired x(0) = 0 min
u(t)
Z T (x(t) − u(t))2 R dt
22
H(x, u, λ) = (x − u)2 + λ(−x + u)
∂ ∂uH(x, u, λ) = 0
−2(x − u) + λ = 0 u = x − λ
2
˙ λ = − ∂
∂xH
˙ λ = −2(x − u) + λ ˙ λ = 0 ˙ x = −x + u = − λ(0)
2
x(t) = − λ(0)
2 t + x(0)
|{z}
=0
Hamiltonian PMP equations Impose boundary conditions
1 = x(1) = − λ(0)
2
λ(0) = −2
Optimal solution (as derived in previous lecture)
u(t) = 1 + t x(t) = t
23
How to move a crane from rest at point A to rest at point B in a fixed amount of time with minimum energy ?
(x, θ, ˙ x, ˙ θ) = (xA, 0, 0, 0) (x, θ, ˙ x, ˙ θ) = (xB, 0, 0, 0) T
min R T
0 u(t)2dt
24
m = 0.2; M = 1; b = 0.05; I = 0.01; g = 9.8; l = 0.5; p = (I+m*l^2)*(M+m)-m^2*l^2; Ac = [0 1 0 0; 0 -(I+m*l^2)*b/p (m^2*g*l^2)/p 0; 0 0 0 1; 0 (m*l*b)/p -m*g*l*(M+m)/p 0]; Bc = [ 0; (I+m*l^2)/p; 0;
Qc = zeros(4,4); Rc = 1; n = 4; T = 1; x0 = [0 0 0 0]'; xf = [1 0 0 0]'; % 1 Hamiltonian H = expm( [Ac -Bc*inv(Rc)*Bc';-Qc -Ac']*T); H11 = H(1:n,1:n); H12 = H(1:n,n+1:2*n); H21 = H(n+1:2*n,1:n); H22 = H(n+1:2*n,n+1:2*n); % 2 obtain lambda0 lambda0 = H12\(xf-H11*x0); % 3 obtain x, lambda, u at times ktau tau = 0.01; N = round(T/tau); x = zeros(4,N+1); lambda = zeros(4,N+1); u = zeros(1,N+1); for k = 1:N+1 XL = expm((k-1)*[Ac -Bc*inv(Rc)*Bc';-Qc -Ac']*tau)*[x0; lambda0]; x(:,k) = XL(1:4); lambda(:,k) = XL(5:8); u(:,k) = -inv(Rc)*Bc'*lambda(:,k); end plot((0:N)*tau,u), xlabel('t'), ylabel('u'), grid on, set(gca,'Fontsize',16) figure, plot((0:N)*tau,x(3,:)), xlabel('t'), ylabel('\theta'), grid on, set(gca,'Fontsize',16) figure, plot((0:N)*tau,x(1,:)), xlabel('t'), ylabel('x'), grid on, set(gca,'Fontsize',16)
25
t
0.2 0.4 0.6 0.8 1
x
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t
0.2 0.4 0.6 0.8 1
3
0.2 0.4 0.6 0.8
t
0.2 0.4 0.6 0.8 1
u
10 20 30 40 50
xA = 0, xB = 1, T = 1
26
t
0.5 1 1.5 2
x
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t
0.5 1 1.5 2
3
0.05 0.1 0.15 0.2 0.25
t
0.5 1 1.5 2
u
0.5 1 1.5 2
xA = 0, xB = 1, T = 2
27
xA = 0, xB = 1, T = 10
t
2 4 6 8 10
x
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t
2 4 6 8 10
3
0.005 0.01 0.015
t
2 4 6 8 10
u
0.02 0.04 0.06 0.08
28
2n ˙ x(t) = f(x(t), h(x(t), λ(t))) ˙ λ(t) = −( ∂
∂xf(x(t), h(x(t), λ(t)))|λ(t) − ( ∂ ∂xg(x(t), h(x(t), λ(t)))|
with a known initial condition , but unknown x(0) t ≥ 0
and then obtain by imposing terminal the boundary conditions λ(0) λ(0) λ(0)
x(T) = ¯ xf or λ(T) =
∂ ∂xgT (x(T)∗)
differential equations is small, and we in general need numerical methods.
and will allow us solve some problems where is small. λ(0) n
[14] John Betts, Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, SIAM, 2010
29
Main idea ˙ x(t) = f(x(t), h(x(t), λ(t))) ˙ λ(t) = −( ∂
∂xf(x(t), h(x(t), λ(t)))|λ(t) − ( ∂ ∂xg(x(t), h(x(t), λ(t)))|
Problem: find such that (or other bound. cond. ) is met after integrating: x(0)
x(T) = ¯ xf
λ(0) λ(1)(0) λ(1)(0) λ(1)(0) with initial conditions and x(0) λ(0) time time t t T T ¯ xf from these “shots” pick , since it satisfies λ(1)(0)
x(T) = ¯ xf
30
(solved next, a different solution for a similar problem is available in Bryson&Ho’s book, Sec 2.4) Consider a particle of mass acted upon by a thrust force of magnitude Suppose that the initial position of the particle at time is and the initial velocity is zero. We wish to transfer the particle to a path parallel to the x-axis, a distance away, in a given time , arriving with the maximum value of . We do not care what the final velocity . Suppose that and .
m m
˙ u = cos(β) ˙ v = sin(β) ˙ x = u ˙ y = v t = 0 (y(0), x(0)) = (0, 0) h T h = 1 T = 3 and described by the following equations, where the angle determines the thrust direction β(t) u(T)
x(T)
u(T)
31
Notation for PDP
˙ x = f(x, u) x = x y u v u = β f(x, u) = u v cos(β) sin(β) min −x(T) | {z }
gT (x(T ))
g(x, u) = 0
Optimal control problem Hamiltonian
H(x, u, λ) = λxu + λyv + λu cos(β) + λv sin(β)
Co-state
λ = ⇥λx λy λu λv ⇤|
Terminal conditions (only two terminal states are specified - two additional constraints on the terminal co-states associated with the unrestricted states)
y(T) = h v(T) = 0 λx(T) = ∂gT (x(T)) ∂x = −1 λu(T) = ∂gT (x(T)) ∂u = 0
32
Note that
˙ λ(t) = −[ ∂ ∂xH]| = − h
∂ ∂xH ∂ ∂yH ∂ ∂uH ∂ ∂vH
i| ˙ λx(t) ˙ λy(t) ˙ λu(t) ˙ λv(t) = −λx(t) −λy(t) ˙ x(t) ˙ y(t) ˙ u(t) ˙ v(t) = u(t) v(t) cos(arctan( λv(t)
λu(t)))
sin(arctan( λv(t)
λ(t) ))
= u(t) v(t)
1 q 1+( λv(t)
λu(t) )2 λv(t) λu(t)
q 1+( λv(t)
λu(t) )2
cos(arctan(x)) = 1 √ 1 + x2 , sin(arctan(x)) = x √ 1 + x2
∂H ∂u (x, u, λ) = 0 −λu sin(β) + λv cos(β) = 0 β = arctan(λv λu )
33
Moreover, note that and then Therefore we just need to search for to satisfy A direct application of the shooting method would lead us to search in a four-dimensional space (initial condition of co-state). However, we already know that:
˙ λu(t) = −λx(t), λu(3) = 0 ˙ λx(t) = 0
λx(T) = −1
λx(0) = −1 λy(0), λv(0) y(T) = h v(T) = 0 λv(t) = λv(0) − λy(0)t ˙ v(t) =
λv(0)−λy(0)t t−3
q 1 + ( λv(0)−λy(0)t
t−3
)2 ˙ y(t) = v(t) λu(t) = t − 3 λu(0) = −3
34
˙ v(t) =
λv(0)−λy(0)t t−3
q 1 + ( λv(0)−λy(0)t
t−3
)2 ˙ y(t) = v(t)
Shooting method boils down to find to satisfy after integrating from 0 to
λy(0), λv(0) v(T) = 0 y(T) = 1
T = 3
λy(0) λv(0)
One option is to grid the search space
35
Wrong values of (terminal conditions not met)
λy(0), λv(0) λy(0) = 1 λv(0) = 1
time t
0.5 1 1.5 2 2.5 3
y
0.1 0.2 0.3
time t
0.5 1 1.5 2 2.5 3
v
0.2
36
Right value of (terminal conditions met)
λy(0), λv(0)
time t
0.5 1 1.5 2 2.5 3
y
0.2 0.4 0.6 0.8 1 1.2
time t
0.5 1 1.5 2 2.5 3
v
0.1 0.2 0.3 0.4 0.5 0.6
λv(0) = 1.22 λy(0) = 0.62
37
constraints.
Summary After this lecture you should be able to:
terminal constraints.
A1
approach of the PMP . This approach was not mathematical sound.
This approach is mathematical sound, although here we only illustrate it, and do not go into depth. CT PMP CT DP DT PMP DT DP Discretization, step τ τ → 0 Taking the limit Optimal path and policy Stage decision problem CT control problem Optimal path and policy
A2
Control system
Cost functional to be minimized
Assumptions
x(t) ∈ Rn u ∈ U ⊂ Rm for all t0
˙ x(t) = f(x(t), u(t)) D ∈ Rn × U (x1, u), (x2, u) ∈ D
∃M |f(x1, u) − f(x2, u)| ≤ M|x1 − x2|
Z T g(x(t), u(t))dt + gT (x(T))
f, g, ∂f ∂x, ∂g ∂x
x0 = x(0)
x(T)
x(0) = x0
A3
Why is this problem hard?
another which is not easy in infinite dimension spaces (depends on the norm!):
*Think about a piecewise constant approximation with increasingly many bins, or of the Fourier series approximation which requires an infinite number of parameters.
a b t is a local minimum of if there exists such that for all such that we have (before ,now space of functions in ) ✏ > 0 ku u∗k < ✏ f(u∗) ≤ f(u) u u∗ close in the 2-norm not close in the 1-norm u u∗ [a, b] δ ku∗ uk0 = max
t∈[a,b] |u(t) u∗(t)|
ku∗ uk0 = qR b
a |u(t) u∗(t)|2dt
u∗ ∈ V u ∈ V f(u) : V → ∞ V ≡ Rn
A4
, if for all and all we have
J y η δJ|y : V → R α J(y + αη) = J(y) + δJ|y(η)α + O(α) J(y + η) = J(y) + δJ|y(η) + O(kηk)
at , if for all we have J y η δJ|y : V → R a b t J(y + η) J(y) y y + η η J(y) + δJ|y(η)
R
A5
know is that if then where is determined by the constraints in the problem (it is a free function in if there are no constraints). η δJ|y(η) = 0 δJ|y(η) = R b
a ∂ ∂yL(y(t))η(t)dt
J(y) = R b
a L(y(t))dt
η η [a, b]
A6
y(a) = y0 y(b) = y1
y : [a, b] 7! R
Among all differentiable functions satisfying find minima of t a b y0 y1 Branch of mathematics addressing the following problem: J(y) := R b
a L(x, y(x), y0(x))dx
For calculus of variations we will consider only functions taking values in although we could also consider R y : [a, b] → Rn
A7
Find a path between two points in a vertical plane such that a particle sliding without friction along this path takes the shortest possible time to travel from one point to the other. The initial kinetic and potential energy is zero, and we must have the following equation determining the velocity as a function of Then the total time is the integral of the arc-length over the velocity g = 9.8
x y a b yb
mv2 2
= mgy
y R b
a
√
1+(y0(x))2
√
2gy(x)
dx
A8
y(a) = y0 y(b) = y1
y : [a, b] 7! R
Among all differentiable functions satisfying find minima of
Calculus of variations Optimal control formulation
Let J(y) := R b
a L(x, y(x), y0(x))dx
u = y0, t = x, x = y Find continuous function that solves the optimal control problem u ˙ y(t) = u(t) J(u) = R b
a L(t, y(t), u(t))dt
is continuous and is twice continuously differentiable with respect to , .
calculus of variations and later for the more general optimal control problem.
Remarks L
u
u y
A9
Consider the class of perturbed trajectories such that The first variation can be concluded by expanding the cost as a first-order Taylor series w.r.t From which we conclude that the first variation is y(t) + αη(t) η(a) = η(b) = 0 α J(y + αη) = Z b
a
L(x, y(x) + αη(x), y0(x) + αη0(x))dx J(y+αη) = Z b
a
L(x, y(x), y0(x))+ ∂ ∂y L(x, y(x), y0(x))αη(x)+ ∂ ∂uL(x, y(x), y0(x))αη0(x)dx δJ|y(η) = Z b
a
∂ ∂y L(x, y(x), y0(x))η(x) + ∂ ∂uL(x, y(x), y0(x))η0(x)dx u(x) = y0(x)
A10
Integration by parts the last term in the integral u(x) = y0(x) η(a) = η(b) = 0 The last term is zero due to Setting the first variation to zero we get for this equation to hold for every disturbances we must have η ∂ ∂y L(x, y(x), y0(x)) = ∂2 ∂x∂uL(x, y(x), y0(x)) Z b
a
( ∂ ∂y L(x, y(x), y0(x)) − ∂2 ∂x∂uL(x, y(x), y0(x)))η(x)dx = 0 Euler Lagrange equation Z b
a
∂ ∂y L(x, y(x), y0(x))η(x)− ∂2 ∂x∂uL(x, y(x), y0(x))η(x)dx+ ∂ ∂uL(x, y(x), y0(x))η(x)|b
a
u(x) = y0(x)
A11
and this implies a perturbation for
not only must be given for multi-dimensional functions but must be compatible with the model (see Liberzon’s book, page 91).
is given next, using similar arguments to the ones used for calculus of variations. u(x) = y0(x) η(x) y(x) u(x) η0(x)
perturbation for u(x) δ(x) y(x) R x
a δ(s)ds
x = x∗ + αη η, ξ u = u∗ + αξ
˙ x(t) = f(x(t), u(t))
and therefore we need to introduce the co-state (“Lagrange multipliers”)
Z T g(x(t), u(t))dt + gT (x(T))
x(0) = x0
A12
Let us rewrite the cost as is an important function called the Hamiltonian. where is the co-state and λ(t) ∈ Rn The first variation (using again integration by parts) is (see Liberzon’s book) J(u) = Z T (g(x(t), u(t)) + λ(t)|(f(x(t), u(t)) − ˙ x(t))dt + gT (x(T)) = Z T H(x(t), u(t), λ(t)) − λ(t)| ˙ x(t)dt + gT (x(T)) H(x(t), u(t), λ(t)) = g(x(t), u(t)) + λ(t)|f(x(t), u(t)) x = x∗ + αη u = u∗ + αξ
δJ|u∗(ξ) = Z T (η|( ˙ λ+[ ∂ ∂xH(x∗, u∗, λ)]|)+ξ|[ ∂ ∂uH(x∗, u∗, λ)]|dt+η(T)|([ ∂ ∂x(T)gT (x∗(T))]|−λ(T))
A13
Setting the first variation to zero we conclude that it must only be zero if ˙ λ = − ∂
∂xH(x∗, u∗, λ)) ∂ ∂uH(x∗, u∗, λ) = 0
|
and the dynamic equation can be written as ˙ x∗ =
∂ ∂λH(x∗, u∗, λ∗)
|
where This is the PMP that we saw before! H(x(t), u(t), λ(t)) = g(x(t), u(t)) + λ(t)|f(x(t), u(t))
λ(T) = [ ∂ ∂x(T)gT (x∗(T))]|
A14
Let be an optimal path for the continuous-time optimal control problem. Then there exists a function , denoted by co-state, such that (u∗(t), x∗(t)) λ(t), t ∈ [0, T] ˙ x∗(t) = f(x∗(t), u∗(t)) ˙ λ(t) = −( ∂
∂xf(x∗(t), u∗(t)))|λ(t) − ( ∂ ∂xg(x∗(t), u∗(t)))| ∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0
(given) λ(T) =
∂ ∂xgT (x(T)∗)
x(0) = x0
A15
and therefore the Hamiltonian remains constant along optimal paths.
d dtH(x∗(t), u∗(t), λ(t)) = (∂H ∂x + d dtλ(t))f(x∗(t), u∗(t)) + ∂H ∂u d dtu = 0
the Pontryagin’s minimum principle.
would be a maximum, and therefore in some literature the nomenclature Pontryagin’s maximum principle is used.
∂ ∂uH(x∗(t), u∗(t), λ(t)) = 0
u(t) → H(x∗(t), u(t), λ(t))
more general terminal conditions (time is not necessary fixed, and terminal state can be fixed and under milder conditions (e.g. actuation may be discontinuous)