Optimal Control
4SC000 Q2 2017-2018 Duarte Antunes
Optimal Control 4SC000 Q2 2017-2018 Duarte Antunes Recap - - PowerPoint PPT Presentation
Optimal Control 4SC000 Q2 2017-2018 Duarte Antunes Recap Continuous-time optimal control problems Dynamic model x ( t ) = f ( x ( t ) , u ( t )) , x (0) = x 0 , t [0 , T ] Z T Cost g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0
4SC000 Q2 2017-2018 Duarte Antunes
1
˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T]
Z T g(x(t), u(t))dt + gT (x(T)) Dynamic model Cost
gT (x(T)) x(T) = ¯ x
2
(u∗(t), x∗(t)) ˙ x∗(t) = f(x∗(t), u∗(t)) ˙ λ(t) = −( ∂
∂xf(x∗(t), u∗(t)))|λ(t) − ( ∂ ∂xg(x∗(t), u∗(t)))| ∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0
λ(T) =
∂ ∂xgT (x∗(T))
State eq. Adjoint eq. Control eq.
(no terminal state constraints) (terminal constraints) ∃λ(t), t ∈ [0, T] is an optimal path candidate if s.t. x∗(0) = ¯ x0 x∗(0) = ¯ x0 x∗(T) = ¯ x
constrained, xi(T) = ¯ xi, i ∈ C
(bang-bang control)
3
.
terminal time problems. This requires a different mathematical framework in continuous-time. CT PMP CT DP DT PMP DT DP Discretization, step τ τ → 0 Taking the limit Optimal path and policy Stage decision problem CT control problem Optimal path and policy The goals for today are
4
Control system
Cost functional to be minimized
Assumptions
x(t) ∈ Rn u ∈ U ⊂ Rm for all
∃L x0 = x(t0) ˙ x(t) = f(x(t), u(t)) D ∈ Rn × U |f(x1, u) − f(x2, u)| ≤ L|x1 − x2| (x1, u), (x2, u) ∈ D
x(0) = x0 J(u) = R T
0 g(x(t), u(t))dt
f, g, ∂f
∂x, ∂g ∂x
t0 = 0 T xf := x(T)
5
can belong to some set.
and letting be the smallest time such that
S1 Rn T (T, xf) ∈ S S = {T} × {xf} T, xf S = {T} × Rn {T} × S1 S ⊂ [0, ∞) × Rn S = [0, ∞) × Rn S = [0, ∞) × {x1} S = [0, ∞) × S1
6
S1 = {x ∈ Rn : h1(x) = · · · = hn−k(x) = 0} hi : Rn → R
state problems ( ) but also fixed-time by considering time as a state Then, for this new problem
xn+1 = t ˙ x(t) ˙ xn+1(t)
f(x(t), u(t)) 1
S = [0, ∞) × {x1} x(0) xn+1(0)
x0
S = [0, ∞) × S1 × {t1} S = [0, ∞) × {x1} × {t1}
7
note that where
and so there is no loss of generality in our problem formulation. x(0) = x0 ˙ x(t) = f(x(t), u(t)) J(u) = R T
0 g(x(t), u(t))dt + gT (xf)
gT (xf) = gT (0) + R T
d dtgT (x(t))dt d dtgT (x(t)) = ∂ ∂xgT (x(t)) ˙
x(t) =
∂ ∂xgT (x(t))f(x(t), u(t))
J(u) = R T
0 ¯
g(x(t), u(t))dt + constant ¯ g(x, u) = g(x, u) +
∂ ∂xgT (x)f(x, u)
8
Let be an optimal control (in the global sense) and let be the corresponding optimal state trajectory. Then there exist a function and a constant satisfying for all and having the following properties: 1) and satisfy x∗ p∗ Consider an optimal control problem with target set . H : Rn × U × Rn × R → R with boundary conditions and , where is the Hamiltonian, defined as . 2) For each fixed , the function has a global maximum, i.e., hold for all and all . 3) for all 4) The vector is orthogonal to the tangent space to at t H(x∗(t), u∗(t), p∗(t), p∗
0) ≥ H(x∗(t), u, p∗(t), p∗ 0)
(p∗
0(t), p∗(t)) 6= (0, 0)
u → H(x∗(t), u, p∗(t), p∗
0)
u ∈ U H(x∗(t), u∗(t), p∗(t), p∗
0) = 0
S1 ˙ x∗ = ∂ ∂pH(x∗, u∗, p∗, p∗
0)
˙ p∗ = − ∂ ∂xH(x∗, u∗, p∗, p∗
0)
p∗
0 ≤ 0
S = [0, ∞) × S1
x∗ : [0, T] → Rn p∗ : [0, T] → Rn u∗ : [0, T] → U t ∈ [0, T] x∗(0) x∗(T) ∈ S1 H(x, u, p, p0) = p|f(x, u) + p0g(x, u) t ∈ [0, T] t ∈ [0, T] p∗(T) x∗(T) < p∗(T), d >= 0
∀d, i :< d, ∂ ∂xhi(x∗(T))| >= 0
9
local maximum or a local minimum cannot satisfy these conditions and thus these are necessary conditions for global optimality.
multiplied by , . This is just a convention.
degenerate cases which we will not address (in which case ).
which is obtained by multiplying by our previous definition p(t) = −λ(t) −1 p0 −1 p∗
0 = −1
p∗
0 = 0
H(x, u, p, −1) = p|f(x, u) − g(x, u) ˆ H(x, u, λ) = λ|f(x, u) + g(x, u) = −p|f(x, u) + g(x, u) = −H(x, u, p, −1) p(t) : [0, T] → Rn
10
for the state and the co-state that we have obtained before. In our previous notation
previous notation this would correspond to a minimum. If there are no input constraints and are differentiable with respect to (our previous setting) then we must have Thus condition 2 is more general and shall allow us to handle input constraints ˙ x∗ = ∂ ∂pH(x∗, u∗, p∗, p∗
0)
˙ p∗ = − ∂ ∂xH(x∗, u∗, p∗, p∗
0)
u → H(x∗(t), u, p∗(t), p∗
0)
˙ x∗ = f(x∗, u∗, p∗) ˙ λ∗ = − ∂ ∂xf(x∗, u∗, λ∗)|λ∗ − ∂ ∂xL(x∗, u∗, λ∗)| u ∂ ∂uH(x∗, u, p∗, p∗
0)|u=u∗ = 0
∂ ∂uf(x∗, u∗, p∗)|λ∗ + ∂ ∂ug(x∗, u∗, p∗)| = 0 (as in previous lectures) (as in previous lectures) f, g
11
where either or
the new problem, denoted by , is given by
Thus condition 3 is more general and shall allow us to handle free time problems for which the Hamiltonian must be zero. xn+1 = t ˙ x(t) ˙ xn+1(t)
f(x(t), u(t)) 1
˙ pn+1(t) = − ∂ ∂xn+1 ¯ H = 0 pn+1(t) ¯ H = 0 H ¯ H x(0) xn+1(0)
x0
S = [0, ∞) × {x1} × {t1} ¯ H = ⇥p| pn+1 ⇤ f 1
| {z }
H
+pn+1
12
free (as in previous lectures)
and
S1 = {x ∈ Rn : h1(x) = · · · = hn−k(x) = 0}
hi : Rn → R
d
h(x(T)) = x(T) − xf = 0 = ⇒ d = 0 = ⇒ p∗(T) p∗(T) = 0 < p∗(T), d >= 0 ∀d, i :< d, ∂ ∂xhi(x∗(T))| >= 0
13
where can be shown to satisfy condition 1 of the theorem.
(as in previous lectures) Thus condition 4 is more general. J(u) = R T
0 g(x(t), u(t))dt + gT (xf) =
R T
0 ¯
g(x(t), u(t))dt + constant
¯ g(x, u) = g(x, u) +
∂ ∂xgT (x)f(x, u)
p = ¯ p −
∂ ∂xgT (x)|
¯ H(x, p, u) = ¯ p|f(x, u) − (g(x, u) +
∂ ∂xgT (x)f(x, u))
¯ p(T) = 0 = ⇒ p(T) = − ∂
∂xgT (x(T))|
¯ g = p|f(x, u) − g(x, u)
The Pontryagin’s maximum principle was developed by the Pontryagin schools in the Soviet Union in the late 1950s. It was presented to the wider research community at the first IFAC World Congress in Moscow in 1960.
14
Lev Pontryagin (1908-1988) was a Soviet mathematician. He was born in Moscow and lost his eyesight due to a stove explosion when he was 14. Despite his blindness he was able to become one of the greatest mathematicians of the 20th century, partially with the help of his mother Tatyana Andreevna who read mathematical books and papers (source: wikipedia).
(bang-bang control)
15
The motion of a conservative system, from time to time is such that the integral has a stationary value (typically a minimum), where L = T(u, q) − V (q) T V q u = ˙ q Lagrangian of the system
Principle of least action
Kinetic energy of the system Potential energy of the system Generalized velocity vector Generalized coordinate vector (state of the system) T I = R T
0 L(u, q)dt
16
The Hamiltonian is then Consequently, the Euler-Lagrange equations are ˙ λ = −∂H ∂q = −∂L ∂q costate eq. control eq. state eq. u = ˙ q 0 = ∂H ∂u = ∂L ∂u + λ = 0 Combining the first two equations we obtain d dt(∂L ∂ ˙ q ) − ∂L ∂q = 0 which are the Lagrange’s equations of motion for a conservative system. H = L + λu
17
If is not an explicit function of time, the Hamiltonian is constant Interesting: nature governs the motion of bodies with optimal control! L H = L − ∂L ∂u u = T − V − ∂T ∂u u = const One can show that the kinetic energy satisfies: (think of ) T = 1 2mv2 ∂T ∂u u = 2T Hence, we have −H = T + V = const that is, the kinetic plus potential energy is constant during the motion.
(bang-bang control)
18
Find a path between two points in a vertical plane such that a particle sliding without friction along this path takes the shortest possible time to travel from one point to the other. x y a b yb
19 Johann Bernoulli posed the problem of the brachistochrone to the readers of Acta Eruditorum in June, 1696. Five mathematicians responded with solutions: Isaac Newton, Jakob Bernoulli (Johann's brother), Gottfried Leibniz, Ehrenfried Walther von Tschirnhaus and Guillaume de l'Hôpital. Four of the solutions (excluding l'Hôpital's) were published in the same edition of the journal as Johann Bernoulli's. In his paper Jakob Bernoulli showed that its solution is a cycloid. According to Newtonian scholar Tom Whiteside, Newton found the problem in his mail, in a letter from Johann Bernoulli, when he arrived home from the mint at 4 p.m., and stayed up all night to solve it and mailed the solution by the next post. This story gives some idea of Newton's power, since Johann Bernoulli took two weeks to solve it. In an attempt to outdo his brother, Jakob Bernoulli created a harder version of the brachistochrone problem. In solving it, he developed new methods that were refined by Leonhard Euler into what the latter called (in 1766) the calculus of variations. Joseph- Louis Lagrange did further work that resulted in modern infinitesimal calculus. (adapted from Wikipedia) Johann Bernoulli (1667 – 1748) was a Swiss mathematician, known for his contributions to infinitesimal calculus and educating Leonhard Euler. He was a professor of mathematics at the university of Groningen.
20
Let us relabel x → t
y → x
The initial kinetic and potential energy is zero, and we must have the following equation determining the velocity as a function of Then the total time is the integral of the arc-length over the velocity mv2 2 = mgx x Z b
a
p 1 + ( ˙ x(t))2 p 2gx(t) dt g = 9.8
21
Z b
a
p 1 + u(t)2 p 2gx(t) dt ˙ x(t) = u(t) Dynamic model Cost function Hamiltonian H(x, u, λ) = L(x, u) + λu L(x, u) Terminal constraints x(a) = 0 x(b) = xb Optimality conditions ∂H ∂u = 0 H = constant
p 1 + u∗(t)2 p 2gx∗(t) − u∗(t)2 p 1 + u∗(t)2p 2gx∗(t) = constant
22
The solutions to the differential equation ˙ x∗(t) = s C − x∗(t) x∗(t) are cycloids determined implicitly by t(θ) = a + c(θ − sin(θ)) x(θ) = c(1 − sin(θ)) These equations describe the curve traced by a point on a circle of radius as this circle rolls without slipping on the horizontal axis. c
(bang-bang control)
23
How to move a motion system described by a linear equation from point A to point B with minimum energy ˙ x(t) = Ax(t) + Bu(t) x(0) = x0 x(T) = xdesired
A B
minu R T
0 g(u(t))dt
Fx Fy y x
g(u)
is convex Assumption:
u ∈ R
24
H(x, u, λ) = g(u) + λ|(Ax + Bu) ∂ ∂uH(x, u, λ) = ∂ ∂ug(u) + λ|B = 0 ˙ λ = −[ ∂ ∂xH(x, u, λ)]| = −A|λ ˙ x = Ax + Bu u = g−1
u (−λ|B)
g−1
u
:= ∂ ∂ug(u)
is invertible since is convex
g λ(t) = e−A|tλ(0)
x(T) = xdesired
imposing we obtain a linear system from which we can
= g−1
u (−B|λ)
x(t) = eAtx(0) + Z t eA(t−s)Bg−1
u (−B|e−A|s)dsλ(0)
25
A = 1
1
g(u) = 1 4u4 x(0) =
1
solution: g−1
u (y) = sign(y)y1/3
sign(y) = 1 if y > 0 − 1 if y < 0 0 if y = 0
Substituting in the expression
1
0
1
x(0)+ Z 1 e
0
1
(1−s)
1
u (−
⇥ 1 ⇤ e
− 0
1
s
λ(0))ds
1
Z 1 1 − s 1
u (sλ1(0) − λ2(0))ds
λ1(0) = 2λ2(0) = 203.54
26
u
x2 = dp/dt
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 35807421218025641/3 heaviside(t - 1/2) (1 - 2 t)4/3)/262144 -...+ (3 35807421218025641/3 heaviside(t - 1) ((2 t - 1)4/3 - t 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1x1=p
0.2 0.4 0.6 0.8 1 35807421218025641/3 heaviside(t) ((3 (1 - 2 t)7/3)/14 - 3/14))/262144 -...+ (3 35807421218025641/3 heaviside(t - 1) ((3 (2 t - 1)7time t time t time t
27
Suppose that , the initial condition is zero and is large
˙ x(t) = Ax(t) + Bu(t) x(T) = xdesired
g(u) = 1 2u2 x(0) = 0 min Z T 1 2u(t)2dt T (T → ∞) Then is the identify, the optimal control input is g−1
u
λ(0) u(t) = −B|e−A|tλ(0) and can be obtained from x(T) = Z T eA(T −s)BB|e−A|sdsλ(0) = Z T eArBB|eA|rdre−A|T λ(0) r = T − s λ(0) = eA|T W(T)−1x(T) W(T) = Z T eArBB|eA|rdr
28
Z T 1 2u(t)2dt = 1 2x(T)|W(T)−1 Z T eA(T −r)BB|eA|(T −r)dr | {z }
W (T )
W(T)−1x(T) = 1 2x(T)|W(T)−1x(T) W(T) = Z T eArBB|eA|rdr If is Hurwitz and is controllable, we can take the limit A (A, B) (T → ∞)
which is the controllability gramian - solution to: W∞ = Z ∞ eArBB|eA|rdr 0 = AW∞ + W∞A| + BB| Optimal cost
(bang-bang control)
Consider the problem
min T ˙ x(t) = Ax(t) + Bu(t) t ≥ 0 x(0) = x0 x(T) = ¯ x (A, B)
subject to
x = 0
R T
0 1dt
−c ≤ ui(t) ≤ c i ∈ {1, . . . , m} u(t) ∈ Rm
29
Maximize
ˆ H(x(t), u(t), λ(t)) = λ(t)|(Ax(t) + Bu(t)) + 1 H(x(t), u(t), p(t), −1) = p(t)|(Ax(t) + Bu(t)) − 1 λ(t) = −p(t) ˙ λ = − ∂ ˆ
H ∂x
˙ x = ∂ ˆ
H ∂λ
terminal constraints ,
˙ x(t) = Ax(t) + Bu(t) x(0) = x0 x(T) = ¯ x λ(0), λ(T)
free the function
30
u → ˆ H(x(t), u(t), λ(t))
is minimized when In particular
ˆ H(x(t), u(t), λ(t)) = 0 t = T |λ(T)|B| = 1
u(t) = −signλ(t)|B = ( − 1 if λ(t)|B > 0 1 if λ(t)|B ≤ 0
u(t) = [−1, 1]
˙ λ(t) = −A|λ(t)
31
u(t) = −signλ(t)|B = ( − 1 if λ(t)|B > 0 1 if λ(t)|B ≤ 0
shows that the input always takes either the maximum or the minimum value.
maximum value until the state reaches the origin.
˙ x(t) = Ax(t) + Bu(t)
u(t) = −signλ(t)|B = ( − 1 if λ(t)|B > 0 1 if λ(t)|B ≤ 0
λ(0)
we obtain x(T) = 0
λ(0) ˙ λ(t) = −A|λ(t)
32
Suppose that we wish to bring an object with a given initial position and velocity to rest at position zero in minimum time subject to a constraint on the magnitude of the force min T = Z T 1dt −1 ≤ u(t) ≤ 1 t for all (x1(0), x2(0)) x1(T) = 0 x2(T) = 0
x1 = 0 x1
u
˙ x1(t) = x2(t) ˙ x2(t) = u(t) ˙ x(t) = Ax(t) + Bu(t) A = 1
1
33
If is an optimal control trajectory, must minimize the Hamiltonian for each , i.e., {u∗(t)|t ∈ [0, T]} u∗(t) t Therefore u∗(t) = argmin−1≤u≤1[1 + λ1(t)x∗
2(t) + λ2(t)u]
u∗(t) = ( 1 if λ2(t) < 0 −1 if λ2(t) ≥ 0 The equations describing the costate are for some constants . λ2(t) = c2 − c1t ˙ λ2(t) = −λ1(t) ˙ λ1(t) = 0 λ1(t) = c1 c1, c2 Thus there are only four possibilities. and we must have . |λ2(T)| = 1
34
u∗(t) = ( 1 if λ2(t) < 0 −1 if λ2(t) ≥ 0 λ2(t) = c2 − c1t T t λ2(t) T t λ2(t) T t λ2(t) T t λ2(t) T t −1 1 u∗(t) T t −1 1 u∗(t) T t −1 1 u∗(t) T t −1 1 u∗(t) |λ2(T)| = 1 A B C D
35
Possibilities A and C correspond to cases where the origin is achieved by setting the actuation constant ( or ) for a given initial condition
u(t) = 1 u(t) = −1 0 = eAT x0 + R T
0 eA(T −s)B(−1)ds
0 = eAT x0 + R T
0 eA(T −s)Bds
u(t) = −1 x1 x2 u(t) = 1
x0 = R 0
T eA(−s)B(−1)ds
x0 = R 0
T eA(−s)Bds
x0 = 1
2T 2
−T
− 1
2T 2
T
36
u(t) = ξ ξ ∈ {+1, −1} x1(t) = x1(0) + x2(0)t + ξ t2 2 x2(t) = x2(0) + ξt x1(t) − 1 2ξ (x2(t))2 = x1(0) − 1 2ξ (x2(0))2 = constant x1 x2 u(t) = 1 u(t) = −1 x1 x2
Switching curve
u(t) = −1 x1 x2 u(t) = 1
(x1(0), x2(0))
trajectories switch when they hit the switching curve For an initial condition not lying in this curve, we know (from PMP co-state) that the control input can only switch once.
37
How to move a (linearized model of a) quadcopter from one hovering position to another one in minimum time?
˙ x(t) = Ax(t) + Bu(t) x(0) = x0 x(T) = xdesired min T x0 xT
|u(t)| ≤ L
38
Linearized model (y-axis) Thrust Gravity ¨ y = 1 mT sin(θ) ¨ z = 1 mT cos(θ) − g T y z θ mg mass ¨ θ = 1 Iθ τθ x = ⇥ y ˙ y θ ˙ θ ⇤|
˙ x(t) = Ax(t) + Bu(t)
B = 1 u = τθ Iθ x(0) = ⇥0 0⇤| x(T) = ⇥¯ y 0⇤| |u(t)| ≤ L ≈ gθ T ≈ mg A = 1 g 1
39
u(t) = −L sign( a3 |{z}
− λ1(0)
6Iθ
t3 + a2 |{z}
λ2(0) 2Iθ
t2 + a1 |{z}
−λ3(0)
t + a0 |{z}
λ4(0)
) |u(t)| = −L sign(λ(t)|B) = −L sign(λ4(t)) Polynomial of order 3 has at most 3 roots so there are at most three sign changes in the optimal control input λ(t) = e−A|tλ(0) = 1 −t 1
gt2 2
−gt 1 − gt3
6 gt2 2
−t 1 λ1(0) λ2(0) λ3(0) λ4(0)
40
T
T1
T2 T3
T
T1
T2 T3
time time u(t) = −L sign(a3t3 + a2t2 + a1t + a0) Conclusion: search over is equivalent to search over switching times (and sign of the control input) λ(0) Ti T
T1
T2 T3
T
T1
T2 T3
time time
41
For u = τθ Iθ |u(t)| ≤ L Iθ = 0.002 L = 50 ¯ y = 1
t 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9u
d/dt3
3
T2 = 0.4705 T3 = 0.8032 T ∗ = 0.941 T1 = 0.1378 time t time t time t
42
t
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9v
0.5 1 1.5 2 2.52251799813685248))/5708990770823839524233143877797980545530986496 -...- (801431249760917353029788835347205 t
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9y
0.2 0.4 0.6 0.8 119200000000000 -...+ (483164605226563795468286475929144999878116613455 t heaviside(t - 1808631515319753/2251799813685248
time t time t
43
For u = τθ Iθ |u(t)| ≤ L Iθ = 0.002 ¯ y = 1 L = 100
t 0.1 0.2 0.3 0.4 0.5 0.6 0.7u
d/dt3
3
T2 = 0.39558 T3 = 0.6752 T ∗ = 0.7910 T1 = 0.1158 time t time t time t
44
t
0.1 0.2 0.3 0.4 0.5 0.6 0.7
v
0.5 1 1.5 2 2.5 3
heaviside(t - 8347098950128955/72057594037927936))/280608314367533360295107487881526339773939048251392 t
0.1 0.2 0.3 0.4 0.5 0.6 0.7y
0.2 0.4 0.6 0.8 119182358974289 heaviside(t - 791/2000))/9600000000000 -...- (245 t4 heaviside(t - 8347098950128955/72057594037927936
time t time t
45
is far from trivial
principle.
switching curves.
Summary
After this lecture you should be able to
dimensional state.