Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

optimal control and dynamic programming
SMART_READER_LITE
LIVE PREVIEW

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Static optimization approach to PMP Linear systems, quadratic cost, terminal constraints Shooting method Recap: continuous-time optimal control


slide-1
SLIDE 1

4SC000 Q2 2017-2018

Optimal Control and Dynamic Programming

Duarte Antunes

slide-2
SLIDE 2

Outline

  • Static optimization approach to PMP
  • Linear systems, quadratic cost, terminal

constraints

  • Shooting method
slide-3
SLIDE 3

1

Recap: continuous-time optimal control problem

Dynamic model Cost function The goal in this lecture is to find an optimal path using a new tool: Pontryagin’s maximum principle ˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T] Z T g(x(t), u(t))dt + gT (x(T))

(u(t), x(t))

slide-4
SLIDE 4

2

Recap

  • Today we will informally derive a simple version of the Pontryagin’s maximum principle

via the discretization approach using static optimization.

  • The direct approach (in continuous-time) is much more elaborate and is shortly

discussed in the appendix (calculus of variations) and in the next lecture. continuous-time approach discretization approach Discretization, step τ τ → 0 Taking the limit Optimal path and policy Stage decision problem CT control problem Optimal path and policy solve optimal control problem

slide-5
SLIDE 5

3

Recall discretization

Dynamic model Cost function Discretization times

discretization step τ kh = T tk = kτ xk+1 = xk + τf(xk, uk) xk = x(kτ) uk = u(kτ) ˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T] Z T g(x(t), u(t))dt + gT (x(T))

h−1

X

k=0

g(xk, uk)τ + gh(xh) gh(x) = gT (x), ∀x

slide-6
SLIDE 6

Method of the Lagrange multipliers

4

The Lagrangian is given by Then, the optimal solution (optimal path) must satisfy where λ = (λ1, . . . , λh) λi ∈ Rn ∂L(x, u, λ) ∂xk = 0 ∂L(x, u, λ) ∂uk = 0 ∂L(x, u, λ) ∂λk = 0 k ∈ {0, . . . , h − 1} k ∈ {1, . . . , h} k ∈ {1, . . . , h} x = (x1, x2, . . . , xh) u = (u0, u1, . . . , uh−1) L(x, u, λ) =

h−1

X

k=0

g(xk, uk)τ + gh(xh) +

h−1

X

k=0

λ|

k+1(xk + τf(xk, uk) − xk+1)

slide-7
SLIDE 7

Recall dimensions

5

Variables

xk =     x1,k x2,k . . . xn,k     xi,k ∈ R ui,k ∈ R uk =     u1,k u2,k . . . um,k     λk =     λ1,k λ2,k . . . λn,k     λi,k ∈ R

Functions

fk : Rn × Rm → Rn gk : Rn × Rm → R gh : Rn → R fk(xk, uk) =      f1,k(xk, uk) f2,k(xk, uk) . . . fn,k(xk, uk)     

Derivatives

∂ ∂xh gh(xh) =

h

∂ ∂x1,h gh(xh) ∂ ∂x2,h gh(xh)

. . .

∂ ∂xn,h gh(xh)

i

For the problem

xk+1 = fk(xk, uk) Ph−1

k=0 gk(xk, uk) + gh(xh)

slide-8
SLIDE 8

Recall dimensions

6

∂ ∂xk fk(xk, uk) =

     

∂ ∂x1,k f1,k(xk, uk) ∂ ∂x2,k f1,k(xk, uk)

. . .

∂ ∂xn,k f1,k(xk, uk) ∂ ∂x1,k f2,k(xk, uk) ∂ ∂x2,k f2,k(xk, uk)

. . .

∂ ∂xn,k f2,k(xk, uk)

. . . . . . . . . . . .

∂ ∂x1,k fn,k(xk, uk) ∂ ∂x2,k fn,k(xk, uk)

. . .

∂ ∂xn,k fn,k(xk, uk)

     

∂ ∂uk fk(xk, uk) =

     

∂ ∂u1,k f1,k(xk, uk) ∂ ∂u2,k f1,k(xk, uk)

. . .

∂ ∂um,k f1,k(xk, uk) ∂ ∂u1,k f2,k(xk, uk) ∂ ∂u2,k f2,k(xk, uk)

. . .

∂ ∂um,k f2,k(xk, uk)

. . . . . . . . . . . .

∂ ∂u1,k fn,k(xk, uk) ∂ ∂u2,k fn,k(xk, uk)

. . .

∂ ∂um,k fn,k(xk, uk)

     

∂ ∂uk gk(xk, uk) =

h

∂ ∂u1,k gk(xk, uk) ∂ ∂u2,k gk(xk, uk)

. . .

∂ ∂um,k gk(xk, uk)

i

∂ ∂xk gk(xk, uk) =

h

∂ ∂x1,k gk(xk, uk) ∂ ∂x2,k gk(xk, uk)

. . .

∂ ∂xn,k gk(xk, uk)

i

Derivatives

∈ Rn×n ∈ Rn×m ∈ R1×m ∈ R1×n

slide-9
SLIDE 9

Optimality conditions

7

∂L(x, u, λ) ∂xk = 0 ∂L(x, u, λ) ∂uk = 0 ∂L(x, u, λ) ∂λk = 0 ∂L(x, u, λ) ∂xh = 0

k ∈ {1, . . . , h − 1}

k ∈ {0, . . . , h − 1} ∂ ∂uk g(xk, uk)τ + λ|

k+1

∂ ∂uk f(xk, uk)τ = 0 k ∈ {1, . . . , h} xk − xk−1 τ = f(xk−1, uk−1) xk = xk−1 + τf(xk−1, uk−1) ∂ ∂xk g(xk, uk)τ + λ|

k+1(I +

∂ ∂xk f(xk, uk)τ) − λk = 0 ∂ ∂xk g(xk, uk) + λ|

k+1( ∂

∂xk f(xk, uk)) = −(λ|

k+1 − λ| k

τ )

∂ ∂xh gh(xh) − λ| h = 0

slide-10
SLIDE 10

8

Taking the limit τ → 0

Assuming that (wishful thinking....), as , converges to a continuously differentiable function, then

τ → 0

Let ¯ λ(t) = λk, t ∈ [kτ, (k + 1)τ) ¯ λ(t) Moreover, naturally and we also have xk − xk−1 τ = f(xk−1, uk−1) ˙ x(t) = f(x(t), u(t)) τ → 0 ˙ ¯ λ(t) = −( ∂f

∂x)|¯

λ(t) − ( ∂g

∂x)|

∂ ∂xk g(xk, uk) + λ|

k+1( ∂

∂xk f(xk, uk)) = −(λ|

k+1 − λ| k

τ )

τ → 0 ∂ ∂uk g(xk, uk)τ + λ|

k+1

∂ ∂uk f(xk, uk)τ = 0 τ → 0

∂ ∂ug(x(t), u(t)) + ¯

λ(t)| ∂

∂uf(x(t), u(t)) = 0

¯ λ(T) =

∂ ∂xgT (x(T))|

τ → 0 ∂ ∂xh gT (xh) − λ|

h = 0

slide-11
SLIDE 11

9

Pontryagin’s maximum principle

If is an optimal path for the continuous-time optimal control problem, then there exists a function , denoted by co-state, such that (u∗(t), x∗(t)) (no constraints state and no input constraints, free terminal state) λ(t), t ∈ [0, T] ˙ x∗(t) = f(x∗(t), u∗(t)) ˙ λ(t) = −( ∂

∂xf(x∗(t), u∗(t)))|λ(t) − ( ∂ ∂xg(x∗(t), u∗(t)))| ∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0

x(0) = ¯ x0 (given) (terminal constraint for the co-state) λ(T) =

∂ ∂xgT (x∗(T))|

slide-12
SLIDE 12

10

Discussion

  • The previous result is a special case of the Pontryagin’s maximum principle.
  • The formal proof of the Pontryagin’s maximum principle is very elaborate

and uses arguments radically different from the intuitive arguments that we have used.

  • However, the intuition provided from static optimization is very useful to

reason about the conditions appearing in the theorem.

  • For example, consider the following problem

˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T]

x(T) = ¯ xf

∂ ∂xh gT (xh) − λ|

h = 0

Following the discretization + static optimization approach, we obtain the same necessary equations for optimality except since the terminal state is now constant. In fact, the next result holds.

min

u

Z T g(x(t), u(t))dt

slide-13
SLIDE 13

11

Pontryagin’s maximum principle

If is an optimal path for the continuous-time optimal control problem with terminal constraint . , then there exists a function such that (u∗(t), x∗(t)) (no constraints state and no input constraints, constrained terminal state) λ(t), t ∈ [0, T] ˙ x∗(t) = f(x∗(t), u∗(t)) ˙ λ(t) = −( ∂

∂xf(x∗(t), u∗(t)))|λ(t) − ( ∂ ∂xg(x∗(t), u∗(t)))| ∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0

x(0) = ¯ x0 (given) Note that contrarily to the previous case there is no constraint on the terminal value of the co-state.

x(T) = ¯ xf x(T) = ¯ xf

slide-14
SLIDE 14

12

Example

Consider a problem similar to a linear quadratic regulation problem for a scalar system but where the additive control input is a nonlinear function ˙ x(t) = ax(t) + `(u(t)) ` PMP equations ˙ λ(t) = −aλ(t) − qx(t) ru(t) + λ(t) d`(u(t))

du

= 0 min 1

2(

R T

0 qx(t)2 + ru(t)2dt + gT x(T)2)

˙ x(t) = ax(t) + `(u(t)) Boundary conditions x(0) = 1 x(0) = 1 λ(T) = gT x(T)

slide-15
SLIDE 15

13

Example

If q = 0 r = 1 gT = 1 `(u) = − log(u) T = 1 u(t) − λ(t)

1 u(t) = 0

x(0) = 1 λ(1) = x(1)

*only the positive root makes sense *

If we integrate the state equation (1) from zero to (variation of constants formula) we can obtain the value of (1) T = 1 x(1) ˙ x(t) = −x(t) − log(u(t)) ˙ λ(t) = λ(t) λ(t) = et−1x(1) u(t) = e

t−1 2 p

x(1) ˙ x(t) = −x(t) − (t−1)

2

− log( p x(1)) ˙ x(t) = −x(t) − (t−1)

2

− 1

2 log(x(1))

x(1) = e−1 x(0) |{z}

1

+ R 1

0 e−(1−s)( −s 2 + 1 2(1 − log(x(1)))ds

Replacing in the formulas above we get the optimal path a = −1 x(1) = 1

2(1 − log(x(1))(1 − 1 e))

x(1) = 0.5215

slide-16
SLIDE 16

14

Discussion

  • To solve the PMP equations one should start by expressing as a function,

say of

∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0

which is sometimes called the control equation. Note that in fact these are equations and, at least implicitly, we can write as a function of u x, λ m

  • Then one must integrate differential equations

2n u u = h(x, λ) ˙ x(t) = f(x(t), h(x(t), λ(t))) x, λ ˙ λ(t) = −( ∂

∂xf(x(t), h(x(t), λ(t)))|λ(t) − ( ∂ ∂xg(x(t), h(x(t), λ(t)))|

with a known initial condition . The latter eq. is also known as adjoint eq. x(0) t ≥ 0

  • However the initial condition of the co-state is unknown !

λ(0)

slide-17
SLIDE 17

15

Discussion

  • It is also possible that only the terminal value of some variables are

constrained, in which case are free and the terminal conditions of the PMP hold only for the remaining co-states var. λj(T) =

∂ ∂xj gT (x(T)), j /

∈ C λi xi(T) = ¯ xi, i ∈ C An example using this fact will be discussed later

  • Some certain examples (e.g. toy example just discussed and linear systems,

quadratic cost) it is possible to solve these equations as a function of and then pick to satisfy terminal conditions

λ(0) λ(0) x(T) = ¯ xf or λ(T) =

∂ ∂xgT (x(T)∗)

  • However, for most applications this is not possible and one must resort to

a numerical method (e.g. shooting method discussed later, which actually tries to guess and see if the terminal conditions are met).

λ(0)

slide-18
SLIDE 18

Summary

16

(u∗(t), x∗(t)) ˙ x∗(t) = f(x∗(t), u∗(t)) ˙ λ(t) = −( ∂

∂xf(x∗(t), u∗(t)))|λ(t) − ( ∂ ∂xg(x∗(t), u∗(t)))| ∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0

λ(T) =

∂ ∂xgT (x∗(T))

State eq. Adjoint eq. Control eq. The boundary conditions depend on the constraints on the terminal state: ∃λ(t), t ∈ [0, T] is an optimal path candidate if s.t. x∗(0) = ¯ x0 x∗(0) = ¯ x0 x∗(T) = ¯ x x∗(0) = ¯ x0 x∗

i (T) = ¯

xi, i ∈ C λ∗

j(T) = ∂ ∂xj gT (x∗(T)), j /

∈ C (no terminal state constraints) (terminal state fully constrained) (only some components of the state are constrained)

slide-19
SLIDE 19

17

Hamiltonian

If we define the Hamiltonian the conditions of the PMP take the following elegant form Moreover, and therefore the Hamiltonian remains constant along optimal paths! H(x, u, λ) = g(x, u) + λ|f(x, u) ˙ λ(t) = − ∂ ∂xH(x∗(t), u∗(t), λ(t))| ˙ x∗(t) = ∂ ∂λH(x∗(t), u∗(t), λ(t))| ∂ ∂uH(x∗(t), u∗(t), λ(t)) = 0 d dtH(x∗(t), u∗(t), λ(t)) = (∂H ∂x + d dtλ(t))f(x∗(t), u∗(t)) + ∂H ∂u d dtu = 0

slide-20
SLIDE 20

18

Discussion

  • The condition indicates that the

function

  • In fact one can prove that such stationary point is a minimum and thus

this is often called the Pontryagin’s minimum principle.

  • A slightly different definition of the Hamiltonian would entail that this

stationary point would be a maximum, and therefore in some literature the nomenclature Pontryagin’s maximum principle is used.

  • Since the nomenclature Pontryagin’s maximum principle is more

common we will use it in the course. has a stationary point as a function of the control input when we fix the state and the co-state of the optimal path. ∂ ∂uH(x∗(t), u∗(t), λ(t)) = 0 u(t) → H(x∗(t), u(t), λ(t))

slide-21
SLIDE 21

Outline

  • Static optimization approach to PMP
  • Linear systems, quadratic cost, terminal

constraints

  • Shooting method
slide-22
SLIDE 22

19

Linear systems, quadratic cost

Dynamic model Cost function ˙ x(t) = Ax(t) + Bu(t) x(0) = x0 PMP necessary conditions for optimality 1 2(x(T)|QT x(T) + Z T (x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt) ˙ λ(t) = −A|λ(t) − (Qx(t) + S|u(t)) λ(T) = QT x(T) u(t) = −R−1(B|λ(t) + S|x(t)) B|λ(t) + S|x(t) + u(t)|R = 0 ˙ x(t) = Ax(t) + Bu(t) x(0) = x0

slide-23
SLIDE 23

20

Linear systems, quadratic cost

λ(T) = QT x(T) x(0) x(T) Given a continuous-time optimal control problem with linear dynamic model and quadratic cost: u(t) = −R−1(B|λ(t) + S|x(t))

  • 1. Write the linear differential equations characterizing the optimal state and co-state
  • 1I. Impose the boundary conditions:

known and one of the following:

  • I1I. Compute the optimal state, co-state, and input

H

|{z}

 x(T) λ(T)

  • = eHT

 x(0) λ(0)

  • λ(0)

 x(t) λ(t)

  • = eHt

 x(0) λ(0)

  • t ∈ [0, T]

 ˙ x(t) ˙ λ(t)

  • =

 A − BR−1S| −BR−1B| −(Q − SR−1S|) −(A| − SR−1B|)  x(t) λ(t)

  • known

(i) (ii) (iii) xi(T), i ∈ C λj(T) =

∂ ∂xj QT x(T), j /

∈ C known This leads to a linear system with 2n equation and 2n unknowns, which allows to obtain

slide-24
SLIDE 24

21

Example

+ −

R C

+ −

u How to charge the capacitor in a RC circuit with minimum energy loss in the resistor?

i x

˙ x(t) =

1 RC (u(t) − x(t))

Let us consider R = C = T = xdesired = 1 x(T) = xdesired x(0) = 0 min

u(t)

Z T (x(t) − u(t))2 R dt

slide-25
SLIDE 25

22

Charging a capacitor

H(x, u, λ) = (x − u)2 + λ(−x + u)

∂ ∂uH(x, u, λ) = 0

−2(x − u) + λ = 0 u = x − λ

2

˙ λ = − ∂

∂xH

˙ λ = −2(x − u) + λ ˙ λ = 0 ˙ x = −x + u = − λ(0)

2

x(t) = − λ(0)

2 t + x(0)

|{z}

=0

Hamiltonian PMP equations Impose boundary conditions

1 = x(1) = − λ(0)

2

λ(0) = −2

Optimal solution (as derived in previous lecture)

u(t) = 1 + t x(t) = t

slide-26
SLIDE 26

23

Moving a crane

How to move a crane from rest at point A to rest at point B in a fixed amount of time with minimum energy ?

(x, θ, ˙ x, ˙ θ) = (xA, 0, 0, 0) (x, θ, ˙ x, ˙ θ) = (xB, 0, 0, 0) T

min R T

0 u(t)2dt

slide-27
SLIDE 27

24

Matlab code

m = 0.2; M = 1; b = 0.05; I = 0.01; g = 9.8; l = 0.5; p = (I+m*l^2)*(M+m)-m^2*l^2; Ac = [0 1 0 0; 0 -(I+m*l^2)*b/p (m^2*g*l^2)/p 0; 0 0 0 1; 0 (m*l*b)/p -m*g*l*(M+m)/p 0]; Bc = [ 0; (I+m*l^2)/p; 0;

  • m*l/p];

Qc = zeros(4,4); Rc = 1; n = 4; T = 1; x0 = [0 0 0 0]'; xf = [1 0 0 0]'; % 1 Hamiltonian H = expm( [Ac -Bc*inv(Rc)*Bc';-Qc -Ac']*T); H11 = H(1:n,1:n); H12 = H(1:n,n+1:2*n); H21 = H(n+1:2*n,1:n); H22 = H(n+1:2*n,n+1:2*n); % 2 obtain lambda0 lambda0 = H12\(xf-H11*x0); % 3 obtain x, lambda, u at times ktau tau = 0.01; N = round(T/tau); x = zeros(4,N+1); lambda = zeros(4,N+1); u = zeros(1,N+1); for k = 1:N+1 XL = expm((k-1)*[Ac -Bc*inv(Rc)*Bc';-Qc -Ac']*tau)*[x0; lambda0]; x(:,k) = XL(1:4); lambda(:,k) = XL(5:8); u(:,k) = -inv(Rc)*Bc'*lambda(:,k); end plot((0:N)*tau,u), xlabel('t'), ylabel('u'), grid on, set(gca,'Fontsize',16) figure, plot((0:N)*tau,x(3,:)), xlabel('t'), ylabel('\theta'), grid on, set(gca,'Fontsize',16) figure, plot((0:N)*tau,x(1,:)), xlabel('t'), ylabel('x'), grid on, set(gca,'Fontsize',16)

slide-28
SLIDE 28

25

Results

t

0.2 0.4 0.6 0.8 1

x

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

t

0.2 0.4 0.6 0.8 1

3

  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 0.8

t

0.2 0.4 0.6 0.8 1

u

  • 50
  • 40
  • 30
  • 20
  • 10

10 20 30 40 50

xA = 0, xB = 1, T = 1

slide-29
SLIDE 29

26

Results

t

0.5 1 1.5 2

x

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

t

0.5 1 1.5 2

3

  • 0.25
  • 0.2
  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2 0.25

t

0.5 1 1.5 2

u

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

xA = 0, xB = 1, T = 2

slide-30
SLIDE 30

27

Results

xA = 0, xB = 1, T = 10

t

2 4 6 8 10

x

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

t

2 4 6 8 10

3

  • 0.015
  • 0.01
  • 0.005

0.005 0.01 0.015

t

2 4 6 8 10

u

  • 0.08
  • 0.06
  • 0.04
  • 0.02

0.02 0.04 0.06 0.08

slide-31
SLIDE 31

Outline

  • Static optimization approach to PMP
  • Linear systems, quadratic cost, terminal

constraints

  • Shooting method
slide-32
SLIDE 32

28

Discussion

  • The crucial step is to integrate differential equations

2n ˙ x(t) = f(x(t), h(x(t), λ(t))) ˙ λ(t) = −( ∂

∂xf(x(t), h(x(t), λ(t)))|λ(t) − ( ∂ ∂xg(x(t), h(x(t), λ(t)))|

with a known initial condition , but unknown x(0) t ≥ 0

  • We have seen some cases where we could solve this as a function of

and then obtain by imposing terminal the boundary conditions λ(0) λ(0) λ(0)

x(T) = ¯ xf or λ(T) =

∂ ∂xgT (x(T)∗)

  • However, the number of applications where we solve explicitly the

differential equations is small, and we in general need numerical methods.

  • We present next the shooting method which simply tries to guess ,

and will allow us solve some problems where is small. λ(0) n

  • To know more about numerical methods to solve the PMP equations see

[14] John Betts, Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, SIAM, 2010

slide-33
SLIDE 33

29

Shooting method

Main idea ˙ x(t) = f(x(t), h(x(t), λ(t))) ˙ λ(t) = −( ∂

∂xf(x(t), h(x(t), λ(t)))|λ(t) − ( ∂ ∂xg(x(t), h(x(t), λ(t)))|

Problem: find such that (or other bound. cond. ) is met after integrating: x(0)

x(T) = ¯ xf

λ(0) λ(1)(0) λ(1)(0) λ(1)(0) with initial conditions and x(0) λ(0) time time t t T T ¯ xf from these “shots” pick , since it satisfies λ(1)(0)

x(T) = ¯ xf

slide-34
SLIDE 34

Example

30

(solved next, a different solution for a similar problem is available in Bryson&Ho’s book, Sec 2.4) Consider a particle of mass acted upon by a thrust force of magnitude Suppose that the initial position of the particle at time is and the initial velocity is zero. We wish to transfer the particle to a path parallel to the x-axis, a distance away, in a given time , arriving with the maximum value of . We do not care what the final velocity . Suppose that and .

m m

˙ u = cos(β) ˙ v = sin(β) ˙ x = u ˙ y = v t = 0 (y(0), x(0)) = (0, 0) h T h = 1 T = 3 and described by the following equations, where the angle determines the thrust direction β(t) u(T)

x(T)

u(T)

slide-35
SLIDE 35

31

Formulation

Notation for PDP

˙ x = f(x, u) x =     x y u v     u = β f(x, u) =     u v cos(β) sin(β)     min −x(T) | {z }

gT (x(T ))

g(x, u) = 0

Optimal control problem Hamiltonian

H(x, u, λ) = λxu + λyv + λu cos(β) + λv sin(β)

Co-state

λ = ⇥λx λy λu λv ⇤|

Terminal conditions (only two terminal states are specified - two additional constraints on the terminal co-states associated with the unrestricted states)

y(T) = h v(T) = 0 λx(T) = ∂gT (x(T)) ∂x = −1 λu(T) = ∂gT (x(T)) ∂u = 0

slide-36
SLIDE 36

32

Applying PMP

Note that

  • 2. Write adjoint and state equations and replace the expression for the control input

˙ λ(t) = −[ ∂ ∂xH]| = − h

∂ ∂xH ∂ ∂yH ∂ ∂uH ∂ ∂vH

i|     ˙ λx(t) ˙ λy(t) ˙ λu(t) ˙ λv(t)     =     −λx(t) −λy(t)         ˙ x(t) ˙ y(t) ˙ u(t) ˙ v(t)     =      u(t) v(t) cos(arctan( λv(t)

λu(t)))

sin(arctan( λv(t)

λ(t) ))

     =        u(t) v(t)

1 q 1+( λv(t)

λu(t) )2 λv(t) λu(t)

q 1+( λv(t)

λu(t) )2

      

cos(arctan(x)) = 1 √ 1 + x2 , sin(arctan(x)) = x √ 1 + x2

  • 1. Control eq. - express the control input in terms of the state and co-state

∂H ∂u (x, u, λ) = 0 −λu sin(β) + λv cos(β) = 0 β = arctan(λv λu )

slide-37
SLIDE 37

33

Shooting method

Moreover, note that and then Therefore we just need to search for to satisfy A direct application of the shooting method would lead us to search in a four-dimensional space (initial condition of co-state). However, we already know that:

˙ λu(t) = −λx(t), λu(3) = 0 ˙ λx(t) = 0

λx(T) = −1

λx(0) = −1 λy(0), λv(0) y(T) = h v(T) = 0 λv(t) = λv(0) − λy(0)t ˙ v(t) =

λv(0)−λy(0)t t−3

q 1 + ( λv(0)−λy(0)t

t−3

)2 ˙ y(t) = v(t) λu(t) = t − 3 λu(0) = −3

slide-38
SLIDE 38

34

Shooting method

˙ v(t) =

λv(0)−λy(0)t t−3

q 1 + ( λv(0)−λy(0)t

t−3

)2 ˙ y(t) = v(t)

Shooting method boils down to find to satisfy after integrating from 0 to

λy(0), λv(0) v(T) = 0 y(T) = 1

T = 3

λy(0) λv(0)

One option is to grid the search space

slide-39
SLIDE 39

35

Shooting method

Wrong values of (terminal conditions not met)

λy(0), λv(0) λy(0) = 1 λv(0) = 1

time t

0.5 1 1.5 2 2.5 3

y

  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.1 0.2 0.3

time t

0.5 1 1.5 2 2.5 3

v

  • 1.2
  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2

slide-40
SLIDE 40

36

Shooting method

Right value of (terminal conditions met)

λy(0), λv(0)

time t

0.5 1 1.5 2 2.5 3

y

0.2 0.4 0.6 0.8 1 1.2

time t

0.5 1 1.5 2 2.5 3

v

  • 0.1

0.1 0.2 0.3 0.4 0.5 0.6

λv(0) = 1.22 λy(0) = 0.62

slide-41
SLIDE 41

37

Concluding remarks

  • We have informally derived a simple version of PMP using static optimization.
  • We have seen how to solve continuous-time optimal control problems with terminal

constraints.

  • The shooting method is a simple numerical method to solve low order PMP problems

Summary After this lecture you should be able to:

  • Apply PMP to solve continuous-time problems with quadratic cost, linear dynamics and

terminal constraints.

  • Apply PMP to non-linear problems and use the shooting method.
slide-42
SLIDE 42

A1

Discussion

  • This lecture we have used an informal discretization approach to derive a simplified

approach of the PMP . This approach was not mathematical sound.

  • In the appendix we shortly show how to address the direct approach (in continuous-time).

This approach is mathematical sound, although here we only illustrate it, and do not go into depth. CT PMP CT DP DT PMP DT DP Discretization, step τ τ → 0 Taking the limit Optimal path and policy Stage decision problem CT control problem Optimal path and policy

slide-43
SLIDE 43

A2

Problem formulation

Control system

  • Final state can be free or fixed.

Cost functional to be minimized

  • Initial time and initial state are fixed.

Assumptions

x(t) ∈ Rn u ∈ U ⊂ Rm for all t0

  • , control input may be constrained
  • Lipschitz property: for every bounded set , s.t.

˙ x(t) = f(x(t), u(t)) D ∈ Rn × U (x1, u), (x2, u) ∈ D

  • are continuous.

∃M |f(x1, u) − f(x2, u)| ≤ M|x1 − x2|

Z T g(x(t), u(t))dt + gT (x(T))

f, g, ∂f ∂x, ∂g ∂x

x0 = x(0)

x(T)

x(0) = x0

slide-44
SLIDE 44

A3

Why is this problem hard?

  • To parameterize a function we need an infinite number of parameters*.
  • To define the minimum we need to define what we mean by a function being close to

another which is not easy in infinite dimension spaces (depends on the norm!):

*Think about a piecewise constant approximation with increasingly many bins, or of the Fourier series approximation which requires an infinite number of parameters.

Infinite dimensional optimization

a b t is a local minimum of if there exists such that for all such that we have (before ,now space of functions in ) ✏ > 0 ku u∗k < ✏ f(u∗) ≤ f(u) u u∗ close in the 2-norm not close in the 1-norm u u∗ [a, b] δ ku∗ uk0 = max

t∈[a,b] |u(t) u∗(t)|

ku∗ uk0 = qR b

a |u(t) u∗(t)|2dt

u∗ ∈ V u ∈ V f(u) : V → ∞ V ≡ Rn

slide-45
SLIDE 45

A4

  • There is not a unique definition of “gradient” (first variation) in infinite dimensional spaces
  • (Gateaux derivative): A linear functional is called the first variation of at

, if for all and all we have

First variation

J y η δJ|y : V → R α J(y + αη) = J(y) + δJ|y(η)α + O(α) J(y + η) = J(y) + δJ|y(η) + O(kηk)

  • (Fréchet derivative): A linear functional is called the first variation of

at , if for all we have J y η δJ|y : V → R a b t J(y + η) J(y) y y + η η J(y) + δJ|y(η)

R

slide-46
SLIDE 46

A5

Sufficient optimality condition

  • Using similar arguments as for finite dimensional spaces, one can conclude that a first-
  • rder necessary condition for optimality is: for all admissible perturbations , we must have
  • Reasonings for constrained optimization are similar.
  • The difficulty here is obtaining an expression for the first variation and for admissible .
  • However, for optimal control problem it is not difficult and the key fact that we need to

know is that if then where is determined by the constraints in the problem (it is a free function in if there are no constraints). η δJ|y(η) = 0 δJ|y(η) = R b

a ∂ ∂yL(y(t))η(t)dt

J(y) = R b

a L(y(t))dt

η η [a, b]

slide-47
SLIDE 47

A6

Calculus of variations

y(a) = y0 y(b) = y1

y : [a, b] 7! R

Among all differentiable functions satisfying find minima of t a b y0 y1 Branch of mathematics addressing the following problem: J(y) := R b

a L(x, y(x), y0(x))dx

For calculus of variations we will consider only functions taking values in although we could also consider R y : [a, b] → Rn

slide-48
SLIDE 48

A7

Example: Brachistochrone

Find a path between two points in a vertical plane such that a particle sliding without friction along this path takes the shortest possible time to travel from one point to the other. The initial kinetic and potential energy is zero, and we must have the following equation determining the velocity as a function of Then the total time is the integral of the arc-length over the velocity g = 9.8

x y a b yb

mv2 2

= mgy

y R b

a

1+(y0(x))2

2gy(x)

dx

slide-49
SLIDE 49

A8

Calculus of variations and optimal control

y(a) = y0 y(b) = y1

y : [a, b] 7! R

Among all differentiable functions satisfying find minima of

Calculus of variations Optimal control formulation

Let J(y) := R b

a L(x, y(x), y0(x))dx

u = y0, t = x, x = y Find continuous function that solves the optimal control problem u ˙ y(t) = u(t) J(u) = R b

a L(t, y(t), u(t))dt

  • We need stronger assumptions than the ones we started with, namely the control

is continuous and is twice continuously differentiable with respect to , .

  • We will use this trick to solve the Brachistochrone problem with optimal control.
  • We provide next necessary conditions of optimality (based on the gradient) for

calculus of variations and later for the more general optimal control problem.

Remarks L

u

u y

slide-50
SLIDE 50

A9

Necessary conditions for optimality

Consider the class of perturbed trajectories such that The first variation can be concluded by expanding the cost as a first-order Taylor series w.r.t From which we conclude that the first variation is y(t) + αη(t) η(a) = η(b) = 0 α J(y + αη) = Z b

a

L(x, y(x) + αη(x), y0(x) + αη0(x))dx J(y+αη) = Z b

a

L(x, y(x), y0(x))+ ∂ ∂y L(x, y(x), y0(x))αη(x)+ ∂ ∂uL(x, y(x), y0(x))αη0(x)dx δJ|y(η) = Z b

a

∂ ∂y L(x, y(x), y0(x))η(x) + ∂ ∂uL(x, y(x), y0(x))η0(x)dx u(x) = y0(x)

slide-51
SLIDE 51

A10

Necessary conditions for optimality

Integration by parts the last term in the integral u(x) = y0(x) η(a) = η(b) = 0 The last term is zero due to Setting the first variation to zero we get for this equation to hold for every disturbances we must have η ∂ ∂y L(x, y(x), y0(x)) = ∂2 ∂x∂uL(x, y(x), y0(x)) Z b

a

( ∂ ∂y L(x, y(x), y0(x)) − ∂2 ∂x∂uL(x, y(x), y0(x)))η(x)dx = 0 Euler Lagrange equation Z b

a

∂ ∂y L(x, y(x), y0(x))η(x)− ∂2 ∂x∂uL(x, y(x), y0(x))η(x)dx+ ∂ ∂uL(x, y(x), y0(x))η(x)|b

a

u(x) = y0(x)

slide-52
SLIDE 52

A11

Discussion

  • Since for calculus of variations we considered perturbation for

and this implies a perturbation for

  • However, for optimal control problems the perturbations for

not only must be given for multi-dimensional functions but must be compatible with the model (see Liberzon’s book, page 91).

  • The derivation for the optimal control problem

is given next, using similar arguments to the ones used for calculus of variations. u(x) = y0(x) η(x) y(x) u(x) η0(x)

  • We could also consider a perturbation for and immediately obtain a

perturbation for u(x) δ(x) y(x) R x

a δ(s)ds

x = x∗ + αη η, ξ u = u∗ + αξ

˙ x(t) = f(x(t), u(t))

  • Note however that this is now a constraint problem (due to the dynamic equation)

and therefore we need to introduce the co-state (“Lagrange multipliers”)

Z T g(x(t), u(t))dt + gT (x(T))

x(0) = x0

slide-53
SLIDE 53

A12

Derivation of PMP

Let us rewrite the cost as is an important function called the Hamiltonian. where is the co-state and λ(t) ∈ Rn The first variation (using again integration by parts) is (see Liberzon’s book) J(u) = Z T (g(x(t), u(t)) + λ(t)|(f(x(t), u(t)) − ˙ x(t))dt + gT (x(T)) = Z T H(x(t), u(t), λ(t)) − λ(t)| ˙ x(t)dt + gT (x(T)) H(x(t), u(t), λ(t)) = g(x(t), u(t)) + λ(t)|f(x(t), u(t)) x = x∗ + αη u = u∗ + αξ

δJ|u∗(ξ) = Z T (η|( ˙ λ+[ ∂ ∂xH(x∗, u∗, λ)]|)+ξ|[ ∂ ∂uH(x∗, u∗, λ)]|dt+η(T)|([ ∂ ∂x(T)gT (x∗(T))]|−λ(T))

slide-54
SLIDE 54

A13

Derivation of PMP

Setting the first variation to zero we conclude that it must only be zero if ˙ λ = − ∂

∂xH(x∗, u∗, λ)) ∂ ∂uH(x∗, u∗, λ) = 0

|

and the dynamic equation can be written as ˙ x∗ =

∂ ∂λH(x∗, u∗, λ∗)

|

where This is the PMP that we saw before! H(x(t), u(t), λ(t)) = g(x(t), u(t)) + λ(t)|f(x(t), u(t))

λ(T) = [ ∂ ∂x(T)gT (x∗(T))]|

slide-55
SLIDE 55

A14

Pontryagin’s maximum principle

Let be an optimal path for the continuous-time optimal control problem. Then there exists a function , denoted by co-state, such that (u∗(t), x∗(t)) λ(t), t ∈ [0, T] ˙ x∗(t) = f(x∗(t), u∗(t)) ˙ λ(t) = −( ∂

∂xf(x∗(t), u∗(t)))|λ(t) − ( ∂ ∂xg(x∗(t), u∗(t)))| ∂ ∂uf(x∗(t), u∗(t))|λ(t) + ∂ ∂ug(x∗(t), u∗(t))| = 0

(given) λ(T) =

∂ ∂xgT (x(T)∗)

x(0) = x0

slide-56
SLIDE 56

A15

Discussion

  • We have

and therefore the Hamiltonian remains constant along optimal paths.

d dtH(x∗(t), u∗(t), λ(t)) = (∂H ∂x + d dtλ(t))f(x∗(t), u∗(t)) + ∂H ∂u d dtu = 0

  • The condition indicates that the function
  • In fact one can prove that such stationary point is a minimum and thus this is often called

the Pontryagin’s minimum principle.

  • A slightly different definition of the Hamiltonian would entail that this stationary point

would be a maximum, and therefore in some literature the nomenclature Pontryagin’s maximum principle is used.

∂ ∂uH(x∗(t), u∗(t), λ(t)) = 0

u(t) → H(x∗(t), u(t), λ(t))

  • This is a special case of the Pontryagin’s, which establishes a similar result but for

more general terminal conditions (time is not necessary fixed, and terminal state can be fixed and under milder conditions (e.g. actuation may be discontinuous)