Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

optimal control and dynamic programming
SMART_READER_LITE
LIVE PREVIEW

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III Continuous-time optimal control problems Recap Discrete optimization Stage decision problems problems Dynamic system & Formulation Transition


slide-1
SLIDE 1

4SC000 Q2 2017-2018

Optimal Control and Dynamic Programming

Duarte Antunes

slide-2
SLIDE 2

Part III

Continuous-time optimal control problems

slide-3
SLIDE 3

Recap

1

Discrete optimization problems Stage decision problems Formulation Transition diagram Dynamic system & additive cost function DP algorithm Graphical DP algorithm & DP equation DP equation Partial information Bayesian inference & decisions based on prob. distribution Kalman filter and separation principle Alternative algorithms Dijkstra's algorithm Static optimization

slide-4
SLIDE 4

2

Introduce optimal control concepts for continuous-time optimal control problems

Goals of part III

Discrete

  • ptimization

problems Stage decision problems Continuous-time control problems Formulation Transition diagram Discrete-time system & additive cost function Differential equations & additive cost function DP algorithm Graphical DP algorithm & DP equation DP equation Hamilton Jacobi Bellman equation Partial information Bayesian inference & decisions based on

  • prob. distribution

Kalman filter and separation principle Continuous-time Kalman filter and separation principle Alternative algorithms Dijkstra's algorithm Static optimization Pontryagin’s maximum principle

And analyze frequency-domain properties of continuous-time LQR/LQG

slide-5
SLIDE 5

Outline

  • Problem formulation and approach
  • Hamilton Jacobi Bellman equation
  • Linear quadratic regulator
slide-6
SLIDE 6

3

Continuous-time optimal control problems

Dynamic model Cost function The goal is to find an optimal path and an optimal policy

  • The differential equation has a unique solution in
  • We assume that do not explicitly depend on time for simplicity - we could

consider

  • and

Assumptions

t ∈ [0, T]

˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T] Z T g(x(t), u(t))dt + gT (x(T))

f, g f(t, x(t), u(t)), g(t, x(t), u(t)) x(t) ∈ Rn u(t) ∈ U ⊆ Rm

slide-7
SLIDE 7

4

Optimal path

  • A path consists of a control input and a corresponding

solution to the differential equation

  • A path is said to be optimal is there is no other path with a smaller cost

(u(t), x(t)) u(t) x(t) , t ∈ [0, T]

  • Choosing the control input can be seen as making decisions in infinitesimal time

intervals which shape the derivative of the state (and thus determine its evolution) ˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T] Z T g(x(t), u(t))dt + gT (x(T))

t = T x(T)

slide-8
SLIDE 8

5

Optimal policy

  • A policy is a function which maps states into actions at every time step
  • A policy is said to be optimal if for every state at every time ,

coincides with the cost of the optimal path to the problem

  • We denote the cost of the latter problem by optimal cost-to-go

u(t) = µ(t, x(t)), t ∈ [0, T] x(t) = ¯ x t µ µ ˙ x(s) = f(x(s), u(s)), x(t) = ¯ x, s ∈ [t, T] J(t, ¯ x) Z T

t

g(x(s), µ(s, x(s)))ds + gT (x(T)) Z T

t

g(x(s), u(s))ds + gT (x(T))

slide-9
SLIDE 9

6

Approach

  • Dynamic programming (DP) shall allow us to compute optimal policies and optimal paths

and the Pontryagin’s maximum principle (PMP) shall allow us to compute optimal paths.

  • However, obtaining these results in continuous-time (CT) is mathematically involved.
  • To gain intuition in both cases we will first discretize the problem as a function of the

discretization step (previously sampling period), apply DP and take the limit as the discretization step converges to zero. CT DP DT DP Discretization, step τ τ → 0 Taking the limit Optimal path and policy Stage decision problem CT control problem Optimal path and policy

slide-10
SLIDE 10

7

Example

+ −

R C

+ −

u How to charge the capacitor in a RC circuit with minimum energy loss in the resistor?

i x

˙ x(t) =

1 RC (u(t) − x(t))

Let us consider R = C = T = xdesired = 1 x(T) = xdesired x(0) = 0 min

u(t)

Z T (x(t) − u(t))2 R dt

slide-11
SLIDE 11

8

Discretization

Dynamic model Cost function Discretization times

discretization step τ x(t) = e−(t−tk) x(tk) | {z }

xk

+(1 − e−(t−tk)) u(tk) | {z }

uk

Z 1 (x(t) − u(t))2dt =

h−1

X

k=0

Z tk+1

tk

(e−(t−tk)xk + (1 − e−(t−tk))uk − uk)2dt =

h−1

X

k=0

Z tk+1

tk

e−2(t−tk)dt(xk − uk)2 =

h−1

X

k=0

1 − e−2τ 2 (xk − uk)2 xk+1 = e−τxk + (1 − e−τ)uk t ∈ [tk, tk+1) kh = T tk = kτ

slide-12
SLIDE 12

9

From terminal constraint to terminal cost

time

1 1 + ∆ 1 x(t) The framework of stage decision problems does not take into account terminal constraints. Thus we apply a trick considering that a final control input is applied at the terminal time setting the state to the desired terminal value after seconds, . x(1 + ∆) = e−∆x(1) + (1 − e−∆)u(1) Since this terminal control input is given by x(1 + ∆) = 1 ∆ u(1) = 1 − e−∆x(1) (1 − e−∆)

slide-13
SLIDE 13

10

The following cost approximates the original one that we are interested in

From terminal constraint to terminal cost

1 − e−∆x(1) (1 − e−∆) terminal cost ∆ → 0 γ(∆) → ∞ as Note that but if γ(∆)(xh − 1)2 → 0 xh → 1 γ(∆) = 1 − e−2∆ 2(1 − e−∆)2 Z 1+∆ (x(t) − u(t))2dt = Z 1 (x(t) − u(t))2dt + Z 1+∆

1

(x(t) − u(t))2dt =(

h−1

X

k=0

1 − e−2τ 2 (xk − uk)2) + 1 − e−2∆ 2 (xh − uh)2 =(

h−1

X

k=0

1 − e−2τ 2 (xk − uk)2) + γ(∆)(xh − 1)2

slide-14
SLIDE 14

11

Dynamic programming

Jk(xk) = min

uk

(xk − uk)2 + Jk+1(e−τxk + (1 − e−τ)uk) Applying DP Jh(xh) = γ(∆)(xh − 1)2 Results in Obtained from Riccati equations Example τ = 0.2 ∆ = 0.01

time t

0.2 0.4 0.6 0.8 1

x(t)

0.2 0.4 0.6 0.8 1

time t

0.2 0.4 0.6 0.8 1

u(t)

0.5 1 1.5 2

uk = Kkxk + αk Jk(xk) = θkx2

k + γkxk + βk

slide-15
SLIDE 15

12

Taking the limit τ → 0

Seems to be converging to u(t) = 1 + t x(t) = t . Later we will prove this. ∆ = 0.01 ∆ = 0.001 τ = 0.01 τ = 0.05 τ = 0.01 ∆ = 0.01

time t

0.2 0.4 0.6 0.8 1

x(t)

0.2 0.4 0.6 0.8 1

time t

0.2 0.4 0.6 0.8 1

u(t)

0.5 1 1.5 2

time t

0.2 0.4 0.6 0.8 1

x(t)

0.2 0.4 0.6 0.8 1

time t

0.2 0.4 0.6 0.8 1

u(t)

0.5 1 1.5 2

time t

0.2 0.4 0.6 0.8 1

x(t)

0.2 0.4 0.6 0.8 1

time t

0.2 0.4 0.6 0.8 1

u(t)

0.5 1 1.5 2

slide-16
SLIDE 16

13

Static optimization

min

u0,...,uh−1 h−1

X

k=0

(1 − e−2τ) 2 (xk − uk)2 xk+1 = e−τxk + (1 − e−τ)uk s.t. x0 = 0 xh = 1 Static optimization problem which can handle constraints Lagrangian

L(x1, u0, λ1, . . . , xh−1, uh−1, λh) =

h−1

X

k=0

(1 − e−2τ) 2 (xk−uk)2+

h−1

X

k=0

λk+1(e−τxk+(1−e−τ)uk−xk+1)

Necessary optimality conditions amount to solving a linear system (when ) ∂L ∂xk = 0 ∂L ∂uk = 0 λk = (1 − e−2τ)(xk − uk) + λk+1e−τ 0 = (1 − e−2τ)(xk − uk) + λk+1(1 − e−τ) xk+1 = e−τxk + (1 − e−τ)uk x0 = 0 xh = 1 k ∈ {0, . . . , h − 1} k ∈ {0, . . . , h − 1} k ∈ {1, . . . , h − 1} k ∈ {0, . . . , h − 1} ∂L ∂λk+1 = 0

slide-17
SLIDE 17

14

Taking the limit τ → 0

Again, seems to be converging to u(t) = 1 + t x(t) = t τ = 0.05 τ = 0.2 τ = 0.01

time t

0.2 0.4 0.6 0.8 1

x(t)

0.2 0.4 0.6 0.8 1

time t

0.2 0.4 0.6 0.8 1

u(t)

0.5 1 1.5 2

time t

0.2 0.4 0.6 0.8 1

x(t)

0.2 0.4 0.6 0.8 1

time t

0.2 0.4 0.6 0.8 1

u(t)

0.5 1 1.5 2

time t

0.2 0.4 0.6 0.8 1

x(t)

0.2 0.4 0.6 0.8 1

time t

0.2 0.4 0.6 0.8 1

u(t)

0.5 1 1.5 2

slide-18
SLIDE 18

15

Discussion

  • In this lecture we follow this discretization approach (the more formal continuous-time

approach can be found in Bertsekas’ book) to derive the counterpart of DP for continuous-time control problems, which is the Hamilton Jacobi Bellman equation

  • Later we will use both the discretization approach and the continuous-time approach to

derive the Pontryagin’s maximum principle.

  • With such tools we will be able to establish the optimal solution for charging the

capacitor, and solve many other problems. CT PMP CT DP DT PMP DT DP Discretization, step τ τ → 0 Taking the limit Optimal path and policy Stage decision problem CT control problem Optimal path and policy

slide-19
SLIDE 19

Outline

  • Problem formulation and approach
  • Hamilton Jacobi Bellman equation
  • Linear quadratic regulator
slide-20
SLIDE 20

16

Discretization approach

Dynamic model Cost function

  • Note that these are approximate discretizations. We could have considered exact

discretization, as in the linear case, but this approximation will suffice.

Discretization times

discretization step τ kh = T tk = kτ xk+1 = xk + τf(xk, uk) xk = x(kτ) uk = u(kτ) ˙ x(t) = f(x(t), u(t)), x(0) = x0, t ∈ [0, T] Z T g(x(t), u(t))dt + gT (x(T))

h−1

X

k=0

g(xk, uk)τ + gh(xh) gh(x) = gT (x), ∀x

slide-21
SLIDE 21

17

Dynamic programming

DP equations for the resulting stage decision problem Jh(xh) = gh(xh) Jk(xk) = min

uk∈U

g(xh, uk)τ + Jk+1(xk + τf(xk, uk)) For convenience let us define ¯ J(kτ, x) = min

u∈U

g(x, u)τ + ¯ J((k + 1)τ, x + τf(x, u)) ¯ J(hτ, x) = Jh(x) ¯ J(t, x) = Jk(x), k ∈ [kτ, (k + 1)τ) Then the dynamic programming algorithm can be written as k ∈ {h − 1, . . . , 0}

k ∈ {h − 1, . . . , 0}

∀x

¯ J(hτ, x) = gh(x) ∀x

∀x

slide-22
SLIDE 22

18

Taking the limit

Using first order Taylor series expansion

τ → 0

¯ J((k +1)τ, x+τf(x, u)) = ¯ J(kτ, x)+τ( ∂ ∂t ¯ J(kτ, x)+ ∂ ∂x ¯ J(kτ, x)f(x, u))+o(τ 2)

and replacing in the DP algorithm, we obtain Assuming that (wishful thinking....) as , converges to a continuously differentiable function, then

¯ J(kτ, x) = min

u∈U

g(x, u)τ + ¯ J(kτ, x)+τ( ∂ ∂t ¯ J(kτ, x)+ ∂ ∂x ¯ J(kτ, x)f(x, u))+o(τ 2)

¯ J(t, x)

0 = min

u∈U

g(x, u) + ∂ ∂t ¯ J(t, x) + ∂ ∂x ¯ J(t, x)f(x, u)

τ → 0

slide-23
SLIDE 23

19

Theorem (HJB)

Suppose that is continuously differentiable in and , and is such that it satisfies the Hamilton-Jacobi-Bellman equation: V (t, u) t x 0 = min

u∈U

g(x, u) + ∂ ∂tV (t, x) + ∂ ∂xV (t, x)f(x, u) ∀t, x V (T, x) = gT (x) Suppose also that attains the minimum in the HJB equation for all . u = µ(t, x) t, x Then coincides with the optimal cost-to-go and coincides with the optimal policy. V (t, x) J(t, x) µ(t, x)

slide-24
SLIDE 24

20

Discussion

  • The HJB equation is a partial differential equation.
  • The intuitive arguments provided before show that this partial

differential equation is just an extension of the DP algorithm.

  • The bottleneck of such intuitive arguments is how to establish that

the cost-to-go is differentiable.

  • The formal proof uses different argument, following a continuous-

time approach. It can be found in Bertsekas’ book, pag 111.

  • Partial differential equations are in general very hard to solve

analytically.

  • We are going to apply the HJB equation first to a simple example,

then for linear systems and solve the previous problem of charging a capacitor.

slide-25
SLIDE 25

21

Example

For the simple problem* ˙ x(t) = u(t) u(t) ∈ U := [−1, 1]

1 2(x(T))2

dynamics cost t ∈ [0, T] The HJB equation is with the terminal condition Approach: find a candidate for optimality and check that it satisfies HJB. V (T, x) = 1 2x2

* example taken from Bertsekas’ book, p. 112

0 = min

u∈[−1,1]

∂ ∂tV (t, x) + ∂ ∂xV (t, x)u

slide-26
SLIDE 26

22

Example

There is an obvious candidate for optimality: move the state towards zero as quickly as possible and for an initial time and initial state , the cost is given by µ∗(t, x) = −sign(x) =      1 if x < 0, 0 if x = 0, − 1 if x > 0

t

x J∗(t, x) = 1 2(max{0, |x| − (T − t)})2 x T − t −(T − t)

slide-27
SLIDE 27

23

Example

This function satisfies the terminal condition of the HJB theorem J∗(T, x) = 1 2x2 satisfies the HJB equation 0 = min

u∈[−1,1][1 + sgn(x)u]max{0, |x| − (T − t)}

µ ∗ (t, x) = u = −sign(x) where the minimum in the HJB equation is achieved by (not unique when ) |x(t)| ≤ T − t Then this is an optimal policy.

∂ ∂xJ∗(t, x) = sign(x) max{0, |x| − (T − t)} ∂ ∂tJ∗(t, x) = max{0, |x| − (T − t)}

slide-28
SLIDE 28

Outline

  • Problem formulation and approach
  • Hamilton Jacobi Bellman equation
  • Linear quadratic regulator
slide-29
SLIDE 29

24

Linear systems, quadratic cost

HJB Dynamic model Cost function Inspired by the fact that a discretization based approach would result in quadratic costs-to-go, let us try . If such function satisfies the HJB equations, it is the cost-to-go! V (t, x) = x|P(t)x ˙ x(t) = Ax(t) + Bu(t) x(0) = x0 0 = min

u∈Rm[x|Qx + 2x|Su + u|Ru + ∂V (t, x)

∂t + ∂V (t, x) ∂x (Ax + Bu)] V (T, x) = x|QT x x(T)|QT x(T) + Z T (x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt

 Q S S| R

  • > 0
slide-30
SLIDE 30

25

The HJB equation takes then the form To obtain the minimum, differentiate and equate to zero

Linear systems, quadratic cost

which leads to which is only satisfied if We have concluded that if satisfies this Riccati equation, then is the cost-to-go and is the optimal policy. P(T) = QT P(T) = QT P(T) = QT P(t) J(t, x) = x|P(t)x µ(t, x) = K(t)x K(t)

|{z}

0 = min

u∈Rm[x|Qx + 2x|Su + u|Ru + x| ˙

P(t)x + 2x|P(t)Ax + 2x|P(t)Bu)] 2(B|P(t) + S|)x + 2Ru = 0 u = −R−1(B|P(t) + S|)x 0 = x|( ˙ P(t) + P(t)A + A|P(t) − (P(t)B + S)R−1(B|P(t) + S|) + Q)x ˙ P(t) = −(P(t)A + A|P(t) − (P(t)B + S)R−1(B|P(t) + S|) + Q)

slide-31
SLIDE 31

26

Finite horizon quadratic control

Finite horizon

The optimal control policy for the following problem is where is the unique solution of P(T) = QT P(t) ˙ x(t) = Ax(t) + Bu(t) u(t) = K(t)x(t) , x(0) = x0 the Riccati equation K(t) = −R−1(B|P(t) + S|) , Moreover, the optimal cost-to-go is given by x|

0P(0)x0

min

u

Z T (x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt + x(T)|QT x(T) ˙ P(t) = −(P(t)A + A|P(t) − (P(t)B + S)R−1(B|P(t) + S|) + Q)

slide-32
SLIDE 32

27

Linear Quadratic Regulator

Infinite horizon

The reasoning follows from similar arguments used in the context of stage decision problems. The optimal policy for the following problem is , where is the unique positive definite solution to the algebraic Riccati equation ˙ x(t) = Ax(t) + Bu(t) x(0) = x0 u(t) = Kx(t) (A + BK) P Moreover the closed-loop matrix has all its eigenvalues on the left-half complex plane and the optimal cost-to-go is given by . 0 = PA + A|P − (PB + S)R−1(B|P + S|) + Q K = −R−1(B|P + S|) x|

0Px0

 Q S S| R

  • > 0

(A, B) controllable min

u

Z ∞ (x(t)|Qx(t) + 2x(t)|Su(t) + u(t)|Ru(t))dt

slide-33
SLIDE 33

28

Charging a capacitor

Applying a trick allows to cast our problem in the standard LQR formulation ˙ x(t) = −x(t) + u(t) R 1

0 (x(t) − u(t))2dt + γ(x(1) − 1)2

|{z}

|{z}

|{z}

|{z}

|{z}

R 1 ⇥x(t) y(t)⇤ 1 x(t) y(t)

  • +2

⇥x(t) y(t)⇤ −1

  • u(t)dt+1u(t)2+γ

⇥x(1) y(1)⇤  γ −γ −γ γ x(1) y(1)

  • A

B S R QT Q

|{z}

Dynamic model Cost function  ˙ x(t) ˙ y(t)

  • =

 −1  x(t) y(t)

  • +

 1

  • u(t)

 x(0) y(0)

  • =

 x0 1

slide-34
SLIDE 34

29

Riccati equations

P(T) = QT ˙ P(t) = −(P(t)A + A|P(t) − (P(t)B + S)R−1(B|P(t) + S|) + Q) Riccati equations P(t) =  p1(t) p2(t) p2(t) p3(t)

  • boil down to and

 ˙ p1(t) ˙ p2(t) ˙ p2(t) ˙ p3(t)

  • = −

p1(t) p2(t) p2(t) p3(t) −1

−1 p1(t) p2(t) p2(t) p3(t)

  • +

p1(t) − 1 p2(t) ⇥p1(t) − 1 p2(t)⇤ − 1 1

  • r equivalently to the non-linear differential equations

˙ p1(t) = 2p1(t) + (p1(t) − 1)2 − 1 = p1(t)2 ˙ p2(t) = p2(t) + p2(t)(p1(t) − 1) = p1(t)p2(t) ˙ p3(t) = p2(t)2 p1(1) = −p2(1) = p3(1) = γ whose solution is (solution method not addressed here) p1(t) = −p2(t) = p3(t) = 1 1 + 1

γ − t

slide-35
SLIDE 35

30

Optimal policy and optimal path

Optimal policy u(t) = −R−1(B|P + S) x(t) y(t)

  • =

⇥−(p1(t) − 1) −p2(t)⇤ x(t) 1

  • = −(p1(t) − 1)x(t) + p1(t) = −p1(t)(x(t) − 1) + x(t)

Optimal path for x(0) = 0 p1(t) = 1 1 + 1

γ − t

˙ x(t) = −x(t) + u(t) = −p1(t)(x(t) − 1) Letting the parameter of the artificial terminal cost converge to zero we obtain x(t) = t − (1 + 1

γ )

1 + 1

γ

+ 1 u(t) = 1 + t x(t) = t ∆ → 0 (γ → ∞)

slide-36
SLIDE 36

31

Discussion

  • The HJB equation is a partial differential equation and an analytical

solution is very hard to find.

  • For problems with linear models and quadratic costs, computing the
  • ptimal policy and optimal paths involves solving non-linear differential

equations (Riccati equations).

  • We were able to solve these Riccati equations since the dimension of

the state-space in our example was small.

  • The approach based on Pontryagin’s maximum principle will lead to

different conditions which can be applied to more cases.

  • We will later consider stochastic disturbances, but the advantages of

having a policy are exactly the same as for stage decision problems.

slide-37
SLIDE 37

32

Concluding remarks

  • The counter part of DP for stage-decision problems is the HJB equation.
  • This is a partial differential equation very hard to solve in general.
  • However, for linear systems we can solve it and this leads to the Riccati equations.
  • As for discrete-time optimal control problems this leads to an algebraic Riccati equation

(LQR in continuous-time) when the horizon is infinite.

Summary: After this lecture you should be able to:

  • Compute optimal policy and optimal path for problems with linear model and finite-horizon

quadratic cost (Riccati equations).

  • Compute the optimal policy for problems with linear models and infinite-horizon quadratic cost.
  • Solve the algebraic Riccati equation analytically when the dimension of the state-space is small.