Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

optimal control and dynamic programming
SMART_READER_LITE
LIVE PREVIEW

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Motivation Last two lectures we discussed two alternative methods to DP for stage decision problems: discretisation and static optimization. Discretization


slide-1
SLIDE 1

4SC000 Q2 2017-2018

Optimal Control and Dynamic Programming

Duarte Antunes

slide-2
SLIDE 2

Motivation

1

  • Last two lectures we discussed two alternative methods to DP for stage decision problems:

discretisation and static optimization.

  • Discretization allows to obtain an optimal policy (approximately) when the dimension of the

problem (state and input dimension) is small and static optimisation allows to obtain an optimal path when the problem is convex dimension convexity DP discretization Static

  • ptimization

small convex small non-convex large convex large non-convex

n ≤ 3 n ≤ 3

  • we show how to obtain optimal policies using static optimisation and certainty equivalent

control (lec 3, slide 33) applying this to solve linear quadratic control with input constraints.

  • discuss related approximate dynamic programming techniques such as MPC and rollout

applying it to a non-convex, large dimensional problem (control of switched linear systems). In this lecture:

slide-3
SLIDE 3

Outline

  • Approximate dynamic programming
  • Linear quadratic control with inequality constraints
  • Control of switched systems
slide-4
SLIDE 4

Challenge

2

Jk(xk) = minuk gk(xk, uk) + Jk+1(fk(xk, uk)) Iterating the dynamic programming algorithm for stage decision problems is hard since it is hard to compute the costs-to-go (see e.g. slide 11 of lecture 5). Jk+1(xk+1) xk+1 uk Jk(xk) gk(xk, uk) + Jk+1(fk(xk, uk)) For each hard to minimize this function of xk

?

Hard to obtain an expression for the cost-to-go xk Typically unknown (known

  • nly for the terminal stage)

uk

  • Moreover:
  • discretization is only possible if the state and input spaces are not large.
  • static optimisation only assures optimality for convex problems and only allows to

compute optimal paths.

  • How to obtain (sub)optimal policy?
slide-5
SLIDE 5

Idea I -Certainty equivalent control

3

Compute optimal path online starting from the current state and apply the first decision xk xk+1 Repeat this procedure at every stage, for the current state

time/stage k

h

time/stage

h

k + 1

slide-6
SLIDE 6

Idea II - MPC

4

Similar to Idea I but compute decisions only in a (short) horizon . . Compute optimal path over horizon starting from the current state and apply first decision xk xk+1 Repeat this procedure in a receding/rolling horizon way

  • Optimization problem to solve at each stage is then much simpler and makes the algorithm

feasible to run online. This is the fundamental idea of Model Predictive Control (MPC).

  • Note that this is (in general) not optimal not even for problems without disturbances.

time/stage k

h

time/stage

k + H h

k + 1

k + H

slide-7
SLIDE 7

Idea III - Rollout

5

Similar to Ideas I II but after the optimization horizon use a base policy. Apply the first decision of the optimization procedure. xk

  • Optimization problem to solve at each stage is then much simpler and makes the algorithm

feasible to run online. This is the fundamental idea of Rollout.

  • Note that this is (in general) not optimal not even for problems without disturbances.

Repeat this procedure in a receding/rolling horizon way

time/stage k

k + H h

time/stage

k + H h

k + 1

xk+1

slide-8
SLIDE 8

Approximate dynamic programming

Approximate the cost-to go and use the solution of the following equation as the control input (in general different from optimal control input!)

Jk(xk) = minuk gk(xk, uk) + ˜ Jk+1(fk(xk, uk))

Jk+1(xk+1)

xk+1 uk

xk

For each state just need to minimize this function and compute one action uk

gk(xk, uk) + ˜ Jk+1(fk(xk, uk))

u∗

k

u∗

k

Typically unknown known by construction ˜

Jk+1(xk+1)

uk More general than previous ideas!

6

slide-9
SLIDE 9

Discussion

7

  • Ideas I, II, III can be seen as special cases of approximate dynamic programming.
  • Idea I - for problems with disturbances, the true cost (requiring the computation of

expected values) is approximated by a deterministic cost. For problems without disturbances idea I yields the same policy as DP .

  • Idea II - approximation is the deterministic cost over a short horizon. This is an

approximation even for problems without disturbances. H = 1

  • Idea III - approximation is the cost of the base policy when , and it is the cost
  • ver a short horizon plus the cost of the base policy if the horizon is larger than one.

Again this is an approximation even for problems without disturbances.

  • Although we mainly consider finite horizon costs the ideas extend naturally to infinite

horizon costs. We formalize these statements next.

slide-10
SLIDE 10

Certainty equivalent control

8

Certainty equivalent control is the following implicit policy for a stage decision problem:

  • 1. At each stage (initially ), assuming that the full state is measured solve the

following problem (for initial condition ), assuming no disturbances and obtain estimates of the optimal control inputs

  • 2. Take the first decision and apply it to the process

(the system will evolve to in general different from due to disturbances). Repeat (go to 1.) xk k k = 0 xk xk+1 = fk(xk, uk, wk) x`+1 = f`(x`, u`, 0) (wk = 0, wk+1 = 0, . . . ) uk, uk+1, . . . , uh−1 uk k = 0 k = 1 1 2 h − 1 h 3 1 2 h − 1 h u1 u0 uh−1 u1 u2 x2 x1 x2 x2 xk+1 = fk(xk, uk, 0) xk+1 = fk(xk, , wk) uk Ph−1

k=0 gk(xk, uk) + gh(xh)

Ph−1

`=k g`(x`, u`) + gh(xh)

` ∈ {k, k + 1, . . . , h − 1}

slide-11
SLIDE 11

Certainty equivalent control

9

For problems with no disturbances, this is equivalent to minuk gk(xk, uk) + ˜ Jk+1(xk+1) x`+1 = f`(x`, u`, 0) ` ∈ {k, k + 1, . . . , h − 1} s.t. ˜ Jk+1(xk+1) = min

uk+1,...,uk+H−1 k+H−1

X

`=k+1

g`(x`, u`) and select the first and apply it. uk s.t. (1) initial condition xk+1 Note that, for problems without disturbances, are the optimal costs-to-go and satisfy the DP equation ˜ Jk ˜ Jk(xk) = min

uk gk(xk, uk) + ˜

Jk+1(xk+1) For problems with stochastic disturbances ˜ Jk are approximations of optimal costs-to-go Jk(xk) = min

uk gk(xk, uk) + E[Jk+1(xk+1)]

Approximated by ˜ Jk+1(xk+1) x`+1 = f`(x`, u`, w`) ` ∈ {k, k + 1, . . . , h − 1}

slide-12
SLIDE 12

Model Predictive Control

10

  • At each time consider the optimal control problem only in a

horizon

time/stage k

k + H h

and select the first control input resulting from this optimization. H min

uk,uh+1,...,uk+H−1 k+H−1

X

`=k

g`(x`, u`) uk k

s.t.

xk

(1) ` ∈ {k, k + 1, . . . , k + H − 1}

x`+1 = f`(x`, u`)

  • Equivalent to min

uk gk(xk, uk) + ˜

Jk+1(xk+1)

s.t.

(1)

` ∈ {k + 1, k + 2, . . . , k + H − 1}

˜ Jk+1(xk+1) = min

uk+1,...,uk+H−1 k+H−1

X

`=k+1

g`(x`, u`)

slide-13
SLIDE 13

Model Predictive Control

11

  • Note that at the last decision stages

the cost function is slightly different for finite horizon problems, including also the terminal cost

  • There are several variants of Model Predictive Control and in

particular some variants use a terminal constraint (this is useful to prove stability). For example, impose that the state after the horizon must be zero (for finite-horizon problems at the last stages ). min

uk,uh+1,...,uk+H−1 k+H−1

X

`=k

g`(x`, u`)

s.t.

+gh(xh) xk+H = 0 xh = 0 k ∈ {h − H, h − H − 1, . . . , h}

(1)

` ∈ {k, k + 1, . . . , h − 1}

slide-14
SLIDE 14

Rollout

12

  • At each time consider the optimal control problem in a horizon

time/stage k

k + H h

and select the first control input resulting from this optimization. H uk

  • Similar to MPC but use a base policy after the horizon

uk = ¯ µ(xk)

assuming that after the horizon a base policy is used

s.t.

(1)

xk

` ∈ {k, k + 1, . . . , k + H − 1}

+

h−1

X

`=k+H

g`(x`, ¯ µ`(x`)) + gh(xh)

min

uk,uk+1,...,uk+H−1 k+H−1

X

`=k

g`(x`, u`)

  • Equivalent to

min

uk gk(xk, uk) + ˜

Jk+1(xk+1)

s.t.

(1) ` ∈ {k + 1, k + 2, . . . , k + H − 1}

+

h−1

X

`=k+H

g`(x`, ¯ µ`(x`)) + gh(xh) ˜ Jk+1(xk+1) = min

uk+1,...,uk+H−1 k+H−1

X

`=k+1

g`(x`, u`)

slide-15
SLIDE 15

Further remarks on ADP

13

  • Quality of an approximation is measured by how “good” is or

how close it is from optimal and not by how “good” is the approximation of

  • Decisions only need to be computed (in real-time) for the value of

the present state (do not need to iterate the cost-to-go).

  • There are several variants to approximate the costs-to-go.
  • Due to the heuristic nature of the approximation, it is very hard to

quantify when a specific approximation method is good and to establish formal results.

  • For example we have seen that the optimal policy for the infinite

horizon problem linear quadratic regulator problem makes the closed-loop of a linear system stable. This is typically very hard to establish for approximate methods.

u∗

k

u∗

k

˜ Jk+1(xk+1) Jk+1(xk+1)

slide-16
SLIDE 16

Outline

  • Approximate dynamic programming
  • Linear quadratic control with inequality constraints
  • Control of switched systems
slide-17
SLIDE 17

14

π = {µ0, . . . , µh−1} uk = µk(xk)

Dynamic model Cost function

Problem formulation

xk+1 = Axk + Buk uk ∈ R uk ∈ [−c, c] ∀k

h−1

X

k=0

x|

kQxk + u| kRuk + x| hQhxh

Input constraints Find a policy which minimizes the cost function Let us focus first on obtaining an optimal path and then we will find a control policy with certainty equivalence and MPC c > 0 (given ) (for simplicity) Possible disturbances +wk

slide-18
SLIDE 18

Quadratic programming formulation

15

Define Dynamic model (equality constraints) Cost function Inequality Constraints ¯ x = ⇥x|

1

x|

2

. . . x|

h

⇤| ¯ u = ⇥u| u|

1

. . . u|

h−1

⇤| ¯ x = ¯ Ax0 + ¯ B¯ u

¯ A =      A A2 . . . Ah     

¯ B =          B . . . AB B . . . A2B AB B . . . A3B A2B AB B . . . . . . ... ... ... ... . . . Ah−1B Ah−2B Ah−3B . . . AB B          ¯ Q =      Q . . . Q . . . . . . ... ... ... . . . . . . Qh      ¯ R =      R . . . R . . . . . . ... ... ... . . . . . . R     

( ¯ Ax0 + ¯ B¯ u)| ¯ Q( ¯ Ax0 + ¯ B¯ u) + ¯ u| ¯ R¯ u  I −I

  • ¯

u ≤ c 2 6 6 4 1 1 . . . 1 3 7 7 5 Standard quadratic programming problem (see, e.g., Convex optimization, Stephen Boyd, Lieven Vandenberghe) implemented in Matlab with quadprog.m. Convex problem! This allow us already to obtain an optimal path.

slide-19
SLIDE 19

Example

16

Double integrator (model slide 20, II_1) Optimal path no constraints (slide 30, Lec 5), parameters

x0 = ⇥ 1 ⇤|

τ = 0.2

Optimal path with constraints

parameters optpolicy=2, fdisturbances=0 (or optpolicy=3, fdisturbances=0 !)

† †

*run matlab code on slides 25-28 with parameters optpolicy=1, fdisturbances=0 *

5 10

t

0.5 1

y(t)

5 10

t

  • 0.8
  • 0.6
  • 0.4
  • 0.2

v(t)

5 10

t

  • 4
  • 2

2

u(t)

5 10

t

0.5 1

y(t)

5 10

t

  • 0.6
  • 0.4
  • 0.2

v(t)

5 10

t

  • 0.4
  • 0.2

0.2 0.4

u(t)

Q = Qh = I, R = 1

h = 50 uk ∈ [−L, L], L = 0.25 x = ⇥y v⇤| Position Velocity

slide-20
SLIDE 20

Example

17

What if there are disturbances? Desired behaviour far from expected!

*run matlab code on slides 25-28 with parameters optpolicy=2, fdisturbances=1 *

We want an optimal policy! CEC! (and MPC)

5 10

t

0.5 1

y(t)

5 10

t

  • 0.6
  • 0.4
  • 0.2

0.2

v(t)

5 10

t

  • 0.4
  • 0.2

0.2 0.4

u(t)

slide-21
SLIDE 21

Certainty equivalent control

18

At each stage define Dynamic model (equality constraints) Cost function Inequality Constraints

¯ Q =      Q . . . Q . . . . . . ... ... ... . . . . . . Qh      ¯ R =      R . . . R . . . . . . ... ... ... . . . . . . R     

 I −I

  • ¯

u ≤ c 2 6 6 4 1 1 . . . 1 3 7 7 5

h−1

X

`=k

x|

` Qx` + u| ` Ru` + x| hQhxh =

¯ B =          B . . . AB B . . . A2B AB B . . . A3B A2B AB B . . . . . . ... ... ... ... . . . Ah−1−kB Ah−2−kB Ah−3−kB . . . AB B         

¯ A =      A A2 . . . Ah−k     

Take the first component of and apply it to the plant. ¯ u uk , that is, ¯ x = ⇥x|

k+1

x|

k+2

. . . x|

h

⇤| ¯ u = ⇥u|

k

u|

k+1

. . . u|

k+H−1

⇤| ( ¯ Axk + ¯ B¯ u)| ¯ Q( ¯ Axk + ¯ B¯ u) + ¯ u| ¯ R¯ u ¯ x = ¯ Axk + ¯ B¯ u

slide-22
SLIDE 22

Example

19

Considering disturbances affecting the system (see code slide 25 for details on disturbances) and applying certainty equivalence control we obtain xk+1 = Axk + Buk

*run matlab code on slides 25-28 with parameters optpolicy=3, fdisturbances=1

+wk

5 10

t

0.5 1

y(t)

5 10

t

  • 0.6
  • 0.4
  • 0.2

0.2

v(t)

5 10

t

  • 0.4
  • 0.2

0.2 0.4

u(t)

slide-23
SLIDE 23

Discussion

  • Although quadprog.m is quite efficient in problems with a large sampling rate or a large

time horizon of interest it might not possible to run quadprog.m and this motivates MPC.

  • Another motivation for MPC is that the model is typically not extremely accurate and

considering short horizon might even lead to better results.

  • We shall try MPC imposing that the final state after the prediction horizon is zero. See

Bertsekas book, Ch. 6, for the close connection with Rollout (after the horizon the base policy is apply zero control input!)

  • CEC allowed to transform an algorithm to compute optimal paths (using quadprog.m) to an

algorithm that computes optimal policies.

  • The fact that is works so well follows from the fact that the problem is convex: a problem

with inequality constraints is convex if the cost is convex and the set of feasible solutions is convex, which can be shown to be the case for this problem.

20

slide-24
SLIDE 24

MPC

21

At each stage define Dynamic model (equality constraints) Cost function Inequality Constraints

¯ R =      R . . . R . . . . . . ... ... ... . . . . . . R     

 I −I

  • ¯

u ≤ c 2 6 6 4 1 1 . . . 1 3 7 7 5 Take the first component of and apply it to the plant. ¯ u uk , that is, ¯ x = ⇥x|

k+1

x|

k+2

. . . x|

k+H−1

⇤| ¯ x = ¯ Axk + ¯ B¯ u ( ¯ Axk + ¯ B¯ u)| ¯ Q( ¯ Axk + ¯ B¯ u) + ¯ u| ¯ R¯ u ¯ u = ⇥u|

k

u|

k+1

. . . u|

k+H−1

⇤|

¯ A =      A A2 . . . AH−1     

¯ B =        B . . . AB B . . . A2B AB B . . . . . . ... ... ... ... . . . AH−2B AH−3B . . . . . . B       

¯ Q =      Q . . . Q . . . . . . ... ... ... . . . . . . Q      k+H−1

X

`=k

x|

` Qx` + u| ` Ru` =

slide-25
SLIDE 25

Example

22

First iteration (no disturbances). This is the state and input that is predicted Applying the first control input leads to the state

*run matlab code on slides 25-28 with parameters optpolicy=4, fdisturbances=0

u0 = −0.25 x1 =  0.995 −0.05

  • and the process is repeated.

5 10

t

0.5 1

y(t)

5 10

t

  • 0.6
  • 0.4
  • 0.2

0.2

v(t)

5 10

t

  • 0.4
  • 0.2

0.2 0.4

u(t)

slide-26
SLIDE 26

Example

23

Note that the end result is different from the one predicted at iteration 0 because the state and input predictions change over time.

*run matlab code on slides 25-28 with parameters optpolicy=5, fdisturbances=0

End result

5 10

t

0.5 1

y(t)

5 10

t

  • 0.6
  • 0.4
  • 0.2

v(t)

5 10

t

  • 0.4
  • 0.2

0.2 0.4

u(t)

slide-27
SLIDE 27

Example

24

Results with disturbances

*run matlab code on slides 25-28 with parameters optpolicy=5, fdisturbances=1

5 10

t

0.5 1

y(t)

5 10

t

  • 0.6
  • 0.4
  • 0.2

0.2

v(t)

5 10

t

  • 0.4
  • 0.2

0.2 0.4

u(t)

slide-28
SLIDE 28

Matlab code

25

  • ptpolicy = 1;

fdisturbances = 0; % double integrator model Ac = [0 1;0 0]; Bc = [0;1]; tau = 0.2; Q = [1 0; 0 1]; S = [0;0]; R = 0.01; QT = [1 0; 0 1]; n = 2; m = 1; x0 = [1;0]; c = 0.25; sigmaw = 0.01; % disturbance level H = 25; % prediction horizon for mpc h = 50; % simulation horizon sysd = c2d(ss(Ac,Bc,zeros(1,n),0),tau); A = sysd.a; B = sysd.b; C = sysd.c; % preliminaries for the policies switch optpolicy case 1 P{h+1} = QT; for k = h:-1:1 % Riccati equations P{k} = A'*P{k+1}*A + Q - …(S+A’*P{k+1}*B)*pinv(R+B’*P{k+1}*B)*(S'+B'*P{k+1}*A); K{k} = -pinv(R+B'*P{k+1}*B)*(S'+B'*P{k+1}*A); end case 2 U2 = quadconstrainedcontrol(A,B,Q,R,c,h,x0,1); case 4 U3 = [quadconstrainedcontrol(A,B,Q,R,c,H,x0,2); zeros(h-H,1)]; end

Model definition Preliminaries

slide-29
SLIDE 29

Matlab code

26

t = tau*(0:1:h); x = zeros(n,h); u = zeros(1,h); x(:,1) = x0; randn('seed',15); % used for results in slides if fdisturbances == 1 gainnodisturbances = 1; else gainnodisturbances = 0; end cost = 0 for k=1:h switch optpolicy case 1 % LQR u(:,k) = K{k}*x(:,k); case 2 % CEC no disturbances u(:,k) = U2(k); case 3 % CEC [u_] = quadconstrainedcontrol(A,B,Q,R,c,h+1-k,x(:,k),1); u(:,k) = u_(1); case 4 % MPC no disturbances u(:,k) = U3(k); case 5 % MPC if k <= h-H-1 [u_] = quadconstrainedcontrol(A,B,Q,R,c,H,x(:,k),2); u(:,k) = u_(1); else [u_] = quadconstrainedcontrol(A,B,Q,R,c,h+1-k,x(:,k),1); u(:,k) = u_(1); end end w(:,k) = B*sigmaw*randn(1)*gainnodisturbances; x(:,k+1) = A*x(:,k) + B*u(:,k) + w(:,k); cost = cost + x(:,k)'*Q*x(:,k) + u(:,k)'*R*u(:,k); end v = x(1,:);

Simulate system

slide-30
SLIDE 30

Matlab code

27

N = 1000; ts = tau/N; nl = h*N; uc = kron(u,ones(1,N)); sysd = c2d(ss(Ac,Bc,zeros(1,size(Ac,1)),zeros(1,size(Bc,2))),ts); Ad = sysd.a; Bd = sysd.b; xc = zeros(n,nl+1); xc(:,1) = x(:,1); for k = 1:h xc(:,(k-1)*N+1) = x(:,k); for l=1:N xc(:,(k-1)*N+l+1) = Ad*xc(:,(k-1)*N+l)+Bd*u(:,k); end end tc = ts*(0:nl); figure(1) subplot(1,3,1) plot(tc,xc(1,:)) set(gca,'Fontsize',18) xlabel('t') ylabel('y(t)') grid on axis([0 h*tau -0.2 1]) subplot(1,3,2) plot(tc(1:end),xc(2,:)) set(gca,'Fontsize',18) xlabel('t') ylabel('v(t)') grid on subplot(1,3,3) plot(tc(1:end-1),uc) set(gca,'Fontsize',18) grid on set(gca,'Fontsize',18) xlabel('t') ylabel('u(t)')

Continuous-time simulation and plots

slide-31
SLIDE 31

Matlab code

28

function [u] = quadconstrainedcontrol(A,B,Q,R,L,h,x0,opt) % opt = 1, CEC, opt = 2, MPC n = size(A,1); m = size(B,2); % define matrices M, N Abar = zeros(n*h,n); Bbar = zeros(n*h,m*h); Qbar = zeros(n*h,n*h); Rbar = zeros(m*h,m*h); for i = 1:h Abar(1+(i-1)*n:n*i,:) = A^i; for j = 1:i Bbar(1+(i-1)*n:n*i,1+(j-1)*m:m*j) = A^(i-j)*B; end Qbar(1+(i-1)*n:n*i,1+(i-1)*n:n*i) = Q; Rbar(1+(i-1)*m:m*i,1+(i-1)*m:m*i) = R; end switch opt case 1 % CEC u = quadprog(Rbar+Bbar’*Qbar*Bbar,((Bbar’*Qbar*Abar)*x0)’,… [eye(m*h);-eye(m*h)],[L*ones(m*h,1);L*ones(m*h,1)],[],[]); case 2 % MPC u = quadprog(Rbar+Bbar’*Qbar*Bbar,((Bbar’*Qbar*Abar)*x0)’,[eye(m*h);-eye(m*h)],… [L*ones(m*h,1);L*ones(m*h,1)],Bbar(end-n+1:end,:),-A^h*x0); end

Key function

slide-32
SLIDE 32

Outline

  • Approximate dynamic programming
  • Linear quadratic control with inequality constraints
  • Control of switched systems
slide-33
SLIDE 33

Switched linear systems

29

xk+1 = Aσkxk Dynamic model Cost function

  • Applying the dynamic programming algorithm would lead to a

combinatorial problem.

  • Let us apply approximate dynamic programming.

Goal: Find an optimal path (or policy) for the control input for a given initial condition uk = σk x0

σk = {1, . . . , m} xk ∈ Rn

h−1

X

k=0

x|

kQσkxk + x| h ¯

Qhxh

slide-34
SLIDE 34

CEC

30

  • 1. At step compute all the possible control sequences
  • 2. Compute the cost for each of these control

sequences, and pick a sequence with minimum cost.

  • 3. Apply the first element set and go again to step 1 if

k σk = ˜ σk k = k + 1 (σk, σk+1, σk+2, . . . , σh−1) (˜ σk, ˜ σk+1, ˜ σk+2, . . . , ˜ σh−1) k 6= h Still combinatorial! Ph−1

`=k x| ` Q`x` + x| h ¯

Qhxh

slide-35
SLIDE 35

MPC

31

  • 1. At step compute all possible control sequences in the prediction

horizon

  • 2. Compute the cost

for each of these control sequences and pick a sequence with minimum cost.

  • 3. Apply the first element set and go again to step 1

if . k (σk, σk+1, σk+2, . . . , σH−1) (σk, σk+1, σk+2, . . . , σh−1) (˜ σk, ˜ σk+1, ˜ σk+2, . . . , ˜ σh−1) if

  • therwise.

if if

  • therwise

(˜ σk, ˜ σk+1, ˜ σk+2, . . . , ˜ σH−1) σk = ˜ σk k = k + 1 k 6= h

  • therwise

0 ≤ k ≤ h − H 0 ≤ k ≤ h − H 0 ≤ k ≤ h − H

Ph−1

`=k x| ` Q`x` + x| h ¯

Qhxh Pk+H−1

`=k

x|

` Q`x`

slide-36
SLIDE 36

32

  • 1. At step compute all the control sequences with free

taking the form

  • 2. Compute the cost for each of these control

sequences, and pick a sequence with minimum cost.

  • 3. Apply the first element , set and go again to step 1

if . k Pick a base policy and start with k = 0 , H > 0 σk = ˜ σk (¯ σ0, ¯ σ1, . . . , ¯ σh−1) (σk, . . . , σk+H−1,¯ σ0, ¯ σ1, . . . , ¯ σh−1−k−H) σk, . . . , σk+H−1

Rollout

  • therwise

(σk, σk+1, . . . , σh−1) if k = k + 1 k 6= h (˜ σk, ˜ σk+1, ˜ σk+2, . . . , ˜ σh−1)

0 ≤ k ≤ h − H

Ph−1

`=k x| ` Q`x` + x| h ¯

Qhxh

slide-37
SLIDE 37

Outline

  • Approximate dynamic programming
  • Linear quadratic control with inequality constraints
  • Control of switched systems
  • Mixing fluids
  • Explicit versus implicit policies
  • Resource-aware control
slide-38
SLIDE 38

33

Mixing

Control law

Camera images Actuators Actuation: 4 possible rotations decided once every h seconds

How to mix two fluids in minimum time*?

σk ∈ {1, . . . , 4}

*Mixing and switching, V.S. Dolk MSc thesis

slide-39
SLIDE 39

34

Periodic open loop policy

Basic mechanism for effective mixing is repetitive stretching & folding

Basic mechanism can be seen as a periodic open-loop strategy Can a closed-loop strategy perform better?

slide-40
SLIDE 40

35

Model

k+1 = Aσkxk

Divide the domain into subdomains and denote by the concentration of fluid a time N i ∈ {1, . . . , N}

xi

k

kτ x1 x2 x3

Then the following equation describes how the concentrations evolve as a function of the control action

xk+1 = Aσkxk

σk ∈ {1, . . . , 4}

slide-41
SLIDE 41

36

Cost function

Model

  • The intensity of segregation is defined as

and measures how far is the concentration from homogeneity.

  • The concentrations for each subdomains will be equal to a given constant

when homogeneity is achieved. This means that the following error will converge to zero

c e|

kek

ek := xk − c      1 1 . . . 1     

  • We are interested in minimizing the intensity of segregation for every time step

Ph

k=0 e| kek

c = 1

N

⇥1 1 . . . 1⇤ x0

slide-42
SLIDE 42

37

Results

Periodic Rollout

  • The periodic sequence consists of alternating a rotation of the outer cylinder

and a clockwise rotation of the inner cylinder.

  • The rollout strategy consists of optimizing in a horizon of length 4 assuming

that afterwards this periodic policy is used.

slide-43
SLIDE 43

38

Results

  • An MPC approach with horizon boils down to picking the decision minimizing

the (squared) norm of the error at the next iteration (minimum error “greedy” strategy) H = 2

  • In this case MPC with leads to worse results.

argminσke|

kek+e| k+1ek+1 = argminσke| kek+e| kA| σkAσkek = argminσk e| kA| σkAσkek

| {z }

e|

k+1ek+1

H = 2

slide-44
SLIDE 44

Outline

  • Approximate dynamic programming
  • Linear quadratic control with inequality constraints
  • Control of switched systems
  • Mixing fluids
  • Explicit versus implicit policies
  • Resource-aware control
slide-45
SLIDE 45

39

Introduction

  • We show next that for switched linear systems, we can write explicit rollout and

MPC policies

  • Although finite-horizon costs can be considered, this is actually easier if we consider

infinite-horizon costs

  • Note that certainty equivalent control in the case of infinite horizon switched systems

problems would entail computing an infinite number of options and therefore it is not viable. xk+1 = Aσkxk P∞

k=0 x| kQσkxk

slide-46
SLIDE 46

40

Cost of base policy

  • Suppose that the base policy corresponds to a constant switching , . Then it

is easy to show that σk = j xk+1 = Ajxk ∀k when where is the solution to the linear system

  • If the base policy corresponds to periodic switching a similar expression can be
  • btained.

A|

j M jAj − M j = −Qj

M j = P∞

k=0(A| j )kQj(Aj)k

P∞

k=` x| kQjxk = x| ` M jx`

  • For example if and

m = 2, σk ∈ {1, 2} (σ`, σ`+1, σ`+2, σ`+3, σ`+4, . . . ) = (1, 2, 1, 2, 1, . . . ) then where P∞

k=` x| kQkxk = x| ` ¯

M 1x` A|

1 ¯

M 2A1 − ¯ M 1 = −Q1 A|

2 ¯

M 1A2 − ¯ M 2 = −Q2

slide-47
SLIDE 47

41

Explicit rollout policy

Note that in the prediction horizon there are Suppose that we index all these options Then each corresponds to a unique set of free schedules at each iteration mH options to be considered and a unique set of costs can be obtained x|

kP ixk = k+H−1

X

`=k

x|

` Q`x` + ∞

X

`=k+H

x|

` Q¯ `x`

i i ∈ {1, 2, 3, . . . , mH} (σk, σk+1, . . . , σk+H−1) = (¯ σi

0, ¯

σi

1, ¯

σi

2, . . . , ¯

σi

H−1)

Then an explicit expression for the rollout policy is σk = π(ι(xk)) ι(xk) = arg min

i

x|

kP ixk

where is a map that extracts the first component of the scheduling associated with π π(i) = ¯ σi i

slide-48
SLIDE 48

42

Example

If and : m = 2 H = 2 i 1 2 3 4 ¯ σ0 ¯ σ1 π(i) Pi 1 1 1 1 1 1 2 2 2 2 2 2 P2 = Q1 + A|

1Q2A1 + (A2A1)| ¯

M(A2A1) P1 = Q1 + A|

1Q1A1 + (A1A1)| ¯

M(A1A1) P 4 = Q2 + A|

2Q2A2 + (A2A2)| ¯

M(A2A2) P 3 = Q2 + A|

2Q1A2 + (A1A2)| ¯

M(A1A2) P i P 1 P 2 P 3 P 4

slide-49
SLIDE 49

Outline

  • Approximate dynamic programming
  • Linear quadratic control with inequality constraints
  • Control of switched systems
  • Mixing fluids
  • Explicit versus implicit policies
  • Resource-aware control
slide-50
SLIDE 50

43

Resource-aware control

How to efficiently control several dynamical systems via a wireless network by deciding how to schedule transmissions? Controller Actuator 1 (only one user can transmit at a given time) Inverted pendulum 1 Inverted pendulum II Actuator II θ1 θ2 τ1

slide-51
SLIDE 51

44

Model

  • The two processes are identical and modeled by

xi

k+1 = ¯

Axi

k + ¯

Bui

k

i ∈ {1, 2}

  • Each system has an identical cost
  • If each system could update the control law at every time instant the optimal control law

would take the form (LQR problem) ui

k = Kixi k

  • Instead, since each system can only update the control input at mutually exclusive times,

we have u1

k =

( K1x1

k, if σk = 1

v1

k, if σk = 2

u2

k =

( K2x2

k, if σk = 2

v2

k, if σk = 1

vi

k = ui k−1

i ∈ {1, 2} and we wish to minimize J1 + J2 Ji =

X

k=0

(xi

k)| ¯

Qxi

k + (ui k)| ¯

Rui

k

slide-52
SLIDE 52

45

Model

  • If we let

xk =     x1

k

v1

k

x2

k

v2

k

    we can write where xk+1 = Aσkxk P∞

k=0 x| kQσkxk

model cost A1 =     ¯ A + ¯ BK K ¯ A ¯ B I     A2 =     ¯ A ¯ B I ¯ A + ¯ BK K     Q1 =     ¯ Q + K| ¯ RK ¯ Q ¯ R     Q2 =     ¯ Q ¯ R ¯ Q + K| ¯ RK    

slide-53
SLIDE 53

46

Example

base policy ¯ B = R τ

0 eAcsdsBc

¯ A = eAcτ Ac =  1 3

  • Bc =

 1

  • ¯

Q = I ¯ R = 0.01 τ = 0.1 x0 =         −0.1 1         Parameters

1 2 3 4 5

  • 0.12
  • 0.1
  • 0.08
  • 0.06
  • 0.04
  • 0.02

1 2 3 4 5

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2 1 2 3 4 5 0.2 0.4 0.6 0.8 1 1.2

cost of periodic (base): 16.4974 cost of rollout: 12.4969 rollout base policy time t = kτ time t = kτ time t = kτ time t = kτ first component of first component of second component of second component of x2

k

x2

k

x1

k

x1

k

1 2 3 4 5

  • 1.6
  • 1.4
  • 1.2
  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4

slide-54
SLIDE 54

47

Example

cost of periodic (base): 16.4974 cost of rollout: 12.4969 Given the initial condition paying more attention to system 2 leads to better results! 2 2 2 1 2 1 2 1 2 1 1 2 3 4 5 6 7 8 9 10 σk k

1 2 3 4 5

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

2 4 1 2 3 4 5

  • 1
  • 0.5

0.5 1 1.5 2

rollout base policy time t = kτ time t = kτ control input control input u1

k

u2

k

slide-55
SLIDE 55

48

Another resource-aware control problem

How to control the two arms of a SCARA robot with only one amplifier to track a trajectory for the tip with minimum error?*

Only torque or can be applied at each time instant τ1 τ2 θ2 θ1 τ1 τ2

2 4 6 8 10

  • 0.2

0.2 0.4 0.6 0.8 1 2 4 6 8 10

  • 1.5
  • 1
  • 0.5

0.5 1

Schematic τ1 τ2 time

*Control of a SCARA robot with a reduced number of amplifiers, Internship report TU/e, VDL, Yuri Steinbuch, 2015

slide-56
SLIDE 56

49

Concluding remarks

To summarize:

  • When discretization is not feasible, approximate dynamic programming (e.g. CEC, rollout,

MPC) can be a practical way to provide a policy. However, it is suboptimal.

  • Must see case by case if it can be used - for SLS it works very well (although suboptimal!)

After this lecture you should be able to:

  • Apply ADP techniques to find suboptimal policies for stage decision problems.
  • Apply ADP techniques to control switched linear systems and linear quadratic problem

with input constraints. dimension convexity DP discretizat Static

  • ptimization

ADP ADP SLS small convex

?

small non- convex

?

large convex

?

large non- convex

?