4SC000 Q2 2017-2018
Optimal Control and Dynamic Programming
Duarte Antunes
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Motivation Last two lectures we discussed two alternative methods to DP for stage decision problems: discretisation and static optimization. Discretization
4SC000 Q2 2017-2018
Duarte Antunes
1
discretisation and static optimization.
problem (state and input dimension) is small and static optimisation allows to obtain an optimal path when the problem is convex dimension convexity DP discretization Static
small convex small non-convex large convex large non-convex
n ≤ 3 n ≤ 3
control (lec 3, slide 33) applying this to solve linear quadratic control with input constraints.
applying it to a non-convex, large dimensional problem (control of switched linear systems). In this lecture:
2
Jk(xk) = minuk gk(xk, uk) + Jk+1(fk(xk, uk)) Iterating the dynamic programming algorithm for stage decision problems is hard since it is hard to compute the costs-to-go (see e.g. slide 11 of lecture 5). Jk+1(xk+1) xk+1 uk Jk(xk) gk(xk, uk) + Jk+1(fk(xk, uk)) For each hard to minimize this function of xk
Hard to obtain an expression for the cost-to-go xk Typically unknown (known
uk
compute optimal paths.
3
Compute optimal path online starting from the current state and apply the first decision xk xk+1 Repeat this procedure at every stage, for the current state
time/stage k
h
time/stage
h
k + 1
4
Similar to Idea I but compute decisions only in a (short) horizon . . Compute optimal path over horizon starting from the current state and apply first decision xk xk+1 Repeat this procedure in a receding/rolling horizon way
feasible to run online. This is the fundamental idea of Model Predictive Control (MPC).
time/stage k
h
time/stage
k + H h
k + 1
k + H
5
Similar to Ideas I II but after the optimization horizon use a base policy. Apply the first decision of the optimization procedure. xk
feasible to run online. This is the fundamental idea of Rollout.
Repeat this procedure in a receding/rolling horizon way
time/stage k
k + H h
time/stage
k + H h
k + 1
xk+1
Approximate the cost-to go and use the solution of the following equation as the control input (in general different from optimal control input!)
Jk(xk) = minuk gk(xk, uk) + ˜ Jk+1(fk(xk, uk))
Jk+1(xk+1)
xk+1 uk
xk
For each state just need to minimize this function and compute one action uk
gk(xk, uk) + ˜ Jk+1(fk(xk, uk))
u∗
k
u∗
k
Typically unknown known by construction ˜
Jk+1(xk+1)
uk More general than previous ideas!
6
7
expected values) is approximated by a deterministic cost. For problems without disturbances idea I yields the same policy as DP .
approximation even for problems without disturbances. H = 1
Again this is an approximation even for problems without disturbances.
horizon costs. We formalize these statements next.
8
Certainty equivalent control is the following implicit policy for a stage decision problem:
following problem (for initial condition ), assuming no disturbances and obtain estimates of the optimal control inputs
(the system will evolve to in general different from due to disturbances). Repeat (go to 1.) xk k k = 0 xk xk+1 = fk(xk, uk, wk) x`+1 = f`(x`, u`, 0) (wk = 0, wk+1 = 0, . . . ) uk, uk+1, . . . , uh−1 uk k = 0 k = 1 1 2 h − 1 h 3 1 2 h − 1 h u1 u0 uh−1 u1 u2 x2 x1 x2 x2 xk+1 = fk(xk, uk, 0) xk+1 = fk(xk, , wk) uk Ph−1
k=0 gk(xk, uk) + gh(xh)
Ph−1
`=k g`(x`, u`) + gh(xh)
` ∈ {k, k + 1, . . . , h − 1}
9
For problems with no disturbances, this is equivalent to minuk gk(xk, uk) + ˜ Jk+1(xk+1) x`+1 = f`(x`, u`, 0) ` ∈ {k, k + 1, . . . , h − 1} s.t. ˜ Jk+1(xk+1) = min
uk+1,...,uk+H−1 k+H−1
X
`=k+1
g`(x`, u`) and select the first and apply it. uk s.t. (1) initial condition xk+1 Note that, for problems without disturbances, are the optimal costs-to-go and satisfy the DP equation ˜ Jk ˜ Jk(xk) = min
uk gk(xk, uk) + ˜
Jk+1(xk+1) For problems with stochastic disturbances ˜ Jk are approximations of optimal costs-to-go Jk(xk) = min
uk gk(xk, uk) + E[Jk+1(xk+1)]
Approximated by ˜ Jk+1(xk+1) x`+1 = f`(x`, u`, w`) ` ∈ {k, k + 1, . . . , h − 1}
10
horizon
time/stage k
k + H h
and select the first control input resulting from this optimization. H min
uk,uh+1,...,uk+H−1 k+H−1
X
`=k
g`(x`, u`) uk k
s.t.
xk
(1) ` ∈ {k, k + 1, . . . , k + H − 1}
x`+1 = f`(x`, u`)
uk gk(xk, uk) + ˜
Jk+1(xk+1)
s.t.
(1)
` ∈ {k + 1, k + 2, . . . , k + H − 1}
˜ Jk+1(xk+1) = min
uk+1,...,uk+H−1 k+H−1
X
`=k+1
g`(x`, u`)
11
the cost function is slightly different for finite horizon problems, including also the terminal cost
particular some variants use a terminal constraint (this is useful to prove stability). For example, impose that the state after the horizon must be zero (for finite-horizon problems at the last stages ). min
uk,uh+1,...,uk+H−1 k+H−1
X
`=k
g`(x`, u`)
s.t.
+gh(xh) xk+H = 0 xh = 0 k ∈ {h − H, h − H − 1, . . . , h}
(1)
` ∈ {k, k + 1, . . . , h − 1}
12
time/stage k
k + H h
and select the first control input resulting from this optimization. H uk
uk = ¯ µ(xk)
assuming that after the horizon a base policy is used
s.t.
(1)
xk
` ∈ {k, k + 1, . . . , k + H − 1}
+
h−1
X
`=k+H
g`(x`, ¯ µ`(x`)) + gh(xh)
min
uk,uk+1,...,uk+H−1 k+H−1
X
`=k
g`(x`, u`)
min
uk gk(xk, uk) + ˜
Jk+1(xk+1)
s.t.
(1) ` ∈ {k + 1, k + 2, . . . , k + H − 1}
+
h−1
X
`=k+H
g`(x`, ¯ µ`(x`)) + gh(xh) ˜ Jk+1(xk+1) = min
uk+1,...,uk+H−1 k+H−1
X
`=k+1
g`(x`, u`)
13
how close it is from optimal and not by how “good” is the approximation of
the present state (do not need to iterate the cost-to-go).
quantify when a specific approximation method is good and to establish formal results.
horizon problem linear quadratic regulator problem makes the closed-loop of a linear system stable. This is typically very hard to establish for approximate methods.
u∗
k
u∗
k
˜ Jk+1(xk+1) Jk+1(xk+1)
14
π = {µ0, . . . , µh−1} uk = µk(xk)
Dynamic model Cost function
xk+1 = Axk + Buk uk ∈ R uk ∈ [−c, c] ∀k
h−1
X
k=0
x|
kQxk + u| kRuk + x| hQhxh
Input constraints Find a policy which minimizes the cost function Let us focus first on obtaining an optimal path and then we will find a control policy with certainty equivalence and MPC c > 0 (given ) (for simplicity) Possible disturbances +wk
15
Define Dynamic model (equality constraints) Cost function Inequality Constraints ¯ x = ⇥x|
1
x|
2
. . . x|
h
⇤| ¯ u = ⇥u| u|
1
. . . u|
h−1
⇤| ¯ x = ¯ Ax0 + ¯ B¯ u
¯ A = A A2 . . . Ah
¯ B = B . . . AB B . . . A2B AB B . . . A3B A2B AB B . . . . . . ... ... ... ... . . . Ah−1B Ah−2B Ah−3B . . . AB B ¯ Q = Q . . . Q . . . . . . ... ... ... . . . . . . Qh ¯ R = R . . . R . . . . . . ... ... ... . . . . . . R
( ¯ Ax0 + ¯ B¯ u)| ¯ Q( ¯ Ax0 + ¯ B¯ u) + ¯ u| ¯ R¯ u I −I
u ≤ c 2 6 6 4 1 1 . . . 1 3 7 7 5 Standard quadratic programming problem (see, e.g., Convex optimization, Stephen Boyd, Lieven Vandenberghe) implemented in Matlab with quadprog.m. Convex problem! This allow us already to obtain an optimal path.
16
Double integrator (model slide 20, II_1) Optimal path no constraints (slide 30, Lec 5), parameters
x0 = ⇥ 1 ⇤|
τ = 0.2
Optimal path with constraints
parameters optpolicy=2, fdisturbances=0 (or optpolicy=3, fdisturbances=0 !)
† †
*run matlab code on slides 25-28 with parameters optpolicy=1, fdisturbances=0 *
5 10
t
0.5 1
y(t)
5 10
t
v(t)
5 10
t
2
u(t)
5 10
t
0.5 1
y(t)
5 10
t
v(t)
5 10
t
0.2 0.4
u(t)
Q = Qh = I, R = 1
h = 50 uk ∈ [−L, L], L = 0.25 x = ⇥y v⇤| Position Velocity
17
What if there are disturbances? Desired behaviour far from expected!
*run matlab code on slides 25-28 with parameters optpolicy=2, fdisturbances=1 *
We want an optimal policy! CEC! (and MPC)
5 10
t
0.5 1
y(t)
5 10
t
0.2
v(t)
5 10
t
0.2 0.4
u(t)
18
At each stage define Dynamic model (equality constraints) Cost function Inequality Constraints
¯ Q = Q . . . Q . . . . . . ... ... ... . . . . . . Qh ¯ R = R . . . R . . . . . . ... ... ... . . . . . . R
I −I
u ≤ c 2 6 6 4 1 1 . . . 1 3 7 7 5
h−1
X
`=k
x|
` Qx` + u| ` Ru` + x| hQhxh =
¯ B = B . . . AB B . . . A2B AB B . . . A3B A2B AB B . . . . . . ... ... ... ... . . . Ah−1−kB Ah−2−kB Ah−3−kB . . . AB B
¯ A = A A2 . . . Ah−k
Take the first component of and apply it to the plant. ¯ u uk , that is, ¯ x = ⇥x|
k+1
x|
k+2
. . . x|
h
⇤| ¯ u = ⇥u|
k
u|
k+1
. . . u|
k+H−1
⇤| ( ¯ Axk + ¯ B¯ u)| ¯ Q( ¯ Axk + ¯ B¯ u) + ¯ u| ¯ R¯ u ¯ x = ¯ Axk + ¯ B¯ u
19
Considering disturbances affecting the system (see code slide 25 for details on disturbances) and applying certainty equivalence control we obtain xk+1 = Axk + Buk
*run matlab code on slides 25-28 with parameters optpolicy=3, fdisturbances=1
+wk
5 10
t
0.5 1
y(t)
5 10
t
0.2
v(t)
5 10
t
0.2 0.4
u(t)
time horizon of interest it might not possible to run quadprog.m and this motivates MPC.
considering short horizon might even lead to better results.
Bertsekas book, Ch. 6, for the close connection with Rollout (after the horizon the base policy is apply zero control input!)
algorithm that computes optimal policies.
with inequality constraints is convex if the cost is convex and the set of feasible solutions is convex, which can be shown to be the case for this problem.
20
21
At each stage define Dynamic model (equality constraints) Cost function Inequality Constraints
¯ R = R . . . R . . . . . . ... ... ... . . . . . . R
I −I
u ≤ c 2 6 6 4 1 1 . . . 1 3 7 7 5 Take the first component of and apply it to the plant. ¯ u uk , that is, ¯ x = ⇥x|
k+1
x|
k+2
. . . x|
k+H−1
⇤| ¯ x = ¯ Axk + ¯ B¯ u ( ¯ Axk + ¯ B¯ u)| ¯ Q( ¯ Axk + ¯ B¯ u) + ¯ u| ¯ R¯ u ¯ u = ⇥u|
k
u|
k+1
. . . u|
k+H−1
⇤|
¯ A = A A2 . . . AH−1
¯ B = B . . . AB B . . . A2B AB B . . . . . . ... ... ... ... . . . AH−2B AH−3B . . . . . . B
¯ Q = Q . . . Q . . . . . . ... ... ... . . . . . . Q k+H−1
X
`=k
x|
` Qx` + u| ` Ru` =
22
First iteration (no disturbances). This is the state and input that is predicted Applying the first control input leads to the state
*run matlab code on slides 25-28 with parameters optpolicy=4, fdisturbances=0
u0 = −0.25 x1 = 0.995 −0.05
5 10
t
0.5 1
y(t)
5 10
t
0.2
v(t)
5 10
t
0.2 0.4
u(t)
23
Note that the end result is different from the one predicted at iteration 0 because the state and input predictions change over time.
*run matlab code on slides 25-28 with parameters optpolicy=5, fdisturbances=0
End result
5 10
t
0.5 1
y(t)
5 10
t
v(t)
5 10
t
0.2 0.4
u(t)
24
Results with disturbances
*run matlab code on slides 25-28 with parameters optpolicy=5, fdisturbances=1
5 10
t
0.5 1
y(t)
5 10
t
0.2
v(t)
5 10
t
0.2 0.4
u(t)
25
fdisturbances = 0; % double integrator model Ac = [0 1;0 0]; Bc = [0;1]; tau = 0.2; Q = [1 0; 0 1]; S = [0;0]; R = 0.01; QT = [1 0; 0 1]; n = 2; m = 1; x0 = [1;0]; c = 0.25; sigmaw = 0.01; % disturbance level H = 25; % prediction horizon for mpc h = 50; % simulation horizon sysd = c2d(ss(Ac,Bc,zeros(1,n),0),tau); A = sysd.a; B = sysd.b; C = sysd.c; % preliminaries for the policies switch optpolicy case 1 P{h+1} = QT; for k = h:-1:1 % Riccati equations P{k} = A'*P{k+1}*A + Q - …(S+A’*P{k+1}*B)*pinv(R+B’*P{k+1}*B)*(S'+B'*P{k+1}*A); K{k} = -pinv(R+B'*P{k+1}*B)*(S'+B'*P{k+1}*A); end case 2 U2 = quadconstrainedcontrol(A,B,Q,R,c,h,x0,1); case 4 U3 = [quadconstrainedcontrol(A,B,Q,R,c,H,x0,2); zeros(h-H,1)]; end
Model definition Preliminaries
26
t = tau*(0:1:h); x = zeros(n,h); u = zeros(1,h); x(:,1) = x0; randn('seed',15); % used for results in slides if fdisturbances == 1 gainnodisturbances = 1; else gainnodisturbances = 0; end cost = 0 for k=1:h switch optpolicy case 1 % LQR u(:,k) = K{k}*x(:,k); case 2 % CEC no disturbances u(:,k) = U2(k); case 3 % CEC [u_] = quadconstrainedcontrol(A,B,Q,R,c,h+1-k,x(:,k),1); u(:,k) = u_(1); case 4 % MPC no disturbances u(:,k) = U3(k); case 5 % MPC if k <= h-H-1 [u_] = quadconstrainedcontrol(A,B,Q,R,c,H,x(:,k),2); u(:,k) = u_(1); else [u_] = quadconstrainedcontrol(A,B,Q,R,c,h+1-k,x(:,k),1); u(:,k) = u_(1); end end w(:,k) = B*sigmaw*randn(1)*gainnodisturbances; x(:,k+1) = A*x(:,k) + B*u(:,k) + w(:,k); cost = cost + x(:,k)'*Q*x(:,k) + u(:,k)'*R*u(:,k); end v = x(1,:);
Simulate system
27
N = 1000; ts = tau/N; nl = h*N; uc = kron(u,ones(1,N)); sysd = c2d(ss(Ac,Bc,zeros(1,size(Ac,1)),zeros(1,size(Bc,2))),ts); Ad = sysd.a; Bd = sysd.b; xc = zeros(n,nl+1); xc(:,1) = x(:,1); for k = 1:h xc(:,(k-1)*N+1) = x(:,k); for l=1:N xc(:,(k-1)*N+l+1) = Ad*xc(:,(k-1)*N+l)+Bd*u(:,k); end end tc = ts*(0:nl); figure(1) subplot(1,3,1) plot(tc,xc(1,:)) set(gca,'Fontsize',18) xlabel('t') ylabel('y(t)') grid on axis([0 h*tau -0.2 1]) subplot(1,3,2) plot(tc(1:end),xc(2,:)) set(gca,'Fontsize',18) xlabel('t') ylabel('v(t)') grid on subplot(1,3,3) plot(tc(1:end-1),uc) set(gca,'Fontsize',18) grid on set(gca,'Fontsize',18) xlabel('t') ylabel('u(t)')
Continuous-time simulation and plots
28
function [u] = quadconstrainedcontrol(A,B,Q,R,L,h,x0,opt) % opt = 1, CEC, opt = 2, MPC n = size(A,1); m = size(B,2); % define matrices M, N Abar = zeros(n*h,n); Bbar = zeros(n*h,m*h); Qbar = zeros(n*h,n*h); Rbar = zeros(m*h,m*h); for i = 1:h Abar(1+(i-1)*n:n*i,:) = A^i; for j = 1:i Bbar(1+(i-1)*n:n*i,1+(j-1)*m:m*j) = A^(i-j)*B; end Qbar(1+(i-1)*n:n*i,1+(i-1)*n:n*i) = Q; Rbar(1+(i-1)*m:m*i,1+(i-1)*m:m*i) = R; end switch opt case 1 % CEC u = quadprog(Rbar+Bbar’*Qbar*Bbar,((Bbar’*Qbar*Abar)*x0)’,… [eye(m*h);-eye(m*h)],[L*ones(m*h,1);L*ones(m*h,1)],[],[]); case 2 % MPC u = quadprog(Rbar+Bbar’*Qbar*Bbar,((Bbar’*Qbar*Abar)*x0)’,[eye(m*h);-eye(m*h)],… [L*ones(m*h,1);L*ones(m*h,1)],Bbar(end-n+1:end,:),-A^h*x0); end
Key function
29
xk+1 = Aσkxk Dynamic model Cost function
combinatorial problem.
Goal: Find an optimal path (or policy) for the control input for a given initial condition uk = σk x0
σk = {1, . . . , m} xk ∈ Rn
h−1
X
k=0
x|
kQσkxk + x| h ¯
Qhxh
30
sequences, and pick a sequence with minimum cost.
k σk = ˜ σk k = k + 1 (σk, σk+1, σk+2, . . . , σh−1) (˜ σk, ˜ σk+1, ˜ σk+2, . . . , ˜ σh−1) k 6= h Still combinatorial! Ph−1
`=k x| ` Q`x` + x| h ¯
Qhxh
31
horizon
for each of these control sequences and pick a sequence with minimum cost.
if . k (σk, σk+1, σk+2, . . . , σH−1) (σk, σk+1, σk+2, . . . , σh−1) (˜ σk, ˜ σk+1, ˜ σk+2, . . . , ˜ σh−1) if
if if
(˜ σk, ˜ σk+1, ˜ σk+2, . . . , ˜ σH−1) σk = ˜ σk k = k + 1 k 6= h
0 ≤ k ≤ h − H 0 ≤ k ≤ h − H 0 ≤ k ≤ h − H
Ph−1
`=k x| ` Q`x` + x| h ¯
Qhxh Pk+H−1
`=k
x|
` Q`x`
32
taking the form
sequences, and pick a sequence with minimum cost.
if . k Pick a base policy and start with k = 0 , H > 0 σk = ˜ σk (¯ σ0, ¯ σ1, . . . , ¯ σh−1) (σk, . . . , σk+H−1,¯ σ0, ¯ σ1, . . . , ¯ σh−1−k−H) σk, . . . , σk+H−1
(σk, σk+1, . . . , σh−1) if k = k + 1 k 6= h (˜ σk, ˜ σk+1, ˜ σk+2, . . . , ˜ σh−1)
0 ≤ k ≤ h − H
Ph−1
`=k x| ` Q`x` + x| h ¯
Qhxh
33
Camera images Actuators Actuation: 4 possible rotations decided once every h seconds
σk ∈ {1, . . . , 4}
*Mixing and switching, V.S. Dolk MSc thesis
34
Basic mechanism for effective mixing is repetitive stretching & folding
Basic mechanism can be seen as a periodic open-loop strategy Can a closed-loop strategy perform better?
35
k+1 = Aσkxk
Divide the domain into subdomains and denote by the concentration of fluid a time N i ∈ {1, . . . , N}
xi
k
kτ x1 x2 x3
Then the following equation describes how the concentrations evolve as a function of the control action
xk+1 = Aσkxk
σk ∈ {1, . . . , 4}
36
Model
and measures how far is the concentration from homogeneity.
when homogeneity is achieved. This means that the following error will converge to zero
c e|
kek
ek := xk − c 1 1 . . . 1
Ph
k=0 e| kek
c = 1
N
⇥1 1 . . . 1⇤ x0
37
Periodic Rollout
and a clockwise rotation of the inner cylinder.
that afterwards this periodic policy is used.
38
the (squared) norm of the error at the next iteration (minimum error “greedy” strategy) H = 2
argminσke|
kek+e| k+1ek+1 = argminσke| kek+e| kA| σkAσkek = argminσk e| kA| σkAσkek
| {z }
e|
k+1ek+1
H = 2
39
MPC policies
infinite-horizon costs
problems would entail computing an infinite number of options and therefore it is not viable. xk+1 = Aσkxk P∞
k=0 x| kQσkxk
40
is easy to show that σk = j xk+1 = Ajxk ∀k when where is the solution to the linear system
A|
j M jAj − M j = −Qj
M j = P∞
k=0(A| j )kQj(Aj)k
P∞
k=` x| kQjxk = x| ` M jx`
m = 2, σk ∈ {1, 2} (σ`, σ`+1, σ`+2, σ`+3, σ`+4, . . . ) = (1, 2, 1, 2, 1, . . . ) then where P∞
k=` x| kQkxk = x| ` ¯
M 1x` A|
1 ¯
M 2A1 − ¯ M 1 = −Q1 A|
2 ¯
M 1A2 − ¯ M 2 = −Q2
41
Note that in the prediction horizon there are Suppose that we index all these options Then each corresponds to a unique set of free schedules at each iteration mH options to be considered and a unique set of costs can be obtained x|
kP ixk = k+H−1
X
`=k
x|
` Q`x` + ∞
X
`=k+H
x|
` Q¯ `x`
i i ∈ {1, 2, 3, . . . , mH} (σk, σk+1, . . . , σk+H−1) = (¯ σi
0, ¯
σi
1, ¯
σi
2, . . . , ¯
σi
H−1)
Then an explicit expression for the rollout policy is σk = π(ι(xk)) ι(xk) = arg min
i
x|
kP ixk
where is a map that extracts the first component of the scheduling associated with π π(i) = ¯ σi i
42
If and : m = 2 H = 2 i 1 2 3 4 ¯ σ0 ¯ σ1 π(i) Pi 1 1 1 1 1 1 2 2 2 2 2 2 P2 = Q1 + A|
1Q2A1 + (A2A1)| ¯
M(A2A1) P1 = Q1 + A|
1Q1A1 + (A1A1)| ¯
M(A1A1) P 4 = Q2 + A|
2Q2A2 + (A2A2)| ¯
M(A2A2) P 3 = Q2 + A|
2Q1A2 + (A1A2)| ¯
M(A1A2) P i P 1 P 2 P 3 P 4
43
How to efficiently control several dynamical systems via a wireless network by deciding how to schedule transmissions? Controller Actuator 1 (only one user can transmit at a given time) Inverted pendulum 1 Inverted pendulum II Actuator II θ1 θ2 τ1
44
xi
k+1 = ¯
Axi
k + ¯
Bui
k
i ∈ {1, 2}
would take the form (LQR problem) ui
k = Kixi k
we have u1
k =
( K1x1
k, if σk = 1
v1
k, if σk = 2
u2
k =
( K2x2
k, if σk = 2
v2
k, if σk = 1
vi
k = ui k−1
i ∈ {1, 2} and we wish to minimize J1 + J2 Ji =
∞
X
k=0
(xi
k)| ¯
Qxi
k + (ui k)| ¯
Rui
k
45
xk = x1
k
v1
k
x2
k
v2
k
we can write where xk+1 = Aσkxk P∞
k=0 x| kQσkxk
model cost A1 = ¯ A + ¯ BK K ¯ A ¯ B I A2 = ¯ A ¯ B I ¯ A + ¯ BK K Q1 = ¯ Q + K| ¯ RK ¯ Q ¯ R Q2 = ¯ Q ¯ R ¯ Q + K| ¯ RK
46
base policy ¯ B = R τ
0 eAcsdsBc
¯ A = eAcτ Ac = 1 3
1
Q = I ¯ R = 0.01 τ = 0.1 x0 = −0.1 1 Parameters
1 2 3 4 5
1 2 3 4 5
0.05 0.1 0.15 0.2 1 2 3 4 5 0.2 0.4 0.6 0.8 1 1.2
cost of periodic (base): 16.4974 cost of rollout: 12.4969 rollout base policy time t = kτ time t = kτ time t = kτ time t = kτ first component of first component of second component of second component of x2
k
x2
k
x1
k
x1
k
1 2 3 4 5
0.2 0.4
47
cost of periodic (base): 16.4974 cost of rollout: 12.4969 Given the initial condition paying more attention to system 2 leads to better results! 2 2 2 1 2 1 2 1 2 1 1 2 3 4 5 6 7 8 9 10 σk k
1 2 3 4 5
2 4 1 2 3 4 5
0.5 1 1.5 2
rollout base policy time t = kτ time t = kτ control input control input u1
k
u2
k
48
Only torque or can be applied at each time instant τ1 τ2 θ2 θ1 τ1 τ2
2 4 6 8 10
0.2 0.4 0.6 0.8 1 2 4 6 8 10
0.5 1
Schematic τ1 τ2 time
*Control of a SCARA robot with a reduced number of amplifiers, Internship report TU/e, VDL, Yuri Steinbuch, 2015
49
MPC) can be a practical way to provide a policy. However, it is suboptimal.
with input constraints. dimension convexity DP discretizat Static
ADP ADP SLS small convex
small non- convex
large convex
large non- convex