A Series of Lectures on Approximate Dynamic Programming
Dimitri P . Bertsekas
Laboratory for Information and Decision Systems Massachusetts Institute of Technology
Lucca, Italy June 2017
Bertsekas (M.I.T.) Approximate Dynamic Programming 1 / 29
A Series of Lectures on Approximate Dynamic Programming Dimitri P - - PowerPoint PPT Presentation
A Series of Lectures on Approximate Dynamic Programming Dimitri P . Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.) Approximate Dynamic Programming 1
Laboratory for Information and Decision Systems Massachusetts Institute of Technology
Bertsekas (M.I.T.) Approximate Dynamic Programming 1 / 29
Bertsekas (M.I.T.) Approximate Dynamic Programming 2 / 29
1
2
3
4
Bertsekas (M.I.T.) Approximate Dynamic Programming 3 / 29
N−1
π Jπ(x0)
Bertsekas (M.I.T.) Approximate Dynamic Programming 5 / 29
uk ∈Uk (xk ) E
k(xk) minimize in the right side above for each xk and k. Then the policy
0, . . . , µ∗ N−1} is optimal
Bertsekas (M.I.T.) Approximate Dynamic Programming 6 / 29
uk ∈Uk (xk ) E
Bertsekas (M.I.T.) Approximate Dynamic Programming 8 / 29
uk,µk+1,...,µk+ℓ−1 E
k+ℓ−1
Bertsekas (M.I.T.) Approximate Dynamic Programming 9 / 29
Bertsekas (M.I.T.) Approximate Dynamic Programming 10 / 29
uk,µk+1,...,µk+ℓ−1 E
k+ℓ−1
Bertsekas (M.I.T.) Approximate Dynamic Programming 12 / 29
Bertsekas (M.I.T.) Approximate Dynamic Programming 13 / 29
m
kφk(xk) i Feature Extraction
i) Linear i) Linear Cost State xk
k Feature Vector φk(xk)
) Approximator r0
kφk(xk)
Bertsekas (M.I.T.) Approximate Dynamic Programming 14 / 29
Feature e Extraction
: Material Balance, Mobility, y, Safety, etc W c Weighting of
s Score P e Position Evaluator
Bertsekas (M.I.T.) Approximate Dynamic Programming 15 / 29
k , βs k), s = 1, . . . , q, where
k =
u∈Uk (xs
k ) E
k , u, wk) + ˜
k , u, wk), rk+1
k , βs k), s = 1, . . . , q
q
k , rk) − βs2 + γrk − ¯
Bertsekas (M.I.T.) Approximate Dynamic Programming 16 / 29
m
Bertsekas (M.I.T.) Approximate Dynamic Programming 17 / 29
. . . . . . . . . State x
Cost Ap Linear Layer Par t Approximation al Layer Li al Layer er Linear er Linear ar Weighting g y(x) r Parameter r Parameter r v = (A, b) b φ1(x, v) Ay(x) + b ) φ2(x, v) ) φm(x, v) ) r State
Nonlinear Ay
A,b,r q
Bertsekas (M.I.T.) Approximate Dynamic Programming 18 / 29
. . . . . . . . . . . . . . . . . .
State r x al Layer Li er Linear al Layer er Linear al Layer al Layer er Linear ar Weighting
Nonlinear Ay Nonlinear Ay
Bertsekas (M.I.T.) Approximate Dynamic Programming 19 / 29
u∈Uk+1(xk+1)
k , us k), βs k
k is a
k , us k). [No need to compute E{·}]
u∈Uk (xk )
Bertsekas (M.I.T.) Approximate Dynamic Programming 20 / 29
uk,µk+1,...,µk+ℓ−1 E
k+ℓ−1
Bertsekas (M.I.T.) Approximate Dynamic Programming 22 / 29
k at time k
Bertsekas (M.I.T.) Approximate Dynamic Programming 23 / 29
k 1 k u2 k 2 k u3 k 3 k u4 k 4 k u5 k 5 k Coupled Subsystems
k, . . . , un k), with ui k corresponding to the ith subsystem
◮ Start with subsystem 1, optimize over (u1
k, . . . , u1 N−1), with all future controls of other
k, . . . , ˜
N−1)
◮ Fix the nominal values of subsystem 1 to the optimal sequence thus obtained ◮ Repeat for all subsystems i = 2, . . . , n (with intermediate adjustment of the nominal
Bertsekas (M.I.T.) Approximate Dynamic Programming 24 / 29
Bertsekas (M.I.T.) Approximate Dynamic Programming 25 / 29
k
1 k u2 k 5 k Constraint Relaxation
U U 1
1 U 2
k , . . . , xn k ), uk = (u1 k, . . . , un k), wk = (w1 k , . . . , wn k ), with (xi k, ui k, wi k)
k, . . . , un k) ∈ U,
k ∈ Ui, u1 k + · · · + un k ≤ bk
k be the optimal
k+1(x1 k+1) + · · · + ˜
k+1(xn k+1)
Bertsekas (M.I.T.) Approximate Dynamic Programming 26 / 29
k
1 k u2 k 5 k Constraint Relaxation
1 U 2
k, ui k, wi k: the amounts stored, produced, and demanded of product i at time k
k , . . . , xn k ), where xi k+1 = xi k + ui k − wi k
Bertsekas (M.I.T.) Approximate Dynamic Programming 27 / 29
Bertsekas (M.I.T.) Approximate Dynamic Programming 28 / 29
dxi
S
φjyQ , j = 1 i ), x ), y Original
es Aggregate States Disaggregation
Aggregation
AGGREGATE SYSTEM
Bertsekas (M.I.T.) Approximate Dynamic Programming 29 / 29