4SC000 Q2 2017-2018
Optimal Control and Dynamic Programming
Duarte Antunes
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction Outline Course info Introduction to optimal control and applications Dynamic programming algorithm Course information Teaching staff
4SC000 Q2 2017-2018
Duarte Antunes
Teaching staff
Ruben di Filippo (r.di.filippo@student.tue.nl)
Grading
assignments 1, 2 respectively. Peer assessment of homework 3 should be sent by Feb 4th.
1
*https://coursework.mathworks.com/, students will receive an email to register
Question hours
Monday Tuesday Wednesday Thursday Friday 12h30-13h30 Eelco GEM-Z 0.55 17h30-18h30 Ruben GEM-Z 0.55 15h45-17h30 BZ Paviljoen U46 12h45-13h45 Duarte GEM-Z -1.139 10h45-12h30 BZ Paviljoen U46
ºsince there are questions hours every working day, no further appointments will be scheduled and we try to avoid answering questions via e-mail.
Monday Tuesday Wednesday Thursday Friday 13 14 15 16 17 20 21 22 23 24 27 28 29 30 1
November
L2 BZ2
Lectures (L): Wednesdays 13h45-15h30 (LUNA 1.050), Fridays 8h45-10h30 (Paviljoen B2). Guided self-study (BZ): Wednesdays 15h45-17h30 (Paviljoen U46) Fridays 10h45-12h30 (Paviljoen U46) Deadlines: PSI: Dec. 5th, 23h45 ; PSII: Jan 11rd 23h45; PSIII: Feb 4th, 23h45. Exam: February 1st, 13h30-16h30, retake: April 12, 18h00-21h00.
2 4 5 6 7 8 11 12 13 14 15 18 19 20 21 22
December January & February
8 9 10 11 12 15 16 17 18 19 22 23 24 25 26 29 30 31 1 2 Exam L1 BZ1 L4 BZ4 L3 BZ3 L5 BZ5 L6 BZ6 L7 BZ7* L9 BZ9 L8 BZ8 L10 BZ10 L12 BZ12 L11 BZ11 L14 BZ14* L13 BZ13 L16 BZ16 L15 BZ15
* In BZ7 and BZ14 the grade of each member of a group pertaining to homework 1 and 2, respectively, will be discussed.
Main course material
. Bertsekas, Athena Scientific, Volume I, 2005. ISBN-13: 1-886529-08-6, Chapters 1-6*
Further reading
Yu-Chi Ho CRC Press, 1975, ISBN-13: 978-0891162285, Chapters 1-3
Princeton University Press, 2012. ISBN-13: 978-0-691-15187-8 Chapters 1-4
Cambridge university press, 2016, ISBN-13:978-0521862059 Chapters 1-12, 14
3 *Slides and video lectures available at http://www.athenasc.com/dpbook.html
4
Topic Lectures I Discrete
problems
II Stage decision problems
III Continuous- time optimal control problems
15 & 16. Revision/sample exam
5
Systems and control oriented programs
Q1 System theory for control, Q2 Optimal control, Q3 Model predictive control
Other programs
automotive applications, such as optimization of powertrains, optimal power management in hybrid vehicles, etc.
6
Matlab
Optimization
Vanderberghe available at http://stanford.edu/~boyd/cvxbook/.
System theory
controllability is useful
2009, João Hespanha.
Probability theory
7
Optimality
refrigerator, minimize the fuel consumption of a car, etc.).
Optimal control
time period in order to reach final and intermediate goals.
many other contexts (e.g., economics, computer science, and game theory).
8
Static optimization
straight line which best fits data, etc.
u∗ J(u∗) u J
9
Optimal control
presence of disturbances, playing chess, etc.
θ(3) Disturbance at time θ(0) θ(1) θ(2) ˜ θ(2) t = 2 θ(t) 6= ˜ θ(t) t 2
10
Dynamic model
Cost function
Goal: find a control policy which minimizes the cost
11
time state space Discrete optimization problems* discrete discrete Stage decision problems discrete general Continuous-time optimal control problems continuous general
Three classes of problems will be considered in the course Some applications are discussed next and more applications
12
Operational research, management, finance
Computer Science
Other fields
Next slides address some applications treated in the course, where we will consider also cases where uncertainty is present.
Aerospace
Traditional process control
13
1 n0 n1 2 2 1 1 1 2 c0
11
Stage 1 Stage 0 Stage h Stage h −1 c0
n01
c0
n02
c0
21
c0
22
c0
12
c1
11
c1
21
c1
22
c1
23
c1
n11
nh nh
− 1
ch−1
11
ch−1
21
ch−1
22
ch−1
nh−11 ch nh
ch
1
Specified by a transition diagram with decision stages
for each state which lead to states at next stages.
terminal stage the costs depend only on the state . ck
i,j
i j k ch
i
i
h − 1
h h
14
1 2 2 1 1 1 2
Challenges
(i) Determine an optimal path for a given initial state which minimizes the sum of costs incurred at every stage (including the terminal stage). (ii) Determine an optimal policy specifying for each state the first decision of the optimal path from that state to the terminal stage. 3 3 1 1 1 1 1 1 2 2 1 1 1 2 3 3 4 5 1 1 2 2 2 2 2 1 1 1 2 2 2 1 5 1 2 5 2 5 1 2
How to manage the supply of products in a shop? Overstock is prejudicial (physical space limitations, technological
15
16
What is the shortest distance from Bucharest to Lugoj?
Lugoj Neamt Iasi Vaslui Hirsova Eforie Urziceni Bucharest Giurgiu Fagaras Pitesti Craiova Sibiu Rimnicu Vilcea Oradea Zerind Arad Timisoara Mehadia Dobreta 71 75 118 111 70 75 120 146 97 138 80 99 211 101 86 98 142 92 87 85 90 151 140
Road map of Romania
*Source: Artificial intelligence: a modern approach, Stuart J. Russel, Peter Novig, 3rd edition, 2016
5 10 15 20 25 30
17
A B
What is the shortest path for a robot to go from point A to B?
18
How to make profit in expectation in a game such as blackjack? As portrayed in the movie ‘21’ the MIT blackjack team had an answer to this problem (using optimal control?). The same movie unveils the principle to achieve this, using a famous game show problem https://www.youtube.com/watch?v=Zr_xWfThjJ0
19
h−1
X
k=0
gk(xk, uk) + gh(xh) xk+1 = fk(xk, uk) Dynamic model Cost function Goals (i) find policy that leads to the minimum cost for every initial condition. k ∈ {0, . . . , h − 1} π = {µ0, . . . , µh−1} uk = µk(xk) {(x0, u0), (x1, u1), . . . , (xh−1, uh−1)} (ii) find path that leads to the minimum cost for a given initial condition.
20
Generalization of discrete optimization problem considering general state and input spaces, e.g., Rn
X0 X1 x0 x1 g0(x0, u0) g1(x1, u1) gh−
1(xh− 1, uh− 1)
Stage 0 Stage 1 Stage h − 1 Stage h Xh−1 Xh xh xh−1 gh(xh)
21
Physical system D/A A/D
Sensors Actuators
Digital controller
Analog Digital Converter
Prime application: how to design a digital controller for a physical system? Several variants: full state is available or only an output, system can have disturbances or not, etc.
22
Control law
Camera images Actuators Actuation: 4 possible rotations decided once every h seconds
How to mix two fluids in minimum time?
23
Given a unicycle robot with constraints on speed and rotation rate and controlled digitally, how to do a curve maneuver in minimum time?
24
Dynamic model Cost function Z h g(t, x(t), u(t))dt + gh(x(h)) ˙ x(t) = f(t, x(t), u(t)), x(0) = x0, t ∈ [0, h] Goals
condition.
condition. u(t) = µ(t, x(t)) u(t), t ∈ [0, h],
25
Rn Rn x(0) ˙ x(t) = f(t, x(t), u(t)) t = 0 x(h) t = h
Most applications in control systems: motion control, aerospace, etc.
26
How to move a motion system described by a linear equation from point A to point B with minimum energy? ˙ x(t) = Ax(t) + Bu(t) x(0) = x0 x(T) = xdesired
minu R T
0 g(u(t))dt
Fx Fy y x
27
How to move a (linearized model of a) quadcopter from one hovering position to another one in minimum time? ˙ x(t) = Ax(t) + Bu(t) x(0) = x0 x(T) = xdesired min T x0 xT
28
Hybrid electric vehicles have a battery where energy can be stored (e.g. during braking). Given a drive cycle, how to design the power slip between the battery and the internal combustion engine to minimize fuel consumption?
29
problems.
functions we call policies or control laws.
disturbances.
we shall see, other methods might be more efficient.
30
Example: shortest route from Eindhoven to Paris passes through Antwerp. Then the piece of the route from Antwerp to Paris is the shortest route between the two cities.
Stage 1
31
The tail of an optimal path is also optimal
consider the state at stage belonging to the optimal path.
discrete optimization problem with initial stage , initial state and final stage xj j h j h xj j 1 2 2 1 1 1 2 3 3 4 5 1 1 2 2 2 2 1 1 1 2 1 2 3 j Stage 0 Stage h 2 2 2 1 2 1 5 5 Stage 1 1 1 2 3 5 1 j Stage h 2 2 2 1 2 5 5 1 2 3 4 4
h
32
3.Then there exists a path with smaller cost from stage to stage - contradiction! h j cost[0,j) + cost[j,h] < cost[0,j) + cost[j,h] h h j cost = cost[0,j) + cost[j,h] cost[j,h] < cost[j,h] h j h
33
The dynamic programming algorithm for discrete optimization problems:
(1) Start at the final decision stage and denote the terminal cost by cost-to-go at stage , (2) For every state at stage compute the optimal action as follows Denote the minimum by cost-to-go at stage , . (3) Repeat (2) for stages moving backwards. h Jh(i) = ch
i
k = h − 1 k ∈ {h − 2, h − 3, . . . , 1, 0} k Jk(i) minj∈actions/arrows ck
ij + Jk+1(state at stage k+1 when j is picked)
j i Then, the function which maps each state to the action obtained in (2) is an optimal policy.
Main idea
the optimal policy and paths from stage to stage (principle of optimality). h h j j − 1
34
1 2 3 4 5 1 2 3 4 5 2 3 4 1 2 4 1 1 1 4
Iteration 1 - Stage 3 1 2 3 Cost-to-go State
1 2 3
min{5 + 0, 2 + 4} = 5 min{4 + 0, 2 + 4} = 4
5
min{0 + 4, 5 + 0} = 4
3 1
35
1 2 3 1 3 4 5 2 3 4 1 1 1 1 4
Iteration 2 - Stage 2
2 3 1 4
1 2 3 Cost-to-go State 4 min{4 + 5, 3 + 4} = 7 min{1 + 5, 3 + 4} = 6
5 4 4
1 + 4 = 5 min{2 + 4, 5 + 4} = 6
3 1
36
1 2 3 1 4 1 1 4
Iteration 3 - Stage 1 1 2 3 Cost-to-go State
5
7 6 6
3 1 1 2 3
min{4 + 6, 3 + 5} = 8 min{0 + 6, 1 + 6} = 6 min{1 + 7, 3 + 6, 1 + 6} = 7
37
1 2 4
Iteration 4 - Stage 0 1 2 Cost-to-go State 6 8
1 1 2
7 min{2 + 7, 1 + 6} = 7 min{1 + 8, 4 + 6} = 9
38
7 9 Optimal policy
1 2 3 4 5 1 2 3 4 5 2 3 4 1 2 4 1 1 1 4 3 1
Initial transition diagram
Optimal policy
the cost-to-go. That decision is precisely the decision specified by the optimal policy.
Optimal path
5
39
If more than one option has the same cost while running the dynamic programming algorithm, simply pick one of the options. At the end one
Stage 1, state 2 - both decisions have the same cost 3.
2 3 2 1 1 2 1
The optimal policy and the optimal paths may not be unique Two optimal paths Two optimal policies
1 2 1 3 3 3 2 1 2 1 1 2 1 2 1 2 1 2 3 2 2 1 2 1 2 1
Ph−1
k=0 gk(xk, uk) + gh(xh)
Controlling the supply of one product
number of items supply demand capacity terminal cost selling price purchase price storage cost xk uk N gh transportation price c1 c ctr p
40
xk+1 = max{xk + uk − dk, 0} uk ∈ {0, 1, . . . , N − xk} gk(xk, uk) = (c1(xk) + cuk + ctrkukk0) p min{dk, xk + uk} kukk0 = ( 0 if uk = 0 1 if uk 6= 0 dk
41
Transition diagram
1 1 Stage 1 Stage 0 N N N 1 supplies determine transitions i i−dk i Stage 4 Stage 4 k uk = 0 uk = dk Stage h uk = N − i
N −dk
42
Ph−1
k=0 gk(xk, uk) + gh(xh)
number of items demand supply selling price purchase price xk uk transportation price p = 10 c = 5 ctr = 0.5 N = 4 d0 = d1 = 2, d2 = d3 = 1 capacity number of stages h = 4 terminal cost storage cost c1(i) = 0.2i, i ∈ {0, . . . , N} g4(i) = −ri+1, i ∈ {0, . . . , 4},
What are the optimal supplies for a zero initial inventory?
r = [0 4.8 9.6 14.4 19.2] xk+1 = max{xk + uk − dk, 0} uk ∈ {0, 1, . . . , N − xk} gk(xk, uk) = c1(xk) + cuk + ctrkukk0 p min{dk, xk + uk}
1 2 3 4 1 1 1 2 3 4 1 2 3 4 −19.2 −19.2 −23.6 −14.4 −14.4 −19 −9.6 −9.6 −4.8 −4.8 −14.4 −4.5 −9.8 −4.5 −4.5 0.5 5.5 10.5 10.5 5.5 0.5 −9.8 −4.3 0.7 5.7 Stage 3 Stage 4 Stage 2 Stage 3 Stage 4 Stage 3
43
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Stage 4 Stage 4 Stage 3 Stage 2 Stage 1 Stage 0 3
4 4
1 2 d0 = 2 d1 = 2 d2 = 1 d3 = 1 −48.1 −43.1 −38.5 −33.2 −28.4 −38.6 −33.7 −28.9 −23.7 −18.9 −28.2 −23.8 −19.4 −14.3 −9.3 −4.5 −9.8 −14.4 −19 −23.6 −14.4 −19.2 −9.6 −4.8 1 u0 u1 u2 u3
44
−28.4 Cost for a zero initial inventory
45
Historical note
‘I was intrigued by dynamic programming. It was clear to me that there was a good deal
could either be a traditional intellectual, or a modern intellectual using the results of my research for the problems of contemporary society.’
46
Summary
state.
time control), many applications.
After this lecture, you should be able to:
Inventory control with uncertainty
A1
What if the demand is instead of the expected ?
d2 = 1 d2 = 2
x3 = 0 x3 = 1 u3 = 1 u3 = 0
1
2
1
2
Stage 4 Stage 4 Stage 3 Stage 2 1
2
−9.3 −4.5 −9.8 −14.4 −9.6 −4.8
In the context of the example of slides 40-44
A2
g0(0, 4) + g1(2, 0) + g2(0, 2) + g3(0, 0) + g4(0) g0(0, 4) + g1(2, 0) + g2(0, 2) + g3(0, 1) + g4(0) 0.5 − 19.6 + −9.5 + + 0 = −28.6 Prob[d2 = 1] = 0.5, Prob[d2 = 2] = 0.5 0.5 × (−28.4) + 0.5 × (−28.6) = −28.5 0.5 × (−28.4) + 0.5 × (−33.1) = −30.75
Costs
Expected cost if
0.5 − 19.6 + −9.5 + −4.5 + 0 = −33.1
References and applications
[1] D. Bertsekas, Dynamic Programming and Optimal Control, 3rd edition, Athena Scientific, Vol I and II, 2005 [2] A. E. Bryson and
[3] D. Liberzon, Calculus of Variations and optimal control, Princeton University Press, 2012, [4] M. Athans and P .L. Falb. Optimal control. McGraw Hill, New York, 1966. Reprint by Dover in 2006. [5] B. D. O. Anderson and J. B. Moore. Optimal control: Linear quadratic methods. Prentice Hall, New Jersey, 1990. Reprinted by Dover in 2007 [6] F. L. Lewis, Draguna Vrabie, Vassilis L. Syrmos, Optimal control, 3rd edition, John Wiley & Sons, 2012 [7] D. E. Kirk, Optimal control theory: an introduction, Dover books on electrical engineering, 2004 [8] R. Bellman. Dynamic programming. Princeton University, 1957. [9] K. Zhou, J. C. Doyle, and K. Glover. Robust and Optimal Control. Prentice Hall, New Jersey, 1996. [10] G. Chen, G. Chen, S-H, Hsu. Linear Stochastic Control Systems, CRC Press Book, 1995. [11] M. H. Davis, Linear Estimation and Stochastic Control, Chapman and Hall, 1977. [12] P . Whittle, Optimal control: basics and beyond. John Wiley and Sons Ltd, 1996. [13] D. Bertsekas and S. E. Shreve, Stochastic optimal control: the discrete-time case, Athena Scientific, 1996
Textbooks
[14] John Betts, Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, SIAM, 2010
Numerical methods
[A2] J. M. Longuski, J. J. Guzmán, J.E. Prussing, Optimal control with aerospace applications, Springer, 2014
Applications
[A1] A. E. Bryson, JR.. Applications of optimal control theory in aerospace engineering Journal of Spacecraft and Rockets,
[A4] Eddy, S. R., What is dynamic programming?, Nature Biotechnology, 22, 909–910 (2004)
Aerospace
[A3] G . W. Swan. Applications of optimal control theory in biomedicine. Marcel Dekker, New York, 1984. [A6] G. S. Christensen, M. E. El-Hawary, and S. A. Soliman. Optimal control applications in electric power
York, 1987.
Biomedicine and sequential alignment of DNA
[A5] M. G. Neubert. Marine reserves and optimal harvesting. Ecology Letters, 6:843-849, 2003.
Power Systems
Applications
[A9] S.P . Sethi and G.L. Thompson. Optimal control theory: applications to management science and economics, Springer, New York, 2nd edition, 2005.
Operational research and inventory control Finance and economics
[A7] M. H. Davis, Markov models and optimization, Chapman and Hall/CRC, 1993. [A8] A. Bensoussan, Dynamic programming and inventory control, IOS press, 2011. [A10] Art Lew and Holger Mauch, Dynamic Programming: A Computational Tool, Springer Verlag, 2007
Computer science and scheduling problems Automotive
[A11] Bram de Jager, Thijs van Keulen, John Kessels Optimal Control of Hybrid Vehicles, Springer, 2013
[S1] Anonymous, "Letter sent to Charles Montague, President of the Royal Society, where two mathematical problems proposed by the celebrated Johann Bernoulli are solved" . Acta Eruditorum Lipsi", (1697) 223. [S2] Bellman, Richard. The theory of dynamic programming. Bull. Amer. Math. Soc. 60 (1954), no. 6, 503--515 [S3] R. E. Kalman. Contributions to the theory of optimal control. Bol. Soc. Mat. Mexicana, 5:102-119, 1960. Reprint in Control Theory: Twenty-Five Seminal Papers (1931-1981), T. Basar, editor, IEEE Press, New York, 2001, pages 149-166. [S4] L. S. Pontryagin,
York, 1962. [S5] J. C. Doyle, K. Glover, P . P . Khargonekar, and B. A. Francis. State-space solutions to standard H2 and Hinf control
History Seminal papers
[H1] A. E. Bryson Jr. Optimal control - 1950-1985. IEEE Control Systems Magazine, 1996. [H2] Héctor J. Sussmann and Jan C. Willems, 300 Years Of Optimal Control: From The Brachystochrone To The Maximum Principle (1997) [H3] Richard Ernest Bellman Eye of the Hurricane: An Autobiography, 1984 [H4] Stuart Dreyfus (2002). Richard Bellman on the Birth of Dynamic Programming. In: Operations Research.