4SC000 Q2 2017-2018
Optimal Control and Dynamic Programming
Duarte Antunes
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest paths in graphs Dynamic programming Dijkstras and A* algorithms Certainty equivalent control Graph Weighted Graph Nodes V := {
4SC000 Q2 2017-2018
Duarte Antunes
1
1
2
3
4
5
6 7 8 1 3 3
4
6 5
Weighted Graph
V := {1, . . . , n} (i, j) ∈ E wij = wji
1
2
3
4
5
6 7 8 1 3 3
4
6 5 3
Undirected Directed wij ≥ 0 E := {(i1, j1), . . . , (ir, jr)|i1, . . . ir, j1, . . . , jr ∈ V} wij = wji
3
E = {(3, 6), (2, 3), . . . } w36 = 7, w23 = 5, . . .
2
Graphs model networks (road, social, transportation, etc.) and can be found in numerous applications
3
1
2
3
4
5
6 7 8 1 3 3
4
6 5
Find a path from an initial node to a destination node in a weighted graph, with minimum length (sum of the weights
Initial Final Minimum length 11
Can we use the DP algorithm to find the shortest path?
4
path from the nodes at stage to the node at stage of the following weighted graph: c0
11
Stage 1 Stage 0 Stage h Stage h −1 c0
n01
c0
n02
c0
21
c0
22
c0
12
c1
11
c1
21
c1
22
c1
23
c1
n11
ch−1
11
ch−1
21
ch−1
22
ch−1
nh−11
ch
nh
ch
1
artificial node h + 1 h + 1 artificial stage
to provide the shortest path, as we show next.
5
4
Given a weighted graph construct a transition diagram:
stage.
1 2 3
4
1 2 3
4
1 2 3
4 1
2 5 8 1 3 3 4 3 1 5 8 3
∞
1 5 8 3
∞
8
∞
3
h = n − 1 n wij = ∞ i j Destination
1 3 3
Initial wii = 0 ck
ij = wij
∞ ∞
Stage k 1 2 3 State xk 1 2 3 4
6
Apply the DP algorithm to this transition diagram
particular costs-to-go at the initial stage are the optimal costs for each initial condition.
the destination.
1
2 5 8 1 3 3 4 3
Destination
1 3 3
Initial 3 3 3 6 6 8 8 7 The implementation can be made more efficient and one does not need to first construct the transition diagram. Moreover, one can stop when the costs-to-go remain unchanged. k n − 1 − k
Stage k 1 2 3 4 5 State xk 1 2 3 4 5 6
7
1
2
3
4
5
6 7 8 1 3 3
4
6 5
Another example for an undirected graph
4 13 15 12 ∞ ∞ ∞ 4 4 4 4 7 7 7 7 7 7 7 7 7 10 10 10 11 11
8
What is the shortest distance from Bucharest to Lugoj?
Lugoj Neamt Iasi Vaslui Hirsova Eforie Urziceni Bucharest Giurgiu Fagaras Pitesti Craiova Sibiu Rimnicu Vilcea Oradea Zerind Arad Timisoara Mehadia Dobreta 71 75 118 111 70 75 120 146 97 138 80 99 211 101 86 98 142 92 87 85 90 151 140
Rode map of Romania
9
504 km (Route: Bucharest, Pitesti, Craiova, Dobreta, Mehadia, and Lugoj)
5 10 15 20 25 30 35 40 45
5 10 15 20 25 30
10
A
What is the shortest path for a robot to go from point A to B?
B
11
nodes and units to move diagonally.
1
√ 2
√ 2
1 1 1 1 1 1
√ 2 √ 2 √ 2
1 1 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
12
B A
What is the shortest path for a robot to go from point A to B?
13
7.83 6.83 5.83 4.83 3.83 3.41 3.00 3.41 3.83 4.83 5.83 6.83 7.83 7.41 6.41 5.41 4.41 3.41 2.41 2.00 2.41 3.41 4.41 5.41 6.41 7.41 7.00 6.00 5.00 4.00 3.00 2.00 1.00 1.41 5.41 8.41 7.41 6.41 6.00 0.00 1.00 6.41 9.41 7.83 7.41 7.00 1.00 1.41 5.41 8.41 8.83 8.41 8.00 2.00 2.41 3.41 4.41 5.41 6.41 7.41 9.83 9.41 9.00 3.00 6.41 6.83 7.83 10.00 9.00 8.00 7.00 6.00 5.00 4.00 5.00 6.00 7.41 7.83 8.24 10.41 9.41 9.00 6.00 6.41 7.41 8.41 8.83 9.24 10.83 10.41 10.00 11.00 10.00 9.00 8.00 7.00 7.41 7.83 8.83 9.83 10.24
Simpler example to show the costs-to-go
Side remark: the cost-to-go can be view as a Lyapunov function and the policy can be obtained by following the direction of maximum decrease of this function.
14
How to design a shortest path from A to B when the obstacles are moving?
t = 0 t = 1 t = 2 t = T
Initial position Final
15
t = 0 t = 1 t = 2 t = T
16
t = 0 t = 1 t = 2
Initial node: initial node at time Final node: final node at time
t = T t = 0 t = T
Example
√ 2 √ 2 1 √ 2 √ 2 1 √ 2 ∞ ∞
17
DP can be quite inefficient when computing an optimal path in enough.
next the Dijkstra’s and the A* algorithms.
1
1
1 3
4 5
2 n n − 1
> 2 > 2 > 2 > 2 > 2 > 2 > 2 > 2 initial
destination
18
Main ideas
with the smallest distance to the origin. source: wikipedia
19
Initialization
Steps
execute step 2 for every node for which there is a path (arrow) from to .
di = ∞ i ∈ V − {p} p− = {p} dp = 0 i di i = t j i j j j i dj di + wij < dj dj = di + wij β(j) = i β(j)
Optimal path
along the optimal path (discovered so far) leading to the initial node.
i (i0, i1, . . . , iL) iL = t iL−1 = β(t) . . . , i0 = β(i1) ` ∈ {1, 2, . . . , L} L i0 = p β(i) i`−1 = β(i`) t
20
1 1
1
3
4
5 2 n n − 1 > 2 > 2 > 2 > 2 > 2 > 2 > 2 > 2 initial destination
Dijkstra’s algorithm requires only three iterations for this example
Iteration Pairs (i, di), i ∈OPEN 1 2 + other pairs pertaining to
Destination/final node removed from OPEN - terminate (1, 0) (2, 1) 1 3 + other pairs pertaining to
1 (3, 2) β(2) = 1 β(3) = 2
21
1
2
3
4
5
6 7 8 1 3 3
4
6 5
Iteration Pairs (i, di), i ∈OPEN 1 2 3 4 5 (1, 0) (2, 1), (3, 8), (4, 6) (3, 6), (4, 4) (3, 6), (5, 7) (5, 7), (6, 13) (6, 11) β(2) = 1 β(3) = 1 β(4) = 1 β(3) = 2 β(4) = 2 β(5) = 4 β(6) = 5
Optimal path (from end to start)
(6, β(6), β(β(6)), . . . , 1) = (6, 5, 4, 2, 1) β(6) = 3
22
What is the shortest distance from Bucharest to Lugoj?
Lugoj Neamt Iasi Vaslui Hirsova Eforie Urziceni Bucharest Giurgiu Fagaras Pitesti Craiova Sibiu Rimnicu Vilcea Oradea Zerind Arad Timisoara Mehadia Dobreta 71 75 118 111 70 75 120 146 97 138 80 99 211 101 86 98 142 92 87 85 90 151 140
Rode map of Romania
23
Shortest path from Bucharest to Lugoj
Iteration Pairs {i, di}, i ∈ OPEN {Lugoj, 0} 1 {Mehadia, 70}, {Timisoara, 111} 2 {Timisoara, 111}, {Dobreta, 145} 3 {Dobreta, 145}, {Arad, 229} 4 {Arad, 229}, {Craiova, 265} 5 {Craiova, 265}, {Sibiu, 369}, {Zerind, 304} 6 {Sibiu, 369}, {Zerind, 304}, {Pitesti, 403}, {Rimnicu Vilcea, 411} 7 {Sibiu, 369}, {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Oradea, 375} 8 {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Oradea, 375}, {Fagaras, 468} 9 {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Fagaras, 468} 10 {Rimnicu Vilcea, 411}, {Fagaras, 468}, {Bucharest, 504} 11 {Fagaras, 468}, {Bucharest, 504} 12 {Bucharest, 504}
From this data we can obtain and compute the optimal path. β
24
distance to the destination for each node is also taken into account when picking the node to be explored next. New nodes are added to the wavefront based on .
to the destination; (ii) is such that for every then optimal path is found. Otherwise no optimality guarantees.
these assumptions. h(i) i ∈ V h(i) ≤ wij + h(j) i, j di + h(i) ¯ wij = wij + h(j) − h(i) wij
25
Initialization
Steps
execute step 2 for every node for which there is a path (arrow) from to .
di = ∞ i ∈ V − {p} p− = {p} dp = 0 i i = t j i j j j i dj di + wij < dj dj = di + wij β(j) = i β(j) t di + h(i)
(same algorithm as the Dijkstra’s algorithm except for , same remarks to find optimal path as in slide 19)
di + h(i)
26
A* typically much faster if we have good a heuristic (might not be easy to find! especially if we require it to satisfy two conditions discussed before) source: wikipedia Dijkstra’s A*
27
Lugoj Neamt Iasi Vaslui Hirsova Eforie Urzice Bucharest Giurgiu Fagaras Pitesti Craiova Sibi Rimnicu Oradea Zerin Arad Timisoara Mehadia Dobreta 71 7 11 11 7 7 12 14 9 13 8 9 21 10 8 9 14 9 8 8 9 15 14
Neamt 234 Lasi 226 Vaslui 199 Urziceni 80 Hirsova 151 Eforie 161 Bucharest Giurgi 77 Pitesti 98 Craiova 160 Fagaras 178 Sibiu 253 Rimnicu Vilcea 193 Lugoj 244 Mehadia 241 Dobreta 242 Timisoara 329 Arad 366 Zerind 374 Oradea 380 Straight line distance to Bucharest h(i)
28
Iteration Pairs {i, di} in OPEN {Lugoj,0} 1 {Mehadia,67}, {Timisoara,196} 2 {Timisoara,196}, {Dobreta,143} 3 {Timisoara,196}, {Craiova,181} 4 {Timisoara,196}, {Pitesti,257}, {Rim. Vilcea,360} 5 {Pitesti,257}, {Rim. Vilcea,360}, {Arad,351} 6 {Rim. Vilcea,360}, {Arad,351}, {Bucharest,260}
¯ wij = wij + h(j) − h(i) From this data we can obtain and compute the optimal path.
β
wij
29
initialise but we can still run the algorithm if we keep track of a list of closed nodes (removed in step 1, see slide 19) so that they are not visited again (on slide 19 this is assured by )
algorithms, e.g., breath-first search, depth-first search (see label correcting methods in Bertseka’s book, Ch.2)
these two remarks is Rapidly-exploring random tree (RRT) (see LaValle’s book). di di + wij < dj
30
computationally more efficient than DP to compute optimal paths.
the destination. This is inefficient when interested in one optimal path.
with disturbances - see lecture 2.
policy obtained in the first lecture with DP .
31
Historical note
What's the shortest way to travel from Rotterdam to Groningen? It is the algorithm for the shortest path, which I designed in about 20 minutes. One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on the café terrace to drink a cup of coffee and I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path. As I said, it was a 20-minute invention. In fact, it was published in 1959, three years later. The publication is still quite nice. One of the reasons that it is so nice was that I designed it without pencil and paper. Without pencil and paper you are almost forced to avoid all avoidable complexities. Eventually that algorithm became, to my great amazement, one of the cornerstones of my fame. Edsger W. Dijkstra (1930-2002)
32
with methods to compute shortest paths in graphs (e.g. Dijkstra’s, A*) initial stage final stage
from each state to last stage) example (see also appendix A)
policy
h
33
initial stage final stageh
recompute (online!) and apply first decision disturbance (recompute) another disturbances (recompute)
34
disturbances considered in the previous lecture.
(equivalent) one is implicit and requires online computations!
equivalent control and is very related to model predictive control (to be addressed later)
Optimal paths DP Dijkstra’s (more efficient) DP (might be computationally hard) Dijkstra’s offline (less efficient) Dijkstra’s online (requires online computations) Certainty equivalent control Stochastic DP Stochastic DP Dijkstra
35
Summary
(except stochastic DP!) - (dis)advantages depend on the application (e.g. can we use online computations?).
After this lecture, you should be able to:
, Dijkstra and A*.
Solving a DP problem with Dijkstra’s algorithm
1 2 3 5 1 2 3 4 5 2 3 4 1 2 4 1 1 1 4 3 1
Initial transition diagram
5
with a cost to arrive to it at the final stage coinciding with the terminal cost Consider the same initial transition diagram considered in the first lecture and follow steps I-3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
terminal node and keep track for each initial state of the first decision of the optimal path (this is the optimal policy) 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Iteration Pairs (i, di), i ∈OPEN 1 2 3 4 5 (1, 0) Initial state 1 (3, 2), (4, 1) (3, 2), (7, 1), (8, 2) (3, 2), (10, 2), (11, 4), (8, 2) (6, 3), (10, 2), (11, 4), (8, 2) (6, 3), (10, 2), (11, 4), (12, 7) (6, 3), (14, 4), (13, 7), (11, 4), (12, 7) (15, 4), (13, 7), (11, 4), (12, 7) (14, 4), (13, 7), (11, 4), (12, 7) 6 7 8 It is clear that if we consider the states 4, 7, 10 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy) Belongs to the optimal policy (15, 8), (13, 7), (11, 4), (12, 7) (15, 8), (13, 6), (12, 7) (15, 6), (12, 7) 9 10 11
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Iteration Pairs (i, di), i ∈OPEN 1 2 3 4 5 Initial state 6 7 It is clear that if we consider the states 5, 9, 12, 14 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy) Belongs to the optimal policy (2, 0) (4, 4), (5, 1) (4, 4), (9, 4), (8, 5) (7, 4), (9, 4), (8, 5) (10, 5), (11, 7) (9, 4), (8, 5) (10, 5), (11, 7) (12, 5), (8, 5) (10, 5), (11, 7) (14, 5), (8, 5), (13, 10) 2 (10, 5), (11, 7), (15, 9), (8, 5), (13, 10) (10, 5), (11, 7), (15, 9), (13, 10) (11, 7), (15, 9), (13, 10) (15, 9), (13, 10) 8 9 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Iteration Pairs (i, di), i ∈OPEN 1 2 3 4 5 Initial state 6 7 Belongs to the optimal policy 3 It is clear that if we consider the states 8, 11, 13 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy) (3, 0) (6, 1), (7, 3), (8, 1) (10, 5), (11, 4) (7, 3), (8, 1) (10, 5), (11, 3) (7, 3), (12, 6) (10, 4), (11, 3) (12, 6) (10, 4), (13, 7), (14, 6) (12, 6) (13, 7), (14, 6) (12, 6) (13, 7), (15, 10), (12, 6) (15, 7), (12, 6) 8 (15, 7), (14, 6) (15, 7) 9 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Iteration Pairs (i, di), i ∈OPEN 1 2 3 4 5 Initial state Belongs to the optimal policy It is clear that if we consider the states 8, 11, 13 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy) 6 (6, 0) (10, 4), (11, 3) (10, 4), (13, 7), (14, 5) (13, 7), (14, 5) (13, 7), (15, 9) (13, 7)
Optimal policy
Combining the first decisions leading to the end stage for each node we obtain the optimal policy (the same obtained with the DP algorithm in the first lecture)
DP with terminal constraints
B1
Suppose that we want to reach a given state at the final stage of a transition diagram starting at a given initial state with minimum cost (as opposed to simply reaching the final stage)
1 1 2 3 5 1 2 2 2 2 5 1 2 4 2
terminal state Initial state Since a transition diagram is simply a weighted graph, we can apply graph search methods, and in particular repeat the trick just used to apply DP .
1 1
B2
1.Relabel nodes
1 2 3 4 5 6 7 6
2.transform graph to trans. diagram weighted graph with final and terminal nodes
1 2 3 4 5 6 7 1 2 3 4 5 6 7
∞ ∞ ∞ ∞ 4 1 2 5 2 2
∞ ∞ 1 4 ∞ ∞ ∞ ∞ ∞ ∞ 4 4 1 1 2 2 2 2 2 5 3 3 3 3
B3
2 2 2 2 5 4 1
1 2 3 4 5 6
3 2 4 1 ∞
By inspection we can see that this is the only
Conclusion: if there is a terminal constraint:
desired terminal state.
to be the terminal cost of the desired terminal node plus the cost of such arrow. If the state has no arrows, set the cost-to-go to infinity.
.