Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

optimal control and dynamic programming
SMART_READER_LITE
LIVE PREVIEW

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest paths in graphs Dynamic programming Dijkstras and A* algorithms Certainty equivalent control Graph Weighted Graph Nodes V := {


slide-1
SLIDE 1

4SC000 Q2 2017-2018

Optimal Control and Dynamic Programming

Duarte Antunes

slide-2
SLIDE 2

Outline

  • Shortest paths in graphs
  • Dynamic programming
  • Dijkstra’s and A* algorithms
  • Certainty equivalent control
slide-3
SLIDE 3

Graph

1

1

2

3

4

5

6 7 8 1 3 3

4

6 5

Weighted Graph

  • Nodes
  • Edges
  • Weights
  • Undirected if

V := {1, . . . , n} (i, j) ∈ E wij = wji

1

2

3

4

5

6 7 8 1 3 3

4

6 5 3

Undirected Directed wij ≥ 0 E := {(i1, j1), . . . , (ir, jr)|i1, . . . ir, j1, . . . , jr ∈ V} wij = wji

3

E = {(3, 6), (2, 3), . . . } w36 = 7, w23 = 5, . . .

slide-4
SLIDE 4

Applications

2

Graphs model networks (road, social, transportation, etc.) and can be found in numerous applications

slide-5
SLIDE 5

Shortest path problem

3

1

2

3

4

5

6 7 8 1 3 3

4

6 5

Find a path from an initial node to a destination node in a weighted graph, with minimum length (sum of the weights

  • f its edges)

Initial Final Minimum length 11

Can we use the DP algorithm to find the shortest path?

slide-6
SLIDE 6

Discussion

4

  • Computing an optimal path in a transition diagram can be seen as computing the shortest

path from the nodes at stage to the node at stage of the following weighted graph: c0

11

Stage 1 Stage 0 Stage h Stage h −1 c0

n01

c0

n02

c0

21

c0

22

c0

12

c1

11

c1

21

c1

22

c1

23

c1

n11

ch−1

11

ch−1

21

ch−1

22

ch−1

nh−11

ch

nh

ch

1

artificial node h + 1 h + 1 artificial stage

  • For graphs with this structure we already know how to use DP to compute shortest paths.
  • Adjustments are needed for general graphs (e.g. cycles may occur) but DP can still be used

to provide the shortest path, as we show next.

slide-7
SLIDE 7

Dynamic programming formulation

5

4

Given a weighted graph construct a transition diagram:

  • stages, states at decision stages and only the destination at the terminal

stage.

  • Make , if there is no link from to , and .

1 2 3

4

1 2 3

4

1 2 3

4 1

2 5 8 1 3 3 4 3 1 5 8 3

1 5 8 3

8

3

h = n − 1 n wij = ∞ i j Destination

1 3 3

Initial wii = 0 ck

ij = wij

∞ ∞

slide-8
SLIDE 8

Stage k 1 2 3 State xk 1 2 3 4

Dynamic programming solution

6

Apply the DP algorithm to this transition diagram

  • Costs-to-go at a stage are the costs of the shortest path with hops. In

particular costs-to-go at the initial stage are the optimal costs for each initial condition.

  • To find an optimal path follow the policy for a given initial state.
  • Cost-to-go at stage of a given state is infinite if there is no path from that initial state to

the destination.

1

2 5 8 1 3 3 4 3

Destination

1 3 3

Initial 3 3 3 6 6 8 8 7 The implementation can be made more efficient and one does not need to first construct the transition diagram. Moreover, one can stop when the costs-to-go remain unchanged. k n − 1 − k

slide-9
SLIDE 9

Stage k 1 2 3 4 5 State xk 1 2 3 4 5 6

Example

7

1

2

3

4

5

6 7 8 1 3 3

4

6 5

Another example for an undirected graph

4 13 15 12 ∞ ∞ ∞ 4 4 4 4 7 7 7 7 7 7 7 7 7 10 10 10 11 11

slide-10
SLIDE 10

8

Shortest paths in road networks

What is the shortest distance from Bucharest to Lugoj?

Lugoj Neamt Iasi Vaslui Hirsova Eforie Urziceni Bucharest Giurgiu Fagaras Pitesti Craiova Sibiu Rimnicu Vilcea Oradea Zerind Arad Timisoara Mehadia Dobreta 71 75 118 111 70 75 120 146 97 138 80 99 211 101 86 98 142 92 87 85 90 151 140

Rode map of Romania

slide-11
SLIDE 11

9

Shortest paths in road networks

504 km (Route: Bucharest, Pitesti, Craiova, Dobreta, Mehadia, and Lugoj)

slide-12
SLIDE 12

5 10 15 20 25 30 35 40 45

  • 5

5 10 15 20 25 30

10

Robot path planning

A

What is the shortest path for a robot to go from point A to B?

B

slide-13
SLIDE 13

11

Assumptions

  • It takes distance unit to move horizontally or vertically between adjacent

nodes and units to move diagonally.

  • Distances to obstacle nodes are infinite.
  • Distance between two diagonally adjacent nodes, adjacent to the same
  • bstacle node is infinite.

1

√ 2

√ 2

1 1 1 1 1 1

√ 2 √ 2 √ 2

1 1 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

slide-14
SLIDE 14

12

Robot path planning

B A

What is the shortest path for a robot to go from point A to B?

slide-15
SLIDE 15 2 4 6 8 10 12 14 1 2 3 4 5 6 7 8 9 10 11

13

Robot path planning

2 4 6 8 10 12 14 1 2 3 4 5 6 7 8 9 10 11

7.83 6.83 5.83 4.83 3.83 3.41 3.00 3.41 3.83 4.83 5.83 6.83 7.83 7.41 6.41 5.41 4.41 3.41 2.41 2.00 2.41 3.41 4.41 5.41 6.41 7.41 7.00 6.00 5.00 4.00 3.00 2.00 1.00 1.41 5.41 8.41 7.41 6.41 6.00 0.00 1.00 6.41 9.41 7.83 7.41 7.00 1.00 1.41 5.41 8.41 8.83 8.41 8.00 2.00 2.41 3.41 4.41 5.41 6.41 7.41 9.83 9.41 9.00 3.00 6.41 6.83 7.83 10.00 9.00 8.00 7.00 6.00 5.00 4.00 5.00 6.00 7.41 7.83 8.24 10.41 9.41 9.00 6.00 6.41 7.41 8.41 8.83 9.24 10.83 10.41 10.00 11.00 10.00 9.00 8.00 7.00 7.41 7.83 8.83 9.83 10.24

Simpler example to show the costs-to-go

Side remark: the cost-to-go can be view as a Lyapunov function and the policy can be obtained by following the direction of maximum decrease of this function.

slide-16
SLIDE 16

14

Time-varying graphs

How to design a shortest path from A to B when the obstacles are moving?

t = 0 t = 1 t = 2 t = T

Initial position Final

slide-17
SLIDE 17

15

Time-varying graphs

  • 1. Consider the set of static graphs for each time step

t = 0 t = 1 t = 2 t = T

slide-18
SLIDE 18

16

Time-varying graphs

t = 0 t = 1 t = 2

  • 2. Build a time-invariant graph in 3D
  • 3. Compute shortest path for 3D graph

Initial node: initial node at time Final node: final node at time

t = T t = 0 t = T

Example

√ 2 √ 2 1 √ 2 √ 2 1 √ 2 ∞ ∞

slide-19
SLIDE 19

Outline

  • Shortest paths in graphs
  • Dynamic programming
  • Dijkstra’s and A* algorithms
  • Certainty equivalent control
slide-20
SLIDE 20

17

Discussion

DP can be quite inefficient when computing an optimal path in enough.

  • For shortest path problems in graphs, there are many alternative algorithms. We describe

next the Dijkstra’s and the A* algorithms.

1

1

1 3

4 5

2 n n − 1

> 2 > 2 > 2 > 2 > 2 > 2 > 2 > 2 initial

destination

  • Figure example: DP searches the full space - not necessary to compute the optimal path.
slide-21
SLIDE 21

18

Dijkstra’s algorithm

Main ideas

  • Iteratively generate shorter paths from the origin to every node.
  • Updates list of nodes (wavefront) which can be explored next.
  • New nodes are added to the wavefront based on the cost: neighbors of node

with the smallest distance to the origin. source: wikipedia

slide-22
SLIDE 22

19

Dijkstra’s algorithm

Initialization

  • for , , and OPEN initial node - final node

Steps

  • 1. Remove a node from OPEN with the minimum estimate . If stop, otherwise

execute step 2 for every node for which there is a path (arrow) from to .

  • 2. If : set , set , place in OPEN if it is not there
  • already. Otherwise do not update , .
  • 3. After executing Step 2 for all the nodes corresponding to out-neighbors of , go to step I.

di = ∞ i ∈ V − {p} p− = {p} dp = 0 i di i = t j i j j j i dj di + wij < dj dj = di + wij β(j) = i β(j)

Optimal path

  • To keep track of the shortest paths if suffices to save for every node the next node

along the optimal path (discovered so far) leading to the initial node.

  • The optimal path is then given by for ,
  • r equivalently , , where is such that .
  • If OPEN is empty at a given step of the algorithm then there is no path to the destination.

i (i0, i1, . . . , iL) iL = t iL−1 = β(t) . . . , i0 = β(i1) ` ∈ {1, 2, . . . , L} L i0 = p β(i) i`−1 = β(i`) t

slide-23
SLIDE 23

20

Example I

1 1

1

3

4

5 2 n n − 1 > 2 > 2 > 2 > 2 > 2 > 2 > 2 > 2 initial destination

Dijkstra’s algorithm requires only three iterations for this example

Iteration Pairs (i, di), i ∈OPEN 1 2 + other pairs pertaining to

  • ther neigh. of node

Destination/final node removed from OPEN - terminate (1, 0) (2, 1) 1 3 + other pairs pertaining to

  • ther neigh. of nodes & 2

1 (3, 2) β(2) = 1 β(3) = 2

slide-24
SLIDE 24

21

Example II

1

2

3

4

5

6 7 8 1 3 3

4

6 5

Iteration Pairs (i, di), i ∈OPEN 1 2 3 4 5 (1, 0) (2, 1), (3, 8), (4, 6) (3, 6), (4, 4) (3, 6), (5, 7) (5, 7), (6, 13) (6, 11) β(2) = 1 β(3) = 1 β(4) = 1 β(3) = 2 β(4) = 2 β(5) = 4 β(6) = 5

Optimal path (from end to start)

(6, β(6), β(β(6)), . . . , 1) = (6, 5, 4, 2, 1) β(6) = 3

slide-25
SLIDE 25

22

Shortest paths in road networks

What is the shortest distance from Bucharest to Lugoj?

Lugoj Neamt Iasi Vaslui Hirsova Eforie Urziceni Bucharest Giurgiu Fagaras Pitesti Craiova Sibiu Rimnicu Vilcea Oradea Zerind Arad Timisoara Mehadia Dobreta 71 75 118 111 70 75 120 146 97 138 80 99 211 101 86 98 142 92 87 85 90 151 140

Rode map of Romania

slide-26
SLIDE 26

23

Example III

Shortest path from Bucharest to Lugoj

Iteration Pairs {i, di}, i ∈ OPEN {Lugoj, 0} 1 {Mehadia, 70}, {Timisoara, 111} 2 {Timisoara, 111}, {Dobreta, 145} 3 {Dobreta, 145}, {Arad, 229} 4 {Arad, 229}, {Craiova, 265} 5 {Craiova, 265}, {Sibiu, 369}, {Zerind, 304} 6 {Sibiu, 369}, {Zerind, 304}, {Pitesti, 403}, {Rimnicu Vilcea, 411} 7 {Sibiu, 369}, {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Oradea, 375} 8 {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Oradea, 375}, {Fagaras, 468} 9 {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Fagaras, 468} 10 {Rimnicu Vilcea, 411}, {Fagaras, 468}, {Bucharest, 504} 11 {Fagaras, 468}, {Bucharest, 504} 12 {Bucharest, 504}

From this data we can obtain and compute the optimal path. β

slide-27
SLIDE 27

24

A*

  • Similar to Dijkstra’s algorithm but an estimate (heuristic) of the

distance to the destination for each node is also taken into account when picking the node to be explored next. New nodes are added to the wavefront based on .

  • If the heuristic is: (i) smaller than the optimal cost from that node

to the destination; (ii) is such that for every then optimal path is found. Otherwise no optimality guarantees.

  • To run the A* algorithm under the two heuristic assumptions:
  • 1. Change the weights to .
  • 2. Run Dijkstra’s algorithm and get optimal path.
  • 3. Obtain optimal cost in the original graph with weights .
  • The general algorithm is given next, which works under or without

these assumptions. h(i) i ∈ V h(i) ≤ wij + h(j) i, j di + h(i) ¯ wij = wij + h(j) − h(i) wij

slide-28
SLIDE 28

25

A*

Initialization

  • for , , and OPEN initial node - final node

Steps

  • 1. Remove a node from OPEN with the minimum . If stop, otherwise

execute step 2 for every node for which there is a path (arrow) from to .

  • 2. If : set , set , place in OPEN if it is not there
  • already. Otherwise do not update , .
  • 3. After executing Step 2 for all the nodes corresponding to out-neighbors of , go to step I.

di = ∞ i ∈ V − {p} p− = {p} dp = 0 i i = t j i j j j i dj di + wij < dj dj = di + wij β(j) = i β(j) t di + h(i)

(same algorithm as the Dijkstra’s algorithm except for , same remarks to find optimal path as in slide 19)

di + h(i)

slide-29
SLIDE 29

26

A* and Dijkstra’s algorithm

A* typically much faster if we have good a heuristic (might not be easy to find! especially if we require it to satisfy two conditions discussed before) source: wikipedia Dijkstra’s A*

slide-30
SLIDE 30

27

Example

Lugoj Neamt Iasi Vaslui Hirsova Eforie Urzice Bucharest Giurgiu Fagaras Pitesti Craiova Sibi Rimnicu Oradea Zerin Arad Timisoara Mehadia Dobreta 71 7 11 11 7 7 12 14 9 13 8 9 21 10 8 9 14 9 8 8 9 15 14

Neamt 234 Lasi 226 Vaslui 199 Urziceni 80 Hirsova 151 Eforie 161 Bucharest Giurgi 77 Pitesti 98 Craiova 160 Fagaras 178 Sibiu 253 Rimnicu Vilcea 193 Lugoj 244 Mehadia 241 Dobreta 242 Timisoara 329 Arad 366 Zerind 374 Oradea 380 Straight line distance to Bucharest h(i)

slide-31
SLIDE 31

28

Example

Iteration Pairs {i, di} in OPEN {Lugoj,0} 1 {Mehadia,67}, {Timisoara,196} 2 {Timisoara,196}, {Dobreta,143} 3 {Timisoara,196}, {Craiova,181} 4 {Timisoara,196}, {Pitesti,257}, {Rim. Vilcea,360} 5 {Pitesti,257}, {Rim. Vilcea,360}, {Arad,351} 6 {Rim. Vilcea,360}, {Arad,351}, {Bucharest,260}

  • 1. Change the weights to .
  • 2. Run Dijkstra’s algorithm and get optimal path.
  • 3. Obtain optimal cost in the original graph with weights .

¯ wij = wij + h(j) − h(i) From this data we can obtain and compute the optimal path.

β

wij

slide-32
SLIDE 32

29

Discussion

  • For large graphs one cannot even store the number of nodes and

initialise but we can still run the algorithm if we keep track of a list of closed nodes (removed in step 1, see slide 19) so that they are not visited again (on slide 19 this is assured by )

  • If optimality is not needed, there are many more graph search

algorithms, e.g., breath-first search, depth-first search (see label correcting methods in Bertseka’s book, Ch.2)

  • For robot motion planning Dijktra and A* are in general naive:
  • construct nodes as we move along (so the graph is only implicit).
  • random placement of nodes are in general better.
  • A popular method that improve upon previous methods based on

these two remarks is Rapidly-exploring random tree (RRT) (see LaValle’s book). di di + wij < dj

slide-33
SLIDE 33

30

Discussion

  • The Dijkstra’s algorithm and other search algorithms (e.g. A*) are typically

computationally more efficient than DP to compute optimal paths.

  • DP explores every node providing the optimal paths from every node to

the destination. This is inefficient when interested in one optimal path.

  • Why then dynamic programming? Provides a policy which allows to cope

with disturbances - see lecture 2.

  • We discuss next how to use the Dijkstra’s algorithm to provide the
  • ptimal policy in real time (online).
  • In Appendix A, the Dijkstra’s algorithm is used to obtain the same optimal

policy obtained in the first lecture with DP .

  • Thus, again, why DP then? Stochastic DP! + other advantages.
slide-34
SLIDE 34

31

Edsger W. Dijkstra

Historical note

  • Edsger W. Dijkstra was a professor at TU/Eindhoven from 1962 to 1984

What's the shortest way to travel from Rotterdam to Groningen? It is the algorithm for the shortest path, which I designed in about 20 minutes. One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on the café terrace to drink a cup of coffee and I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path. As I said, it was a 20-minute invention. In fact, it was published in 1959, three years later. The publication is still quite nice. One of the reasons that it is so nice was that I designed it without pencil and paper. Without pencil and paper you are almost forced to avoid all avoidable complexities. Eventually that algorithm became, to my great amazement, one of the cornerstones of my fame. Edsger W. Dijkstra (1930-2002)

slide-35
SLIDE 35

Outline

  • Shortest paths in graphs
  • Dynamic programming
  • Dijkstra’s and A* algorithms
  • Certainty equivalent control
slide-36
SLIDE 36

32

Shortest paths in graphs and policies

  • A transition diagram is just a weighted graph and therefore we can compute optimal paths

with methods to compute shortest paths in graphs (e.g. Dijkstra’s, A*) initial stage final stage

  • Doing this for every stage and every state and taking the first decision of the optimal paths we
  • btain the optimal policy! (function that for each state give the first decision of the optimal path

from each state to last stage) example (see also appendix A)

  • ptimal

policy

  • However, this is typically computationally less efficient than DP

h

slide-37
SLIDE 37

33

Certainty equivalent control

  • Yet, we can implement the method just described (using e.g., the Dijkstra’s algorithm) online
  • 1. Compute the optimal path for the initial sate and take the first decision

initial stage final stageh

  • 2. If no disturbance occurred use the next decision along the optimal path, otherwise

recompute (online!) and apply first decision disturbance (recompute) another disturbances (recompute)

slide-38
SLIDE 38

34

Discussion

  • Doing this we end up with the same policy as DP neglecting

disturbances considered in the previous lecture.

  • The policy obtained with DP is explicit whereas this new

(equivalent) one is implicit and requires online computations!

  • In the literature this (equivalent) policy is called certainty

equivalent control and is very related to model predictive control (to be addressed later)

  • To summarize:

Optimal paths DP Dijkstra’s (more efficient) DP (might be computationally hard) Dijkstra’s offline (less efficient) Dijkstra’s online (requires online computations) Certainty equivalent control Stochastic DP Stochastic DP Dijkstra

slide-39
SLIDE 39

35

Concluding remarks

Summary

  • DP can be used to solve shortest paths in graphs.
  • Discussed alternative methods, Dijkstra’s and A*.
  • Introduced certainty equivalent control.
  • Main message there are other methods to compute optimal paths and optimal policies

(except stochastic DP!) - (dis)advantages depend on the application (e.g. can we use online computations?).

After this lecture, you should be able to:

  • Compute the shortest path in a graph with DP

, Dijkstra and A*.

slide-40
SLIDE 40

Appendix A

Solving a DP problem with Dijkstra’s algorithm

slide-41
SLIDE 41

1 2 3 5 1 2 3 4 5 2 3 4 1 2 4 1 1 1 4 3 1

Initial transition diagram

5

Example

  • 1. Add artificial terminal node

with a cost to arrive to it at the final stage coinciding with the terminal cost Consider the same initial transition diagram considered in the first lecture and follow steps I-3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

  • 3. Compute the optimal paths (using Dijkstra’s algorithm) for each state to the artificial

terminal node and keep track for each initial state of the first decision of the optimal path (this is the optimal policy) 4

slide-42
SLIDE 42

Example

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Iteration Pairs (i, di), i ∈OPEN 1 2 3 4 5 (1, 0) Initial state 1 (3, 2), (4, 1) (3, 2), (7, 1), (8, 2) (3, 2), (10, 2), (11, 4), (8, 2) (6, 3), (10, 2), (11, 4), (8, 2) (6, 3), (10, 2), (11, 4), (12, 7) (6, 3), (14, 4), (13, 7), (11, 4), (12, 7) (15, 4), (13, 7), (11, 4), (12, 7) (14, 4), (13, 7), (11, 4), (12, 7) 6 7 8 It is clear that if we consider the states 4, 7, 10 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy) Belongs to the optimal policy (15, 8), (13, 7), (11, 4), (12, 7) (15, 8), (13, 6), (12, 7) (15, 6), (12, 7) 9 10 11

slide-43
SLIDE 43

Example

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Iteration Pairs (i, di), i ∈OPEN 1 2 3 4 5 Initial state 6 7 It is clear that if we consider the states 5, 9, 12, 14 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy) Belongs to the optimal policy (2, 0) (4, 4), (5, 1) (4, 4), (9, 4), (8, 5) (7, 4), (9, 4), (8, 5) (10, 5), (11, 7) (9, 4), (8, 5) (10, 5), (11, 7) (12, 5), (8, 5) (10, 5), (11, 7) (14, 5), (8, 5), (13, 10) 2 (10, 5), (11, 7), (15, 9), (8, 5), (13, 10) (10, 5), (11, 7), (15, 9), (13, 10) (11, 7), (15, 9), (13, 10) (15, 9), (13, 10) 8 9 10

slide-44
SLIDE 44

Example

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Iteration Pairs (i, di), i ∈OPEN 1 2 3 4 5 Initial state 6 7 Belongs to the optimal policy 3 It is clear that if we consider the states 8, 11, 13 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy) (3, 0) (6, 1), (7, 3), (8, 1) (10, 5), (11, 4) (7, 3), (8, 1) (10, 5), (11, 3) (7, 3), (12, 6) (10, 4), (11, 3) (12, 6) (10, 4), (13, 7), (14, 6) (12, 6) (13, 7), (14, 6) (12, 6) (13, 7), (15, 10), (12, 6) (15, 7), (12, 6) 8 (15, 7), (14, 6) (15, 7) 9 10

slide-45
SLIDE 45

Example

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Iteration Pairs (i, di), i ∈OPEN 1 2 3 4 5 Initial state Belongs to the optimal policy It is clear that if we consider the states 8, 11, 13 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy) 6 (6, 0) (10, 4), (11, 3) (10, 4), (13, 7), (14, 5) (13, 7), (14, 5) (13, 7), (15, 9) (13, 7)

slide-46
SLIDE 46

Optimal policy

Example

Combining the first decisions leading to the end stage for each node we obtain the optimal policy (the same obtained with the DP algorithm in the first lecture)

slide-47
SLIDE 47

Appendix B

DP with terminal constraints

slide-48
SLIDE 48

B1

Suppose that we want to reach a given state at the final stage of a transition diagram starting at a given initial state with minimum cost (as opposed to simply reaching the final stage)

1 1 2 3 5 1 2 2 2 2 5 1 2 4 2

DP with terminal constraints

terminal state Initial state Since a transition diagram is simply a weighted graph, we can apply graph search methods, and in particular repeat the trick just used to apply DP .

1 1

slide-49
SLIDE 49

B2

DP with terminal constraints

1.Relabel nodes

1 2 3 4 5 6 7 6

2.transform graph to trans. diagram weighted graph with final and terminal nodes

1 2 3 4 5 6 7 1 2 3 4 5 6 7

∞ ∞ ∞ ∞ 4 1 2 5 2 2

  • 3. Apply DP

∞ ∞ 1 4 ∞ ∞ ∞ ∞ ∞ ∞ 4 4 1 1 2 2 2 2 2 5 3 3 3 3

slide-50
SLIDE 50

B3

DP with terminal constraints

2 2 2 2 5 4 1

1 2 3 4 5 6

3 2 4 1 ∞

By inspection we can see that this is the only

  • nly part that matters

Conclusion: if there is a terminal constraint:

  • Remove the arrows from nodes at the final decision states that do not lead to the

desired terminal state.

  • For each state choose the arrow with minimum cost and set the cost-to-go of that node

to be the terminal cost of the desired terminal node plus the cost of such arrow. If the state has no arrows, set the cost-to-go to infinity.

  • Apply DP

.