Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

Part I Discrete optimization problems

Outline • Dynamic programming formalism • Stochastic dynamic programming • Applications

Recap c 1 c 0 n 1 n 1 1 n 0 2 c h − 1 c 0 n h − 1 1 c h n h n 0 n 0 1 1 − n h n h c 1 c 0 23 22 c h − 1 c 1 c 0 22 22 21 2 2 2 c h − 1 c 0 c 1 12 21 21 c h 1 c h − 1 c 0 c 1 11 11 11 1 1 1 1 Stage 0 Stage 1 Stage h − 1 Stage h • Discrete optimization problem specified by a transition diagram. • Several applications. 1

Recap The dynamic programming algorithm provides a policy from which an optimal path can be obtained. Policies are crucial to cope with disturbances. 4 1 3 5 3 3 2 0 1 1 2 1 3 0 4 1 2 2 4 4 2 5 5 1 4 0 1 3 7 1 9 4 4 5 1 3 3 2 1 1 1 3 3 4 0 1 4 2 2 1 5 4 4 3 1 1 2

Equivalent formulation of discrete optimization problems • Dynamic model x k +1 = f k ( x k , u k ) , k ∈ { 0 , . . . , h − 1 } . h − 1 X g k ( x k , u k ) + g h ( x h ) . • Cost k =0 c 1 c 0 n 1 n 1 1 n 0 2 c h − 1 c 0 n h − 1 1 c h n h n 0 n 0 1 1 − n h c 1 c 0 23 22 n h c h − 1 c 1 c 0 22 22 21 2 2 2 c 0 c h − 1 c 1 12 c h 21 21 1 c h − 1 c 0 c 1 11 11 11 1 1 1 1 x h ∈ { 1 , . . . , n h } State x 1 ∈ { 1 , . . . , n 1 } x 0 ∈ { 1 , . . . , n 0 } Action u 0 ∈ { 1 , . . . , m 0 ,x 0 } u 1 ∈ { 1 , . . . , m 1 ,x 1 } Cost g 0 ( x 0 , u 0 ) = c 0 g h ( x h ) = c h g 1 ( x 1 , u 1 ) = c 1 3 x h x 0 ,u 0 x 1 ,u 1

Dynamic programming equations Dynamic programming algorithm in the new formalist Start with for every and for each decision stage, starting J h ( i ) = g h ( i ) i ∈ X k from the last and moving backwards, compute and from k ∈ { h − 1 , h − 2 , . . . , 0 } J k µ k J k ( i ) = j ∈ U k ( i ) g k ( i, j ) + J k +1 ( f k ( i, j )) min DP equation and µ k ( i ) = j, where is the minimizer in the dynamic programming (DP) equation, i.e., j J k ( i ) = g k ( i, µ k ( i )) + J k +1 ( f k ( i, µ k ( i ))) and . Then is an optimal policy. { µ 0 , . . . , µ h − 1 } U k ( i ) := { 1 , . . . , m k,i } 4

Remarks • The DP equation expresses the balance each optimal decision must meet between immediate and future cost J k ( x k ) = min g ( x k , u k ) + J k +1 ( f ( x k , u k )) u k ∈ U k ( x k ) | {z } | {z } immediate or stage cost future cost • This is just a more formal way of writing what we have already seen. • We shall use the same notation for stage-decision problems. • There we shall formally prove that the dynamic programming algorithm provides the optimal policy. The proof also applies to discrete optimization problems. 5

Example Move a robot from an initial stage to a final stage in minimum time 0 h initial stage final stage 0 h • If the robot is not stuck in an obstacle or in a wall it can go up, straight or down. Otherwise, there is only one option (see figs). It takes I time unit to move horizontally from stage to stage and to move diagonally. time units are √ i + 1 i 2 c paid every time an obstacle or a wall is hit. √ 2 √ √ √ 1 + c 1 2 + c 2 + c 2 lower wall obstacle up straight down upper wall 6

Modeling This problem can be written in the DP framework for a transition diagram obtained from the rules of the problem √ √ 2 + c 2 + c √ 2 √ 2 1 1 √ √ 2 2 √ 2 1 1 + c 7

DP equation n 1 final stage h initial stage 0 J h ( i ) = 0 i ∈ { 1 , . . . , n } For : k ∈ { h − 1 , h − 2 , . . . , 1 , 0 } √ J k (1) = 2 + c + J k +1 (2) √ J k ( n ) = 2 + c + J k +1 ( n − 1) √ √ not an obstacle node J k ( i ) = min { 1 + J k +1 ( i ) , 2 + J k +1 ( i + 1) , 2 + J k +1 ( i − 1) } obstacle node J k ( i ) = 1 + c + J k +1 ( i ) i ∈ { 2 , . . . , n − 1 } 8

Numerical Example √ J k ( n ) = 2 + c + J k +1 ( n − 1) J h ( i ) = 0 17.4116.4115.4114.4113.4112.4111.4110.419.41 8.41 7.41 6.41 5.41 0.00 13.0012.0011.0010.009.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 13.4112.4111.4110.419.41 8.41 11.416.41 5.41 8.41 3.41 6.00 1.00 0.00 13.8312.8311.8310.839.83 9.24 11.416.41 5.41 4.41 3.41 6.00 1.00 0.00 13.8312.8311.8310.839.83 8.83 7.83 6.83 5.83 8.00 3.00 2.00 1.00 0.00 14.2413.2412.2411.2410.249.24 12.247.24 9.41 8.41 3.41 6.00 1.00 0.00 13.8312.8311.8310.839.83 12.8311.836.83 9.41 8.41 3.41 6.00 1.00 0.00 13.4112.4111.4110.419.41 8.41 11.416.41 5.41 8.00 3.00 2.00 1.00 0.00 13.0012.0011.0010.009.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 13.0012.0011.0010.009.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 : 13.0012.0011.0010.009.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 13.0012.0011.0010.009.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 17.4116.4115.4114.4113.4112.4111.4110.419.41 8.41 7.41 6.41 5.41 0.00 √ J k (1) = 2 + c + J k +1 (2) c = 4 √ √ not an obstacle node J k ( i ) = min { 1 + J k +1 ( i ) , 2 + J k +1 ( i + 1) , 2 + J k +1 ( i − 1) } obstacle node J k ( i ) = 1 + c + J k +1 ( i ) 9

Outline • Dynamic programming formalism • Stochastic dynamic programming • Applications

Discussion • We can use the policy provided by the dynamic programming algorithm assuming no disturbances to cope with disturbances. • Is this procedure optimal in any sense? In general, no. • In fact, as we show next, in the presence of disturbances it may not even be possible to define optimal decisions, since these would depend on future realizations of disturbances. 10

Example 17.4116.4115.4114.4113.4112.4111.4110.419.41 8.41 7.41 6.41 5.41 0.00 13.0012.0011.0010.009.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 Consider that at position A there might 13.4112.4111.4110.419.41 8.41 11.416.41 5.41 8.41 3.41 6.00 1.00 0.00 be a disturbance making the robot move 13.8312.8311.8310.839.83 9.24 11.416.41 5.41 4.41 3.41 6.00 1.00 0.00 13.8312.8311.8310.839.83 8.83 7.83 6.83 5.83 8.00 3.00 2.00 1.00 0.00 down one extra position A 14.2413.2412.2411.2410.249.24 12.247.24 9.41 8.41 3.41 6.00 1.00 0.00 13.8312.8311.8310.839.83 12.8311.836.83 9.41 8.41 3.41 6.00 1.00 0.00 13.4112.4111.4110.419.41 8.41 11.416.41 5.41 8.00 3.00 2.00 1.00 0.00 13.0012.0011.0010.009.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 13.0012.0011.0010.009.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 13.0012.0011.0010.009.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 13.0012.0011.0010.009.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 17.4116.4115.4114.4113.4112.4111.4110.419.41 8.41 7.41 6.41 5.41 0.00 Possible outcomes (no disturbance/ disturbance) √ 2 √ 1 2 1 √ 5 √ 2 1 down up straight Decisions 11

Example no disturbance disturbance √ decision: ‘up’ 11 . 4 + 2 7 . 83 + 1 √ decision:‘straight’ 7 . 83 + 1 12 . 2 + 2 √ decision:‘down’ √ 11 . 8 + 5 12 . 2 + 2 cost-to-go at position A If we knew the future disturbance value, we would pick ‘up’ if ‘disturbance’, ‘straight’ if ‘no disturbance’. Thus, if we assume nothing about the disturbances there is no optimal decision at position A. 12

Assumptions on disturbances There are two assumptions that make optimal decisions well-defined • Stochastic disturbances. If we have a stochastic characterization of disturbances we can define optimal policies as the ones that minimize the expected cost. The dynamic programming framework can be extended to provide this policy. • Worst-case disturbances. Optimal control problems with worst-case disturbances can be tackled in the framework of game theory and will not be addressed in the course. 13

Example: stochastic disturbances For the toy robot problem consider the stochastic characterization: Prob[disturbance] = 0 . 2 Prob[no disturbance] = 0 . 8 no disturbance disturbance Expected cost decision: ‘up’ √ √ (11 . 4 + 2)0 . 8 + (7 . 83 + 1)0 . 2 = 12 . 0174 11 . 4 + 2 7 . 83 + 1 decision:‘straight’ √ √ (7 . 83 + 1)0 . 8 + (12 . 2 + 2)0 . 2 = 9 . 79 12 . 2 + 2 7 . 83 + 1 √ √ decision:‘down’ √ (12 . 2 + 2)0 . 8 + (11 . 8 + 5)0 . 2 = 13 . 69 √ 11 . 8 + 5 12 . 2 + 2 cost-to-go at position A Optimal decision is now well-defined: pick ‘straight’ . 14

Example: worst-case disturbances no disturbance disturbance Worst-case cost decision: ‘up’ √ √ 11 . 4 + 2 7 . 83 + 1 11 . 4 + 2 √ √ decision:‘straight’ 12 . 2 + 2 12 . 2 + 2 7 . 83 + 1 √ √ √ decision:‘down’ 11 . 8 + 5 11 . 8 + 5 12 . 2 + 2 cost-to-go at position A Optimal decision is now well-defined: pick ‘up’. Safe policy (at least we get ). √ 11 . 4 + 2 15

Stochastic formulation: Markov decision processes The dynamic model and the cost take now the form x k +1 = f k ( x k , u k , w k ) h − 1 X g k ( x k , u k , w k ) + g h ( x h ) , k =0 where the state and input live in the same finite spaces defined before (slide 3), and disturbances belong to a finite set w k ∈ W k ( i, j ) := { 1 , . . . , ω i,j,k } when and are characterised by x k = i, u k = j p k,i,j := Prob[ w k = ` | x k = i, u k = j ] , ` ∈ W k ( i, j ) . ` Note that both the state and cost are now random variables. 16

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete optimization problems Outline Dynamic programming formalism Stochastic dynamic programming Applications Recap c 1 c 0 n 1 n 1 1 n 0 2 c h

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Static

GraphCoQL A mechanized formalization of GraphQL in Coq Toms Daz Federico Olmedo ric

CSE 140: Components and Design Techniques for Digital Systems Lecture 9: Sequential Networks:

This Class Map Reduce Programming Framework Map Reduce

A Perfect Sampling Method for Exponential Random Graph Models Carter T. Butts Department of

Asynchronous DP, Real-Time DP and Intro to RL Lecturer: Daniel Russo Scribe: Kejia Shi, Yexin Wu

Todays Lecture: More on linear search Cell arrays Application of cell array:

Recap: VINCIA Plug-in to PYTHIA 8 C++ (~20,000 lines) Giele, Kosower, Skands, PRD 78 (2008)

A Series of Lectures on Approximate Dynamic Programming Dimitri P . Bertsekas Laboratory for

Sambuz

Useful Links

Newsletter

Mail Us