Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

Motivation • Last two lectures we discussed two alternative methods to DP for stage decision problems: discretisation and static optimization. • Discretization allows to obtain an optimal policy (approximately) when the dimension of the problem (state and input dimension) is small and static optimisation allows to obtain an optimal path when the problem is convex DP Static dimension convexity discretization optimization small convex n ≤ 3 small non-convex n ≤ 3 large convex large non-convex In this lecture: • we show how to obtain optimal policies using static optimisation and certainty equivalent control (lec 3, slide 33) applying this to solve linear quadratic control with input constraints. • discuss related approximate dynamic programming techniques such as MPC and rollout applying it to a non-convex, large dimensional problem (control of switched linear systems). 1

Outline • Approximate dynamic programming • Linear quadratic control with inequality constraints • Control of switched systems

Challenge Iterating the dynamic programming algorithm for stage decision problems is hard since it is hard to compute the costs-to-go (see e.g. slide 11 of lecture 5). J k ( x k ) = min u k g k ( x k , u k ) + J k +1 ( f k ( x k , u k )) J k ( x k ) J k +1 ( x k +1 ) g k ( x k , u k ) + J k +1 ( f k ( x k , u k )) ? x k x k +1 u k Typically unknown (known Hard to obtain an expression For each hard to x k only for the terminal stage) for the cost-to-go minimize this function of u k • Moreover: - discretization is only possible if the state and input spaces are not large. - static optimisation only assures optimality for convex problems and only allows to compute optimal paths. - How to obtain (sub)optimal policy? 2

Idea I -Certainty equivalent control Compute optimal path online starting from the current state and apply the first decision x k time/stage k h Repeat this procedure at every stage, for the current state x k +1 time/stage k + 1 h 3

Idea II - MPC Similar to Idea I but compute decisions only in a (short) horizon . . Compute optimal path over horizon starting from the current state and apply first decision x k time/stage k k + H h Repeat this procedure in a receding/rolling horizon way x k +1 time/stage k + 1 k + H h • Optimization problem to solve at each stage is then much simpler and makes the algorithm feasible to run online. This is the fundamental idea of Model Predictive Control (MPC). • Note that this is (in general) not optimal not even for problems without disturbances. 4

Idea III - Rollout Similar to Ideas I II but after the optimization horizon use a base policy. Apply the first decision of the optimization procedure. x k time/stage k k + H h Repeat this procedure in a receding/rolling horizon way x k +1 time/stage k + 1 k + H h • Optimization problem to solve at each stage is then much simpler and makes the algorithm feasible to run online. This is the fundamental idea of Rollout. • Note that this is (in general) not optimal not even for problems without disturbances. 5

Approximate dynamic programming Approximate the cost-to go and use the solution of the following equation as the control u k input (in general different from optimal control input!) J k ( x k ) = min u k g k ( x k , u k ) + ˜ J k +1 ( f k ( x k , u k )) known by construction ˜ J k +1 ( x k +1 ) g k ( x k , u k ) + ˜ J k +1 ( f k ( x k , u k )) Typically unknown J k +1 ( x k +1 ) u ∗ u ∗ x k +1 u k k k u k For each state just need to minimize this x k function and compute one action More general than previous ideas! 6

Discussion • Ideas I, II, III can be seen as special cases of approximate dynamic programming. • Idea I - for problems with disturbances, the true cost (requiring the computation of expected values) is approximated by a deterministic cost. For problems without disturbances idea I yields the same policy as DP . • Idea II - approximation is the deterministic cost over a short horizon. This is an approximation even for problems without disturbances. • Idea III - approximation is the cost of the base policy when , and it is the cost H = 1 over a short horizon plus the cost of the base policy if the horizon is larger than one. Again this is an approximation even for problems without disturbances. • Although we mainly consider finite horizon costs the ideas extend naturally to infinite horizon costs. We formalize these statements next. 7

Certainty equivalent control Certainty equivalent control is the following implicit policy for a stage decision problem: P h − 1 x k +1 = f k ( x k , u k , w k ) k =0 g k ( x k , u k ) + g h ( x h ) 1. At each stage (initially ), assuming that the full state is measured solve the k k = 0 x k following problem (for initial condition ), assuming no disturbances ( w k = 0 , w k +1 = 0 , . . . ) x k P h − 1 ` = k g ` ( x ` , u ` ) + g h ( x h ) x ` +1 = f ` ( x ` , u ` , 0) ` ∈ { k, k + 1 , . . . , h − 1 } and obtain estimates of the optimal control inputs u k , u k +1 , . . . , u h − 1 2. Take the first decision and apply it to the process u k (the system will evolve to in general different from x k +1 = f k ( x k , , w k ) x k +1 = f k ( x k , u k , 0) u k due to disturbances). Repeat (go to 1.) u 2 u 1 x 2 x 2 x 2 u h − 1 x 1 u 0 u 1 0 0 h h 1 2 h − 1 1 2 h − 1 3 k = 0 k = 1 8

Certainty equivalent control For problems with no disturbances, this is equivalent to min u k g k ( x k , u k ) + ˜ s.t. x ` +1 = f ` ( x ` , u ` , 0) J k +1 ( x k +1 ) ` ∈ { k, k + 1 , . . . , h − 1 } k + H − 1 ˜ X J k +1 ( x k +1 ) = min g ` ( x ` , u ` ) s.t. (1) u k +1 ,...,u k + H − 1 initial condition x k +1 ` = k +1 and select the first and apply it. u k Note that, for problems without disturbances, are the optimal costs-to-go and satisfy ˜ J k the DP equation ˜ u k g k ( x k , u k ) + ˜ J k ( x k ) = min J k +1 ( x k +1 ) For problems with stochastic disturbances ˜ J k are approximations of optimal costs-to-go J k ( x k ) = min u k g k ( x k , u k ) + E [ J k +1 ( x k +1 )] x ` +1 = f ` ( x ` , u ` , w ` ) ` ∈ { k, k + 1 , . . . , h − 1 } Approximated by ˜ J k +1 ( x k +1 ) 9

Model Predictive Control x k time/stage k k + H h • At each time consider the optimal control problem only in a k horizon H k + H − 1 X min g ` ( x ` , u ` ) x ` +1 = f ` ( x ` , u ` ) s.t. (1) u k ,u h +1 ,...,u k + H − 1 ` ∈ { k, k + 1 , . . . , k + H − 1 } ` = k and select the first control input resulting from this optimization. u k • Equivalent to min u k g k ( x k , u k ) + ˜ J k +1 ( x k +1 ) k + H − 1 ˜ X s.t. (1) J k +1 ( x k +1 ) = min g ` ( x ` , u ` ) ` ∈ { k + 1 , k + 2 , . . . , k + H − 1 } u k +1 ,...,u k + H − 1 10 ` = k +1

Model Predictive Control • Note that at the last decision stages k ∈ { h − H, h − H − 1 , . . . , h } the cost function is slightly different for finite horizon problems, including also the terminal cost k + H − 1 X + g h ( x h ) min g ` ( x ` , u ` ) s.t. (1) u k ,u h +1 ,...,u k + H − 1 ` = k ` ∈ { k, k + 1 , . . . , h − 1 } • There are several variants of Model Predictive Control and in particular some variants use a terminal constraint (this is useful to prove stability). For example, impose that the state after the horizon must be zero (for finite-horizon problems at x k + H = 0 the last stages ). x h = 0 11

Rollout • Similar to MPC but use a base policy after the horizon u k = ¯ µ ( x k ) x k time/stage k k + H h • At each time consider the optimal control problem in a horizon H assuming that after the horizon a base policy is used h − 1 k + H − 1 s.t. (1) X X + g ` ( x ` , ¯ µ ` ( x ` )) + g h ( x h ) min g ` ( x ` , u ` ) u k ,u k +1 ,...,u k + H − 1 ` ∈ { k, k + 1 , . . . , k + H − 1 } ` = k ` = k + H and select the first control input resulting from this optimization. u k • Equivalent to u k g k ( x k , u k ) + ˜ min J k +1 ( x k +1 ) k + H − 1 h − 1 ˜ X X J k +1 ( x k +1 ) = min g ` ( x ` , u ` ) + g ` ( x ` , ¯ µ ` ( x ` )) + g h ( x h ) u k +1 ,...,u k + H − 1 ` = k +1 ` = k + H s.t. (1) ` ∈ { k + 1 , k + 2 , . . . , k + H − 1 } 12

Further remarks on ADP • Quality of an approximation is measured by how “good” is or u ∗ k how close it is from optimal and not by how “good” is the u ∗ k approximation of ˜ J k +1 ( x k +1 ) J k +1 ( x k +1 ) • Decisions only need to be computed (in real-time) for the value of the present state (do not need to iterate the cost-to-go). • There are several variants to approximate the costs-to-go. • Due to the heuristic nature of the approximation, it is very hard to quantify when a specific approximation method is good and to establish formal results. • For example we have seen that the optimal policy for the infinite horizon problem linear quadratic regulator problem makes the closed-loop of a linear system stable. This is typically very hard to establish for approximate methods. 13

Outline • Approximate dynamic programming • Linear quadratic control with inequality constraints • Control of switched systems

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Motivation Last two lectures we discussed two alternative methods to DP for stage decision problems: discretisation and static optimization. Discretization

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

WEYERHAEUSER EARNINGS RESULTS 1ST QUARTER 2019 | April 26, 2019 FORWARD-LOOKING STATEMENTS

Low / Mod National Objective & Income Determination 2019 CDBG-DR Problem Solving Clinic

How to capture the pension risk transfer data dividend http://linkedin.com/company/club-vita/

Perpetuities and Annuities (Welch, Chapter 03) Ivo Welch UCLA Anderson School, Corporate

ts tr r P

CS 573: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington

Responses to Climate Change in a Dynamic Stochastic Economy 1 Yongyang Cai The Ohio State

Portfolio Selection with Estimation Risk: a Test-based Approach Bertille Antoine Simon Fraser

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Motivation Last two lectures we discussed two alternative methods to DP for stage decision problems: discretisation and static optimization. Discretization

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

WEYERHAEUSER EARNINGS RESULTS 1ST QUARTER 2019 | April 26, 2019 FORWARD-LOOKING STATEMENTS

Low / Mod National Objective &amp; Income Determination 2019 CDBG-DR Problem Solving Clinic

How to capture the pension risk transfer data dividend http://linkedin.com/company/club-vita/

Perpetuities and Annuities (Welch, Chapter 03) Ivo Welch UCLA Anderson School, Corporate

ts tr r P

CS 573: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington

Responses to Climate Change in a Dynamic Stochastic Economy 1 Yongyang Cai The Ohio State

Portfolio Selection with Estimation Risk: a Test-based Approach Bertille Antoine Simon Fraser

Low / Mod National Objective & Income Determination 2019 CDBG-DR Problem Solving Clinic