Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

Part III Continuous-time optimal control problems

Recap Discrete optimization Stage decision problems problems Dynamic system & Formulation Transition diagram additive cost function Graphical DP algorithm & DP algorithm DP equation DP equation Partial Bayesian inference & decisions Kalman filter and separation information based on prob. distribution principle Alternative Dijkstra's algorithm Static optimization algorithms 1

Goals of part III Introduce optimal control concepts for continuous-time optimal control problems Discrete Stage decision Continuous-time optimization problems control problems problems Discrete-time system & Differential equations & Formulation Transition diagram additive cost function additive cost function DP Graphical DP algorithm Hamilton Jacobi DP equation algorithm & DP equation Bellman equation Bayesian inference & Continuous-time Partial Kalman filter and decisions based on Kalman filter and information separation principle prob. distribution separation principle Alternative Pontryagin’s maximum Dijkstra's algorithm Static optimization algorithms principle And analyze frequency-domain properties of continuous-time LQR/LQG 2

Outline • Problem formulation and approach • Hamilton Jacobi Bellman equation • Linear quadratic regulator

Continuous-time optimal control problems Dynamic model x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] Cost function Z T g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0 Assumptions • The differential equation has a unique solution in t ∈ [0 , T ] • We assume that do not explicitly depend on time for simplicity - we could f, g consider f ( t, x ( t ) , u ( t )) , g ( t, x ( t ) , u ( t )) • and x ( t ) ∈ R n u ( t ) ∈ U ⊆ R m The goal is to find an optimal path and an optimal policy 3

Optimal path • A path consists of a control input and a corresponding ( u ( t ) , x ( t )) , t ∈ [0 , T ] u ( t ) solution to the differential equation x ( t ) x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] • A path is said to be optimal is there is no other path with a smaller cost Z T g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0 • Choosing the control input can be seen as making decisions in infinitesimal time intervals which shape the derivative of the state (and thus determine its evolution) x ( T ) t = T 4

Optimal policy • A policy is a function which maps states into actions at every time step µ u ( t ) = µ ( t, x ( t )) , t ∈ [0 , T ] • A policy is said to be optimal if for every state at every time , x ( t ) = ¯ x t µ Z T g ( x ( s ) , µ ( s, x ( s ))) ds + g T ( x ( T )) t coincides with the cost of the optimal path to the problem x ( s ) = f ( x ( s ) , u ( s )) , ˙ x ( t ) = ¯ s ∈ [ t, T ] x, Z T g ( x ( s ) , u ( s )) ds + g T ( x ( T )) t • We denote the cost of the latter problem by optimal cost-to-go J ( t, ¯ x ) 5

Approach • Dynamic programming (DP) shall allow us to compute optimal policies and optimal paths and the Pontryagin’s maximum principle (PMP) shall allow us to compute optimal paths. • However, obtaining these results in continuous-time (CT) is mathematically involved. • To gain intuition in both cases we will first discretize the problem as a function of the discretization step (previously sampling period), apply DP and take the limit as the discretization step converges to zero. CT Optimal CT DP control path and problem policy Taking the limit Discretization, step τ τ → 0 Stage Optimal DT DP decision path and problem policy 6

Example How to charge the capacitor in a RC circuit with minimum energy loss in the resistor? Z T x (0) = 0 ( x ( t ) − u ( t )) 2 min dt R x ( T ) = x desired u ( t ) 0 R i + 1 x ( t ) = ˙ RC ( u ( t ) − x ( t )) C x + u − − Let us consider R = C = T = x desired = 1 7

Discretization Discretization times discretization step t k = k τ kh = T τ Dynamic model x ( t ) = e − ( t − t k ) x ( t k ) +(1 − e − ( t − t k ) ) u ( t k ) t ∈ [ t k , t k +1 ) | {z } | {z } x k u k x k +1 = e − τ x k + (1 − e − τ ) u k Cost function Z 1 Z t k +1 h − 1 ( x ( t ) − u ( t )) 2 dt = X ( e − ( t − t k ) x k + (1 − e − ( t − t k ) ) u k − u k ) 2 dt 0 t k k =0 Z t k +1 h − 1 X e − 2( t − t k ) dt ( x k − u k ) 2 = t k k =0 h − 1 1 − e − 2 τ X ( x k − u k ) 2 = 2 k =0 8

From terminal constraint to terminal cost The framework of stage decision problems does not take into account terminal constraints. Thus we apply a trick considering that a final control input is applied at the terminal time setting the state to the desired terminal value after seconds, . x (1 + ∆ ) = 1 ∆ x ( t ) 1 time 1 1 + ∆ Since this terminal control input is given by x (1 + ∆ ) = e − ∆ x (1) + (1 − e − ∆ ) u (1) u (1) = 1 − e − ∆ x (1) (1 − e − ∆ ) 9

From terminal constraint to terminal cost The following cost approximates the original one that we are interested in Z 1+ ∆ Z 1 Z 1+ ∆ ( x ( t ) − u ( t )) 2 dt = ( x ( t ) − u ( t )) 2 dt + ( x ( t ) − u ( t )) 2 dt 0 0 1 h − 1 ( x k − u k ) 2 ) + 1 − e − 2 ∆ 1 − e − 2 τ X ( x h − u h ) 2 =( 2 2 1 − e − ∆ x (1) k =0 (1 − e − ∆ ) h − 1 1 − e − 2 τ X ( x k − u k ) 2 ) + γ ( ∆ )( x h − 1) 2 =( 2 k =0 terminal cost 1 − e − 2 ∆ γ ( ∆ ) = 2(1 − e − ∆ ) 2 γ ( ∆ ) → ∞ as Note that but if γ ( ∆ )( x h − 1) 2 → 0 ∆ → 0 x h → 1 10

Dynamic programming Applying DP ( x k − u k ) 2 + J k +1 ( e − τ x k + (1 − e − τ ) u k ) J k ( x k ) = min u k J h ( x h ) = γ ( ∆ )( x h − 1) 2 Results in Obtained from Riccati equations u k = K k x k + α k J k ( x k ) = θ k x 2 k + γ k x k + β k Example 1 2 ∆ = 0 . 01 0.8 1.5 τ = 0 . 2 0.6 x(t) u(t) 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t 11

Taking the limit τ → 0 1 2 0.8 1.5 ∆ = 0 . 01 0.6 x(t) u(t) 1 τ = 0 . 05 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t 1 2 0.8 1.5 ∆ = 0 . 01 0.6 x(t) u(t) τ = 0 . 01 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t 1 2 ∆ = 0 . 001 0.8 1.5 τ = 0 . 01 0.6 u(t) x(t) 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t x ( t ) = t . Later we will prove this. Seems to be converging to u ( t ) = 1 + t 12

Static optimization Static optimization problem which can handle constraints h − 1 (1 − e − 2 τ ) X ( x k − u k ) 2 min 2 u 0 ,...,u h − 1 k =0 x k +1 = e − τ x k + (1 − e − τ ) u k k ∈ { 0 , . . . , h − 1 } s.t. x h = 1 x 0 = 0 Lagrangian h − 1 h − 1 (1 − e − 2 τ ) X X ( x k − u k ) 2 + L ( x 1 , u 0 , λ 1 , . . . , x h − 1 , u h − 1 , λ h ) = λ k +1 ( e − τ x k +(1 − e − τ ) u k − x k +1 ) 2 k =0 k =0 Necessary optimality conditions amount to solving a linear system (when ) x 0 = 0 x h = 1 ∂ L λ k = (1 − e − 2 τ )( x k − u k ) + λ k +1 e − τ k ∈ { 1 , . . . , h − 1 } = 0 ∂ x k ∂ L 0 = (1 − e − 2 τ )( x k − u k ) + λ k +1 (1 − e − τ ) = 0 k ∈ { 0 , . . . , h − 1 } ∂ u k ∂ L x k +1 = e − τ x k + (1 − e − τ ) u k = 0 k ∈ { 0 , . . . , h − 1 } ∂λ k +1 13

Taking the limit τ → 0 1 2 0.8 1.5 0.6 x(t) u(t) τ = 0 . 2 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t 1 2 0.8 1.5 0.6 u(t) x(t) τ = 0 . 05 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t 1 2 0.8 1.5 0.6 τ = 0 . 01 x(t) u(t) 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t Again, seems to be converging to u ( t ) = 1 + t x ( t ) = t 14

Discussion • In this lecture we follow this discretization approach (the more formal continuous-time approach can be found in Bertsekas’ book) to derive the counterpart of DP for continuous-time control problems, which is the Hamilton Jacobi Bellman equation • Later we will use both the discretization approach and the continuous-time approach to derive the Pontryagin’s maximum principle. • With such tools we will be able to establish the optimal solution for charging the capacitor, and solve many other problems. CT PMP CT Optimal control CT DP path and problem policy Taking the limit Discretization, step τ τ → 0 DT PMP Stage Optimal decision path and DT DP problem policy 15

Outline • Problem formulation and approach • Hamilton Jacobi Bellman equation • Linear quadratic regulator

Discretization approach Discretization times discretization step t k = k τ kh = T τ Dynamic model x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] x k +1 = x k + τ f ( x k , u k ) x k = x ( k τ ) u k = u ( k τ ) Cost function Z T g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0 h − 1 X g h ( x ) = g T ( x ) , ∀ x g ( x k , u k ) τ + g h ( x h ) k =0 • Note that these are approximate discretizations. We could have considered exact discretization, as in the linear case, but this approximation will suffice. 16

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III Continuous-time optimal control problems Recap Discrete optimization Stage decision problems problems Dynamic system & Formulation Transition

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Static

INC 212 Signals and systems Lecture#2: Response of system Assoc. Prof. Benjamas Panomruttanarug

Mixed-Signal VLSI Design Course Code: EE719 Department: Electrical Engineering Lecture 14:

The HADES experiment @GSI Collaboration: 18 Inst. 125 members Mot ivat ion Det ect or

NOISE in OAs [SLYT-051 + SLVA043 + AN-104 + AN-358] Noise factor Noise figure Note: these

A universal differential equation Olivier Bournez, Amaury Pouly 21 mars 2017 1 / 14 Digital vs

Math 211 Math 211 Lecture #3 Solutions to Differential Equations August 29, 2003 2

MA-207 Differential Equations II Ronnie Sebastian Department of Mathematics Indian Institute of

FEMBs Design of FEMBs with LArASIC, ColdADC, FPGA/COLDATA Jack Fried on behalf of the CE group