Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

Outline • Static optimization approach to PMP • Linear systems, quadratic cost, terminal constraints • Shooting method

Recap: continuous-time optimal control problem Dynamic model x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] Cost function Z T g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0 The goal in this lecture is to find an optimal path ( u ( t ) , x ( t )) using a new tool: Pontryagin’s maximum principle 1

Recap continuous-time approach CT Optimal control path and problem policy Taking the limit Discretization, step τ τ → 0 discretization approach Stage Optimal solve optimal control problem decision path and problem policy • Today we will informally derive a simple version of the Pontryagin’s maximum principle via the discretization approach using static optimization. • The direct approach (in continuous-time) is much more elaborate and is shortly discussed in the appendix (calculus of variations) and in the next lecture. 2

Recall discretization Discretization times discretization step t k = k τ kh = T τ Dynamic model x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] x k +1 = x k + τ f ( x k , u k ) x k = x ( k τ ) u k = u ( k τ ) Cost function Z T g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0 h − 1 X g ( x k , u k ) τ + g h ( x h ) g h ( x ) = g T ( x ) , ∀ x k =0 3

Method of the Lagrange multipliers The Lagrangian is given by h − 1 h − 1 X X λ | L ( x, u, λ ) = g ( x k , u k ) τ + g h ( x h ) + k +1 ( x k + τ f ( x k , u k ) − x k +1 ) k =0 k =0 where λ i ∈ R n x = ( x 1 , x 2 , . . . , x h ) λ = ( λ 1 , . . . , λ h ) u = ( u 0 , u 1 , . . . , u h − 1 ) Then, the optimal solution (optimal path) must satisfy ∂ L ( x, u, λ ) k ∈ { 1 , . . . , h } = 0 ∂ x k ∂ L ( x, u, λ ) k ∈ { 0 , . . . , h − 1 } = 0 ∂ u k ∂ L ( x, u, λ ) k ∈ { 1 , . . . , h } = 0 ∂λ k 4

Recall dimensions For the problem P h − 1 x k +1 = f k ( x k , u k ) k =0 g k ( x k , u k ) + g h ( x h )     λ 1 ,k   Variables u 1 ,k x 1 ,k λ 2 ,k u 2 ,k   x 2 ,k    λ i,k ∈ R λ k =   x i,k ∈ R u i,k ∈ R u k =   x k =     . . .  . . .   . . .   λ n,k u m,k x n,k g h : R n → R Functions g k : R n × R m → R f k : R n × R m → R n   f 1 ,k ( x k , u k ) f 2 ,k ( x k , u k )   f k ( x k , u k ) = .   .   .   f n,k ( x k , u k ) Derivatives h i ∂ ∂ ∂ ∂ ∂ x 1 ,h g h ( x h ) ∂ x 2 ,h g h ( x h ) ∂ x n,h g h ( x h ) ∂ x h g h ( x h ) = . . . 5

Recall dimensions Derivatives ∂ ∂ ∂   ∂ x 1 ,k f 1 ,k ( x k , u k ) ∂ x 2 ,k f 1 ,k ( x k , u k ) ∂ x n,k f 1 ,k ( x k , u k ) . . . ∂ ∂ ∂ ∂ x 1 ,k f 2 ,k ( x k , u k ) ∂ x 2 ,k f 2 ,k ( x k , u k ) ∂ x n,k f 2 ,k ( x k , u k ) . . .   ∂   ∂ x k f k ( x k , u k ) = ∈ R n × n . . . .   . . . .  . . . .    ∂ ∂ ∂ ∂ x 1 ,k f n,k ( x k , u k ) ∂ x 2 ,k f n,k ( x k , u k ) ∂ x n,k f n,k ( x k , u k ) . . . ∂ ∂ ∂   ∂ u 1 ,k f 1 ,k ( x k , u k ) ∂ u 2 ,k f 1 ,k ( x k , u k ) ∂ u m,k f 1 ,k ( x k , u k ) . . . ∂ ∂ ∂ ∂ u 1 ,k f 2 ,k ( x k , u k ) ∂ u 2 ,k f 2 ,k ( x k , u k ) ∂ u m,k f 2 ,k ( x k , u k ) . . .   ∂   ∂ u k f k ( x k , u k ) = ∈ R n × m . . . .   . . . .  . . . .    ∂ ∂ ∂ ∂ u 1 ,k f n,k ( x k , u k ) ∂ u 2 ,k f n,k ( x k , u k ) ∂ u m,k f n,k ( x k , u k ) . . . h i ∂ ∂ ∂ ∂ ∂ x 1 ,k g k ( x k , u k ) ∂ x 2 ,k g k ( x k , u k ) ∂ x n,k g k ( x k , u k ) ∂ x k g k ( x k , u k ) = ∈ R 1 × n . . . h i ∂ ∂ ∂ ∂ ∂ u 1 ,k g k ( x k , u k ) ∂ u 2 ,k g k ( x k , u k ) ∂ u m,k g k ( x k , u k ) ∈ R 1 × m ∂ u k g k ( x k , u k ) = . . . 6

Optimality conditions ∂ ∂ g ( x k , u k ) τ + λ | k +1 ( I + f ( x k , u k ) τ ) − λ k = 0 ∂ x k ∂ x k ∂ L ( x, u, λ ) k ∈ { 1 , . . . , h − 1 } = 0 ∂ x k f ( x k , u k )) = − ( λ | k +1 − λ | ∂ k +1 ( ∂ k g ( x k , u k ) + λ | ) ∂ x k ∂ x k τ ∂ L ( x, u, λ ) ∂ ∂ x h g h ( x h ) − λ | h = 0 = 0 ∂ x h ∂ L ( x, u, λ ) ∂ ∂ g ( x k , u k ) τ + λ | f ( x k , u k ) τ = 0 k ∈ { 0 , . . . , h − 1 } = 0 k +1 ∂ u k ∂ u k ∂ u k x k = x k − 1 + τ f ( x k − 1 , u k − 1 ) k ∈ { 1 , . . . , h } ∂ L ( x, u, λ ) = 0 ∂λ k x k − x k − 1 = f ( x k − 1 , u k − 1 ) τ 7

Taking the limit τ → 0 ¯ Let λ ( t ) = λ k , t ∈ [ k τ , ( k + 1) τ ) Assuming that (wishful thinking....), as , converges to a continuously differentiable ¯ τ → 0 λ ( t ) function, then f ( x k , u k )) = − ( λ | k +1 − λ | ∂ k +1 ( ∂ τ → 0 ˙ k ¯ ∂ x ) | ¯ g ( x k , u k ) + λ | λ ( t ) = − ( ∂ f λ ( t ) − ( ∂ g ) ∂ x ) | ∂ x k ∂ x k τ τ → 0 x k − x k − 1 Moreover, naturally = f ( x k − 1 , u k − 1 ) x ( t ) = f ( x ( t ) , u ( t )) ˙ τ and we also have τ → 0 ∂ ∂ ∂ u g ( x ( t ) , u ( t )) + ¯ ∂ λ ( t ) | ∂ g ( x k , u k ) τ + λ | ∂ u f ( x ( t ) , u ( t )) = 0 f ( x k , u k ) τ = 0 k +1 ∂ u k ∂ u k ∂ τ → 0 ¯ ∂ g T ( x h ) − λ | h = 0 λ ( T ) = ∂ x g T ( x ( T )) | ∂ x h 8

Pontryagin’s maximum principle (no constraints state and no input constraints, free terminal state) If is an optimal path for the continuous-time optimal ( u ∗ ( t ) , x ∗ ( t )) control problem, then there exists a function , denoted λ ( t ) , t ∈ [0 , T ] by co-state, such that x 0 (given) x ∗ ( t ) = f ( x ∗ ( t ) , u ∗ ( t )) ˙ x (0) = ¯ ˙ λ ( t ) = − ( ∂ ∂ x f ( x ∗ ( t ) , u ∗ ( t ))) | λ ( t ) − ( ∂ ∂ x g ( x ∗ ( t ) , u ∗ ( t ))) | ∂ (terminal constraint λ ( T ) = ∂ x g T ( x ∗ ( T )) | for the co-state) ∂ u g ( x ∗ ( t ) , u ∗ ( t )) | = 0 ∂ ∂ ∂ u f ( x ∗ ( t ) , u ∗ ( t )) | λ ( t ) + 9

Discussion • The previous result is a special case of the Pontryagin’s maximum principle. • The formal proof of the Pontryagin’s maximum principle is very elaborate and uses arguments radically different from the intuitive arguments that we have used. • However, the intuition provided from static optimization is very useful to reason about the conditions appearing in the theorem. • For example, consider the following problem x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] x ( T ) = ¯ x f Z T min g ( x ( t ) , u ( t )) dt u 0 Following the discretization + static optimization approach, we obtain the same necessary equations for optimality except ∂ g T ( x h ) − λ | h = 0 ∂ x h since the terminal state is now constant. In fact, the next result holds. 10

Pontryagin’s maximum principle (no constraints state and no input constraints, constrained terminal state) If is an optimal path for the continuous-time optimal ( u ∗ ( t ) , x ∗ ( t )) control problem with terminal constraint . , then there exists a x ( T ) = ¯ x f function such that λ ( t ) , t ∈ [0 , T ] (given) x ∗ ( t ) = f ( x ∗ ( t ) , u ∗ ( t )) ˙ x (0) = ¯ x ( T ) = ¯ x f x 0 ˙ λ ( t ) = − ( ∂ ∂ x f ( x ∗ ( t ) , u ∗ ( t ))) | λ ( t ) − ( ∂ ∂ x g ( x ∗ ( t ) , u ∗ ( t ))) | ∂ u g ( x ∗ ( t ) , u ∗ ( t )) | = 0 ∂ ∂ ∂ u f ( x ∗ ( t ) , u ∗ ( t )) | λ ( t ) + Note that contrarily to the previous case there is no constraint on the terminal value of the co-state. 11

Example Consider a problem similar to a linear quadratic regulation problem for a scalar system but where the additive control input is a nonlinear function ` x ( t ) = ax ( t ) + ` ( u ( t )) ˙ x (0) = 1 R T 0 qx ( t ) 2 + ru ( t ) 2 dt + g T x ( T ) 2 ) min 1 2 ( PMP equations x ( t ) = ax ( t ) + ` ( u ( t )) ˙ ˙ λ ( t ) = − a λ ( t ) − qx ( t ) ru ( t ) + λ ( t ) d ` ( u ( t )) = 0 du Boundary conditions λ ( T ) = g T x ( T ) x (0) = 1 12

Example If q = 0 g T = 1 a = − 1 ` ( u ) = − log( u ) r = 1 T = 1 x ( t ) = − x ( t ) − ( t − 1) − 1 (1) ˙ 2 log( x (1)) x ( t ) = − x ( t ) − log( u ( t )) ˙ 2 ˙ λ ( t ) = e t − 1 x (1) λ ( t ) = λ ( t ) * t − 1 2 p 1 u ( t ) = e x (1) u ( t ) − λ ( t ) u ( t ) = 0 x ( t ) = − x ( t ) − ( t − 1) p x (0) = 1 ˙ − log( x (1)) 2 λ (1) = x (1) If we integrate the state equation (1) from zero to (variation of T = 1 constants formula) we can obtain the value of x (1) R 1 x (1) = e − 1 x (0) 2 + 1 0 e − (1 − s ) ( − s + 2 (1 − log( x (1))) ds |{z} 1 x (1) = 1 2 (1 − log( x (1))(1 − 1 e )) x (1) = 0 . 5215 Replacing in the formulas above we get the optimal path 13 *only the positive root makes sense

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Static optimization approach to PMP Linear systems, quadratic cost, terminal constraints Shooting method Recap: continuous-time optimal control

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Potential Energy and Conservation of Mechanical Energy Conservative and Nonconservative Forces

Detector Challenges in Photon Science. Heinz Graafsma DESY-Hamburg; Germany & University

Pico-second resolution timing measurements Chiara Nociforo GSI Helmholtzzentrum fr

Geometric Numerical Integration: Overview Hamiltonian Systems, Symplectic Transformations

Neutron Transmutation Doping of Silicon Wadysaw Dbrowski AGH University of Science and

TFS Financial Corporation For the quarter ended December 31, 2018 TFS Financial Corporation

CS 312 Algorithm Design Dan Sheldon sheldon@cs.umass.edu dsheldon@mtholyoke.edu http:/

Preserving Historically Significant Buildings A Proposal from the Maynard Historical

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Static optimization approach to PMP Linear systems, quadratic cost, terminal constraints Shooting method Recap: continuous-time optimal control

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Potential Energy and Conservation of Mechanical Energy Conservative and Nonconservative Forces

Detector Challenges in Photon Science. Heinz Graafsma DESY-Hamburg; Germany &amp; University

Pico-second resolution timing measurements Chiara Nociforo GSI Helmholtzzentrum fr

Geometric Numerical Integration: Overview Hamiltonian Systems, Symplectic Transformations

Neutron Transmutation Doping of Silicon Wadysaw Dbrowski AGH University of Science and

TFS Financial Corporation For the quarter ended December 31, 2018 TFS Financial Corporation

CS 312 Algorithm Design Dan Sheldon sheldon@cs.umass.edu dsheldon@mtholyoke.edu http:/

Preserving Historically Significant Buildings A Proposal from the Maynard Historical

Detector Challenges in Photon Science. Heinz Graafsma DESY-Hamburg; Germany & University