optimal control and dynamic programming
play

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Static optimization approach to PMP Linear systems, quadratic cost, terminal constraints Shooting method Recap: continuous-time optimal control


  1. Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

  2. Outline • Static optimization approach to PMP • Linear systems, quadratic cost, terminal constraints • Shooting method

  3. Recap: continuous-time optimal control problem Dynamic model x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] Cost function Z T g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0 The goal in this lecture is to find an optimal path ( u ( t ) , x ( t )) using a new tool: Pontryagin’s maximum principle 1

  4. Recap continuous-time approach CT Optimal control path and problem policy Taking the limit Discretization, step τ τ → 0 discretization approach Stage Optimal solve optimal control problem decision path and problem policy • Today we will informally derive a simple version of the Pontryagin’s maximum principle via the discretization approach using static optimization. • The direct approach (in continuous-time) is much more elaborate and is shortly discussed in the appendix (calculus of variations) and in the next lecture. 2

  5. Recall discretization Discretization times discretization step t k = k τ kh = T τ Dynamic model x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] x k +1 = x k + τ f ( x k , u k ) x k = x ( k τ ) u k = u ( k τ ) Cost function Z T g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0 h − 1 X g ( x k , u k ) τ + g h ( x h ) g h ( x ) = g T ( x ) , ∀ x k =0 3

  6. Method of the Lagrange multipliers The Lagrangian is given by h − 1 h − 1 X X λ | L ( x, u, λ ) = g ( x k , u k ) τ + g h ( x h ) + k +1 ( x k + τ f ( x k , u k ) − x k +1 ) k =0 k =0 where λ i ∈ R n x = ( x 1 , x 2 , . . . , x h ) λ = ( λ 1 , . . . , λ h ) u = ( u 0 , u 1 , . . . , u h − 1 ) Then, the optimal solution (optimal path) must satisfy ∂ L ( x, u, λ ) k ∈ { 1 , . . . , h } = 0 ∂ x k ∂ L ( x, u, λ ) k ∈ { 0 , . . . , h − 1 } = 0 ∂ u k ∂ L ( x, u, λ ) k ∈ { 1 , . . . , h } = 0 ∂λ k 4

  7. Recall dimensions For the problem P h − 1 x k +1 = f k ( x k , u k ) k =0 g k ( x k , u k ) + g h ( x h )     λ 1 ,k   Variables u 1 ,k x 1 ,k λ 2 ,k u 2 ,k   x 2 ,k    λ i,k ∈ R λ k =   x i,k ∈ R u i,k ∈ R u k =   x k =     . . .  . . .   . . .   λ n,k u m,k x n,k g h : R n → R Functions g k : R n × R m → R f k : R n × R m → R n   f 1 ,k ( x k , u k ) f 2 ,k ( x k , u k )   f k ( x k , u k ) = .   .   .   f n,k ( x k , u k ) Derivatives h i ∂ ∂ ∂ ∂ ∂ x 1 ,h g h ( x h ) ∂ x 2 ,h g h ( x h ) ∂ x n,h g h ( x h ) ∂ x h g h ( x h ) = . . . 5

  8. Recall dimensions Derivatives ∂ ∂ ∂   ∂ x 1 ,k f 1 ,k ( x k , u k ) ∂ x 2 ,k f 1 ,k ( x k , u k ) ∂ x n,k f 1 ,k ( x k , u k ) . . . ∂ ∂ ∂ ∂ x 1 ,k f 2 ,k ( x k , u k ) ∂ x 2 ,k f 2 ,k ( x k , u k ) ∂ x n,k f 2 ,k ( x k , u k ) . . .   ∂   ∂ x k f k ( x k , u k ) = ∈ R n × n . . . .   . . . .  . . . .    ∂ ∂ ∂ ∂ x 1 ,k f n,k ( x k , u k ) ∂ x 2 ,k f n,k ( x k , u k ) ∂ x n,k f n,k ( x k , u k ) . . . ∂ ∂ ∂   ∂ u 1 ,k f 1 ,k ( x k , u k ) ∂ u 2 ,k f 1 ,k ( x k , u k ) ∂ u m,k f 1 ,k ( x k , u k ) . . . ∂ ∂ ∂ ∂ u 1 ,k f 2 ,k ( x k , u k ) ∂ u 2 ,k f 2 ,k ( x k , u k ) ∂ u m,k f 2 ,k ( x k , u k ) . . .   ∂   ∂ u k f k ( x k , u k ) = ∈ R n × m . . . .   . . . .  . . . .    ∂ ∂ ∂ ∂ u 1 ,k f n,k ( x k , u k ) ∂ u 2 ,k f n,k ( x k , u k ) ∂ u m,k f n,k ( x k , u k ) . . . h i ∂ ∂ ∂ ∂ ∂ x 1 ,k g k ( x k , u k ) ∂ x 2 ,k g k ( x k , u k ) ∂ x n,k g k ( x k , u k ) ∂ x k g k ( x k , u k ) = ∈ R 1 × n . . . h i ∂ ∂ ∂ ∂ ∂ u 1 ,k g k ( x k , u k ) ∂ u 2 ,k g k ( x k , u k ) ∂ u m,k g k ( x k , u k ) ∈ R 1 × m ∂ u k g k ( x k , u k ) = . . . 6

  9. Optimality conditions ∂ ∂ g ( x k , u k ) τ + λ | k +1 ( I + f ( x k , u k ) τ ) − λ k = 0 ∂ x k ∂ x k ∂ L ( x, u, λ ) k ∈ { 1 , . . . , h − 1 } = 0 ∂ x k f ( x k , u k )) = − ( λ | k +1 − λ | ∂ k +1 ( ∂ k g ( x k , u k ) + λ | ) ∂ x k ∂ x k τ ∂ L ( x, u, λ ) ∂ ∂ x h g h ( x h ) − λ | h = 0 = 0 ∂ x h ∂ L ( x, u, λ ) ∂ ∂ g ( x k , u k ) τ + λ | f ( x k , u k ) τ = 0 k ∈ { 0 , . . . , h − 1 } = 0 k +1 ∂ u k ∂ u k ∂ u k x k = x k − 1 + τ f ( x k − 1 , u k − 1 ) k ∈ { 1 , . . . , h } ∂ L ( x, u, λ ) = 0 ∂λ k x k − x k − 1 = f ( x k − 1 , u k − 1 ) τ 7

  10. Taking the limit τ → 0 ¯ Let λ ( t ) = λ k , t ∈ [ k τ , ( k + 1) τ ) Assuming that (wishful thinking....), as , converges to a continuously differentiable ¯ τ → 0 λ ( t ) function, then f ( x k , u k )) = − ( λ | k +1 − λ | ∂ k +1 ( ∂ τ → 0 ˙ k ¯ ∂ x ) | ¯ g ( x k , u k ) + λ | λ ( t ) = − ( ∂ f λ ( t ) − ( ∂ g ) ∂ x ) | ∂ x k ∂ x k τ τ → 0 x k − x k − 1 Moreover, naturally = f ( x k − 1 , u k − 1 ) x ( t ) = f ( x ( t ) , u ( t )) ˙ τ and we also have τ → 0 ∂ ∂ ∂ u g ( x ( t ) , u ( t )) + ¯ ∂ λ ( t ) | ∂ g ( x k , u k ) τ + λ | ∂ u f ( x ( t ) , u ( t )) = 0 f ( x k , u k ) τ = 0 k +1 ∂ u k ∂ u k ∂ τ → 0 ¯ ∂ g T ( x h ) − λ | h = 0 λ ( T ) = ∂ x g T ( x ( T )) | ∂ x h 8

  11. Pontryagin’s maximum principle (no constraints state and no input constraints, free terminal state) If is an optimal path for the continuous-time optimal ( u ∗ ( t ) , x ∗ ( t )) control problem, then there exists a function , denoted λ ( t ) , t ∈ [0 , T ] by co-state, such that x 0 (given) x ∗ ( t ) = f ( x ∗ ( t ) , u ∗ ( t )) ˙ x (0) = ¯ ˙ λ ( t ) = − ( ∂ ∂ x f ( x ∗ ( t ) , u ∗ ( t ))) | λ ( t ) − ( ∂ ∂ x g ( x ∗ ( t ) , u ∗ ( t ))) | ∂ (terminal constraint λ ( T ) = ∂ x g T ( x ∗ ( T )) | for the co-state) ∂ u g ( x ∗ ( t ) , u ∗ ( t )) | = 0 ∂ ∂ ∂ u f ( x ∗ ( t ) , u ∗ ( t )) | λ ( t ) + 9

  12. Discussion • The previous result is a special case of the Pontryagin’s maximum principle. • The formal proof of the Pontryagin’s maximum principle is very elaborate and uses arguments radically different from the intuitive arguments that we have used. • However, the intuition provided from static optimization is very useful to reason about the conditions appearing in the theorem. • For example, consider the following problem x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] x ( T ) = ¯ x f Z T min g ( x ( t ) , u ( t )) dt u 0 Following the discretization + static optimization approach, we obtain the same necessary equations for optimality except ∂ g T ( x h ) − λ | h = 0 ∂ x h since the terminal state is now constant. In fact, the next result holds. 10

  13. Pontryagin’s maximum principle (no constraints state and no input constraints, constrained terminal state) If is an optimal path for the continuous-time optimal ( u ∗ ( t ) , x ∗ ( t )) control problem with terminal constraint . , then there exists a x ( T ) = ¯ x f function such that λ ( t ) , t ∈ [0 , T ] (given) x ∗ ( t ) = f ( x ∗ ( t ) , u ∗ ( t )) ˙ x (0) = ¯ x ( T ) = ¯ x f x 0 ˙ λ ( t ) = − ( ∂ ∂ x f ( x ∗ ( t ) , u ∗ ( t ))) | λ ( t ) − ( ∂ ∂ x g ( x ∗ ( t ) , u ∗ ( t ))) | ∂ u g ( x ∗ ( t ) , u ∗ ( t )) | = 0 ∂ ∂ ∂ u f ( x ∗ ( t ) , u ∗ ( t )) | λ ( t ) + Note that contrarily to the previous case there is no constraint on the terminal value of the co-state. 11

  14. Example Consider a problem similar to a linear quadratic regulation problem for a scalar system but where the additive control input is a nonlinear function ` x ( t ) = ax ( t ) + ` ( u ( t )) ˙ x (0) = 1 R T 0 qx ( t ) 2 + ru ( t ) 2 dt + g T x ( T ) 2 ) min 1 2 ( PMP equations x ( t ) = ax ( t ) + ` ( u ( t )) ˙ ˙ λ ( t ) = − a λ ( t ) − qx ( t ) ru ( t ) + λ ( t ) d ` ( u ( t )) = 0 du Boundary conditions λ ( T ) = g T x ( T ) x (0) = 1 12

  15. Example If q = 0 g T = 1 a = − 1 ` ( u ) = − log( u ) r = 1 T = 1 x ( t ) = − x ( t ) − ( t − 1) − 1 (1) ˙ 2 log( x (1)) x ( t ) = − x ( t ) − log( u ( t )) ˙ 2 ˙ λ ( t ) = e t − 1 x (1) λ ( t ) = λ ( t ) * t − 1 2 p 1 u ( t ) = e x (1) u ( t ) − λ ( t ) u ( t ) = 0 x ( t ) = − x ( t ) − ( t − 1) p x (0) = 1 ˙ − log( x (1)) 2 λ (1) = x (1) If we integrate the state equation (1) from zero to (variation of T = 1 constants formula) we can obtain the value of x (1) R 1 x (1) = e − 1 x (0) 2 + 1 0 e − (1 − s ) ( − s + 2 (1 − log( x (1))) ds |{z} 1 x (1) = 1 2 (1 − log( x (1))(1 − 1 e )) x (1) = 0 . 5215 Replacing in the formulas above we get the optimal path 13 *only the positive root makes sense

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend