Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

Recall Discrete Stage decision Continuous-time optimization problems control problems problems Discrete-time system & Differential equations & Formulation Transition diagram additive cost function additive cost function DP Graphical DP algorithm Hamilton Jacobi DP equation algorithm & DP equation Bellman equation Bayesian inference & Continuous-time Partial Kalman filter and decisions based on Kalman filter and information separation principle prob. distribution separation principle Alternative Pontryagin’s maximum Dijkstra's algorithm Static optimization algorithms principle (PMP) Today: continuous-time Kalman filter and separation principle And a new topic - frequency domain properties of LQR 1

Outline • Linear quadratic control, Kalman filter, separation principle • Frequency domain properties of LQR

Linear quadratic control The analogous problem to linear quadratic control for continuous-time systems would be Z T u ( t )= µ ( t,x ( t )) E [ min x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt + x ( T ) | Q T x ( T )] 0 x ( t ) = Ax ( t ) + Bu ( t )+ w ( t ) ˙ However, how to define disturbances for continuous-time systems? It it quite challenging! White noise disturbances are one of the few ways to define disturbances without ‘‘memory’ for continuous-time systems. 2

White noise Let us start with a scalar white noise process ω ( t ) ∈ R ω time Very interesting (or strange!) properties: • it is continuous but not differentiable anywhere • the integral over a finite interval is infinite • does not exist in nature • Even for a small time interval , and are uncorrelated, that ω ( t ) ω ( t + δ ) δ is the autocorrelation is zero R ( τ ) = E [ ω ( t ) ω ( t + τ )] = 0 • When , (infinite power) E [ ω ( t ) ω ( t )] = E [ ω ( t ) 2 ] = ∞ δ = 0 • The auto-correlation is then a Dirac delta function R ( τ ) = a δ ( t ) and the scalar process white noise is characterized by the amplitude 3

Random walk The integral of white noise is called random walk or the Wiener process x ( t ) = w ( t ) ˙ and it is more intuitive and it is easier to handle mathematically w ( t ) x ( t ) 1 s • has now finite power x ( t ) • x ( t ) and are correlated x ( t + τ ) • We shall assume that is Gaussian for each fixed time and this implies w ( t ) that is also Gaussian for fixed time. x ( t ) 4

Discussion • In a similar way to the Wiener process the solution to this stochastic differential equation x ( t ) = Ax ( t ) + Bu ( t )+ ˙ w ( t ) is more intuitive than white noise. • If , and we assume that w ( t ) ∈ R n x ( t ) ∈ R n w ( t ) = N ¯ w ( t ) ⇥ ¯ w p ( t ) ⇤ | w ( t ) = ¯ w 1 ( t ) w 2 ( t ) ¯ ¯ . . . where are Gaussian white noise scalar variables and uncorrelated w i ( t ) ¯ Thus, , E [ ¯ w ( t ) ¯ w ( t + τ ) | ] = I δ ( τ ) E [ w ( t ) w ( t + τ ) | ] = NN | δ ( τ ) := W δ ( τ ) 5

Discussion • It is possible to prove that the discretized system takes the form x k := x ( t k ) t k = k τ x k +1 = A d x k + B d u k + w k u ( t ) = u k , t ∈ [ t k , t k +1 ) Z τ A d = e A τ where as before , and are zero-mean e As Bds B d = w k 0 Gaussian random independent variables with covariance Z τ e As We A | s ds E [ w k w | k ] = 0 • The cost can also be written in terms of the discrete-time variables and it is also a quadratic function. • Since the optimal control policy for such system would be the same linear state feedback control law as for the deterministic version of the problem, the next results come with no surprise. 6

Finite horizon linear quadratic control The optimal control law for the problem Z T u ( t )= µ ( t,x ( t )) E [ min x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt + x ( T ) | Q T x ( T )] 0 Q > 0 R > 0 x ( t ) = Ax ( t ) + Bu ( t )+ ˙ w ( t ) where is zero-mean Gaussian white noise with , w ( t ) E [ w ( t ) w ( t + τ ) | ] = W δ ( τ ) K ( t ) = − R − 1 B | P ( t ) x ( t ) is , where u ( t ) = K ( t ) x ( t ) ˙ P ( t ) = − ( A | P ( t ) + P ( t ) A − P ( t ) BR − 1 B | P ( t ) + Q ) P ( T ) = Q T t ∈ [0 , T ) 7

Infinite horizon linear quadratic control The optimal control law for the problem Z T 1 u ( t )= µ ( x ( t )) lim min T E [ x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt ] Q > 0 T →∞ 0 R > 0 x ( t ) = Ax ( t ) + Bu ( t )+ ˙ w ( t ) ( A, B ) controllable where is zero-mean Gaussian white noise with , w ( t ) E [ w ( t ) w ( t + τ ) | ] = W δ ( τ ) K = − R − 1 B | P is , where is the unique positive u ( t ) = Kx ( t ) P definite solution to the (continuous-time) algebraic Riccati equation A | P + PA − PBR − 1 B | P + Q = 0 8

Output feedback linear quadratic control Problem formulation Z T u ( t )= µ ( t,I ( t )) E [ min x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt + x ( T ) | Q T x ( T )] 0 x ( t ) = Ax ( t ) + Bu ( t )+ w ( t ) ˙ y ( t ) = Cx ( t ) + n ( t ) - information set I ( t ) = { y ( s ) , u ( s ) | s ∈ [0 , t ) } - zero-mean Gaussian white noise with w ( t ) E [ w ( t ) w ( t + τ ) | ] = W δ ( τ ) - zero-mean Gaussian white noise with n ( t ) E [ n ( t ) n ( t + τ ) | ] = V δ ( τ ) - Gaussian random vector with mean and covariance ¯ x (0) Φ 0 ¯ x 0 9

Discussion • The solution to this problem is analogous to the discrete-time case. • In particular there is also a separation principle: the optimal controller consists of an optimal estimator (Kalman filter) + optimal controller (LQR) • The derivation of the Kalman filter (in continuous-time is known as the Kalman-Bucy filter) and of this result is mathematically quite involved. • We simply state the results next, without further justification. 10

Kalman-Bucy filter Consider the problem of finding an estimator for the state of ˆ x x ( t ) = Ax ( t ) + Bu ( t )+ w ( t ) ˙ E [ w ( t ) w ( t + τ ) | ] = W δ ( τ ) as a function of the information set which includes the measurements E [ n ( t ) n ( t + τ ) | ] = V δ ( τ ) y ( t ) = Cx ( t ) + n ( t ) where , are zero-mean Gaussian white noise and the initial state n ( t ) w ( t ) is zero-mean Gaussian random variable. The optimal estimator in the sense that minimizes c | E [(ˆ x ( t ) − x ( t ))(ˆ x ( t ) − x ( t )) | | I ( t )] c for any constant vector is the Kalman-Bucy filter c ˙ x ( t ) = A ˆ ˆ x ( t ) + Bu ( t ) + L ( t )( y ( t ) − C ˆ x ( t )) x (0) = ¯ ˆ x 0 Φ ( t ) = A Φ ( t ) + Φ ( t ) A | + W − Φ ( t ) C | V − 1 C Φ ( t ) ˙ L ( t ) = Φ ( t ) C | V − 1 t ≥ 0 x 0 ) | ] = ¯ Φ (0) = E [( x (0) − ¯ x 0 )( x (0) − ¯ Φ 0 11

LQG - Separation principle The optimal control input for the output feedback linear quadratic optimal control problem is u ( t ) = K ( t )ˆ x ( t ) ˙ x ( t ) = A ˆ ˆ x ( t ) + Bu ( t ) + L ( t )( y ( t ) − C ˆ x ( t )) where K ( t ) = − R − 1 B | P ( t ) ˙ P ( t ) = − ( A | P ( t ) + P ( t ) A − P ( t ) BR − 1 B | P ( t ) + Q ) P ( T ) = Q T t ∈ [0 , T ) Φ ( t ) = A Φ ( t ) + Φ ( t ) A | + W − Φ ( t ) C | V − 1 C Φ ( t ) ˙ L ( t ) = Φ ( t ) C | V − 1 t ∈ [0 , T ) x 0 ) | ] = ¯ Φ (0) = E [( x (0) − ¯ x 0 )( x (0) − ¯ Φ 0 12

LQG - Separation principle If instead of the finite-horizon cost, we consider Z T 1 (1) u ( t )= µ ( t,I ( t )) lim min T E [ x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt ] T → 0 0 The optimal control input for the output feedback linear quadratic optimal control problem with cost (1) is u ( t ) = K ˆ x ( t ) ˙ x ( t ) = A ˆ ˆ x ( t ) + Bu ( t ) + L ( y ( t ) − C ˆ x ( t )) where K = − R − 1 B | P A | P + PA − PBR − 1 B | P + Q = 0 A Φ + Φ A | + W − Φ C | V − 1 C Φ = 0 L = Φ C | V − 1 13

Inverted pendulum example For the model provided in Lecture II_1, slide 32, (state-feedback, for simplicity) let us compare discrete-time and continuous-time gains clear all, close all, clc Q = diag([1 1 1 1]); % definition of the continuous-time model S = zeros(4,1); m = 0.2; R = 1; M = 1; b = 0.05; % discretization I = 0.01; n = 4; g = 9.8; tau = 0.01; l = 0.5; sysd = c2d(ss(Ac,Bc,zeros(1,n),0),tau); p = (I+m*l^2)*(M+m)-m^2*l^2; A = sysd.a; B = sysd.b; Ac = [0 1 0 0; 0 -(I+m*l^2)*b/p (m^2*g*l^2)/p 0; % LQR control discrete time 0 0 0 1; K = dlqr(A,B,Q,R,S); K = -K; 0 -(m*l*b)/p m*g*l*(M+m)/p 0]; Bc = [ 0; % continuous-time (I+m*l^2)/p; Kc = lqr(Ac,Bc,Q,R,S); Kc =-Kc; 0; m*l/p]; 14

Inverted pendulum example Continuous-time gains ( policy ) u ( t ) = K c x ( t ) ⇥ 1 . 0000 − 7 . 8509 ⇤ K c = 2 . 3674 − 33 . 1623 Discrete-time gains ( policy ) u k = Kx k ⇥ 0 . 5955 − 5 . 9529 ⇤ K = 1 . 4650 − 25 . 3322 τ = 0 . 1 ⇥ 0 . 9495 − 7 . 6156 ⇤ τ = 0 . 01 K = 2 . 2551 − 32 . 1930 ⇥ 0 . 9948 − 7 . 8269 ⇤ τ = 0 . 001 K = 2 . 3559 − 33 . 0632 (converging to continuous-time gains as expected) 15

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Recall Discrete Stage decision Continuous-time optimization problems control problems problems Discrete-time system & Differential equations &

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Strong solutions of semilinear SPDEs with unbounded diffusion 1 Florian Bechtold LPSM

Small Mass Limit of a Langevin Equation on a Manifold Jeremiah Birrell Department of Mathematics

A new approach to the L p -theory of + b , and its applications to Feller processes

Part-II Parametric Signal Modeling and Linear Prediction Theory 2. Discrete Wiener Filtering

Dimension reduction numerical methods for Bermudan options Scott Sues Probability, Numerics, and

On a stochastic mass conserved Allen-Cahn equation with nonlinear diffusion Perla El Kettani 1 ,

MAESTRO: Model-based AnalysEs of Single-cell Transcriptome and RegulOme Ming (Tommy) Tang

Human Learning in the Michalski Train Domain Ute Schmid Cognitive Systems Fakult at