Optimal Control Theory The theory Optimal control theory is a - PowerPoint PPT Presentation

Optimal Control Theory

The theory • Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems • The elaborate mathematical machinery behind optimal control models is rarely exposed to computer animation community • Most controllers designed in practice are theoretically suboptimal • There is an excellent tutorial by Dr. Emo Todorov (http:// www.cs.washington.edu/homes/todorov/papers/ optimality_chapter.pdf)

Standard problem • Find an action sequence ( u 0 , u 1 , ..., u n -1 ) and corresponding state sequence ( x 0 , x 1 , ..., x n- 1 ) minimizing the total cost • The initial state ( x 0 ) and the destination state ( x n ) are given

Discrete control $500 $120 $450 $150 $350 $200 $120 $250 $150 $350 $200 next( x , u ) cost( x , u ) $300 $250

Dynamic programming • Bellman optimality principle: • If a given state-action sequence is optimal and we remove the first state and action, remaining sequence is also optimal • The choice of optimal actions in the futures is independent of the past actions which led to the present state • The optimal state-action sequences can be constructed by starting at the final state and extending backwards

Optimal value function • v ( x ) = “minimal total cost for completing the task starting from state x ” • Find optimal actions: 1. Consider every action available at the current state 2. Add its immediate cost to the optimal value of the resulting next state 3. Choose an action for which the sum is minimal

Optimal value function • Mathematically, a value function, or a cost-to-go function can be defined as

Optimal control policy • A mapping from states to actions is called control policy or control law • Once we have a control policy, we can start at any state and reach the destination state by following the control policy • Optimal control policy satisfies • Its corresponding optimal value function satisfies

Value iteration • Bellman equations cannot be solved in a single pass if the state transitions are cyclic • Value iteration starts with a guess v (0) of the optimal value function and construct a sequence of improved guesses:

• Discrete control: Bellman equations • Continuous control: HJB equations • Maximum principle • Linear quadratic regulator (LQR) • Differential dynamic program (DDP)

Continuous control • State space and control space are continuos • Dynamics of the system: • Continuous time • Discrete time • Objective function:

HJB equation • HJB equation is a nonlinear PDE with respect to unknown function v u ∈ U ( x ) ( l ( x , u , t ) + f ( x , u ) T v x ( x , t )) − v t ( x , t ) = min • An optimal control π ( x , t ) is a value of u which achieves the minimum in HJB equation u ∈ U ( x ) ( l ( x , u , t ) + f ( x , u ) T v x ( x , t )) π ( x , t ) = arg min

Numerical solution • Non-linear differential equations do not always have classic solutions which satisfy them everywhere • Numerical methods guarantee convergence, but they rely on discretization of the state space, which grows exponentially in the state space dimension • Nevertheless, the HJB equations have motivated a number of methods for approximate solution

Parametric value function • Consider an approximation to the optimal value function • The derivative function with respect to x • Choose a large enough set of states and evaluate the right hand side of HJB using the approximated value function • Adjust theta such that get closer to target values

Maximum principle • Optimal control theory is based on two fundamental ideas: dynamic programming and maximum principle • Maximum principle solves the optimal control for a deterministic dynamic system with boundary conditions • Maximum principle casts trajectory optimization as a set of ODE’s, under optimal control conditions and boundary conditions • It escapes “curse of dimensionality” because it only solves for the optimal trajectory and not the entire policy. However, for specific problem classes, the control policy can be obtained.

Derive from Lagrangian Multipliers minimize subject to f ( x k , u k ) − x k +1 = 0 , 0 ≤ k ≤ n − 1

The Lagrangian minimize f ( x ) subject to Ax = b • The Lagrangian associated with this problem is p X ν i ( a T L ( x , ν ) = f ( x ) + i x − b i ) i =1 • Optimality conditions: x * is optimal iff there exists a such that ν ∗ r f ( x ∗ ) + A T ν ∗ = 0 Ax ∗ = b

Geometric interpretation r f ( x ∗ ) + A T ν ∗ = 0 Ax ∗ = b � f ( x ) � f ( x ∗ ) a i • At the optimal point, the gradient of the objective function is the linear combination of the gradient of constraints • The projection of the gradient of the objective function onto the constraint hyperplane is zero at the optimal point

∇ C 2 ∇ C 1 F ( x )

Derive from Lagrangian Multipliers minimize subject to f ( x k , u k ) − x k +1 = 0 , 0 ≤ k ≤ n − 1 Maximum principle can be express in Hamiltonian function

Hamiltonian expression Plug Hamiltonian back to Lagrangian state equation costate equation optimal condition boundary condition

Solving optimal trajectory • Given a control sequence, use state equation to get the corresponding state sequence. • Then iterate co-state equation backward in time to get Lagrange multiplier (co-state) sequence. • Evaluate the gradient of H wrt u at each time step, and improve the control sequence with any gradient descent algorithm. Go back to step 1, or exit if converged.

Special case • Optimal control laws can rarely be obtained in closed form. One notable exception is the LQR case where the dynamics are linear and the costs are quadratic. • LQR is a class of problems which dynamic function is linear and cost function is quadratic • dynamics: • cost rate: • final cost

Optimal value function • We derive optimal value function from Bellman equation • Again, the optimal value function is quadratic in x and changes over time • Plugging in Bellman equation, we obtain a recursive relation of V k • The optimal control law is linear in x

Linear quadratic regulator • Most optimal control problems do not have closed-form solutions. One exception is LQR case • LQR is a class of problems which dynamic function is linear and cost function is quadratic • dynamics: • cost rate: • final cost • R is symmetric positive definite, and Q and Q f are symmetric • A , B , R , Q can be made time-varying

Optimal value function • For a LQR problem, the optimal value function is quadratic in x and can be expressed as where V ( t ) is a symmetric matrix • We can obtain the ODE of V ( t ) via HJB equation

Discrete LQR • LQR is defined as follows when time is discretized • dynamics • cost rate • final cost • Let n = t f / Δ , the correspondence to continuous-time problem is

Optimal value function • We derive optimal value function from Bellman equation • Again, the optimal value function is quadratic in x and changes over time • Plugging in Bellman equation, we obtain a recursive relation of V k • The optimal control law is linear in x

Optimal Control Theory The theory Optimal control theory is a - PowerPoint PPT Presentation

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal control models is rarely exposed

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Sensitivity analysis for optimal control problems. Chance-constrained stochastic optimal control.

OPTIMAL CONTROL PROBLEMS ON THE COEFFICIENTS FOR THE PARABOLIC EQUATIONS A. Alla May 19 th ,

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Sensitivity analysis for relaxed optimal control problems with final-state constraints eric

Numerical Optimal Control with DAEs Lecture 8: Direct Collocation S ebastien Gros AWESCO PhD

Optimal Control 4SC000 Q2 2017-2018 Duarte Antunes Recap Continuous-time optimal control

Optimal Control, LQR, Trajectory Optimization Lecture 13 What will you take home today? Intro

Numerical Optimal Control Overview Moritz Diehl Simplified Optimal Control Problem in ODE path

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Prr t

Value function and optimal trajectories for a control problem with supremum cost function and

Optimal Rapidly-exploring Random Trees Miguel Vargas Material taken form: S. Karaman, E.

Applied Machine Learning Applied Machine Learning Decision Trees Siamak Ravanbakhsh Siamak

MATH 12002 - CALCULUS I 3.5: Optimization (Part 2) Professor Donald L. White Department of

Machine Learning - MT 2016 2. Linear Regression Varun Kanade University of Oxford October 12,

AICPA Business and Industry Economic Outlook Survey Detailed Survey Results: 3Q 2019 Survey

Software evolution Objectives To explain why change is inevitable if software systems are to