Optimal Control Theory The theory Optimal control theory is a - - PowerPoint PPT Presentation

optimal control theory the theory
SMART_READER_LITE
LIVE PREVIEW

Optimal Control Theory The theory Optimal control theory is a - - PowerPoint PPT Presentation

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline which provides algorithms to solve various control problems The elaborate mathematical machinery behind optimal control models is rarely exposed


slide-1
SLIDE 1

Optimal Control Theory

slide-2
SLIDE 2

The theory

  • Optimal control theory is a mature mathematical discipline

which provides algorithms to solve various control problems

  • The elaborate mathematical machinery behind optimal control

models is rarely exposed to computer animation community

  • Most controllers designed in practice are theoretically

suboptimal

  • There is an excellent tutorial by Dr. Emo Todorov (http://

www.cs.washington.edu/homes/todorov/papers/

  • ptimality_chapter.pdf)
slide-3
SLIDE 3

Standard problem

  • Find an action sequence (u0, u1, ..., un-1) and corresponding

state sequence (x0, x1, ..., xn-1) minimizing the total cost

  • The initial state (x0) and the destination state (xn) are given
slide-4
SLIDE 4

Discrete control

$250 $200 $150 $120 $500 $450 $350 $250 $150 $120 $200 $350 $300

next(x,u) cost(x,u)

slide-5
SLIDE 5

Dynamic programming

  • Bellman optimality principle:
  • If a given state-action sequence is optimal and we remove

the first state and action, remaining sequence is also optimal

  • The choice of optimal actions in the futures is independent
  • f the past actions which led to the present state
  • The optimal state-action sequences can be constructed by

starting at the final state and extending backwards

slide-6
SLIDE 6

Optimal value function

  • v(x) = “minimal total cost for completing the task starting from

state x”

  • Find optimal actions:
  • 1. Consider every action available at the current state
  • 2. Add its immediate cost to the optimal value of the resulting

next state

  • 3. Choose an action for which the sum is minimal
slide-7
SLIDE 7

Optimal value function

  • Mathematically, a value function, or a cost-to-go function can

be defined as

slide-8
SLIDE 8

Optimal control policy

  • A mapping from states to actions is called control policy or

control law

  • Once we have a control policy, we can start at any state and

reach the destination state by following the control policy

  • Optimal control policy satisfies
  • Its corresponding optimal value function satisfies
slide-9
SLIDE 9

Value iteration

  • Bellman equations cannot be solved in a single pass if the state

transitions are cyclic

  • Value iteration starts with a guess v(0) of the optimal value

function and construct a sequence of improved guesses:

slide-10
SLIDE 10
  • Discrete control: Bellman equations
  • Continuous control: HJB equations
  • Maximum principle
  • Linear quadratic regulator (LQR)
  • Differential dynamic program (DDP)
slide-11
SLIDE 11

Continuous control

  • State space and control space are continuos
  • Dynamics of the system:
  • Continuous time
  • Discrete time
  • Objective function:
slide-12
SLIDE 12

HJB equation

  • HJB equation is a nonlinear PDE with respect to unknown

function v

  • An optimal control π(x, t) is a value of u which achieves the

minimum in HJB equation −vt(x, t) = min

u∈U(x)(l(x, u, t) + f(x, u)T vx(x, t))

π(x, t) = arg min

u∈U(x)(l(x, u, t) + f(x, u)T vx(x, t))

slide-13
SLIDE 13

Numerical solution

  • Non-linear differential equations do not always have classic

solutions which satisfy them everywhere

  • Numerical methods guarantee convergence, but they rely on

discretization of the state space, which grows exponentially in the state space dimension

  • Nevertheless, the HJB equations have motivated a number of

methods for approximate solution

slide-14
SLIDE 14

Parametric value function

  • Consider an approximation to the optimal value function
  • The derivative function with respect to x
  • Choose a large enough set of states and evaluate the right hand

side of HJB using the approximated value function

  • Adjust theta such that get closer to target values
slide-15
SLIDE 15
  • Discrete control: Bellman equations
  • Continuous control: HJB equations
  • Maximum principle
  • Linear quadratic regulator (LQR)
  • Differential dynamic program (DDP)
slide-16
SLIDE 16
  • Optimal control theory is based on two fundamental ideas:

dynamic programming and maximum principle

  • Maximum principle solves the optimal control for a

deterministic dynamic system with boundary conditions

  • Maximum principle casts trajectory optimization as a set of

ODE’s, under optimal control conditions and boundary conditions

  • It escapes “curse of dimensionality” because it only solves for

the optimal trajectory and not the entire policy. However, for specific problem classes, the control policy can be obtained.

Maximum principle

slide-17
SLIDE 17

Derive from Lagrangian Multipliers

minimize subject to

f(xk, uk) − xk+1 = 0, 0 ≤ k ≤ n − 1

slide-18
SLIDE 18

The Lagrangian

  • The Lagrangian associated with this problem is
  • Optimality conditions: x* is optimal iff there exists a such that

minimize f(x) subject to Ax = b rf(x∗) + AT ν∗ = 0 Ax∗ = b L(x, ν) = f(x) +

p

X

i=1

νi(aT

i x − bi)

ν∗

slide-19
SLIDE 19

Geometric interpretation

  • At the optimal point, the gradient of the
  • bjective function is the linear

combination of the gradient of constraints

  • The projection of the gradient of the
  • bjective function onto the constraint

hyperplane is zero at the optimal point

f(x)

f(x∗)

ai

rf(x∗) + AT ν∗ = 0 Ax∗ = b

slide-20
SLIDE 20

∇C1 ∇C2 F(x)

slide-21
SLIDE 21

Derive from Lagrangian Multipliers

minimize subject to

f(xk, uk) − xk+1 = 0, 0 ≤ k ≤ n − 1

Maximum principle can be express in Hamiltonian function

slide-22
SLIDE 22

Hamiltonian expression

state equation costate equation

  • ptimal condition

Plug Hamiltonian back to Lagrangian boundary condition

slide-23
SLIDE 23
  • Given a control sequence, use state equation to get the

corresponding state sequence.

  • Then iterate co-state equation backward in time to get

Lagrange multiplier (co-state) sequence.

  • Evaluate the gradient of H wrt u at each time step, and improve

the control sequence with any gradient descent algorithm. Go back to step 1, or exit if converged.

Solving optimal trajectory

slide-24
SLIDE 24
  • Optimal control laws can rarely be obtained in closed form.

One notable exception is the LQR case where the dynamics are linear and the costs are quadratic.

  • LQR is a class of problems which dynamic function is linear

and cost function is quadratic

  • dynamics:
  • cost rate:
  • final cost

Special case

slide-25
SLIDE 25

Optimal value function

  • We derive optimal value function from Bellman equation
  • Again, the optimal value function is quadratic in x and changes
  • ver time
  • Plugging in Bellman equation, we obtain a recursive relation of

Vk

  • The optimal control law is linear in x
slide-26
SLIDE 26
  • Discrete control: Bellman equations
  • Continuous control: HJB equations
  • Maximum principle
  • Linear quadratic regulator (LQR)
  • Differential dynamic program (DDP)
slide-27
SLIDE 27
  • Most optimal control problems do not have closed-form
  • solutions. One exception is LQR case
  • LQR is a class of problems which dynamic function is linear

and cost function is quadratic

  • dynamics:
  • cost rate:
  • final cost
  • R is symmetric positive definite, and Q and Qf are symmetric
  • A, B, R, Q can be made time-varying

Linear quadratic regulator

slide-28
SLIDE 28

Optimal value function

  • For a LQR problem, the optimal value function is quadratic in

x and can be expressed as

  • We can obtain the ODE of V(t) via HJB equation

where V(t) is a symmetric matrix

slide-29
SLIDE 29

Discrete LQR

  • LQR is defined as follows when time is discretized
  • dynamics
  • cost rate
  • final cost
  • Let n = tf /Δ, the correspondence to continuous-time problem is
slide-30
SLIDE 30

Optimal value function

  • We derive optimal value function from Bellman equation
  • Again, the optimal value function is quadratic in x and changes
  • ver time
  • Plugging in Bellman equation, we obtain a recursive relation of

Vk

  • The optimal control law is linear in x