Optimal Control & Viscosity Solutions Tutorial Slides from - - PowerPoint PPT Presentation

optimal control viscosity solutions
SMART_READER_LITE
LIVE PREVIEW

Optimal Control & Viscosity Solutions Tutorial Slides from - - PowerPoint PPT Presentation

Optimal Control & Viscosity Solutions Tutorial Slides from Banff International Research Station Workshop 11w5086: Advancing Numerical Methods for Viscosity Solutions and Applications Ian M. Mitchell mitchell@cs.ubc.ca


slide-1
SLIDE 1

Optimal Control & Viscosity Solutions

Tutorial Slides from Banff International Research Station Workshop 11w5086: Advancing Numerical Methods for Viscosity Solutions and Applications Ian M. Mitchell

mitchell@cs.ubc.ca http://www.cs.ubc.ca/˜mitchell

University of British Columbia Department of Computer Science

February 2011

slide-2
SLIDE 2

Outline

∙ Optimal control: models of system dynamics and objective functionals ∙ The value function and the dynamic programming principle ∙ A formal derivation of the Hamilton-Jacobi(-Bellman) equation ∙ Viscosity solutions and a rigorous derivation ∙ Other types of Hamilton-Jacobi equations in control ∙ Optimal control problems with analytic solutions ∙ References

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 2/ 41

slide-3
SLIDE 3

Control Theory

∙ Control theory is the mathematical study of methods to steer the evolution of

a dynamic system to achieve desired goals

∙ For example, stability or tracking a reference

∙ Optimal control is a branch of control theory that seeks to steer the evolution

so as to optimize a specific objective functional

∙ There are close connections with calculus of variations

∙ Mathematical study of control requires predictive models of the system

evolution

∙ Assume Markovian models: everything relevant to future evolution of the

system is captured in the current state ∙ Many classes of models, but we will talk primarily about deterministic,

continuous state, continuous time systems

∙ Other continuous models: stochastic DEs, delay DEs, differential algebratic

equations, differential inclusions, . . .

∙ Other classes of dynamic evolution: discrete time (eg: discrete event), discrete

state (eg: Markov chains), . . .

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 3/ 41

slide-4
SLIDE 4

System Models

∙ Deterministic, continuous state, continuous time systems are often modeled

with ordinary differential equations (ODEs) ˙ 푥(푡) = 푑푥(푡) 푑푡 = 푓(푥(푡), 푢(푡)) with state 푥(푡) ∈ ℝ푑푥, input 푢 ∈ 풰 ⊆ ℝ푑푢, and initial condition 푥(0) = 푥0

∙ To ensure that trajectories are well-posed (they exist and are unique), it is

typically assumed that 푓 is bounded and Lipschitz continuous with respect to 푥 for fixed 푢

∙ The field of system identification studies how to determine 푓

∙ An important subclass of system dynamics are linear

˙ 푥(푡) = A푥 + B푢 with A ∈ ℝ푑푥×푑푥 and B ∈ ℝ푑푥×푑푢

∙ Unless specifically described as “nonlinear control,” most engineering control

theory (academic and practical) assumes linear systems

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 4/ 41

slide-5
SLIDE 5

Optimal Control Objectives

∙ Choose input signal

푢(⋅) ∈ 픘 ≜ {푢 : [0, ∞[→ 풰 ∣ 푢(⋅) is measureable} to minimize the cost functional 퐽(푥, 푢(⋅)) or 퐽(푥, 푡, 푢(⋅))

∙ Many possible cost functionals exist, such as:

∙ Finite horizon: given horizon 푇 > 0, running cost ℓ and terminal cost 푔

퐽(푥(푡), 푡, 푢(⋅)) ≜ ∫ 푇

ℓ(푥(푠), 푢(푠)) 푑푠 + 푔(푥(푇))

∙ Minimum time: given target set 풯 ⊂ ℝ푑푥

퐽(푥0, 푢(⋅)) ≜ { min{푡 ∣ 푥(푡) ∈ 풯 }, if {푡 ∣ 푥(푡) ∈ 풯 } ∕= ∅; +∞,

  • therwise

∙ Discounted infinite horizon: given discount factor 휆 > 0 and running cost ℓ

퐽(푥0, 푢(⋅)) ≜ ∫ ∞ ℓ(푥(푠), 푢(푠))푒−휆푠 푑푠 ∙ Alternatively, “maximize payoff functionals” or “optimize objective functionals”

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 5/ 41

slide-6
SLIDE 6

Outline

∙ Optimal control: models of system dynamics and objective functionals ∙ The value function and the dynamic programming principle ∙ A formal derivation of the Hamilton-Jacobi(-Bellman) equation ∙ Viscosity solutions and a rigorous derivation ∙ Other types of Hamilton-Jacobi equations in control ∙ Optimal control problems with analytic solutions ∙ References

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 6/ 41

slide-7
SLIDE 7

Value Functions

∙ The value function specifies the best possible value of the cost functional

starting from each state (and possibly time) 푉 (푥) = inf

푢(⋅)∈픘 퐽(푥, 푢(⋅))

  • r

푉 (푥, 푡) = inf

푢(⋅)∈픘 퐽(푥, 푡, 푢(⋅))

∙ Infimum may not be achievable ∙ If infimum is attained then the (possibly non-unique) optimal input is often

designated 푢∗(⋅), and sometimes the corresponding optimal trajectory is designated 푥∗(⋅) ∙ Intuitively, to find the best trajectory from a point 푥, go to a neighbour ˆ

푥 of 푥 which minimizes the sum of the cost from 푥 to ˆ 푥 and the cost to go from ˆ 푥.

∙ This intuition is formalized in the dynamic programming principle

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 7/ 41

slide-8
SLIDE 8

Dynamic Programming Principle

∙ For concreteness, we assume a finite horizon objective with horizon 푇, running

cost ℓ(푥, 푢) and terminal cost 푔(푥)

∙ Dynamic Programming Principle (DPP): for each ℎ > 0 small enough that

푡 + ℎ < 푇 푉 (푥, 푡) = inf

푢(⋅)∈픘

[∫ 푡+ℎ

ℓ(푥(푠), 푢(푠)) 푑푠 + 푉 (푥(푡 + ℎ), 푡 + ℎ) ]

∙ Similar DPP can be formulated for other objective functionals

∙ Proof [Evans, chapter 10.3.2] in two parts: For any 휖 > 0

∙ Show that 푉 (푥, 푡) ≤ inf푢(⋅)

[∫ 푡+ℎ

ℓ(푥(푠), 푢(푠)) 푑푠 + 푉 (푥(푡 + ℎ), 푡 + ℎ) ] + 휖

∙ Show that 푉 (푥, 푡) ≥ inf푢(⋅)

[∫ 푡+ℎ

ℓ(푥(푠), 푢(푠)) 푑푠 + 푉 (푥(푡 + ℎ), 푡 + ℎ) ] − 휖

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 8/ 41

slide-9
SLIDE 9

Proof of DPP (upper bound part 1)

Consider 푉 (ˆ 푥, 푡)

∙ Choose any 푢1(⋅) and define the trajectory

˙ 푥1(푠) = 푓(푥1(푠), 푢1(푠)) for 푠 > 푡 and 푥1(푡) = ˆ 푥

∙ Fix 휖 > 0 and choose 푢2(⋅) such that

푉 (푥1(푡 + ℎ), 푡 + ℎ) + 휖 ≥ ∫ 푇

푡+ℎ

ℓ(푥2(푠), 푢2(푠)) 푑푠 + 푔(푥2(푇)) where ˙ 푥2(푠) = 푓(푥2(푠), 푢2(푠)) for 푠 > 푡 + ℎ and 푥2(푡 + ℎ) = 푥1(푡 + ℎ)

∙ Define a new control

푢3(푠) = { 푢1(푠), if 푠 ∈ [푡, 푡 + ℎ[; 푢2(푠), if 푠 ∈ [푡 + ℎ, 푇] which gives rise to trajectory ˙ 푥3(푠) = 푓(푥3(푠), 푢3(푠)) for 푠 > 푡 and 푥3(푡) = ˆ 푥

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 9/ 41

slide-10
SLIDE 10

Proof of DPP (upper bound part 2)

∙ By uniqueness of solutions of ODEs

푥3(푠) = { 푥1(푠), if 푠 ∈ [푡, 푡 + ℎ]; 푥2(푠), if 푠 ∈ [푡 + ℎ, 푇]

∙ Consequently

푉 (ˆ 푥, 푡) ≤ 퐽(ˆ 푥, 푡, 푢3(⋅)) = ∫ 푇

ℓ(푥3(푠), 푢3(푠)) 푑푠 + 푔(푥3(푇)) = ∫ 푡+ℎ

ℓ(푥1(푠), 푢1(푠)) 푑푠 + ∫ 푇

푡+ℎ

ℓ(푥2(푠), 푢2(푠)) 푑푠 + 푔(푥2(푇)) ≤ ∫ 푡+ℎ

ℓ(푥1(푠), 푢1(푠)) 푑푠 + 푉 (푥1(푡 + ℎ), 푡 + ℎ) + 휖

∙ Since 푢1(⋅) was arbitrary, it must be that

푉 (ˆ 푥, 푡) ≤ inf

푢(⋅)∈픘

[∫ 푡+ℎ

ℓ(푥(푠), 푢(푠)) 푑푠 + 푉 (푥(푡 + ℎ), 푡 + ℎ) ] + 휖

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 10/ 41

slide-11
SLIDE 11

Proof of DPP (lower bound)

∙ Fix 휖 > 0 and choose 푢4(⋅) such that

푉 (ˆ 푥, 푡) ≥ ∫ 푇

ℓ(푥4(푠), 푢4(푠)) 푑푠 + 푔(푥4(푇)) − 휖 where ˙ 푥4(푠) = 푓(푥4(푠), 푢4(푠)) for 푠 > 푡 and 푥4(푡) = ˆ 푥

∙ From the definition of the value function

푉 (푥4(푡 + ℎ), 푡 + ℎ) ≤ ∫ 푇

푡+ℎ

ℓ(푥4(푠), 푢4(푠)) 푑푠 + 푔(푥4(푇))

∙ Consequently

푉 (ˆ 푥, 푡) ≥ inf

푢(⋅)∈픘

[∫ 푡+ℎ

ℓ(푥(푠), 푢(푠)) 푑푠 + 푉 (푥(푡 + ℎ), 푡 + ℎ) ] + 휖

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 11/ 41

slide-12
SLIDE 12

Outline

∙ Optimal control: models of system dynamics and objective functionals ∙ The value function and the dynamic programming principle ∙ A formal derivation of the Hamilton-Jacobi(-Bellman) equation ∙ Viscosity solutions and a rigorous derivation ∙ Other types of Hamilton-Jacobi equations in control ∙ Optimal control problems with analytic solutions ∙ References

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 12/ 41

slide-13
SLIDE 13

A Formal Derivation of the Hamilton-Jacobi PDE (part 1)

∙ Assume that 푉 (푥, 푡) is smooth ∙ Start from rearranged DPP

inf

푢(⋅)∈픘

[ 푉 (푥(푡 + ℎ), 푡 + ℎ) − 푉 (푥, 푡) + ∫ 푡+ℎ

ℓ(푥(푠), 푢(푠)) 푑푠 ] = 0

∙ Divide through by ℎ > 0

inf

푢(⋅)∈픘

[ 푉 (푥(푡 + ℎ), 푡 + ℎ) − 푉 (푥, 푡) ℎ + 1 ℎ ∫ 푡+ℎ

ℓ(푥(푠), 푢(푠)) 푑푠 ] = 0

∙ Let ℎ → 0

inf

푢(⋅)∈픘

[ 푑 푑푡푉 (푥, 푡) + ℓ(푥(푡), 푢(푡)) ] = 0

∙ Apply chain rule on first term

inf

푢(⋅)∈픘

[ 퐷푡푉 (푥, 푡) + 퐷푥푉 (푥, 푡) ⋅ 푑 푑푡푥(푡) + ℓ(푥(푡), 푢(푡)) ] = 0

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 13/ 41

slide-14
SLIDE 14

A Formal Derivation of the Hamilton-Jacobi PDE (part 2)

∙ Introduce system dynamics ˙

푥 = 푓(푥, 푢) inf

푢(⋅)∈픘 [퐷푡푉 (푥, 푡) + 퐷푥푉 (푥, 푡) ⋅ 푓(푥(푡), 푢(푡)) + ℓ(푥(푡), 푢(푡))] = 0

∙ Observe that only dependence on 푢(⋅) ∈ 픘 is 푢(푡) = 푢 ∈ 풰

inf

푢∈풰 [퐷푡푉 (푥, 푡) + 퐷푥푉 (푥, 푡) ⋅ 푓(푥, 푢) + ℓ(푥, 푢)] = 0

∙ If 풰 is compact, infimum becomes minimum

∙ Arrive at (time-dependent) Hamilton-Jacobi(-Bellman) PDE

퐷푡푉 (푥, 푡) + 퐻 (푥, 퐷푥푉 (푥, 푡)) = 0 with Hamiltonian 퐻(푥, 푝) = inf

푢∈풰 [푝 ⋅ 푓(푥, 푢) + ℓ(푥, 푢)]

and terminal conditions (choose 푡 = 푇 in definition of 푉 ) 푉 (푥, 푇) = 푔(푥)

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 14/ 41

slide-15
SLIDE 15

No Classical Solutions

∙ Unfortunately, even for smooth terminal conditions, running cost and

dynamics, solution of HJ PDE may not remain differentiable for all time

∙ A rigorous derivation must take into account that the value function may not be

differentiable, and that the optimal input and/or trajectory may not be unique

  • r may not exist

∙ Search for well-posed weak solutions included the vanishing viscosity solution

∙ For 휖 > 0, the semilinear or quasilinear parabolic PDE

퐷푡푉 (푥, 푡) + 퐻 (푡, 푥, 퐷푥푉 (푥, 푡)) = 휖Δ푉 (푥, 푡) has a smooth solution for all time

∙ The vanishing viscosity solution is the limiting solution as 휖 → 0 ∙ Unfortunately, it does not always exist

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 15/ 41

slide-16
SLIDE 16

Outline

∙ Optimal control: models of system dynamics and objective functionals ∙ The value function and the dynamic programming principle ∙ A formal derivation of the Hamilton-Jacobi(-Bellman) equation ∙ Viscosity solutions and a rigorous derivation ∙ Other types of Hamilton-Jacobi equations in control ∙ Optimal control problems with analytic solutions ∙ References

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 16/ 41

slide-17
SLIDE 17

Viscosity Solutions

∙ Crandall & Lions (1983) propose the “viscosity solution”

∙ Under reasonable conditions there exists a unique viscosity solution ∙ Anywhere that 푉 is differentiable, it solves the HJ PDE in the classical sense ∙ If there exists a vanishing viscosity solution, then it is the same as the viscosity

solution ∙ Original definition has been supplanted by an equivalent definition from

Crandall, Evans & Lions (1984): 푉 (푥, 푡) is a viscosity solution of the terminal value HJ PDE 퐷푡푉 (푥, 푡) + 퐻(푥, 퐷푥푉 (푥, 푡)) = 0 푉 (푥, 푇) = 푔(푥) if 푉 satisfies the terminal conditions and for each smooth 휙(푥, 푡)

∙ if 푉 (푥, 푡) − 휙(푥, 푡) has a local maximum then

퐷푡휙(푥, 푡) + 퐻(푥, 퐷푥휙(푥, 푡)) ≥ 0

∙ if 푉 (푥, 푡) − 휙(푥, 푡) has a local minimum then

퐷푡휙(푥, 푡) + 퐻(푥, 퐷푥휙(푥, 푡)) ≤ 0 ∙ For initial value HJ PDE, reverse the inequalities

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 17/ 41

slide-18
SLIDE 18

Assumptions and Bounds

∙ Assume that dynamics, running and terminal costs are bounded and Lipschitz

continuous: there exists a constant 퐶 such that for fixed 푢 ∣푓(푥, 푢)∣ ≤ 퐶 ∣푓(푥, 푎) − 푓(ˆ 푥, 푎)∣ ≤ 퐶∣푥 − ˆ 푥∣ ∣ℓ(푥, 푢)∣ ≤ 퐶 ∣ℓ(푥, 푎) − ℓ(ˆ 푥, 푎)∣ ≤ 퐶∣푥 − ˆ 푥∣ ∣푔(푥)∣ ≤ 퐶 ∣푔(푥) − 푔(ˆ 푥)∣ ≤ 퐶∣푥 − ˆ 푥∣

∙ This assumption implies continuity properties for the Hamiltonian, but more

generally we could assume such properties: there exists a constant 퐶 such that ∣퐻(푥, 푝) − 퐻(푥, ˆ 푝)∣ ≤ 퐶∣푝 − ˆ 푝∣ ∣퐻(푥, 푝) − 퐻(ˆ 푥, 푝)∣ ≤ 퐶∣푥 − ˆ 푥∣(1 + ∣푝∣)

∙ Then it can be shown that the value function is bounded and Lipschitz

continuous: there exists a constant 퐶 ∣푉 (푥, 푡)∣ ≤ 퐶 ∣푉 (푥, 푡) − 푉 (ˆ 푥, ˆ 푡)∣ ≤ 퐶(∣푥 − ˆ 푥∣ + ∣푡 − ˆ 푡∣)

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 18/ 41

slide-19
SLIDE 19

Proof: Value Function is the Viscosity Solution (terminal condition and local maximum part 1)

∙ From the definition of the value function and objective functional

푉 (푥, 푇) = inf

푢(⋅) 퐽(푥, 푇, 푢(⋅)) =

∫ 푇

ℓ(푥(푠), 푢(푠)) 푑푠 + 푔(푥(푇)) = 푔(푥)

∙ Choose smooth 휙 and assume that 푉 − 휙 has a local maximum at (ˆ

푥, ˆ 푡)

∙ Then we must show

퐷푡휙(ˆ 푥, ˆ 푡) + min

푢∈풰

[ 퐷푥휙(ˆ 푥, ˆ 푡) ⋅ 푓(ˆ 푥, 푢) + ℓ(ˆ 푥, 푢) ] ≥ 0

∙ Since 푉 − 휙 has a local maximum, choose 훿 > 0 such that for all

∣푥 − ˆ 푥∣ + ∣푡 − ˆ 푡∣ ≤ 훿 (푉 − 휙)(푥, 푡) ≤ (푉 − 휙)(ˆ 푥, ˆ 푡)

∙ Proof proceeds by contradiction: if the inequality is false then there exist ˆ

푢 ∈ 풰 and 휉 > 0 such that for all ∣푥 − ˆ 푥∣ + ∣푡 − ˆ 푡∣ ≤ 훿 we have 퐷푡휙(푥, 푡) + 퐷푥휙(푥, 푡) ⋅ 푓(푥, ˆ 푢) + ℓ(푥, ˆ 푢) ≤ −휉 < 0

∙ Choose constant control 푢(⋅) = ˆ

푢 and define the corresponding trajectory ˙ 푥(푠) = 푓(푥(푠), ˆ 푢) for 푠 > ˆ 푡 and 푥(푡) = ˆ 푥

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 19/ 41

slide-20
SLIDE 20

Proof: Value Function is the Viscosity Solution (local maximum part 2)

∙ Working on contradiction if 푉 − 휙 has a local maximum at (ˆ

푥, ˆ 푡)

∙ Choose ℎ ∈ [0, 훿] small enough that ∣푥(푠) − ˆ

푥∣ ≤ 훿 for 푠 ∈ [ˆ 푡, ˆ 푡 + ℎ] so that 퐷푡휙(푥(푠), 푠) + 퐷푥휙(푥(푠), 푠) ⋅ 푓(푥(푠), ˆ 푢) + ℓ(푥(푠), ˆ 푢) ≤ −휉

∙ Because 푉 − 휙 has a local maximum

푉 (푥(ˆ 푡 + ℎ), ˆ 푡 + ℎ) − 푉 (ˆ 푥, ˆ 푡) ≤ 휙(푥(ˆ 푡 + ℎ), ˆ 푡 + ℎ) − 휙(ˆ 푥, ˆ 푡) = ∫ ˆ

푡+ℎ ˆ 푡

푑 푑푠휙(푥(푠), 푠) 푑푠 = ∫ ˆ

푡+ℎ ˆ 푡

퐷푡휙(푥(푠), 푠) + 퐷푥휙(푥(푠), 푠) ⋅ 푓(푥(푠), ˆ 푢) 푑푠

∙ From the DPP

푉 (ˆ 푥, ˆ 푡) ≤ ∫ ˆ

푡+ℎ ˆ 푡

ℓ(푥(푠), ˆ 푢) 푑푠 + 푉 (푥(ˆ 푡 + ℎ), ˆ 푡 + ℎ)

∙ Therefore we arrive at the contradiction

0 ≤ ∫ ˆ

푡+ℎ ˆ 푡

퐷푡휙(푥(푠), 푠) + 퐷푥휙(푥(푠), 푠) ⋅ 푓(푥(푠), ˆ 푢) + ℓ(푥(푠), ˆ 푢) 푑푠 ≤ −휉ℎ

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 20/ 41

slide-21
SLIDE 21

Proof: Value Function is the Viscosity Solution (local minimum part 1)

∙ Choose smooth 휙 and assume that 푉 − 휙 has a local minimum at (ˆ

푥, ˆ 푡)

∙ Then we must show

퐷푡휙(ˆ 푥, ˆ 푡) + min

푢∈풰

[ 퐷푥휙(ˆ 푥, ˆ 푡) ⋅ 푓(ˆ 푥, 푢) + ℓ(ˆ 푥, 푢) ] ≤ 0

∙ Since 푉 − 휙 has a local minimum, choose 훿 > 0 such that for all

∣푥 − ˆ 푥∣ + ∣푡 − ˆ 푡∣ ≤ 훿 (푉 − 휙)(푥, 푡) ≥ (푉 − 휙)(ˆ 푥, ˆ 푡)

∙ Proof proceeds by contradiction: if the inequality is false then there exists 휉 > 0

such that for all ˆ 푢 ∈ 풰 and ∣푥 − ˆ 푥∣ + ∣푡 − ˆ 푡∣ ≤ 훿 we have 퐷푡휙(푥, 푡) + 퐷푥휙(푥, 푡) ⋅ 푓(푥, ˆ 푢) + ℓ(푥, ˆ 푢) ≥ 휉 > 0

∙ For any control 푢(⋅) ∈ 픘 choose ℎ ∈ [0, 훿] small enough that ∣푥(푠) − ˆ

푥∣ ≤ 훿 for 푠 ∈ [ˆ 푡, ˆ 푡 + ℎ] and the corresponding trajectory ˙ 푥(푠) = 푓(푥(푠), 푢(푠)) for 푠 > ˆ 푡 and 푥(푡) = ˆ 푥

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 21/ 41

slide-22
SLIDE 22

Proof: Value Function is the Viscosity Solution (local minimum part 2)

∙ Working on contradiction if 푉 − 휙 has a local minimum at (ˆ

푥, ˆ 푡)

∙ Because 푉 − 휙 has a local minimum

푉 (푥(ˆ 푡 + ℎ), ˆ 푡 + ℎ) − 푉 (ˆ 푥, ˆ 푡) ≥ 휙(푥(ˆ 푡 + ℎ), ˆ 푡 + ℎ) − 휙(ˆ 푥, ˆ 푡) = ∫ ˆ

푡+ℎ ˆ 푡

푑 푑푠휙(푥(푠), 푠) 푑푠 = ∫ ˆ

푡+ℎ ˆ 푡

퐷푡휙(푥(푠), 푠) + 퐷푥휙(푥(푠), 푠) ⋅ 푓(푥(푠), 푢(푠)) 푑푠

∙ From the DPP we can choose a control 푢(⋅) ∈ 픘 such that

푉 (ˆ 푥, ˆ 푡) ≥ ∫ ˆ

푡+ℎ ˆ 푡

ℓ(푥(푠), 푢(푠)) 푑푠 + 푉 (푥(ˆ 푡 + ℎ), ˆ 푡 + ℎ) − 휉ℎ 2

∙ Therefore we arrive at the contradiction

휉ℎ 2 ≥ ∫ ˆ

푡+ℎ ˆ 푡

퐷푡휙(푥(푠), 푠) + 퐷푥휙(푥(푠), 푠) ⋅ 푓(푥(푠), 푢(푠)) + ℓ(푥(푠), 푢(푠)) 푑푠 ≥ 휉ℎ

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 22/ 41

slide-23
SLIDE 23

Synthesizing an Optimal Control

∙ Given (viscosity) solution 푉 (푥, 푡), the optimal control is

푢∗(푥, 푡) ∈ arg min

푢∈풰

[퐷푥푉 (푥, 푡) ⋅ 푓(푥, 푢) + ℓ(푥, 푢)]

∙ Such a control is called a time-dependent feedback control since it depends on

the current time and state

∙ Optimal choice may not be unique ∙ Issues arise when 푉 (푥, 푡) is not differentiable, gradient is zero and/or

Hamiltonian is (locally) independent of input

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 23/ 41

slide-24
SLIDE 24

Outline

∙ Optimal control: models of system dynamics and objective functionals ∙ The value function and the dynamic programming principle ∙ A formal derivation of the Hamilton-Jacobi(-Bellman) equation ∙ Viscosity solutions and a rigorous derivation ∙ Other types of Hamilton-Jacobi equations in control ∙ Optimal control problems with analytic solutions ∙ References

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 24/ 41

slide-25
SLIDE 25

Hamilton-Jacobi Equations for Discounted Infinite Horizon

∙ Given discount factor 휆 > 0 and running cost ℓ, objective is

퐽(푥0, 푢(⋅)) = ∫ ∞ ℓ(푥(푠), 푢(푠))푒−휆푠 푑푠

∙ The value function 푉 (푥) = inf푢(⋅)∈픘 퐽(푥, 푢(⋅)) satisfies the dynamic

programming principle 푉 (푥) = inf

푢(⋅)∈픘

[∫ ℎ ℓ(푥(푠), 푢(푠)푒−휆푠 푑푠 + 푉 (푥(ℎ))푒−휆ℎ ] and static HJ PDE 휆푉 (푥) − min

푢∈풰 [퐷푥푉 (푥) ⋅ 푓(푥, 푢) + ℓ(푥, 푢)] = 0

for 푥 ∈ ℝ푑푥

∙ Another relatively well behaved problem

∙ Similar results to finite horizon problem: viscosity solution 푉 (푥) is bounded and

continuous but not necessarily differentiable

∙ Optimal feedback input is time-independent

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 25/ 41

slide-26
SLIDE 26

Hamilton-Jacobi Equations for Minimum Time

∙ Given target 풯 , objective is

퐽(푥0, 푢(⋅)) = { min{푡 ∣ 푥(푡) ∈ 풯 }, if {푡 ∣ 푥(푡) ∈ 풯 } ∕= ∅; +∞,

  • therwise

∙ Let Ω = {푥 ∣ 푉 (푥) < ∞} be the set of states that give rise to trajectories

which can reach the target set in finite time

∙ The value function 푉 (푥) = inf푢(⋅)∈픘 퐽(푥, 푢(⋅)) satisfies the dynamic

programming principle for 푥 ∈ Ω 푉 (푥) = inf

푢(⋅)∈픘 [ℎ + 푉 (푥(ℎ))] if ℎ < 푉 (푥)

and static boundary value HJ PDE 퐻(푥, 퐷푥푉 (푥)) = min

푢∈풰 [퐷푥푉 (푥) ⋅ 푓(푥, 푢) − 1] = 0

for 푥 ∈ Ω ∖ 풯 푉 (푥) = 0 for 푥 ∈ 풯

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 26/ 41

slide-27
SLIDE 27

Small Time Local Controllability and the Static HJ PDE

∙ A system is small time locally controllable (STLC) at a state 푥 if the set of

states which give rise to trajectories which reach 푥 contains 푥 in its interior for all positive times

∙ Intuitively, the system can move in any direction ∙ Many important types of system are not STLC

∙ If dynamics are STLC everywhere then the static HJ PDE is relatively well

behaved: the viscosity solution 푉 (푥) is bounded and continuous (but not necessarily differentiable) and Ω = ℝ푑푥

∙ If dynamics are not STLC then there may not be a bounded continuous

viscosity solution which solves the PDE and/or Ω must be determined

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 27/ 41

slide-28
SLIDE 28

Disturbance Parameters

Sometimes the dynamics are influenced by additional parameters ˙ 푥 = 푓(푥, 푢, 푣) where 푣 ∈ ℝ푑푣 are not known and are not controllable. There are two typical ways

  • f treating these disturbance inputs

∙ Stochastic: 푣(푡) ∼ 풱 where 풱 is some distribution

∙ Modelled by stochastic differential equations (SDEs) in continuous case, or

various probabilistic models in discrete settings (Markov chains, discrete state Poisson processes, etc)

∙ Optimal control of SDEs leads to Fokker-Plank or Kolmogorov PDEs: second

  • rder versions of the HJ PDE

∙ Bounded value: 푣(푡) ∈ 풱 where 풱 ⊆ ℝ푑푣 is a specified set

∙ Modelled by standard ODEs with multiple inputs ∙ Robust or worst-case treatment of disturbance input is modelled by two player

zero sum games and HJ PDE with nonconvex Hamiltonians

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 28/ 41

slide-29
SLIDE 29

Dynamics, Objective Functional and Player Knowledge in Differential Games

∙ Dynamics and objective functional are almost the same as in the single input

case; for example ˙ 푥(푡) = 푓(푥(푡), 푢(푡), 푣(푡)) 퐽(푥(푡), 푡, 푢(⋅), 푣(⋅)) = ∫ 푇

ℓ(푥(푠), 푢(푠), 푣(푠)) 푑푠 + 푔(푥(푇))

∙ Control input 푢(⋅) ∈ 픘 attempts to minimize ∙ Disturbance input 푣(⋅) ∈ 픙 attempts to maximize

∙ In a differential setting, how much does each player know about the other’s

choice of input?

∙ A non-anticipative strategy allows one player to know the other player’s

current input value

∙ However, the player with the additional knowledge must declare their strategy

(reaction to every input) in advance

∙ For example, the disturbance can be given the advantage by permitting it a

non-anticipative strategy 훾 훾 ∈ Γ(푡) = { 휁 : 픘 → 픙

  • 푢(푟) = ˆ

푢(푟) for almost every 푟 ∈ [푡, 푇] = ⇒ 휁[푢](푟) = 휁[ˆ 푢](푟) for almost every 푟 ∈ [푡, 푇] }

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 29/ 41

slide-30
SLIDE 30

Hamilton-Jacobi(-Isaacs) Equations for Differential Games

∙ Value function is then an optimization over the appropriate strategy and input

signal; for example 푉 (푥, 푡) = sup

훾∈Γ(푡)

inf

푢(⋅)∈픘 퐽(푥, 푡, 푢(⋅), 훾[푢(⋅)](⋅))

∙ This choice is called the upper value function because the maximizing

disturbance is given the advantage of the non-anticipative strategy

∙ A dual lower value function can be defined ∙ If the upper and lower value functions are equivalent, then both optimal inputs

can be synthesized without strategies as pure state feedback ∙ The value function satisfies the DPP

푉 (푥, 푡) = sup

훾∈Γ(푡)

inf

푢(⋅)∈픘

[∫ 푡+ℎ

ℓ(푥(푠), 푢(푠), 훾[푢](푠)) 푑푠 + 푉 (푥(푡 + ℎ), 푡 + ℎ) ] and the HJ PDE 퐷푡푉 (푥, 푡) + 퐻 (푥, 퐷푥푉 (푥, 푡)) = 0 퐻(푥, 푝) = min

푢∈풰 max 푣∈풱 [푝 ⋅ 푓(푥, 푢, 푣) + ℓ(푥, 푢, 푣)]

∙ Optimization in Hamiltonian requires no special treatment of strategies, but it is

nonconvex

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 30/ 41

slide-31
SLIDE 31

Fokker-Planck or Kolmogorov Equations for Optimal Stochastic Control

∙ For system dynamics given by the (Itˆ

  • ) stochastic ordinary differential

equation (SDE) 푑푥(푡) = 푓(푥(푡), 푢(푡))푑푡 + 휎(푥(푡))푑푊(푡) where the (controlled) “drift term” 푓 is the same as in the deterministic ODE case and the “diffusion term” providing the stochastic disturbance is 휎 : ℝ푑푥 → ℝ푑푊 and a 푑푊 dimensional Wiener process 푊(푡)

∙ For the finite horizon objective, the value function satisfies a Fokker-Planck or

backward Kolmogorov PDE 퐷푡푉 (푥, 푡) + min

푢∈풰 [퐷푥푉 (푥, 푡) ⋅ 푓(푥, 푢) + ℓ(푥, 푢)] + 1 2휎(푥)휎푇 (푥)퐷2 푥푉 (푥, 푡) = 0

∙ If 푑푊 = 푑푥 and 휎 is full rank then the PDE is semilinear or quasilinear and

under mild assumptions has a classical solution

∙ Otherwise the PDE is degenerate parabolic and a viscosity solution is the

appropriate weak solution

∙ Note that solution evolution is no longer governed entirely by characteristics

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 31/ 41

slide-32
SLIDE 32

Other Control Applications with HJ PDEs

∙ State estimation / observation

∙ In most real systems we can only observe sensor outputs—the true state is not

directly observable

∙ State estimation can be formulated as various types of HJ PDE, depending on

the noise model

∙ Optimal control subject to state uncertainty can be formulated as an infinite

dimensional HJ equation ∙ Optimal stopping times

∙ In some problems the control (or disturbance) can choose the stopping time ∙ Can be formulated as a variational inequality; for example, for finite horizon

  • bjective functional with stopping / terminal cost 푔(푥)

max [퐷푡푉 (푥, 푡) + 퐻(푥, 퐷푥푉 (푥, 푡)), 푉 (푥, 푡) − 푔(푥)] = 0 ∙ Reachability

∙ Next set of slides

∙ Many more. . .

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 32/ 41

slide-33
SLIDE 33

Outline

∙ Optimal control: models of system dynamics and objective functionals ∙ The value function and the dynamic programming principle ∙ A formal derivation of the Hamilton-Jacobi(-Bellman) equation ∙ Viscosity solutions and a rigorous derivation ∙ Other types of Hamilton-Jacobi equations in control ∙ Optimal control problems with analytic solutions ∙ References

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 33/ 41

slide-34
SLIDE 34

Finite Horizon: LQR Formulation

∙ In the Linear Quadratic Regulator (LQR) problem

∙ The dynamics are linear

˙ 푥 = A푥 + B푢 with 푢 ∈ 풰 = ℝ푑푢

∙ The finite horizon objective is quadratic

퐽(푥, 푡, 푢(⋅)) = 푢푇 (푇)Q푓푢(푇) + ∫ 푇

푥푇 (푠)Q푥(푠) + 푢푇 (푠)R푢(푠) 푑푠 where Q푓 = Q푇

푓 ≥ 0, Q = Q푇 ≥ 0, and R = R푇 > 0 are the terminal state

cost, the running state cost, and the input cost matrices respectively

∙ It can be shown that the value function is quadratic in the state

푉 (푥, 푡) = inf

푢(⋅)∈픘 퐽(푥, 푡, 푢(⋅)) = 푥푇 P(푡)푥

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 34/ 41

slide-35
SLIDE 35

Finite Horizon: LQR Solution (part 1)

∙ Analytic solution can be constructed from a dynamic programming argument

∙ Start at state ˆ

푥 and take 푢(푠) = ˆ 푢 fixed over a small time interval 푠 ∈ [푡, 푡 + ℎ]

∙ Cost incurred is

∫ 푡+ℎ

푥푇 (푠)Q푥(푠) + 푢푇 (푠)R푢(푠) 푑푠 ≈ ℎ(ˆ 푥푇 Qˆ 푥 + ˆ 푢푇 Rˆ 푢)

∙ State after that time period is 푥(푡 + ℎ) ≈ ˆ

푥 + ℎ(Aˆ 푥 + Bˆ 푢)

∙ Value function at that new state is

푉 (푥(푡 + ℎ), 푡 + ℎ) = 푥푇 (푡 + ℎ)P(푡 + ℎ)푥(푡 + ℎ) ≈ (ˆ 푥 + ℎ(Aˆ 푥 + Bˆ 푢))푇 (P(푡) + ℎ ˙ P(푡))(ˆ 푥 + ℎ(Aˆ 푥 + Bˆ 푢)) ≈ ˆ 푥푇 P(푡)ˆ 푥 + ℎ ( (Aˆ 푥 + Bˆ 푢)푇 P(푡)ˆ 푥 +ˆ 푥P(푡)(Aˆ 푥 + Bˆ 푢) + ˆ 푥푇 ˙ P(푡)ˆ 푥 )

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 35/ 41

slide-36
SLIDE 36

Finite Horizon: LQR Solution (part 2)

∙ Dynamic programming derivation of LQR solution

∙ Dynamic programming principle

푉 (ˆ 푥, 푡) = min

푢(⋅)∈픘

[∫ 푡+ℎ

푥푇 (푠)Q푥(푠) + 푢푇 (푠)R푢(푠) 푑푠 + 푉 (푥(푡 + ℎ), 푡 + ℎ) ] ˆ 푥푇 P(푡)ˆ 푥 = min

ˆ 푢∈ℝ푑푢

⎡ ⎢ ⎢ ⎣ ℎ(ˆ 푥푇 Qˆ 푥 + ˆ 푢푇 Rˆ 푢) + ˆ 푥푇 P(푡)ˆ 푥 +ℎ ( (Aˆ 푥 + Bˆ 푢)푇 P(푡)ˆ 푥 +ˆ 푥P(푡)(Aˆ 푥 + Bˆ 푢) + ˆ 푥푇 ˙ P(푡)ˆ 푥 ) ⎤ ⎥ ⎥ ⎦ 0 = min

ˆ 푢∈ℝ푑푢 ℎ

[ ˆ 푥푇 Qˆ 푥 + ˆ 푢푇 Rˆ 푢 + (Aˆ 푥 + Bˆ 푢)푇 P(푡)ˆ 푥 +ˆ 푥P(푡)(Aˆ 푥 + Bˆ 푢) + ˆ 푥푇 ˙ P(푡)ˆ 푥 ]

∙ Set derivative with respect to ˆ

푢 to be zero to find optimal ˆ 푢 2ℎ(ˆ 푢R + ˆ 푥푇 P(푡)B) = 0 ˆ 푢∗ = −R−1B푇 P(푡)ˆ 푥

∙ Substitute ˆ

푢∗ into dynamic programming equation and solve for ˙ P(푡) to find Riccati differential equation − ˙ P(푡) = A푇 P(푡) + P(푡)A − P(푡)BR−1B푇 P(푡) + Q with terminal condition P(푇) = Q푓

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 36/ 41

slide-37
SLIDE 37

(In)Finite Horizon: Steady State LQR

∙ In conclusion: LQR value function is 푉 (푥, 푡) = 푥푇 P(푡)푥 where P(푡) is the

solution to a terminal value (matrix) ODE

∙ In practice, P(푡) and ˆ

푢∗ rapidly converge to steady state values

∙ Solve (continuous time) algebraic Riccati equation for steady state P

A푇 P + PA − PBR−1B푇 P + Q = 0

∙ Time-independent state feedback given by

푢(푡) = K푥(푡) where K = −R−1B푇 P ∙ See Stanford’s EE363: Linear Dynamical Systems (Stephen Boyd)

http://www.stanford.edu/class/ee363/

∙ This and several more derivations given in lecture notes 4 (Continuous LQR) ∙ Other lectures discuss discrete time, Kalman filter (eg: LQR for state

estimation), . . . ∙ See any textbook on “state space” / “modern” control

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 37/ 41

slide-38
SLIDE 38

Minimum Time: Double Integrator

∙ The double integrator is one of the simplest systems which is not STLC

∙ System states are position 푥1 and velocity 푥2, and the input is the acceleration

푢 ∈ 풰 = [−1, +1] 푓(푥, 푢) = [푥2 푢 ] ∙ If target is the origin

푉 (푥) = ⎧  ⎨  ⎩ 푥2 + √ 4푥1 + 2푥2

2,

if 푥1 > 1

2푥2∣푥2∣;

−푥2 + √ −4푥1 + 2푥2

2,

if 푥1 < 1

2푥2∣푥2∣;

∣푥2∣, if 푥1 < 1

2푥2∣푥2∣

∙ Dynamics are small time controllable at the origin, so value function is

continuous, but not Lipschitz continuous

∙ Optimal trajectories / characteristics travel along the curve where Lipschitz

continuity fails ∙ If target set is not a circle, value function is discontinuous ∙ See Optimal Control, Athans & Falb (1966) or Applied Optimal Control,

Bryson & Ho (1975) or many others

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 38/ 41

slide-39
SLIDE 39

Known Solutions for More Complex Dynamics (part 1)

∙ Unicycle model:

푥 = ⎡ ⎣ 푥1 푥2 휃 ⎤ ⎦ ˙ 푥 = ⎡ ⎣ 푣 cos 휃 푣 sin 휃 휔 ⎤ ⎦ where (푥1, 푥2) is position in the plane, 휃 is heading, 푣 is linear velocity and 휔 is angular velocity

∙ Dubins’ car: Unicycle with fixed positive linear velocity and bounded angular

velocity

∙ Alternative viewpoint: unicycle with minimum turn radius ∙ Minimum time to reach is generally discontinuous ∙ Extensive study of combinatorial aspects of optimal paths in robotics literature:

  • ptimal paths include CCC or CSC forms, where C is a minimum radius left or

right arc of a circle (possibly of zero length) and S is a straight segment

∙ For example, see Bui, Boissonnat, Sou`

eres & Laumond, “Shortest Path Synthesis for Dubins Non-holonomic Robot,” ICRA 1994

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 39/ 41

slide-40
SLIDE 40

Known Solutions for More Complex Dynamics (part 2)

∙ Game of two identical vehicles: Collision avoidance with two adversarial

Dubins’ cars

∙ Solved in relative coordinate system, so state space remains three dimensional ∙ Reachability problem becomes a two player zero sum differential game becomes

a HJI PDE

∙ Analytic optimal trajectories can also be enumerated and points on the

boundary of the reachable set determined

∙ Optimal characteristics both converge and diverge, causing challenges for

Lagrangian approaches

∙ More details in subsequent set of slides ∙ See Mitchell, “Games of Two Identical Vehicles,” Stanford University

Department of Aeronautics and Astronautics Report (SUDAAR) 740 (2001). ∙ In summary, there is no shortage of toy optimal control problems with analytic

solutions

∙ On the other hand, there is no shortage of real optimal control problems

without analytic solutions

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 40/ 41

slide-41
SLIDE 41

Viscosity Solution & Control References

∙ Crandall & Lions (1983) original publication ∙ Crandall, Evans & Lions (1984) current formulation ∙ Evans & Souganidis (1984) for differential games ∙ Crandall, Ishii & Lions (1992) “User’s guide” for viscosity solutions of

degenerate ellipic and parabolic equations (dense reading)

∙ Viscosity Solutions & Applications Springer Lecture Notes in Mathematics

(1995), featuring Bardi, Crandall, Evans, Soner & Souganidis (Capuzzo-Dolcetta & Lions eds)

∙ Optimal Control & Viscosity Solutions of Hamilton-Jacobi-Bellman Equations,

Bardi & Capuzzo-Dolcetta (1997)

∙ Partial Differential Equations, Evans (3rd ed, 2002)

Optimal Control & Viscosity Solutions Ian M. Mitchell— UBC Computer Science 41/ 41