A near model-free method for solving the Hamilton-Jacobi-Bellman - - PowerPoint PPT Presentation

a near model free method for solving the hamilton jacobi
SMART_READER_LITE
LIVE PREVIEW

A near model-free method for solving the Hamilton-Jacobi-Bellman - - PowerPoint PPT Presentation

A near model-free method for solving the Hamilton-Jacobi-Bellman equation in high dimensions Mathias Oster , Leon Sallandt, Reinhold Schneider Technische Universit at Berlin ICODE Workshop on numerical solutions of HJB equations 10.01.2020


slide-1
SLIDE 1

A near model-free method for solving the Hamilton-Jacobi-Bellman equation in high dimensions

Mathias Oster, Leon Sallandt, Reinhold Schneider

Technische Universit¨ at Berlin ICODE Workshop on numerical solutions of HJB equations

10.01.2020

slide-2
SLIDE 2

Motivation and Ingredients

Aim: Calculate optimal feedback laws (via HJB) for controlled PDEs. Ingredients:

1 Reformulate the HJB equation as operator equation. 2 Use Monte Carlo integration for least squares approximation. 3 Use non linear, smooth Ansatz space: HT/TT – tree-based tensors. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 2 / 21

slide-3
SLIDE 3

Classical optimal control problem

Optimal control problem: find u ∈ L2(0, ∞) such that min

u J(x, u) = min u

∞ 1 2x(s)2

Rn + λ

2 |u(s)|2ds, subject to ˙ x = f (x, u), x ∈ Ω ⊂ R x(0) = x0

1 Note that the differential equation can be high-dimensional 2 linear ODE and quadratic cost → Riccati equation 3 nonlinear ODE and nonlinear cost → Hamilton-Jacobi-Bellman (HJB)

equation

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 3 / 21

slide-4
SLIDE 4

Feedback control problem

Define a feedback-law α(x(t)) = u(t). Rephrase min

α Jα(x) = min α

∞ 1 2x(s, α)2

Rn + λ

2 |(α(x))(s)|2

  • =:rα(x)

ds, Our goal: find an optimal feedback law α∗(x) = u. Defining the value function v(x) := inf

α Jα(x) ∈ R

Idea: if v is differentiable, the feedback law is given by α(x) = − 1 λDxv(x) ◦ Duf (x, u) (easy to calculate!).

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 4 / 21

slide-5
SLIDE 5

The HJB equation

The value function obeys inf

α

  • f (x, α(x)) · ∇v(x) + rα(x)
  • = 0

HJB equation is highly nonlinear and potentially high-dimensional! But: For fixed policy α(x) it reduces to a linear equation: Defining Lα := −f (x, α) · ∇ we get Lαvα(x) − rα(x) = 0.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 5 / 21

slide-6
SLIDE 6

Methods of characteristics

Linearized HJB: Lαvα(x) − rα(x) = 0. Using the methods of characteristics we obtain ˙ x(t) = f (x, α), vα(x(0)) = τ rα(x(t))dt + vα(x(s)), which we call Bellman-like equation.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 6 / 21

slide-7
SLIDE 7

Reformulation as Operator Equation

Consider the Koopman operator: K α

τ : Lloc,∞(Ω) → Lloc,∞(Ω),

K α

τ [g](x) = g(x(τ)).

Rewrite the Bellman-like equation: For all x ∈ Ω: vα(x(0)) = τ rα(x(t))dt + vα(x(s)), as (Id − K α

τ )[v](x) =

τ K α

t r(x)dt

  • =:Rα

τ (x)

.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 7 / 21

slide-8
SLIDE 8

Policy iteration

Policy iteration uses a sequence of linearized HJB equations.

Algorithm (Policy iteration)

Initialize with stabilizing feedback α0. Solve until convergence

1 Find vi+1 such that (Id − K αi

τ )vi+1(·) − Rαi τ (·) = 0.

2 Update policy according to αi+1(x) = − 1

λDxvi+1(x) ◦ Duf (t, x, u).

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 8 / 21

slide-9
SLIDE 9

Least squares ansatz

Problem: We need to solve (Id − K αi

τ )vαi+1(·) − Rαi τ (·) = 0.

Idea: Solve on suitable S vαi+1 = arg min

v∈S

(Id − K αi

τ )v(·) − Rαi τ (·)2 L2(Ω)

  • =
  • Ω |(Id−K

αi τ )v(x)−R αi τ (x)|2dx

.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 9 / 21

slide-10
SLIDE 10

Projected Policy iteration

Algorithm (Projected Policy iteration)

Initialize with stabilizing feedback α0. Solve until convergence

1 Find

vi+1 = arg min

v∈S

(Id − K αi

τ )v(·) − Rαi τ (·)2 L2(Ω).

2 Update policy according to αi+1(x) = − 1

λDxvi+1(x) ◦ Duf (x, u).

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 10 / 21

slide-11
SLIDE 11

Variational Monte-Carlo

Approximate by Monte-Carlo quadrature (Id − K αi

τ )v(·) − Rαi τ (·)2 L2(Ω) ≈ 1

n

n

  • j=1

|(Id − K αi

τ )v(xj) − Rαi τ (xj)|2.

v∗

n,s = arg min v∈S

1 n

n

  • j=1

|(Id − K αi

τ )v(xj) − Rαi τ (xj)|2

Proposition ([Eigel, Schneider et al, 19)

] Let ǫ > 0 such that inf

vs∈S v∗ − vs2 L2(Ω) ≤ ǫ. Then

P[v∗ − v∗

(n,s)2 L2(Ω) > ǫ] ≤ c1(ǫ)e−c2(ǫ)n

with c1, c2 > 0. Exponential decay with number of samples chosen.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 11 / 21

slide-12
SLIDE 12

Solving the VMC equation

arg min

n

  • j=1

|(Id − K αi

τ )v(xj) − Rαi τ (xj)|2.

1 v(xj) → evaluate v at samples xj. 2 K αi

τ v(xj) → evaluate v at transported samples (with policy αi).

3 Rαi

τ (xj) → approximate reward by trapezoidal rule

What do we need for solving the equation?

Model-free solution is possible. Only a black-box solver of the ODE is needed.

What do we need for updating the policy?

We need Duf (x, u), i.e. The derivative of the rhs w.r.t. the control.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 12 / 21

slide-13
SLIDE 13

Possible ansatz spaces

Full linear space of polynomials Low-rank tensor manifolds Deep Neural Networks

Here used:

Low rank Tensor Train (TT-tensor) manifold Riemanian manifold structure Explicit representation of tangential space Convergence theory for optimization algorithms

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 13 / 21

slide-14
SLIDE 14

Tensor Trains

Consider Πi = (1, xi, x2

i , x3 i , .., xk i ) one-dimensional polynomials.

Tensor product Π = n

i=1 Πi.

dim(Π) = (k + 1)n, huge if n > > 0. Reduce size of Ansatz space by considering non-linear M ⊂ Π. v(x) =

A1 A2 A3 A4 P P P P

x1 x2 x3 x4 r1 r2 r3

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 14 / 21

slide-15
SLIDE 15

Cost functional

Modify cost-functional: RN(v) = 1 n

n

  • j=1

|(Id − K αi

τ )v(xj) − R(xj)|2

+ |v(0)|2 + |∇v(0)|2

  • vanishes in exact case

+ µv2

H1(Ω)

  • regularizer

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 15 / 21

slide-16
SLIDE 16

Example: Schloegl-like equation

Consider a Schl¨

  • gl like system with Neumann boundary condition, c.f. [1,

Dolgov, Kalise, Kunisch, 19]. Solve for x ∈ Ω = L2(−1, 1) min

u J(x, u) = min u

∞ 1 2x(s)2 + λ 2 |u(s)|2ds, subject to ˙ x(t) = σ∆x(t) + x(t)3 + χωu(t) x(0) = x0. χω is characteristic function on ω = [−0.4, 0, 4]. After discretization in space (finite differences):    ˙ x1 . . . ˙ xn    = A    x1 . . . xn    +    x3

1

. . . x3

n

   + u         . . . 1 . . .        

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 16 / 21

slide-17
SLIDE 17

Example: Schloegl-like equation

TT Degrees of Freedom

Full space: 532. Reduced to ≈ 5000.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 17 / 21

slide-18
SLIDE 18

Example: Schloegl-like equation

−1.0 −0.5 0.0 0.5 1.0 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 x0 x1

(a) Initial values.

x0 x1 2 4 6 Cost 2.85 5.74 2.15 2.88 2.12 2.83 |v(x0) − J (x0, α(x0))|2 |v(x1) − J (x1, α(x1))|2 10−2 10−1 100 101 Bellman error squared

(b) Generated cost and least squares

  • error. Blue is Riccati, orange is VL2 and

green is VH1.

Figure: The generated controls for different initial values.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 18 / 21

slide-19
SLIDE 19

Example: Schloegl-like equation

1 2 3 4 5 time −20 −15 −10 −5 Riccati VL2 VH1

(a) Generated controls, initial value x0

1 2 3 4 5 time −17.5 −15.0 −12.5 −10.0 −7.5 −5.0 −2.5 0.0 Riccati VL2 VH1

(b) Generated controls, initial value x1

Figure: The generated controls for different initial values.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 19 / 21

slide-20
SLIDE 20

What do we need for optimization

We only need a discretization of the flow Φ (blackbox) the derivative of the rhs f (x, u) w.r.t. the control (easy if linear) the cost functional to solve the equation and generate a feedback law.

Thank you for your attention

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 20 / 21

slide-21
SLIDE 21

References and related work

Sergey Dolgov, Dante Kalise, and Karl Kunisch. A Tensor Decomposition Approach for High-Dimensional Hamilton-Jacobi-Bellman Equations. arXiv e-prints, page arXiv:1908.01533, Aug 2019. Martin Eigel, Reinhold Schneider, Philipp Trunschke, and Sebastian Wolf. Variational monte carlo—bridging concepts of machine learning and high-dimensional partial differential equations. Advances in Computational Mathematics, Oct 2019. Mathias Oster, Leon Sallandt, and Reinhold Schneider. Approximating the stationary hamilton-jacobi-bellman equation by hierarchical tensor products, 2019.

Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 21 / 21