Stability of Feedback Equilibrium Solutions for Noncooperative - - PowerPoint PPT Presentation

stability of feedback equilibrium solutions for
SMART_READER_LITE
LIVE PREVIEW

Stability of Feedback Equilibrium Solutions for Noncooperative - - PowerPoint PPT Presentation

Stability of Feedback Equilibrium Solutions for Noncooperative Differential Games Alberto Bressan Department of Mathematics, Penn State University Alberto Bressan (Penn State) Noncooperative Games 1 / 33 Differential games: the PDE approach


slide-1
SLIDE 1

Stability of Feedback Equilibrium Solutions for Noncooperative Differential Games

Alberto Bressan

Department of Mathematics, Penn State University

Alberto Bressan (Penn State) Noncooperative Games 1 / 33

slide-2
SLIDE 2

Differential games: the PDE approach

The search for equilibrium solutions to noncooperative differential games in feedback form leads to a nonlinear system of Hamilton-Jacobi PDEs for the values functions main focus: existence and stability of the solutions to these PDEs

Alberto Bressan (Penn State) Noncooperative Games 2 / 33

slide-3
SLIDE 3

Any system can be locally approximated by a linear one: ˙ x = f (x, u1, u2), ˙ x = Ax + B1u1 + B2u2 Any cost functional can be locally approximated by a quadratic one. Main issue: Assume that an equilibrium solution is found, to an approximating game with linear dynamics and quadratic costs. Does the original nonlinear game also have an equilibrium solution, close to the linear-quadratic one? Two cases (i) finite time horizon (ii) infinite time horizon

Alberto Bressan (Penn State) Noncooperative Games 3 / 33

slide-4
SLIDE 4

An example - approximating a Cauchy problem for a PDE

ut = uxx u(0, x) = ϕ(x) Approximate the initial data: ϕ(x) ≈ ax2 + bx + c Conclude: u(t, x) ≈ ax2 + bx + c + 2at CORRECT for t ≥ 0 WRONG for t < 0

Alberto Bressan (Penn State) Noncooperative Games 4 / 33

slide-5
SLIDE 5

Noncooperative differential games with finite horizon

Finite Horizon Games x ∈ Rn state of the system u1, u2 controls implemented by the players Dynamics: ˙ x(t) = f (x, u1, u2) x(τ) = y Goal of i-th player: maximize: Ji(τ, y, u1, u2) . = ψi

  • x(T)

T

τ

Li

  • x(t), u1(t), u2(t)
  • dt

= [terminal payoff] - [integral of a running cost]

Alberto Bressan (Penn State) Noncooperative Games 5 / 33

slide-6
SLIDE 6

Seek: Nash equilibrium solutions in feedback form ui = u∗

i (t, x)

  • Given the strategy u2 = u∗

2(t, x) adopted by the second player,

for every initial data (τ, y), the assignment u1 = u∗

1(t, x) is a feedback solution to

  • ptimal control problem for the first player :

max

u1(·)

  • ψ1(x(T)) −

T

τ

L1(x, u1, u∗

2(t, x)) dt

  • subject to

˙ x = f (x, u1, u∗

2(t, x)),

x(τ) = y

  • Similarly u2 = u∗

2(t, x) should provide a solution to the optimal control problem

for the second player, given that u1 = u∗

1(t, x)

Alberto Bressan (Penn State) Noncooperative Games 6 / 33

slide-7
SLIDE 7

The system of PDEs for the value functions

Vi(τ, y) = value function for the i-th player (= expected payoff, if game starts at τ, y) Assume:    f (x, u1, u2) = f1(x, u1) + f2(x, u2) Li(x, u1, u2) = Li1(x, u1) + Li2(x, u2) Optimal feedback controls: u∗

i

= u∗

i (t, x, ∇Vi) = argmax ω

  • ∇Vi(t, x) · fi(x, ω) − Lii(x, ω)
  • The value functions satisfy a system of PDE’s

∂tVi + ∇Vi · f (x, u∗

1, u∗ 2) = Li(x, u∗ 1, u∗ 2)

i = 1, 2 with terminal condition: Vi(T, x) = ψi(x)

Alberto Bressan (Penn State) Noncooperative Games 7 / 33

slide-8
SLIDE 8

Systems of Hamilton-Jacobi PDEs

Finite horizon game    ∂tV1 = H(1)(x, ∇V1, ∇V2) , ∂tV2 = H(2)(x, ∇V1, ∇V2) ,    V1(T, x) = ψ1(x) V2(T, x) = ψ2(x) (backward Cauchy problem, with terminal conditions)

Alberto Bressan (Penn State) Noncooperative Games 8 / 33

slide-9
SLIDE 9

Test well-posedness: by locally linearizing of the equations ∂tVi = H(i)(x, ∇V1, ∇V2) i = 1, 2 perturbed solution: V (ε)

i

= Vi + ε Zi + o(ε) Differentiating H(i)(x, p1, p2), obtain a linear equation satisfied by Zi ∂tZi = ∂H(i) ∂p1 (x, ∇V1, ∇V2) · ∇Z1 + ∂H(i) ∂p2 (x, ∇V1, ∇V2) · ∇Z2

Alberto Bressan (Penn State) Noncooperative Games 9 / 33

slide-10
SLIDE 10

Freezing the coefficients at a point (¯ x, ∇V1(¯ x), ∇V2(¯ x)), one obtains a linear system with constant coefficients Z1 Z2

  • t

+

n

  • j=1

Aj Z1 Z2

  • xj

= 0 (1) Each Aj is a 2 × 2 matrix For a given vector ξ = (ξ1, . . . , ξn) ∈ Rn, consider the matrix A(ξ) . =

  • j

ξjAj Definition 1. The system (1) is hyperbolic if there exists a constant C such that sup

ξ∈Rn

  • exp iA(ξ)
  • ≤ C

Alberto Bressan (Penn State) Noncooperative Games 10 / 33

slide-11
SLIDE 11

Computing solutions in terms of Fourier transform, which is an isometry on L2, the above definition is motivated by Theorem 1. The system (1) is hyperbolic if and only if the corresponding Cauchy problem is well posed in L2(Rn).

  • Z(t)
  • L2

=

  • Z(t)
  • L2 ≤

sup

ξ∈Rn

  • exp(−iA(ξ))
  • ·
  • Z(0)
  • L2

= sup

ξ∈Rn

  • exp iA(ξ)
  • ·
  • Z(0)
  • L2

Lemma 1 (necessary condition). If the system (1) is hyperbolic, then for every ξ ∈ Rm the matrix A(ξ) has a basis of eigenvectors r1, . . . , rn, with real eigenvalues λ1, . . . , λn (not necessarily distinct). Lemma 2 (sufficient condition). Assume that, for |ξ| = 1, the matrices A(ξ) can be diagonalized in terms of a real, invertible matrix R(ξ) continuously depending on ξ. Then the system (1) is hyperbolic.

Alberto Bressan (Penn State) Noncooperative Games 11 / 33

slide-12
SLIDE 12

A class of differential games

dynamics: ˙ x = f1(x, u1) + f2(x, u2) payoffs: Ji = ψi(x(T)) − T

  • Li1(x, u1) + Li2(x, u2)
  • dt ,

i = 1, 2 Value functions satisfy    ∂tV1 + ∇V1 · f (x, u♯

1, u♯ 2)

= L1(x, u♯

1, u♯ 2)

∂tV2 + ∇V2 · f (x, u♯

1, u♯ 2)

= L2(x, u♯

1, u♯ 2)

u♯

i

= u♯

i (x, ∇Vi) =

argmax

ω

  • ∇Vi · fi(x, ω) − Lii(x, ω)
  • ,

i = 1, 2 V1(T, x) = ψ1(x) , V2(T, x) = ψ2(x)

Alberto Bressan (Penn State) Noncooperative Games 12 / 33

slide-13
SLIDE 13

f = (f1, . . . , fn), ∇Vi = pi = (pi1, . . . pin) Evolution of a perturbation: Zi,t+

n

  • k=1

fk Zi,xk+

n

  • k=1

2

  • j=1
  • ∇Vi · ∂f

∂uj − ∂Li ∂uj ∂u♯

j

∂p1k Z1,xk+ ∂u♯

j

∂p2k Z2,xk

  • = 0

Maximality conditions = ⇒ ∇V1 · ∂f ∂u1 − ∂L1 ∂u1 = 0 , ∇V2 · ∂f ∂u2 − ∂L2 ∂u2 = 0

Alberto Bressan (Penn State) Noncooperative Games 13 / 33

slide-14
SLIDE 14

Evolution of a first order perturbation

  • Z1,t

Z2,t

  • +

n

  • k=1

Ak

  • Z1,xk

Z2,xk

  • =
  • where the 2 × 2 matrices Ak are given by

Ak =       fk

  • ∇V1 · ∂f

∂u2 − ∂L1 ∂u2 ∂u♯

2

∂p2k

  • ∇V2 · ∂f

∂u1 − ∂L2 ∂u1 ∂u♯

1

∂p1k fk      

Alberto Bressan (Penn State) Noncooperative Games 14 / 33

slide-15
SLIDE 15

A(ξ) =

n

  • k=1

      fkξk

  • ∇V1 · ∂f

∂u2 − ∂L1 ∂u2 ∂u♯

2

∂p2k ξk

  • ∇V2 · ∂f

∂u1 − ∂L2 ∂u1 ∂u♯

1

∂p1k ξk fkξk       HYPERBOLICITY = ⇒ A(ξ) has real eigenvalues for every ξ v = (v1, . . . , vn) , vk . =

  • ∇V1 · ∂f

∂u2 − ∂L1 ∂u2 ∂u♯

2

∂p2k w = (w1, . . . , wn) , wk . =

  • ∇V2 · ∂f

∂u1 − ∂L2 ∂u1 ∂u♯

1

∂p1k HYPERBOLICITY = ⇒ (v · ξ)(w · ξ) ≥ 0 for all ξ ∈ Rn

Alberto Bressan (Penn State) Noncooperative Games 15 / 33

slide-16
SLIDE 16

HYPERBOLICITY = ⇒ (v · ξ)(w · ξ) ≥ 0 for all ξ ∈ Rn TRUE if v, w are linearly dependent, with same orientation. FALSE if v, w are linearly independent.

w v w v ξ ξ

Alberto Bressan (Penn State) Noncooperative Games 16 / 33

slide-17
SLIDE 17

In one space dimension, the Cauchy Problem can be well posed for a large set of data. In several space dimensions, generically the system is hyperbolic, and the Cauchy Problem is ill posed

A.B., W.Shen, Small BV solutions of hyperbolic non-cooperative differential games, SIAM J. Control Optim. 43 (2004), 194–215. A.B., W.Shen, Semi-cooperative strategies for differential games, Intern. J. Game Theory 32 (2004), 561–593. A.B., Noncooperative differential games. Milan J. Math., 79 (2011), 357–427.

Alberto Bressan (Penn State) Noncooperative Games 17 / 33

slide-18
SLIDE 18

Differential games in infinite time horizon

Dynamics: ˙ x = f (x, u1, u2), x(0) = x0 u1, u2 controls implemented by the players Goal of i-th player: maximize: Ji . = +∞ e−γt Ψi

  • x(t), u1(t), u2(t)
  • dt

(running payoff, exponentially discounted in time)

Alberto Bressan (Penn State) Noncooperative Games 18 / 33

slide-19
SLIDE 19

A special case

Dynamics: ˙ x = f (x) + g1(x)u1 + g2(x)u2 Player i seeks to minimize: Ji = ∞ e−γt

  • φi(x(t)) + u2

i (t)

2

  • dt

Alberto Bressan (Penn State) Noncooperative Games 19 / 33

slide-20
SLIDE 20

A system of PDEs for the value functions

The value functions V1, V2 for the two players satisfy the system of H-J equations    γV1 = (f · ∇V1) − 1

2(g1 · ∇V1)2 − (g2 · ∇V1)(g2 · ∇V2) + φ1

γV2 = (f · ∇V2) − 1

2(g2 · ∇V2)2 − (g1 · ∇V1)(g1 · ∇V2) + φ2

Optimal feedback controls: u∗

i (x) =

− ∇Vi(x) · gi(x) i = 1, 2 nonlinear, implicit !

Alberto Bressan (Penn State) Noncooperative Games 20 / 33

slide-21
SLIDE 21

Linear - Quadratic games

Assume that the dynamics is linear: ˙ x = (Ax + b0) + b1u1 + b2u2 , x(0) = y and the cost functions are quadratic: Ji = +∞ e−γt ai · x + xTPix + u2

i

2

  • dt

Then the system of PDEs has a special solution of the form Vi(x) = ki + βi · x + xTΓix i = 1, 2 (∗)

  • ptimal controls: u∗

i (x) =

− (βi + 2xTΓi) · bi To find this solution, it suffices to determine the coefficients ki, βi, Γi by solving a system of algebraic equations

Alberto Bressan (Penn State) Noncooperative Games 21 / 33

slide-22
SLIDE 22

Validity of linear-quadratic approximations ?

Assume the dynamics is almost linear ˙ x = f0(x)+g1(x)u1+g2(x)u2 ≈ (Ax+b0)+b1u1+b2u2 , x(0) = y and the cost functions are almost quadratic Ji = +∞ e−γt φi(x) + u2

i

2

  • dt ≈

+∞ e−γt ai · x + xTPix + u2

i

2

  • dt

Is it true that the nonlinear game has a feedback solution close to the linear-quadratic game?

Alberto Bressan (Penn State) Noncooperative Games 22 / 33

slide-23
SLIDE 23

One-dimensional, linear-quadratic games in infinite time horizon ˙ x = (a0x + b0) + b1u1 + b2u2 The ODE for the derivatives of the value functions ξi = V ′

i

takes the form   A11 A12 A21 A22     ξ′

1

ξ′

2

  =   ψ1(x, ξ1, ξ2) ψ2(x, ξ1, ξ2)   , Aij = Aij(x, ξ1, ξ2) The map (x, ξ1, ξ2) → det A(x, ξ1, ξ2) is a homogeneous quadratic polynomial An affine solution exists: ξ∗

1(x) = k1x + β1 ,

ξ∗

2(x) = k2x + β2

The map x → det A(x, ξ∗

1(x), ξ∗ 2(x)) is a quadratic polynomial

Alberto Bressan (Penn State) Noncooperative Games 23 / 33

slide-24
SLIDE 24

Stability w.r.t. perturbations

(A.B. - Khai Nguyen, 2016)

  • n a bounded interval Ω = [a, b]

(positively invariant for the feedback dynamics)

  • n the whole real line

x ξ ξ (x)

∗ ∗

(x) ξ∗ (x) Γ

_

x x

Γ− =

  • (x, ξ1, ξ2) ;

det A(x, ξ1, ξ2) ≤ 0

  • Alberto Bressan (Penn State)

Noncooperative Games 24 / 33

slide-25
SLIDE 25

Stability over a bounded interval

Easy case: det A(x, ξ∗

1(x), ξ∗ 2(x)) = 0 for all x ∈ R.

  ξ′

1

ξ′

2

  = A−1(x, ξ1, ξ2)   ψ1(x, ξ1, ξ2) ψ2(x, ξ1, ξ2)   The linear-quadratic game has a 2-parameter family of Nash equilibrium solutions in feedback form. One is affine, the other are nonlinear. All of the above solutions are stable w.r.t. small nonlinear perturbations of the dynamics and the cost functions.

x ξ1

2

ξ ξ

*

(x) Alberto Bressan (Penn State) Noncooperative Games 25 / 33

slide-26
SLIDE 26

Case 2: det A vanishes at two points ¯ x1 < ¯ x2

  A11 A12 A21 A22     ξ′

1

ξ′

2

  =   ψ1(x, ξ1, ξ2) ψ2(x, ξ1, ξ2)   Equivalent system: A11 dξ1 + A12 dξ2 − ψ1 dx = 0 A21 dξ1 + A22 dξ2 − ψ2 dx = 0 Setting: v . =   −ψ1 A11 A12   , w . =   −ψ2 A21 A22   , We seek continuously differentiable functions x → (ξ1(x), ξ2(x)) whose graph is

  • btained by concatenating trajectories of the system

  ˙ x ˙ ξ1 ˙ ξ2   = v × w =   A11A22 − A12A21 A22ψ1 − 12ψ2 A11ψ2 − A21ψ1   .

Alberto Bressan (Penn State) Noncooperative Games 26 / 33

slide-27
SLIDE 27

det A11 A12 A21 A22

  • = 0
  • n

Σ1 ∪ Σ2 Under generic conditions on the coefficients of the linear-quadratic problem, there exists two curves γ1 ⊂ Σ1 , γ2 ⊂ Σ2 such that v × w = 0 on γ1 and on γ2 v × w is vertical on Σ1 \ γ1 and on Σ2 \ γ2

(x) x ξ ξ ξ∗ Σ1

1

γ

2

Σ γ2

1 2 Alberto Bressan (Penn State) Noncooperative Games 27 / 33

slide-28
SLIDE 28

Three generic cases

2 1 1 1 2 2

γ

unique solution

  • ne − parameter family of solutions

two − parameter family of solutions

γ γ γ γ γ

Alberto Bressan (Penn State) Noncooperative Games 28 / 33

slide-29
SLIDE 29

The saddle-saddle case

2 1

*

x ξ1

2

ξ Σ2 γ2

1

γ

1

Σ

1

M M ξ* (x) x _ _ x

1 2

P*

2

P

Under generic assumptions, a unique solution exists

Alberto Bressan (Penn State) Noncooperative Games 29 / 33

slide-30
SLIDE 30

Stability under perturbations, on the entire real line

(A.B., K.Nguyen, 2016)

˙ x = a0x + f0(x) + (b1 + h1(x))u1 + (b2 + h2(x))u2 Ji = +∞ e−γt

  • Rix + Six2 + ηi(x) + u2

i

2

  • dt

Under generic assumptions on the coefficients a0, b1, b2, . . ., for any ε > 0 there exists δ > 0 such that the following holds. If the perturbations satisfy f0C2 + η1C2 + η2C2 + h1C1 + h2C1 ≤ δ, then the perturbed equations for ξ1 = V ′

1, ξ2 = V ′ 2 admit a solution such that

|ξ1(x) − ξ∗

1(x)| + |ξ2(x) − ξ∗ 2(x)| ≤ ε(1 + |x|)

for all x ∈ R

Alberto Bressan (Penn State) Noncooperative Games 30 / 33

slide-31
SLIDE 31

Future work: x ∈ R2

Vi = Vi(x1, x2)    γV1 = (f · ∇V1) + 1

2(g1 · ∇V1)2 + (g2 · ∇V1)(g2 · ∇V2) + φ1

γV2 = (f · ∇V2) + 1

2(g2 · ∇V2)2 + (g1 · ∇V1)(g1 · ∇V2) + φ2

Linearize around the affine solution of a L-Q game Determine if this linearized PDE is elliptic, hyperbolic, or mixed type Construct solutions to the perturbed nonlinear PDE

hyperbolic elliptic

x1 x2

Alberto Bressan (Penn State) Noncooperative Games 31 / 33

slide-32
SLIDE 32

1956

Happy Birthday

#

Jean Michel !! * * * * * * *

# # # # #

Alberto Bressan (Penn State) Noncooperative Games 32 / 33