A multilevel stochastic gradient algorithm for PDE-constrained - - PowerPoint PPT Presentation

a multilevel stochastic gradient algorithm for pde
SMART_READER_LITE
LIVE PREVIEW

A multilevel stochastic gradient algorithm for PDE-constrained - - PowerPoint PPT Presentation

A multilevel stochastic gradient algorithm for PDE-constrained optimal control problems under uncertainty Fabio Nobile CSQI Institute of Mathematics, EPFL, Switzerland Joint work with : M. Martin (Criteo, Grenoble), S. Krumscheid (RWTH


slide-1
SLIDE 1

A multilevel stochastic gradient algorithm for PDE-constrained optimal control problems under uncertainty

Fabio Nobile

CSQI – Institute of Mathematics, EPFL, Switzerland

Joint work with: M. Martin (Criteo, Grenoble), S. Krumscheid (RWTH Aachen),

  • P. Tsilifis (EPFL)

RICAM Special Semestre on Optimization Workshop 3 “Optimization and Inversion under Uncertainty” November 11-15, 2019, Linz, Austria

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 1

slide-2
SLIDE 2

Outline

1

Problem setting – quadratic optimal control problem

2

Discretization by finite elements + Monte Carlo

3

Deterministic (CG) iterative solvers versus Stochastic Gradient

4

Multilevel stochastic gradient algorithms

5

Conclusions

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 2

slide-3
SLIDE 3

Problem setting – quadratic optimal control problem

Outline

1

Problem setting – quadratic optimal control problem

2

Discretization by finite elements + Monte Carlo

3

Deterministic (CG) iterative solvers versus Stochastic Gradient

4

Multilevel stochastic gradient algorithms

5

Conclusions

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 3

slide-4
SLIDE 4

Problem setting – quadratic optimal control problem

Problem setting

(Ω, F, P): complete probability space

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 4

slide-5
SLIDE 5

Problem setting – quadratic optimal control problem

Problem setting

(Ω, F, P): complete probability space D ⊂ Rd: physical domain. Throughout · = · L2(D).

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 4

slide-6
SLIDE 6

Problem setting – quadratic optimal control problem

Problem setting

(Ω, F, P): complete probability space D ⊂ Rd: physical domain. Throughout · = · L2(D). Forward problem

  • −div(a(x, ω)∇y(x, ω)) = g(x) + u(x),

for a.e. x ∈ D, ω ∈ Ω y(x, ω) = 0, for a.e. x ∈ ∂D, ω ∈ Ω (*) with a(·, ω) a random field s.t. 0 < amin ≤ a(x, ω) ≤ amax, ∀(x, ω) ∈ D × Ω. = ⇒ random solution ω → y(·, ω) ∈ H1

0(D). In particular y ∈ L2 P(Ω; H1 0(D)).

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 4

slide-7
SLIDE 7

Problem setting – quadratic optimal control problem

Problem setting

(Ω, F, P): complete probability space D ⊂ Rd: physical domain. Throughout · = · L2(D). Forward problem

  • −div(a(x, ω)∇y(x, ω)) = g(x) + u(x),

for a.e. x ∈ D, ω ∈ Ω y(x, ω) = 0, for a.e. x ∈ ∂D, ω ∈ Ω (*) with a(·, ω) a random field s.t. 0 < amin ≤ a(x, ω) ≤ amax, ∀(x, ω) ∈ D × Ω. = ⇒ random solution ω → y(·, ω) ∈ H1

0(D). In particular y ∈ L2 P(Ω; H1 0(D)).

u ∈ L2(D): control function

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 4

slide-8
SLIDE 8

Problem setting – quadratic optimal control problem

Problem setting

(Ω, F, P): complete probability space D ⊂ Rd: physical domain. Throughout · = · L2(D). Forward problem

  • −div(a(x, ω)∇y(x, ω)) = g(x) + u(x),

for a.e. x ∈ D, ω ∈ Ω y(x, ω) = 0, for a.e. x ∈ ∂D, ω ∈ Ω (*) with a(·, ω) a random field s.t. 0 < amin ≤ a(x, ω) ≤ amax, ∀(x, ω) ∈ D × Ω. = ⇒ random solution ω → y(·, ω) ∈ H1

0(D). In particular y ∈ L2 P(Ω; H1 0(D)).

u ∈ L2(D): control function Optimal control problem min

u∈L2(D) y∈L2 P (Ω;H1 0 (D))

˜ J(u, y) := 1 2Eω[y(·, ω) − ytarget2] + β 2 u2, subject to (*)

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 4

slide-9
SLIDE 9

Problem setting – quadratic optimal control problem

Reduced functional

(Stochastic) Affine solution operator: yω : L2(D) → H1

0(D)

∀ω ∈ Ω u → yω(u), solution of

  • −div(a(·, ω)∇yω(u)) = g + u,

in D yω(u) = 0,

  • n ∂D.
  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 5

slide-10
SLIDE 10

Problem setting – quadratic optimal control problem

Reduced functional

(Stochastic) Affine solution operator: yω : L2(D) → H1

0(D)

∀ω ∈ Ω u → yω(u), solution of

  • −div(a(·, ω)∇yω(u)) = g + u,

in D yω(u) = 0,

  • n ∂D.

Reduced functional: minu∈L2(D) J(u) J(u) = Eω[f (u, ω)], f (u, ω) = 1 2yω(u) − ytarget2 + β 2 u2

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 5

slide-11
SLIDE 11

Problem setting – quadratic optimal control problem

Reduced functional

(Stochastic) Affine solution operator: yω : L2(D) → H1

0(D)

∀ω ∈ Ω u → yω(u), solution of

  • −div(a(·, ω)∇yω(u)) = g + u,

in D yω(u) = 0,

  • n ∂D.

Reduced functional: minu∈L2(D) J(u) J(u) = Eω[f (u, ω)], f (u, ω) = 1 2yω(u) − ytarget2 + β 2 u2 Adjoint based gradient computation: ∇uf (u, ω) = βu + pω(u), ∇uJ(u) = βu + Eω[pω(u)] where pω(u) solves the adjoint problem ∀ω ∈ Ω −div(a(·, ω)∇pω(u)) = yω(u) − ytarget in D, pω(u) = 0 on ∂D.

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 5

slide-12
SLIDE 12

Problem setting – quadratic optimal control problem

Reduced functional

(Stochastic) Affine solution operator: yω : L2(D) → H1

0(D)

∀ω ∈ Ω u → yω(u), solution of

  • −div(a(·, ω)∇yω(u)) = g + u,

in D yω(u) = 0,

  • n ∂D.

Reduced functional: minu∈L2(D) J(u) J(u) = Eω[f (u, ω)], f (u, ω) = 1 2yω(u) − ytarget2 + β 2 u2 Adjoint based gradient computation: ∇uf (u, ω) = βu + pω(u), ∇uJ(u) = βu + Eω[pω(u)] where pω(u) solves the adjoint problem ∀ω ∈ Ω −div(a(·, ω)∇pω(u)) = yω(u) − ytarget in D, pω(u) = 0 on ∂D. Lipschitz and strong convexity properties of ∇uf : ∀u1, u1 ∈ L2(D), ω ∈ Ω ∇uf (u1, ω) − ∇uf (u2, ω) ≤ Lu1 − u2, L = β + C 4

P

a2

min

∇uf (u1, ω) − ∇uf (u2, ω), u1 − u2L2(D) ≥ ℓ 2u1 − u22, ℓ = 2β

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 5

slide-13
SLIDE 13

Discretization by finite elements + Monte Carlo

Outline

1

Problem setting – quadratic optimal control problem

2

Discretization by finite elements + Monte Carlo

3

Deterministic (CG) iterative solvers versus Stochastic Gradient

4

Multilevel stochastic gradient algorithms

5

Conclusions

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 6

slide-14
SLIDE 14

Discretization by finite elements + Monte Carlo

Finite dimensional approximation

Finite Element approximation of the PDE: ∀u ∈ L2(D), ω ∈ Ω u → y h

ω(u)

solves

  • D

a(·, ω)∇y h

ω(u) · ∇vh =

  • D

(g + u)vh, ∀vh ∈ Y r

h

Y r

h : space of continuous Pr finite element functions vanishing on the

boundary

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 7

slide-15
SLIDE 15

Discretization by finite elements + Monte Carlo

Finite dimensional approximation

Finite Element approximation of the PDE: ∀u ∈ L2(D), ω ∈ Ω u → y h

ω(u)

solves

  • D

a(·, ω)∇y h

ω(u) · ∇vh =

  • D

(g + u)vh, ∀vh ∈ Y r

h

Y r

h : space of continuous Pr finite element functions vanishing on the

boundary Monte Carlo approximation of expectation: ωi

iid

∼ P, i = 1, . . . N J(u) = Eω[f (u, ω)] ≈ 1 N

N

  • i=1

f (u, ωi)

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 7

slide-16
SLIDE 16

Discretization by finite elements + Monte Carlo

Finite dimensional approximation

Finite Element approximation of the PDE: ∀u ∈ L2(D), ω ∈ Ω u → y h

ω(u)

solves

  • D

a(·, ω)∇y h

ω(u) · ∇vh =

  • D

(g + u)vh, ∀vh ∈ Y r

h

Y r

h : space of continuous Pr finite element functions vanishing on the

boundary Monte Carlo approximation of expectation: ωi

iid

∼ P, i = 1, . . . N J(u) = Eω[f (u, ω)] ≈ 1 N

N

  • i=1

f (u, ωi) Discrete optimal control problem: min

u∈L2(D) Jh,N(u) := 1

N

N

  • i=1

f h(u, ωi) = 1 N

N

  • i=1

1 2y h

ωi(u) − ytarget2 + β

2 u2

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 7

slide-17
SLIDE 17

Discretization by finite elements + Monte Carlo

Finite dimensional approximation

Finite Element approximation of the PDE: ∀u ∈ L2(D), ω ∈ Ω u → y h

ω(u)

solves

  • D

a(·, ω)∇y h

ω(u) · ∇vh =

  • D

(g + u)vh, ∀vh ∈ Y r

h

Y r

h : space of continuous Pr finite element functions vanishing on the

boundary Monte Carlo approximation of expectation: ωi

iid

∼ P, i = 1, . . . N J(u) = Eω[f (u, ω)] ≈ 1 N

N

  • i=1

f (u, ωi) Discrete optimal control problem: min

u∈Y r

h

Jh,N(u) := 1 N

N

  • i=1

f h(u, ωi) = 1 N

N

  • i=1

1 2y h

ωi(u) − ytarget2 + β

2 u2

  • Remark: The unique minimizer uh,N

∈ Y r

h

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 7

slide-18
SLIDE 18

Discretization by finite elements + Monte Carlo

Optimality conditions

primal pbs:

  • D

a(·, ωi)∇y h

ωi · ∇vh =

  • D

(g + uh,N)vh, ∀vh ∈ Y r

h , i = 1, . . . , N,

adjoint pbs:

  • D

a(·, ωi)∇vh · ∇ph

ωi =

  • D

(y h

ωi − ytar)vh,

∀vh ∈ Y r

h , i = 1, . . . , N,

sensitivity:

  • D

(βuh,N + 1 N

N

  • i=1

ph

ωi)vh = 0

∀vh ∈ Y r

h .

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 8

slide-19
SLIDE 19

Discretization by finite elements + Monte Carlo

Optimality conditions

primal pbs:

  • D

a(·, ωi)∇y h

ωi · ∇vh =

  • D

(g + uh,N)vh, ∀vh ∈ Y r

h , i = 1, . . . , N,

adjoint pbs:

  • D

a(·, ωi)∇vh · ∇ph

ωi =

  • D

(y h

ωi − ytar)vh,

∀vh ∈ Y r

h , i = 1, . . . , N,

sensitivity:

  • D

(βuh,N + 1 N

N

  • i=1

ph

ωi)vh = 0

∀vh ∈ Y r

h .

Algebraic system              A1 −M ... . . . AN −M − M AT

1

... ... −M AT

N

M · · · M βNM                           y1 . . . yN p1 . . . pN u              =              g . . . g − ytar . . . −ytar             

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 8

slide-20
SLIDE 20

Deterministic (CG) iterative solvers versus Stochastic Gradient

Outline

1

Problem setting – quadratic optimal control problem

2

Discretization by finite elements + Monte Carlo

3

Deterministic (CG) iterative solvers versus Stochastic Gradient

4

Multilevel stochastic gradient algorithms

5

Conclusions

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 9

slide-21
SLIDE 21

Deterministic (CG) iterative solvers versus Stochastic Gradient

Reduced algebraic system

Several appraoches can be used to solve this coupled system

[Kouri-Heinkenschloss-eal 2013], [VanBarel-Vandewalle 2018], [Borz` ı-vonWinckel 2011]

✶ ✶ ✶

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 10

slide-22
SLIDE 22

Deterministic (CG) iterative solvers versus Stochastic Gradient

Reduced algebraic system

Several appraoches can be used to solve this coupled system

[Kouri-Heinkenschloss-eal 2013], [VanBarel-Vandewalle 2018], [Borz` ı-vonWinckel 2011]

Eliminating (y1, . . . , yN) and (p1, . . . , pN) and introducing the block matrices A =    A1 ... AN    , M =    M ... M    , ✶ =    Id . . . Id    leads to a reduced system Gu = ξ with matrix G = βM + 1 N ✶TMA−TMA−1M✶

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 10

slide-23
SLIDE 23

Deterministic (CG) iterative solvers versus Stochastic Gradient

Reduced algebraic system

Several appraoches can be used to solve this coupled system

[Kouri-Heinkenschloss-eal 2013], [VanBarel-Vandewalle 2018], [Borz` ı-vonWinckel 2011]

Eliminating (y1, . . . , yN) and (p1, . . . , pN) and introducing the block matrices A =    A1 ... AN    , M =    M ... M    , ✶ =    Id . . . Id    leads to a reduced system Gu = ξ with matrix G = βM + 1 N ✶TMA−TMA−1M✶ The matrix G is spd and Cond(G) = O(β−1) indep. of h and N.

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 10

slide-24
SLIDE 24

Deterministic (CG) iterative solvers versus Stochastic Gradient

Reduced algebraic system

Several appraoches can be used to solve this coupled system

[Kouri-Heinkenschloss-eal 2013], [VanBarel-Vandewalle 2018], [Borz` ı-vonWinckel 2011]

Eliminating (y1, . . . , yN) and (p1, . . . , pN) and introducing the block matrices A =    A1 ... AN    , M =    M ... M    , ✶ =    Id . . . Id    leads to a reduced system Gu = ξ with matrix G = βM + 1 N ✶TMA−TMA−1M✶ The matrix G is spd and Cond(G) = O(β−1) indep. of h and N. Reduced system can be solved efficiently by e.g. conjugate gradient. Denoting uh,N

j

the j-th iterate uh,N

− uh,N

j

≤ Cρj, ρ =

  • Cond(G) − 1
  • Cond(G) + 1
  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 10

slide-25
SLIDE 25

Deterministic (CG) iterative solvers versus Stochastic Gradient

Deterministic approach

Use standard (deterministic) iterative method (e.g. CG) to solve the fully discrete system Error splitting assuming smooth solutions yω(u⋆), pω(u⋆) ∈ Hr+1(D) E[u⋆ − uh,N

j

2] ≤ C1ρ2j

CG error

+ C2 N

  • MC error

+ C3h2r+2

FE error

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 11

slide-26
SLIDE 26

Deterministic (CG) iterative solvers versus Stochastic Gradient

Deterministic approach

Use standard (deterministic) iterative method (e.g. CG) to solve the fully discrete system Error splitting assuming smooth solutions yω(u⋆), pω(u⋆) ∈ Hr+1(D) E[u⋆ − uh,N

j

2] ≤ C1ρ2j

CG error

+ C2 N

  • MC error

+ C3h2r+2

FE error

Cost to compute uh,N

j

: assume that the cost of solving 1 PDE is O(h−γd) (with γ ∈ (1, 3]) = ⇒ Work to compute uh,N

j

: Work ∼ jNh−γd

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 11

slide-27
SLIDE 27

Deterministic (CG) iterative solvers versus Stochastic Gradient

Deterministic approach

Use standard (deterministic) iterative method (e.g. CG) to solve the fully discrete system Error splitting assuming smooth solutions yω(u⋆), pω(u⋆) ∈ Hr+1(D) E[u⋆ − uh,N

j

2] ≤ C1ρ2j

CG error

+ C2 N

  • MC error

+ C3h2r+2

FE error

Cost to compute uh,N

j

: assume that the cost of solving 1 PDE is O(h−γd) (with γ ∈ (1, 3]) = ⇒ Work to compute uh,N

j

: Work ∼ jNh−γd Complexity analysis. Balancing errors contributions: ρj ∼ N− 1

2 ∼ hr+1 ∼ tol

Work(tol) tol−2

MC

tol− γd

r+1

FE

log tol−1

  • solver
  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 11

slide-28
SLIDE 28

Deterministic (CG) iterative solvers versus Stochastic Gradient

Stochastic gradient (Robbins-Monro)

Instead of introducing upfront the Monte Carlo approximation and then solve the discrete problem by a deterministic iterative solver, we could apply a stochastic gradient method to the continuous problem (non-discrete in probability) uh

j+1 = uh j − τj∇uf h(uh j , ωj)

= (1 − τjβ)uh

j − τjph ωj(uu j )

ωj

iid

∼ P Learning rate τj =

τ0 j+α

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 12

slide-29
SLIDE 29

Deterministic (CG) iterative solvers versus Stochastic Gradient

Stochastic gradient (Robbins-Monro)

Instead of introducing upfront the Monte Carlo approximation and then solve the discrete problem by a deterministic iterative solver, we could apply a stochastic gradient method to the continuous problem (non-discrete in probability) uh

j+1 = uh j − τj∇uf h(uh j , ωj)

= (1 − τjβ)uh

j − τjph ωj(uu j )

ωj

iid

∼ P Learning rate τj =

τ0 j+α

Proposition [Martin-Nobile-Krumscheid 2018] Assuming yω(u⋆), pω(u⋆) ∈ Hr+1(D), for any α ∈ R+ and τ0 >

1 2β there exist

D1, D2 > 0 independent of j and h s.t. SG convergence: E[uh

⋆ − uh j 2] ≤ D1j−1

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 12

slide-30
SLIDE 30

Deterministic (CG) iterative solvers versus Stochastic Gradient

Stochastic gradient (Robbins-Monro)

Instead of introducing upfront the Monte Carlo approximation and then solve the discrete problem by a deterministic iterative solver, we could apply a stochastic gradient method to the continuous problem (non-discrete in probability) uh

j+1 = uh j − τj∇uf h(uh j , ωj)

= (1 − τjβ)uh

j − τjph ωj(uu j )

ωj

iid

∼ P Learning rate τj =

τ0 j+α

Proposition [Martin-Nobile-Krumscheid 2018] Assuming yω(u⋆), pω(u⋆) ∈ Hr+1(D), for any α ∈ R+ and τ0 >

1 2β there exist

D1, D2 > 0 independent of j and h s.t. SG convergence: E[uh

⋆ − uh j 2] ≤ D1j−1

Error splitting: E[u⋆ − uh

j 2] ≤ 2D1j−1 + D2h2r+2

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 12

slide-31
SLIDE 31

Deterministic (CG) iterative solvers versus Stochastic Gradient

Stochastic gradient (Robbins-Monro)

Instead of introducing upfront the Monte Carlo approximation and then solve the discrete problem by a deterministic iterative solver, we could apply a stochastic gradient method to the continuous problem (non-discrete in probability) uh

j+1 = uh j − τj∇uf h(uh j , ωj)

= (1 − τjβ)uh

j − τjph ωj(uu j )

ωj

iid

∼ P Learning rate τj =

τ0 j+α

Proposition [Martin-Nobile-Krumscheid 2018] Assuming yω(u⋆), pω(u⋆) ∈ Hr+1(D), for any α ∈ R+ and τ0 >

1 2β there exist

D1, D2 > 0 independent of j and h s.t. SG convergence: E[uh

⋆ − uh j 2] ≤ D1j−1

Error splitting: E[u⋆ − uh

j 2] ≤ 2D1j−1 + D2h2r+2

Complexity: Work(tol) tol−2tol− γd

r+1

(no log terms !)

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 12

slide-32
SLIDE 32

Deterministic (CG) iterative solvers versus Stochastic Gradient

Numerical results

Optimal control problem: min

u∈L2(D)

1 2Eω

  • yω(u) − ytarget2

+ β 2 u2 subject to

  • −div(a(·, ω)∇yω(u)) = g + u

in D yω(u) = 0

  • n ∂D

Problem parameters D = (0, 1)2, g = 1, ytarget(x1, x2) = sin(πx) sin(πy), β = 10−4

a(x1, x2, ξ) = 1 + exp{θ(ξ1 cos(1.1πx1) + ξ2 cos(1.2πx1) + ξ3 sin(1.3πx2) + ξ4 sin(1.4πx2))}

IsoValue 0.873489 0.891562 0.90361 0.915659 0.927708 0.939756 0.951805 0.963854 0.975903 0.987951 1 1.01205 1.0241 1.03615 1.04819 1.06024 1.07229 1.08434 1.09639 1.12651 delta IsoValue 0.854672 0.875433 0.889274 0.903114 0.916955 0.930796 0.944637 0.958478 0.972318 0.986159 1 1.01384 1.02768 1.04152 1.05536 1.0692 1.08304 1.09689 1.11073 1.14533 delta IsoValue 0.739162 0.776425 0.801266 0.826108 0.85095 0.875791 0.900633 0.925475 0.950317 0.975158 1 1.02484 1.04968 1.07453 1.09937 1.12421 1.14905 1.17389 1.19873 1.26084 delta

three realization of the random coefficient

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 13

slide-33
SLIDE 33

Deterministic (CG) iterative solvers versus Stochastic Gradient

Numerical results – SG convergence

100 101 102 103 10−2 10−1 100 iteration counter j E[uh

j − uh ⋆]

E[uh

j − uh ⋆]: SG with fixed mesh

fit: error = 100.16547j−0.48555 E[uh

j − uh ⋆] + std(uh j − uh ⋆)

Mean L2 error as a function of iteration counter, estimated by sample average over 100 independent realizations. Fixed mesh size h = 2−4 – P1 finite elements.

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 14

slide-34
SLIDE 34

Deterministic (CG) iterative solvers versus Stochastic Gradient

Numerical results – complexity of CG versus SG

100 101 102 103 104 105 106 107 108 109 1010 10−4 10−3 10−2 10−1 100 Work model (γ=1) E[u − u⋆] SG E[u − u⋆] SG E[u − u⋆] + std(u − u⋆) CG E[u − u⋆] CG E[u − u⋆] + std(u − u⋆) reference complexity: error ≈ W-1/3

Complexity plot for CG and SG (average over 20 repetitions)

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 15

slide-35
SLIDE 35

Multilevel stochastic gradient algorithms

Outline

1

Problem setting – quadratic optimal control problem

2

Discretization by finite elements + Monte Carlo

3

Deterministic (CG) iterative solvers versus Stochastic Gradient

4

Multilevel stochastic gradient algorithms

5

Conclusions

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 16

slide-36
SLIDE 36

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient

Let Y r

h0 ⊂ Y r h1 ⊂ . . . ⊂ Y r hL be a sequence of finer and finer FE spaces.

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 17

slide-37
SLIDE 37

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient

Let Y r

h0 ⊂ Y r h1 ⊂ . . . ⊂ Y r hL be a sequence of finer and finer FE spaces.

Idea: In the Stochastic Gradient algorithm, replace the single evaluation ∇uf h(uj, ωj) by a multilevel approx. of the expectation [Heinrich 1998], [Giles 2008] EMLMC

L, N

[∇uf (uj, ·)] =

L

  • ℓ=0

1 Nℓ

Nℓ

  • i=1
  • ∇uf hℓ(uj, ω(i,ℓ)

j

) − ∇uf hℓ−1(uj, ω(i,ℓ)

j

)

  • (with the convention ∇uf h−1 = 0) with ω(i,ℓ)

j iid

∼ P (drawn independently between levels and at each iteration)

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 17

slide-38
SLIDE 38

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient

Let Y r

h0 ⊂ Y r h1 ⊂ . . . ⊂ Y r hL be a sequence of finer and finer FE spaces.

Idea: In the Stochastic Gradient algorithm, replace the single evaluation ∇uf h(uj, ωj) by a multilevel approx. of the expectation [Heinrich 1998], [Giles 2008] EMLMC

L, N

[∇uf (uj, ·)] =

L

  • ℓ=0

1 Nℓ

Nℓ

  • i=1
  • ∇uf hℓ(uj, ω(i,ℓ)

j

) − ∇uf hℓ−1(uj, ω(i,ℓ)

j

)

  • (with the convention ∇uf h−1 = 0) with ω(i,ℓ)

j iid

∼ P (drawn independently between levels and at each iteration) L controls the bias of the estimator (FE error on level hL) E

  • EMLMC

L, N

[∇uf (uj, ·)] − E[∇uf (uj, ·)]

  • = E[∇uf hL(uj, ·) − ∇uf (uj, ·)]
  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 17

slide-39
SLIDE 39

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient

Let Y r

h0 ⊂ Y r h1 ⊂ . . . ⊂ Y r hL be a sequence of finer and finer FE spaces.

Idea: In the Stochastic Gradient algorithm, replace the single evaluation ∇uf h(uj, ωj) by a multilevel approx. of the expectation [Heinrich 1998], [Giles 2008] EMLMC

L, N

[∇uf (uj, ·)] =

L

  • ℓ=0

1 Nℓ

Nℓ

  • i=1
  • ∇uf hℓ(uj, ω(i,ℓ)

j

) − ∇uf hℓ−1(uj, ω(i,ℓ)

j

)

  • (with the convention ∇uf h−1 = 0) with ω(i,ℓ)

j iid

∼ P (drawn independently between levels and at each iteration) L controls the bias of the estimator (FE error on level hL) E

  • EMLMC

L, N

[∇uf (uj, ·)] − E[∇uf (uj, ·)]

  • = E[∇uf hL(uj, ·) − ∇uf (uj, ·)]
  • N = (N0, . . . , NL) controls the variance of the estimator (MC error)

Var

  • EMLMC

L, N

[∇uf (uj, ·)]

  • =

L

  • ℓ=0

Var

  • ∇uf hℓ(uj, ·) − ∇uf hℓ−1(uj, ·)
  • Nℓ
  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 17

slide-40
SLIDE 40

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – first version

uj+1 = uj − τjEMLMC

Lj, Nj [∇uf (uj, ·)]

= (1 − τjβ)uj − τj

Lj

  • ℓ=0

1 Nℓ,j

Nℓ,j

  • i=1
  • phℓ(uj, ω(i,ℓ)

j

) − phℓ−1(uj, ω(i,ℓ)

j

)

  • Learning rate τj =

τ0 j+α. Notice that ∀j, uj+1 ∈ Y r hLj

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 18

slide-41
SLIDE 41

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – first version

uj+1 = uj − τjEMLMC

Lj, Nj [∇uf (uj, ·)]

= (1 − τjβ)uj − τj

Lj

  • ℓ=0

1 Nℓ,j

Nℓ,j

  • i=1
  • phℓ(uj, ω(i,ℓ)

j

) − phℓ−1(uj, ω(i,ℓ)

j

)

  • Learning rate τj =

τ0 j+α. Notice that ∀j, uj+1 ∈ Y r hLj

We allow L and N to depend on the iteration counter j. How to choose them

  • ptimally ?
  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 18

slide-42
SLIDE 42

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – first version

uj+1 = uj − τjEMLMC

Lj, Nj [∇uf (uj, ·)]

= (1 − τjβ)uj − τj

Lj

  • ℓ=0

1 Nℓ,j

Nℓ,j

  • i=1
  • phℓ(uj, ω(i,ℓ)

j

) − phℓ−1(uj, ω(i,ℓ)

j

)

  • Learning rate τj =

τ0 j+α. Notice that ∀j, uj+1 ∈ Y r hLj

We allow L and N to depend on the iteration counter j. How to choose them

  • ptimally ?

To recover the optimal control u⋆ in the limit, we need Lj → ∞ as j → ∞. On the other hand, N does not need to go to ∞.

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 18

slide-43
SLIDE 43

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – first version

uj+1 = uj − τjEMLMC

Lj, Nj [∇uf (uj, ·)]

= (1 − τjβ)uj − τj

Lj

  • ℓ=0

1 Nℓ,j

Nℓ,j

  • i=1
  • phℓ(uj, ω(i,ℓ)

j

) − phℓ−1(uj, ω(i,ℓ)

j

)

  • Learning rate τj =

τ0 j+α. Notice that ∀j, uj+1 ∈ Y r hLj

We allow L and N to depend on the iteration counter j. How to choose them

  • ptimally ?

To recover the optimal control u⋆ in the limit, we need Lj → ∞ as j → ∞. On the other hand, N does not need to go to ∞. Similar approach proposed in [Dereich-MuellerGronbach 2015] for abstract

  • ptimization problem. Different working assumptions but similar results.
  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 18

slide-44
SLIDE 44

Multilevel stochastic gradient algorithms

Convergence analysis

Bias term εj =

  • E[∇uf hLj (u⋆, ·) − ∇uf (u⋆, ·)]
  • Variance term σ2

j = E

  • EMLMC

Lj, Nj [∇uf (u⋆, ·)] − E[∇uf hLj (u⋆, ·)]

  • 2

Proposition [Martin-Nobile-Tsilifis 2019] There exist λ, µ > 0 independent of h and j such that E[uj+1 − u⋆2] ≤ cjE[uj − u⋆2] + λτ 2

j σ2 j + µτjε2 j

with cj ∼ 1 − τ0β

j+α

(⇒ j

k=0 ck ∼ j−τ0β)

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 19

slide-45
SLIDE 45

Multilevel stochastic gradient algorithms

Convergence analysis

Bias term εj =

  • E[∇uf hLj (u⋆, ·) − ∇uf (u⋆, ·)]
  • Variance term σ2

j = E

  • EMLMC

Lj, Nj [∇uf (u⋆, ·)] − E[∇uf hLj (u⋆, ·)]

  • 2

Proposition [Martin-Nobile-Tsilifis 2019] There exist λ, µ > 0 independent of h and j such that E[uj+1 − u⋆2] ≤ cjE[uj − u⋆2] + λτ 2

j σ2 j + µτjε2 j

with cj ∼ 1 − τ0β

j+α

(⇒ j

k=0 ck ∼ j−τ0β)

Optimal error balance: τ 2

j σ2 j ∼ τjε2 j

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 19

slide-46
SLIDE 46

Multilevel stochastic gradient algorithms

Convergence analysis

Bias term εj =

  • E[∇uf hLj (u⋆, ·) − ∇uf (u⋆, ·)]
  • Variance term σ2

j = E

  • EMLMC

Lj, Nj [∇uf (u⋆, ·)] − E[∇uf hLj (u⋆, ·)]

  • 2

Proposition [Martin-Nobile-Tsilifis 2019] There exist λ, µ > 0 independent of h and j such that E[uj+1 − u⋆2] ≤ cjE[uj − u⋆2] + λτ 2

j σ2 j + µτjε2 j

with cj ∼ 1 − τ0β

j+α

(⇒ j

k=0 ck ∼ j−τ0β)

Optimal error balance: τ 2

j σ2 j ∼ τjε2 j

= ⇒ ε2

j ∼ 1 j σ2 j

Square bias should decrease faster than variance !

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 19

slide-47
SLIDE 47

Multilevel stochastic gradient algorithms

Compexity analysis: optimal bias/variance decrease

Take τ0 > 1

β and η > 2

ε2

j ∼ (j + α)1−η

= ⇒ Lj ∼ η − 1 2r + 2 log(j + α) σ2

j ∼ (j + α)2−η

= ⇒ Nℓ,j ∼ 2−ℓ 2r+2+γd

2

(j + α)η−2 Proposition [Martin-Nobile-Tsilifis 2019] The work to achieve a mean squared error E[uj − u⋆2] = O(tol2) is bounded by Work(tol)          tol−2, 2r + 2 > γd, η ≥ 2 +

γd 2r+2−γd , τ0 > η−1 β

tol−2(1+

1 τ0β )| log tol|3+ 1 τ0β ,

2r + 2 = γd, η = τ0β + 1 tol−2( γd

2r+2 + 1 τ0β )| log tol| γd 2r+2 + 1 τ0β ,

2r + 2 < γd, η = τ0β + 1

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 20

slide-48
SLIDE 48

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – second version

The multilevel Stochastic Gradient algorithm (MLSG) is in between a Stochastic Gradient and a Full gradient: at each iteration, we compute an approximation EMLMC

L, N

[∇uf (uj, ·)] of increasing accuracy (εj, σj → 0 as j → ∞).

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 21

slide-49
SLIDE 49

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – second version

The multilevel Stochastic Gradient algorithm (MLSG) is in between a Stochastic Gradient and a Full gradient: at each iteration, we compute an approximation EMLMC

L, N

[∇uf (uj, ·)] of increasing accuracy (εj, σj → 0 as j → ∞). Alternative Idea (inspired from “unbiased MLMC estimator” of [Rhee-Glynn 2015]): at each iteration, sample randomly one level ℓj and one realization ωj and compute the difference ∇uf hℓj (uj, ωj) − ∇uf hℓj −1(uj, ωj)

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 21

slide-50
SLIDE 50

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – second version

The multilevel Stochastic Gradient algorithm (MLSG) is in between a Stochastic Gradient and a Full gradient: at each iteration, we compute an approximation EMLMC

L, N

[∇uf (uj, ·)] of increasing accuracy (εj, σj → 0 as j → ∞). Alternative Idea (inspired from “unbiased MLMC estimator” of [Rhee-Glynn 2015]): at each iteration, sample randomly one level ℓj and one realization ωj and compute the difference ∇uf hℓj (uj, ωj) − ∇uf hℓj −1(uj, ωj) Randomized Multilevel Stochastic Gradient (RMLSG) (only for the case 2r + 2 > γd) uj+1 = uj − τj 1 pℓj

  • ∇uf hℓj (uj, ωj) − ∇uf hℓj −1(uj, ωj)
  • ,

ℓj

iid

∼ {pℓ}Lj

ℓ=0, ωj iid

∼ P with τj =

τ0 j+α and {pℓ ∝ 2−ℓ 2r+2+γd

2

} probability mass function on {0, . . . , Lj}

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 21

slide-51
SLIDE 51

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – second version

The multilevel Stochastic Gradient algorithm (MLSG) is in between a Stochastic Gradient and a Full gradient: at each iteration, we compute an approximation EMLMC

L, N

[∇uf (uj, ·)] of increasing accuracy (εj, σj → 0 as j → ∞). Alternative Idea (inspired from “unbiased MLMC estimator” of [Rhee-Glynn 2015]): at each iteration, sample randomly one level ℓj and one realization ωj and compute the difference ∇uf hℓj (uj, ωj) − ∇uf hℓj −1(uj, ωj) Randomized Multilevel Stochastic Gradient (RMLSG) (only for the case 2r + 2 > γd) uj+1 = uj − τj 1 pℓj

  • ∇uf hℓj (uj, ωj) − ∇uf hℓj −1(uj, ωj)
  • ,

ℓj

iid

∼ {pℓ}Lj

ℓ=0, ωj iid

∼ P with τj =

τ0 j+α and {pℓ ∝ 2−ℓ 2r+2+γd

2

} probability mass function on {0, . . . , Lj} Bias: εj = E[∇uf hLj (uj, ·) − ∇uf (uj, ·)] (same as for MLSG)

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 21

slide-52
SLIDE 52

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – second version

The multilevel Stochastic Gradient algorithm (MLSG) is in between a Stochastic Gradient and a Full gradient: at each iteration, we compute an approximation EMLMC

L, N

[∇uf (uj, ·)] of increasing accuracy (εj, σj → 0 as j → ∞). Alternative Idea (inspired from “unbiased MLMC estimator” of [Rhee-Glynn 2015]): at each iteration, sample randomly one level ℓj and one realization ωj and compute the difference ∇uf hℓj (uj, ωj) − ∇uf hℓj −1(uj, ωj) Randomized Multilevel Stochastic Gradient (RMLSG) (only for the case 2r + 2 > γd) uj+1 = uj − τj 1 pℓj

  • ∇uf hℓj (uj, ωj) − ∇uf hℓj −1(uj, ωj)
  • ,

ℓj

iid

∼ {pℓ}Lj

ℓ=0, ωj iid

∼ P with τj =

τ0 j+α and {pℓ ∝ 2−ℓ 2r+2+γd

2

} probability mass function on {0, . . . , Lj} Bias: εj = E[∇uf hLj (uj, ·) − ∇uf (uj, ·)] (same as for MLSG) If one takes Lj = ∞, ∀j, then the estimator is unbiadsed. Unfortunately, we were not able to prove convergence with this setting.

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 21

slide-53
SLIDE 53

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – second version

The multilevel Stochastic Gradient algorithm (MLSG) is in between a Stochastic Gradient and a Full gradient: at each iteration, we compute an approximation EMLMC

L, N

[∇uf (uj, ·)] of increasing accuracy (εj, σj → 0 as j → ∞). Alternative Idea (inspired from “unbiased MLMC estimator” of [Rhee-Glynn 2015]): at each iteration, sample randomly one level ℓj and one realization ωj and compute the difference ∇uf hℓj (uj, ωj) − ∇uf hℓj −1(uj, ωj) Randomized Multilevel Stochastic Gradient (RMLSG) (only for the case 2r + 2 > γd) uj+1 = uj − τj 1 pℓj

  • ∇uf hℓj (uj, ωj) − ∇uf hℓj −1(uj, ωj)
  • ,

ℓj

iid

∼ {pℓ}Lj

ℓ=0, ωj iid

∼ P with τj =

τ0 j+α and {pℓ ∝ 2−ℓ 2r+2+γd

2

} probability mass function on {0, . . . , Lj} Bias: εj = E[∇uf hLj (uj, ·) − ∇uf (uj, ·)] (same as for MLSG) If one takes Lj = ∞, ∀j, then the estimator is unbiadsed. Unfortunately, we were not able to prove convergence with this setting. Alternatively, decrease the bias as ε2

j ∼ (j + α)1−η

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 21

slide-54
SLIDE 54

Multilevel stochastic gradient algorithms

Compexity analysis of RMLSG: optimal bias decrease

Proposition [Martin-Nobile-Tsilifis 2019] Assume 2r + 2 > γd and take η = 2, τ0 > 1

β . Then, the expected work to achieve

a mean squared error E[uj − u⋆2] = O(tol2) is E[Work(tol)] tol−2 and

  • Var [Work(tol)]

E [Work(tol)] tol

3(2r+2)−γd 2(2r+2)

tol→0

− − − − → 0

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 22

slide-55
SLIDE 55

Multilevel stochastic gradient algorithms

Numerical results – MLSG complexity

100 101 102 10−1 100 101

log iteration log(E|error|) E(error) MLSG max levels MLSG fit: y=-1.0684x+0.8204 E(error)+std(error) MLSG

Mean error vs iteration counter

10−1 100 102 103 104 105 106 107 108

log(E|error|) log W MLSG fit: y=-2.2276x+0.8204

Cost W versus mean error

(average over 10 repetitions)

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 23

slide-56
SLIDE 56

Multilevel stochastic gradient algorithms

Numerical results – RMLSG complexity

100 101 102 103 104 10−1 100 101

log iteration log(E|error|) E(error) RMLSG alg levels RMLSG error RMLSG one realization fit: y=-0.48247x+0.87942 E(error)+std(error) RMLSG alg

Mean error vs iteration counter

10−1 100 101 102 103 104 105 106

E|error| log W E[W ] using RMLSG averaged on 20 realizations LS-fit: y=-2.1571x+3.1728

Cost W versus mean error

(average over 20 repetitions)

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 24

slide-57
SLIDE 57

Conclusions

Outline

1

Problem setting – quadratic optimal control problem

2

Discretization by finite elements + Monte Carlo

3

Deterministic (CG) iterative solvers versus Stochastic Gradient

4

Multilevel stochastic gradient algorithms

5

Conclusions

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 25

slide-58
SLIDE 58

Conclusions

Conclusions and future work

We have analyzed and compared several algorithms to solve a PDE constrained quadratic optimal control problem. Our outcome is that, in the context of a Monte Carlo approximation, the Stochastic gradient algorithm has a slighly better complexity than a deterministic solver such as CG. The multilevel versions, on the other hand, has a substantially better complexity.

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 26

slide-59
SLIDE 59

Conclusions

Conclusions and future work

We have analyzed and compared several algorithms to solve a PDE constrained quadratic optimal control problem. Our outcome is that, in the context of a Monte Carlo approximation, the Stochastic gradient algorithm has a slighly better complexity than a deterministic solver such as CG. The multilevel versions, on the other hand, has a substantially better complexity. Our analysis of SG and MLSG uses only the (uniform) Lipschitz and strong convexity properties of the random cost functional. As such, our results may generalize to ortehr problems as e.g.

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 26

slide-60
SLIDE 60

Conclusions

Conclusions and future work

We have analyzed and compared several algorithms to solve a PDE constrained quadratic optimal control problem. Our outcome is that, in the context of a Monte Carlo approximation, the Stochastic gradient algorithm has a slighly better complexity than a deterministic solver such as CG. The multilevel versions, on the other hand, has a substantially better complexity. Our analysis of SG and MLSG uses only the (uniform) Lipschitz and strong convexity properties of the random cost functional. As such, our results may generalize to ortehr problems as e.g.

  • general linear second order elliptic equations with random coefficients
  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 26

slide-61
SLIDE 61

Conclusions

Conclusions and future work

We have analyzed and compared several algorithms to solve a PDE constrained quadratic optimal control problem. Our outcome is that, in the context of a Monte Carlo approximation, the Stochastic gradient algorithm has a slighly better complexity than a deterministic solver such as CG. The multilevel versions, on the other hand, has a substantially better complexity. Our analysis of SG and MLSG uses only the (uniform) Lipschitz and strong convexity properties of the random cost functional. As such, our results may generalize to ortehr problems as e.g.

  • general linear second order elliptic equations with random coefficients
  • boundary control
  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 26

slide-62
SLIDE 62

Conclusions

Conclusions and future work

We have analyzed and compared several algorithms to solve a PDE constrained quadratic optimal control problem. Our outcome is that, in the context of a Monte Carlo approximation, the Stochastic gradient algorithm has a slighly better complexity than a deterministic solver such as CG. The multilevel versions, on the other hand, has a substantially better complexity. Our analysis of SG and MLSG uses only the (uniform) Lipschitz and strong convexity properties of the random cost functional. As such, our results may generalize to ortehr problems as e.g.

  • general linear second order elliptic equations with random coefficients
  • boundary control
  • non-linear (convex and Lipschitz) cost functionals
  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 26

slide-63
SLIDE 63

Conclusions

Conclusions and future work

We have analyzed and compared several algorithms to solve a PDE constrained quadratic optimal control problem. Our outcome is that, in the context of a Monte Carlo approximation, the Stochastic gradient algorithm has a slighly better complexity than a deterministic solver such as CG. The multilevel versions, on the other hand, has a substantially better complexity. Our analysis of SG and MLSG uses only the (uniform) Lipschitz and strong convexity properties of the random cost functional. As such, our results may generalize to ortehr problems as e.g.

  • general linear second order elliptic equations with random coefficients
  • boundary control
  • non-linear (convex and Lipschitz) cost functionals

Extension of the results to more involved risk measures (other than expectation) is in progress.

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 26

slide-64
SLIDE 64

Conclusions

Conclusions and future work

We have analyzed and compared several algorithms to solve a PDE constrained quadratic optimal control problem. Our outcome is that, in the context of a Monte Carlo approximation, the Stochastic gradient algorithm has a slighly better complexity than a deterministic solver such as CG. The multilevel versions, on the other hand, has a substantially better complexity. Our analysis of SG and MLSG uses only the (uniform) Lipschitz and strong convexity properties of the random cost functional. As such, our results may generalize to ortehr problems as e.g.

  • general linear second order elliptic equations with random coefficients
  • boundary control
  • non-linear (convex and Lipschitz) cost functionals

Extension of the results to more involved risk measures (other than expectation) is in progress. In the case on only few random variables, a deterministic quadrature (e.g. full tensor or sparse quadrature) may be more accurate than Monte Carlo. In this case, Stochastic Gradient algorithms can still be effective if properly modified (see SAG / SAGA versions). Analysis available in [Martin-Nobile 2018].

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 26

slide-65
SLIDE 65

Conclusions

Thank you for your attention!

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 27

slide-66
SLIDE 66

Conclusions

Martin, M.C. Stochastic approximation methods for PDE constrained optimal control problems with uncertain parameters. PhD thesis 7233, March 2019, EPFL. doi.org/10.5075/epfl-thesis-7233 Martin, M. and Krumscheid, S. and Nobile F. Analysis of stochastic gradient methods for PDE-constrained optimal control problems with uncertain parameters MATHICSE Technical Report 04.2018. doi.org/10.5075/epfl-MATHICSE-263568 Martin, M. and Nobile F. and Tsilifis, P. Multilevel stochastic gradient methods for PDE-constrained optimal control problems with uncertain parameters in preparation Martin, M and Nobile, F. PDE-constrained optimal control problems with uncertain parameters using SAGA. arXiv:1810.13378.

  • F. Nobile (EPFL)

MLSG for PDE-constrained optimization RICAM 2019, Linz 28