Stochastic constrained optimization in Hilbert spaces with - - PowerPoint PPT Presentation

stochastic constrained optimization in hilbert spaces
SMART_READER_LITE
LIVE PREVIEW

Stochastic constrained optimization in Hilbert spaces with - - PowerPoint PPT Presentation

Stochastic constrained optimization in Hilbert spaces with applications Georg Ch. Pflug/C. Geiersbach March 27, 2019 Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications Iteration methods


slide-1
SLIDE 1

Stochastic constrained optimization in Hilbert spaces with applications

Georg Ch. Pflug/C. Geiersbach March 27, 2019

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-2
SLIDE 2

Iteration methods

Problems: Finding roots of equations: Finding optima of functions: Given f (·) find a root x∗, Given f (·) find a candidate such that f (x∗) = 0. for an optimum, i.e. x∗ such that ∇f (x∗) = 0.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-3
SLIDE 3

Newton (1669). Iterative solution method for the equation f (x) = x3 − 2x − 5 = 0 Raphson (1690) General version xn+1 = xn − f (xn)

f ′(xn)

xn+1 = xn − [∇2f (xn)]−1∇f (xn) vMises, Pollaczek-Geiringer (1929) xn+1 = xn − t · f (xn) xn+1 = xn − t · ∇f (xn) converges, if t ≤ [supx f ′(x)]−1 converges, if t < 1/λmax with λmax the max. eigenvalue of ∇2f (x) decreasing stepsize: xn+1 = xn − tnf (xn) xn+1 = xn − tn∇f (xn) with tn ≥ 0, tn → 0,

  • n tn = ∞.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-4
SLIDE 4

xxxxxxxxxxxxxxxxxxx Rosen (1960). Gradient projection for

  • ptimization under linear equality constraints

min{f (x) : Ax = b} xn+1 = xn − tn(I − A⊤(AA⊤)−1A)∇f (xn) Goldstein (1964). Gradient projection for

  • ptimization under convex constraints

min{f (x) : x ∈ C(convex)} xn+1 = πC(xn − tn∇f (xn)) πC is the convex projection

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-5
SLIDE 5

Stochastic iterations

f resp. ∇f is only observable together with noise, i.e. fω(·) resp. ∇fω(·). E[fω(x)] = f (x) + bias resp. E[∇fω(x)] = ∇f (x) + bias Robbins-Monro (1951) Ermoliev (1967-1976) stochastic (quasi-)gradients Xn+1 = Xn − tn fωn(Xn) Xn+1 = Xn − tn ∇fωn(Xn) Gupal (1974), Kushner (1974) stochastic (quasi-)gradient projection Xn+1 = πC(Xn − tn ∇fωn(Xn))

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-6
SLIDE 6

The projected stochastic quasigradient method

While more sophisticated methods like (Armijo line-search, level set methods, mirror decent methods or operator splitting) have developed and became popular for deterministic optimization, the good old gradient search is still nearly the only method for stochastic optimization. Stochastic optimization is applied in two different cases (1) for problems of huge dimensions, where subproblems of smaller dimension are generated by random selection; (2) for intrinsically stochastic problems, where externally risk factors have to be considered. Problems of type (1) include e.g. digital image classification and restoration, speech recognition, deep machine learning using neural networks and deterministic shape optimization. In this talk, we discuss a problem of type (2): Shape optimization in an intrinsically random environment.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-7
SLIDE 7

Projected Stochastic Gradient (PSG) Algorithm in Hilbert spaces

Let H be a Hilbert space with inner product ·, · and norm ·. Let the projection onto C be denoted by πC : H → C. Problem: min

u∈C{j(u) = E[Jω(u)]}.

The PSG Algorithm:

◮ Initialization: u0 ∈ H ◮ for n = 0, 1, . . . Generate independent ωn, choose tn > 0

un+1 := πC(un − tn gn(ωn)) with stochastic gradient gn Possible choices for the stochastic gradient:

◮ Single realization: gn = ∇Jωn(un) ◮ Batch method: gn = 1 mn

mn

i=1 ∇Jωn,i(un)

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-8
SLIDE 8

Illustration

Left: Projection to the tangent space Right: Projection to the constraint set Left: Line Search? Right: A stationary point

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-9
SLIDE 9

Assumptions for Convergence

  • 1. ∅ = C ⊂ H is closed and convex.
  • 2. Jω is convex and continuously Fr´

echet differentiable for a.e. ω ∈ Ω on a neighborhood of C ⊂ H.

  • 3. j bounded below by ¯

j ∈ R and finitely valued over C.

  • 4. Robbins-Monro step sizes

tn ≥ 0, ∞

n=0 tn = ∞, ∞ n=0 t2 n < ∞.

  • 5. ∇Jωn(un) = ∇j(un) + wn+1 + rn+1 and increasing {Fn},

(i) wn and rn are Fn-measurable; (ii) E[wn|Fn] = 0; (iii) ∞

n=0 tn esssup rn < ∞;

(iv) ∃M1, M2 : E[∇Jωn(u)2|Fn] ≤ M1 + M2un2.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-10
SLIDE 10

Convergence Results

Theorem ((Geiersbach and G.P.) Weak Convergence in Probability for General Convex Objective.)

Under Assumptions 1-5 it holds for PSG algorithm and S := {w ∈ C : j(w) = j(˜ u)}, where ˜ u is a minimizer of j:

  • 1. {un − ˜

u} converges a.s. for all ˜ u ∈ S,

  • 2. {j(un)} converges a.s. and limn→∞ j(un) = j(˜

u),

  • 3. {un} weakly converges a.s. and limn un ∈ S. This is stronger

than ”any weak cluster point of (un) lies in S !

Corollary (A.s. Strong Convergence for Strongly Convex Objective)

Given Assumptions 1-5, assume as well that j is strongly convex. Then {un} converges strongly a.s. to a unique optimum ¯ u.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-11
SLIDE 11

Efficiency Estimates in the Strongly Convex Case

If j is strongly convex with growth µ and tn = θ/(n + ν) for θ > 1/(2µ); ν ≥ K1 then there are computable constants K1,K2 such that the expected error in the control at step n is E[un − ¯ u] ≤

  • K2

n + ν and the expected error in the objective n is given by E[j(un) − j(¯ u)] ≤ LK2 2(n + ν). L is the Lipschitz constant for j. This generalizes a result by Nemirovski et al. (2009).

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-12
SLIDE 12

Efficiency Estimates in the General Convex Case I

Polyak and Juditsky (1992), Ruppert (1992): Convergence improvement by taking larger stepsizes and averaging. Define γk := tk/(k

ℓ=1 tℓ) and

˜ uN

1 = N

  • k=1

γkuk. Let DS be a bound s.t. supu∈S u0 − u ≤ DS. We can show that E[j(uk) − j(¯ u)] ≤ DS + R N

k=1 t2 k

2 N

k=1 tk

with a computable constant R. With the constant stepsize policy tn = DSK −1/2N−1/2 for a fixed number of iterations n = 1, . . . , N we get the efficiency estimate E[j(˜ uN

1 ) − j(¯

u)] ≤ D √ R √ N .

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-13
SLIDE 13

Efficiency Estimates in the General Convex Case II

With the choice of a variable stepsize tn = θDS/ √ nR we can show that E[j(˜ un

1) − j(¯

u)] = O(log n √n ) And if one starts averaging after N1 steps with N1 = [rn] one can also get E[j(˜ un

N1) − j(¯

u)] = O( 1 √n). These bounds are extensions of Nemirovski et al. (2009).

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-14
SLIDE 14

A PDE-constained problem: Optimal Control of Stationary Heat Source

min

u∈C

  • E[Jω(u)] = E

1 2y − yD2

L2(D)

  • + λ

2 u2

L2(D)

  • s.t.

− ∇ · (a(x, ω)∇y(x, ω) = u(x), (x, ω) ∈ D × Ω, y(x, ω) = 0, (x, ω) ∈ ∂D × Ω. C = {u ∈ L2(D) : ua(x) ≤ u(x) ≤ ub(x) a.e. x ∈ D}.

◮ Random (positive) conductivity a(x, ω) ∈ (amin, amax) ◮ Random temperature y = y(x, ω) controlled by deterministic

source density u = u(x).

◮ Deterministic target distribution yD = yD(x) ∈ L2(D).

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-15
SLIDE 15

The Problem Satisfies Convergence Assumptions

◮ ∅ = C ⊂ H is closed and convex. ◮ Jω is convex and continuously Fr´

echet differentiable for a.e. ω ∈ Ω on a neighborhood of C ⊂ H.

◮ j bounded below by ¯

j ∈ R and finite valued over C.

◮ Robbins-Monro step sizes

tn ≥ 0, ∞

n=0 tn = ∞, ∞ n=0 t2 n < ∞. ◮ For a fixed realization ω, there exists a unique solution

y(·, ω) ∈ H1

0(D) to the PDE constraint and

y(·, ω)L2(D) ≤ C1uL2(D).

◮ ∇Jω(u) = λu − p(·, ω), where p(·, ω) solves the adjoint PDE

ˆ

D

a(x, ω)∇v · ∇p dx = ˆ

D

(yD − y(·, ω))v dx ∀v ∈ H1

0(D)

with bounds p(·, ω)L2(D) ≤ C2yD − y(·, ω)L2(D).

◮ ∇Jω(u)L2(D) ≤ λuL2(D) + C2(yDL2(D) + C1uL2(D)).

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-16
SLIDE 16

Setup for Simulation: Strongly Convex Case

◮ Deterministic optimum: ¯

u = − 1

λ sin(2πx1) sin(2πx2) ◮ yD = (¯

a8π2 −

1 ¯ a8λπ2 ) sin(2πx1) sin(2πx2) ◮ D = [0, 1] × [0, 1] ◮ C = {u ∈ L2(D) | − 1 ≤ u(x) ≤ 1

∀x ∈ D}

◮ a(x, ω) = a(ω), a constant value from truncated normal

distribution on interval [0.5, 3.5] with mean ¯ a = 2.0 and standard deviation σ = 0.25

◮ tn = 1 3n

Computations with FEniCS / Python on a uniform triangular mesh (piecewise linear elements), hmin ≈ 0.013.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-17
SLIDE 17

Simulation: Strongly Convex Case

Simulation values for N = 100 iterations: λ = 2, ¯ a = 2, σ = 0.25

Figure: u0 = 3

2 sin(πx1) sin(πx2) (left) and u100 (right).

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-18
SLIDE 18

Simulation: Strongly Convex Case

Reference solution computed after N = 10, 000 steps on a finer mesh (15, 681 nodes, hmin ≈ 6.6 · 10−3). Convergence behavior

  • bserved for one trajectory.

Figure: Jωn(un) − ¯ j = O(n−0.99) (left) and un − ¯ uL2(D) = O(n−0.76) (right).

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-19
SLIDE 19

Setup for Simulation: (non-strongly) Convex Case

Modified problem with λ = 0: min

u∈U E[J(u, ω)] = E

1 2y − yD2

L2(D)

  • s.t.

− ∇ · (a(ω)∇y(x, ω)) = u(x) + eD(x), (x, ω) ∈ D × Ω y(x, ω) = 0, (x, ω) ∈ ∂D × Ω.

◮ Deterministic optimum: ¯

u = −sign(− sin(2πx1) sin(2πx2))

◮ yD(x) = sin(πx1) sin(πx2) + 2 sin(2πx1) sin(2πx2) ◮ eD(x) = 4π2 sin(πx1) sin(πx2)+sign(− 1 8π2 sin(2πx1) sin(2πx2)) ◮ tn = θ M· √ 100 with θ = 10, M = 0.835 ◮ Steps using averaging:

˜ uN

1 = N

  • k=1

γkuk with γk = τk/(k

l=1 τl).

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-20
SLIDE 20

Simulation: (non-strongly) Convex Case

Simulation values for N = 100 iterations: u0 = 0, λ = 0, ¯ a = 2, σ = 0.25

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-21
SLIDE 21

Simulation: (non-strongly) Convex Case: Convergence

Reference solution computed after N = 10, 000 steps on a finer mesh (15, 681 nodes, hmin ≈ 6.6 · 10−3). Convergence behavior

  • bserved for one trajectory.

Figure: Jωn(un) − ¯ j = O(n−0.50) (left) and un − ¯ uL2(D) = O(n−0.26). (right).

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-22
SLIDE 22

Nonconvex case: Assumptions

  • 1. ∅ = C ⊂ H is closed and convex.
  • 2. Jω is continuously Fr´

echet differentiable for a.e. ω ∈ Ω on a neighborhood of C ⊂ H.

  • 3. j bounded below by ¯

j ∈ R and finite valued over C.

  • 4. For increasing {Fn}, (i) un is Fn-measurable; (ii)

∃M > 0 : E[vn2|Fn] ≤ M2 for all n ∈ N. → No bias term.

  • 5. The sequence {an} of symmetric, positive definite bilinear

forms an : H × H → R satisfy for κ > 0, K > 0 and n ∈ N0 κu2 ≤ u2

an ≤ Ku2

∀u ∈ H.

  • 6. J′

ω is L(ω)-Lipschitz continuous for a.e. ω ∈ Ω and

L := supω∈Ω L(ω) < ∞. → Required in absence of convexity.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-23
SLIDE 23

Projection Subproblem with Variable Metric

Projection subproblem: min

v∈C

1 2v − (un − ∇Jωn(un))2 ⇔ min

v∈C

1 2v − (un − ∇Jωn(un))2

an

Lemma

Given Assumptions 1, 2, 5, then the projection subproblem has a unique solution ¯ vn = π(an)

C

(un − ∇Jωn(un)) ∈ C for any n. Furthermore, it holds for vn := ¯ vn − un that ∇Jωn(un), vn ≤ −κvn2. (1)

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-24
SLIDE 24

Convergence Result

Theorem

Given Assumptions 1-6, then for Algorithm for nonconvex

  • ptimization with stepsizes tn = ∞, t2

n < ∞ the following

holds:

◮ The sequence {E[j(un)]} converges and the sequence {vn}

fulfills lim inf

n→∞ Evn2 = 0. ◮ If C is bounded, then with probability 1, any accumulation

point u of {un} is a stationary point of j.

◮ If in addition the second derivative of j is bounded, then

lim

n→∞ Evn2 = 0.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-25
SLIDE 25

Illustration

Left: The step vn Right: The criterion vn = 0 In the nonconvex case, several nonconnected stationary points may be present

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-26
SLIDE 26

Application to to Shape optimization

Problem from structural topology optimization: minimization of expected compliance of a shape in domain D under uncertain forcing on boundary.

Figure: A shape O ⊂ D ⊂ R2 with a Neumann boundary (derivatives are given) ΓN and Dirichlet boundary (function values are given) ΓD. Boundary forcing is random function: g(x, ω) → R2 with x ∈ ΓN.

The shape is represented by a (smooth) function −1 ≤ u ≤ 1 on the domain D, where 1 represents the shape and −1 represents the

  • void. The shape as a set is recovered by thresholding

(Defuzzigication).

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-27
SLIDE 27

Phase Field Relaxation for Shapes

The function u(x), x ∈ D is called the phase field.

Figure: Shape O (left) and phase field representation u (right)

The shape’s elasticity is encoded in its tensor ❆. The tensor is extended to the whole optimization domain D by the function ❆ǫ(u) := 1 − ε2 4 u2 + 1 − ε2 2 u + 1 + 3ε2 4

  • ❆.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-28
SLIDE 28

Mathematical Formulation

min

u∈C E

ˆ

ΓN

g(x, ω) · y(x, ω) ds

  • + γ

ˆ

D

ε 2|∇u|2 + 1 2ε(1 − u2) dx s.t. ˆ

D

❆(u)e(y(x, ω)) : e(v) dx = ˆ

ΓN

g(x, ω) · v dx ∀(v, ω) ∈ V × Ω, C = {u ∈ H1(D) : −1 ≤ u(x) ≤ 1 ∀x ∈ D, ˆ

D

1{u≥0} dx = m} where V := {v ∈ H1(D)d : Tv = 0 on ΓD}, e(y) = 1

2(∇y + ∇y⊤),

A : B = 2

i,j AijBij. and m is the total area of the shape.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-29
SLIDE 29

Generation of Stochastic Descent Direction

Projection type subproblem: min

v∈C

1 2v − (un − ∇Jωn(un))2

an

Formally this is equivalent to solving the problem min

v∈C

1 2v − un2

an + J′ ωn(un)[v − un].

It holds that J′

ω(u)v = −

ˆ

D

❆′(u)e(y(x, ω)) : e(y(x, ω))v dx + γ ˆ

D

ε∇u · ∇v − 1 εuv dx and y(·, ω) solves the state equation ˆ

D

❆(u)e(y) : e(v) dx = ˆ

ΓN

g(x, ω) · v dx ∀v ∈ V . (2)

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-30
SLIDE 30

The Algorithm

◮ Initialization: u0 ∈ C, ρ ∈ (0, 1), c ∈ (0, min{1, κ}) ◮ For n = 0, 1, . . .

  • 1. Generate ωn ∈ Ω
  • 2. Compute yn as solution to state equation (2)
  • 3. Compute ¯

vn as solution to projection subproblem (??)

  • 4. vn := ¯

vn − un

  • 5. Determine tn := ρmn with minimal mn ∈ N0 such that

Jωn(un + tnvn) ≤ Jωn(un) − ctnvn2

  • 6. un+1 := un + tnvn

This algorithm is used only for the initial steps. When oscillation behavior is detected, we switch to a Robbins-Monro type algorithm.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-31
SLIDE 31

Setup for Simulations

◮ u0 = −0.15 = m ◮ D = [0, 1] × [0, 2] ◮ ❆ijkl = 2µδikδjl + λδijδkl, with λ = µ = 1000 ◮ Ginzburg-Landau constants ε = 0.03, γ = 0.02 ◮ an(y, v) = 1 sn

´

D ∇y · ∇v dx with sn ∈ [5, 500]:

◮ Deterministic case: if line search successful in one step, set

sn := 4

3sn−1, otherwise sn := 3 4sn−1.

◮ Random case: if line search successful in one step for last ten

iterations, set sn := 4

3sn−1, otherwise sn := 3 4sn−1.

◮ Projection subproblem: primal/dual active set method

Computations with FEniCS / Python on triangular mesh (hmin = 0.0149).

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-32
SLIDE 32

Numerical Simulations - Deterministic Experiments

Termination condition: vn < 0.001

Figure: uN for g = (−25, −50)⊤ after 370 iterations. Figure: uN for g = (−25, 50)⊤ after 270 iterations.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-33
SLIDE 33

Numerical Simulations - Random Experiment 1

Figure: uN for N = 1000, random g at uniform angles between (−25, 50)⊤, (−25, −50)⊤.

A limit shape of a single run of the Algoritthm.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-34
SLIDE 34

Numerical Simulations - Random Experiment 2

Figure: uN for N = 1000, random g at uniform angles between (−25, 50)⊤, (−25, −50)⊤.

Another limit shape of another run of the Algorithm using the

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

slide-35
SLIDE 35

Summary and Remarks

Results:

◮ Convergence results for stochastic gradient algorithm in

Hilbert spaces. Work in progress for different stepsize rules in the nonconvex case.

◮ Discussion of efficiency estimates and step size rules in the

convex case.

◮ Demonstration of the algorithms on problems in PDE

constrained optimization.

Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications