stochastic constrained optimization in hilbert spaces
play

Stochastic constrained optimization in Hilbert spaces with - PowerPoint PPT Presentation

Stochastic constrained optimization in Hilbert spaces with applications Georg Ch. Pflug/C. Geiersbach March 27, 2019 Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications Iteration methods


  1. Stochastic constrained optimization in Hilbert spaces with applications Georg Ch. Pflug/C. Geiersbach March 27, 2019 Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  2. Iteration methods Problems: Finding roots of equations: Finding optima of functions: Given f ( · ) find a root x ∗ , Given f ( · ) find a candidate such that f ( x ∗ ) = 0. for an optimum, i.e. x ∗ such that ∇ f ( x ∗ ) = 0. Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  3. Newton (1669). Iterative solution method for the equation f ( x ) = x 3 − 2 x − 5 = 0 Raphson (1690) General version x n +1 = x n − f ( x n ) x n +1 = x n − [ ∇ 2 f ( x n )] − 1 ∇ f ( x n ) f ′ ( x n ) vMises, Pollaczek-Geiringer (1929) x n +1 = x n − t · f ( x n ) x n +1 = x n − t · ∇ f ( x n ) converges, if t ≤ [sup x f ′ ( x )] − 1 converges, if t < 1 /λ max with λ max the max. eigenvalue of ∇ 2 f ( x ) decreasing stepsize: x n +1 = x n − t n f ( x n ) x n +1 = x n − t n ∇ f ( x n ) with t n ≥ 0 , t n → 0 , � n t n = ∞ . Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  4. xxxxxxxxxxxxxxxxxxx Rosen (1960). Gradient projection for optimization under linear equality constraints min { f ( x ) : Ax = b } x n +1 = x n − t n ( I − A ⊤ ( AA ⊤ ) − 1 A ) ∇ f ( x n ) Goldstein (1964). Gradient projection for optimization under convex constraints min { f ( x ) : x ∈ C (convex) } x n +1 = π C ( x n − t n ∇ f ( x n )) π C is the convex projection Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  5. Stochastic iterations f resp. ∇ f is only observable together with noise, i.e. f ω ( · ) resp. ∇ f ω ( · ). E [ f ω ( x )] = f ( x ) + bias resp. E [ ∇ f ω ( x )] = ∇ f ( x ) + bias Robbins-Monro (1951) Ermoliev (1967-1976) stochastic (quasi-)gradients X n +1 = X n − t n f ω n ( X n ) X n +1 = X n − t n ∇ f ω n ( X n ) Gupal (1974), Kushner (1974) stochastic (quasi-)gradient projection X n +1 = π C ( X n − t n ∇ f ω n ( X n )) Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  6. The projected stochastic quasigradient method While more sophisticated methods like (Armijo line-search, level set methods, mirror decent methods or operator splitting) have developed and became popular for deterministic optimization, the good old gradient search is still nearly the only method for stochastic optimization. Stochastic optimization is applied in two different cases (1) for problems of huge dimensions, where subproblems of smaller dimension are generated by random selection; (2) for intrinsically stochastic problems, where externally risk factors have to be considered. Problems of type (1) include e.g. digital image classification and restoration, speech recognition, deep machine learning using neural networks and deterministic shape optimization. In this talk, we discuss a problem of type (2): Shape optimization in an intrinsically random environment. Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  7. Projected Stochastic Gradient (PSG) Algorithm in Hilbert spaces Let H be a Hilbert space with inner product �· , ·� and norm �·� . Let the projection onto C be denoted by π C : H → C . Problem: min u ∈ C { j ( u ) = E [ J ω ( u )] } . The PSG Algorithm: ◮ Initialization: u 0 ∈ H ◮ for n = 0 , 1 , . . . Generate independent ω n , choose t n > 0 u n +1 := π C ( u n − t n g n ( ω n )) with stochastic gradient g n Possible choices for the stochastic gradient: ◮ Single realization: g n = ∇ J ω n ( u n ) 1 � m n ◮ Batch method: g n = i =1 ∇ J ω n , i ( u n ) m n Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  8. Illustration Left: Projection to the tangent space Right: Projection to the constraint set Left: Line Search? Right: A stationary point Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  9. Assumptions for Convergence 1. ∅ � = C ⊂ H is closed and convex. 2. J ω is convex and continuously Fr´ echet differentiable for a.e. ω ∈ Ω on a neighborhood of C ⊂ H . 3. j bounded below by ¯ j ∈ R and finitely valued over C . 4. Robbins-Monro step sizes t n ≥ 0 , � ∞ n =0 t n = ∞ , � ∞ n =0 t 2 n < ∞ . 5. ∇ J ω n ( u n ) = ∇ j ( u n ) + w n +1 + r n +1 and increasing {F n } , (i) w n and r n are F n -measurable; (ii) E [ w n |F n ] = 0; (iii) � ∞ n =0 t n esssup � r n � < ∞ ; (iv) ∃ M 1 , M 2 : E [ �∇ J ω n ( u ) � 2 |F n ] ≤ M 1 + M 2 � u n � 2 . Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  10. Convergence Results Theorem ((Geiersbach and G.P.) Weak Convergence in Probability for General Convex Objective.) Under Assumptions 1-5 it holds for PSG algorithm and S := { w ∈ C : j ( w ) = j (˜ u ) } , where ˜ u is a minimizer of j: 1. {� u n − ˜ u �} converges a.s. for all ˜ u ∈ S, 2. { j ( u n ) } converges a.s. and lim n →∞ j ( u n ) = j (˜ u ) , 3. { u n } weakly converges a.s. and lim n u n ∈ S. This is stronger than ”any weak cluster point of ( u n ) lies in S ! Corollary (A.s. Strong Convergence for Strongly Convex Objective) Given Assumptions 1-5, assume as well that j is strongly convex. Then { u n } converges strongly a.s. to a unique optimum ¯ u. Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  11. Efficiency Estimates in the Strongly Convex Case If j is strongly convex with growth µ and t n = θ/ ( n + ν ) for θ > 1 / (2 µ ); ν ≥ K 1 then there are computable constants K 1 , K 2 such that the expected error in the control at step n is � K 2 E [ � u n − ¯ u � ] ≤ n + ν and the expected error in the objective n is given by LK 2 E [ j ( u n ) − j (¯ u )] ≤ 2( n + ν ) . L is the Lipschitz constant for j . This generalizes a result by Nemirovski et al. (2009). Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  12. Efficiency Estimates in the General Convex Case I Polyak and Juditsky (1992), Ruppert (1992): Convergence improvement by taking larger stepsizes and averaging. Define γ k := t k / ( � k ℓ =1 t ℓ ) and N � u N ˜ 1 = γ k u k . k =1 Let D S be a bound s.t. sup u ∈ S � u 0 − u � ≤ D S . We can show that u )] ≤ D S + R � N k =1 t 2 k E [ j ( u k ) − j (¯ 2 � N k =1 t k with a computable constant R . With the constant stepsize policy t n = D S K − 1 / 2 N − 1 / 2 for a fixed number of iterations n = 1 , . . . , N we get the efficiency estimate √ u )] ≤ D R u N E [ j (˜ 1 ) − j (¯ √ . N Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  13. Efficiency Estimates in the General Convex Case II √ With the choice of a variable stepsize t n = θ D S / nR we can show that u )] = O (log n u n E [ j (˜ 1 ) − j (¯ √ n ) And if one starts averaging after N 1 steps with N 1 = [ rn ] one can also get u )] = O ( 1 u n E [ j (˜ N 1 ) − j (¯ √ n ) . These bounds are extensions of Nemirovski et al. (2009). Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  14. A PDE-constained problem: Optimal Control of Stationary Heat Source � � 1 � � + λ 2 � y − y D � 2 2 � u � 2 min E [ J ω ( u )] = E L 2 ( D ) L 2 ( D ) u ∈ C − ∇ · ( a ( x , ω ) ∇ y ( x , ω ) = u ( x ) , ( x , ω ) ∈ D × Ω , s.t. y ( x , ω ) = 0 , ( x , ω ) ∈ ∂ D × Ω . C = { u ∈ L 2 ( D ) : u a ( x ) ≤ u ( x ) ≤ u b ( x ) a.e. x ∈ D } . ◮ Random (positive) conductivity a ( x , ω ) ∈ ( a min , a max ) ◮ Random temperature y = y ( x , ω ) controlled by deterministic source density u = u ( x ). ◮ Deterministic target distribution y D = y D ( x ) ∈ L 2 ( D ). Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

  15. The Problem Satisfies Convergence Assumptions ◮ ∅ � = C ⊂ H is closed and convex. ◮ J ω is convex and continuously Fr´ echet differentiable for a.e. ω ∈ Ω on a neighborhood of C ⊂ H . ◮ j bounded below by ¯ j ∈ R and finite valued over C . ◮ Robbins-Monro step sizes t n ≥ 0 , � ∞ n =0 t n = ∞ , � ∞ n =0 t 2 n < ∞ . ◮ For a fixed realization ω , there exists a unique solution y ( · , ω ) ∈ H 1 0 ( D ) to the PDE constraint and � y ( · , ω ) � L 2 ( D ) ≤ C 1 � u � L 2 ( D ) . ◮ ∇ J ω ( u ) = λ u − p ( · , ω ) , where p ( · , ω ) solves the adjoint PDE ˆ ˆ ∀ v ∈ H 1 a ( x , ω ) ∇ v · ∇ p d x = ( y D − y ( · , ω )) v d x 0 ( D ) D D with bounds � p ( · , ω ) � L 2 ( D ) ≤ C 2 � y D − y ( · , ω ) � L 2 ( D ) . ◮ �∇ J ω ( u ) � L 2 ( D ) ≤ λ � u � L 2 ( D ) + C 2 ( � y D � L 2 ( D ) + C 1 � u � L 2 ( D ) ). Georg Ch. Pflug/C. Geiersbach Stochastic constrained optimization in Hilbert spaces with applications

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend