a globally linearly convergent method for large scale
play

A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE - PowerPoint PPT Presentation

A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE QUADRATICALLY SUPPORTABLE CONVEX-CONCAVE SADDLE POINT PROBLEMS Russell Luke (Timo Aspelmeier, Charitha, Ron Shefi) Universit at G ottingen LCCC Workshop, Large-Scale and


  1. A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE QUADRATICALLY SUPPORTABLE CONVEX-CONCAVE SADDLE POINT PROBLEMS Russell Luke (Timo Aspelmeier, Charitha, Ron Shefi) Universit¨ at G¨ ottingen LCCC Workshop, Large-Scale and Distributed Optimization, June 14-16, 2017, Lunds University

  2. Outline Prelude Analysis Applications References

  3. STimulated Emission Depletion

  4. STimulated Emission Depletion ≈ 3 nm per pixel

  5. Statistical Image Denoising/Deconvolution minimize f ( x ) x ∈ R n subject to g ǫ ( Ax ) ≤ 0 where f is convex, piecewise linear-quadratic, A : R n → R n , and g ǫ : R n → m = 2 R n := v �→ ( g 1 ( v ) − ǫ 1 , g 2 ( v ) − ǫ 2 , . . . , g m ( v ) − ǫ m ) T is convex and smooth

  6. Statistical Image Denoising/Deconvolution minimize f ( x ) x ∈ R n subject to g ǫ ( Ax ) ≤ 0 where f is convex, piecewise linear-quadratic, A : R n → R n , and g ǫ : R n → m = 2 R n := v �→ ( g 1 ( v ) − ǫ 1 , g 2 ( v ) − ǫ 2 , . . . , g m ( v ) − ǫ m ) T is convex and smooth What is the scientific content of processed images?

  7. Goals Solve 0 ∈ F ( x ) for F : E ⇒ E with E a Euclidean space. ◮ #1. Convergence (with a posteriori error bounds) of Picard iterations: x k + 1 ∈ Tx k where Fix T ≈ zer F ◮ #2. Algorithms: ◮ (Non)convex Optimization: ADMM/Douglas-Rachford ◮ Saddle-point Problems: Proximal Alternating Predictor-Corrector (PAPC) ◮ #3. Applications: ◮ Image denoising/deconvolution ◮ Phase retrieval

  8. Building blocks ◮ Resolvent: ( Id + λ F ) − 1 ◮ Prox operator: for a function f : X → R , define � f ( y ) + 1 � 2 � y − x � 2 prox M , f ( x ) := argmin y M ◮ Proximal reflector: R M , f := 2 prox M , f − Id ◮ Projector: if f = ι Ω for Ω ⊂ X closed and nonempty, then prox M , f ( x ) = P Ω x where P Ω x := { x ∈ Ω | � x − x � = dist ( x , Ω) } dist ( x , Ω) := y ∈ Ω � x − y � M . inf ◮ Reflector: if f = ι Ω for some closed, nonempty set Ω ⊂ X , then R Ω := 2 P Ω − Id

  9. Optimization � I � � g i ( A T i x ) =: f ( x ) + g ( A T x ) : x ∈ R n p ∗ = min f ( x ) + . ( P ) x i Reformulations: Augmented Lagrangian v ∈ R m f ( x ) + � x , A b � − � b , v � + g ( v ) + 1 2 �A T x − v � 2 x ∈ R n min min ( L ) M Saddle-point � � � � A T x , y − g ∗ ( y ) ( M ) x ∈ R n max min K ( x , y ) := f ( x ) + . y ∈ R m

  10. Algorithms ADMM Initialization. Choose η > 0 and ( x 0 , v 0 , b 0 ) . General Step ( k = 0 , 1 , . . . ) � 2 � Ax − v k � 2 � x k + 1 f ( x ) + � b k , Ax � + η ∈ argmin x ; (1a) � 2 � Ax k + 1 − v � 2 � v k + 1 g ( v ) − � b k , v � + η ∈ argmin v ; (1b) b k + η ( Ax k + 1 − v k + 1 ) . b k + 1 = (1c) In the convex setting, the points in ADMM can be computed from the corresponding points in Douglas-Rachford y k + 1 ∈ Ty k ( k ∈ N ) for T := 1 2 ( R η B R η D + Id ) = J η B ( 2 J η D − Id ) + ( Id −J η D ) , f ∗ ◦ ( −A T ) � � D := ∂ g ∗ where B := ∂ and

  11. Algorithms Proximal Alternating Predictor-Corrector (PAPC) [Drori, Sabach&Teboulle, 2015] Initialization: Let ( x 0 , y 0 ) ∈ R n × R m , and choose the parameters τ and σ to satisfy � � 0 , 1 1 τ ∈ , 0 < τσ ≤ �A T A� . L f Main Iteration: for k = 1 , 2 , . . . update x k , y k as follows: p k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k − 1 ); for i = 1 , . . . , I , � i p k � y k y k − 1 + σ A T i = prox σ, g ∗ ; i i x k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k ) .

  12. Outline Prelude Analysis Applications References

  13. Key abstract properties Almost firm nonexpansiveness T : E ⇒ E is pointwise almost firmly nonsexpansive at y when � 2 ≤ ε 2 � x − y � 2 + � x + − y + , x − y � � x + − y + � � for all x + ∈ Tx , and all y + ∈ Ty whenever x ∈ U . Metric subregularity (Ioffe, Aze, Dontchev&Rockafellar) Φ : E ⇒ Y is metrically regular on U × V ⊂ E × Y relative to Λ ⊂ E if ∃ a κ > 0 such that x , Φ − 1 ( y ) ∩ Λ � � dist ≤ κ dist ( y , Φ( x )) (2) holds for all x ∈ U ∩ Λ and y ∈ V . When the set V consists of a single point, V = { y } , then Φ is said to be metrically subregular for y on U relative to Λ ⊂ E .

  14. Abstract results Linear convergence [L. Nguyen& Tam, 2017] Let g = ι Ω for Ω ⊂ R n semi-algebraic and let f : R n → R be linear-quadratic convex. Let ( x k ) k ∈ N be iterates of the Douglas–Rachford algorithm and let Λ = aff ( x k ) . If T DR − Id is metrically subregular at all points x ∈ Fix T DR ∩ Λ � = ∅ relative to Λ then for all x 0 close enough to Fix T DR ∩ Λ , the sequence x k converges linearly to a point in Fix T ∩ Λ with constant at most 1 + ε − 1 /κ 2 < 1 where κ is the constant of metric � c = subregularity for T DR − Id on some neighborhood U containing the sequence and ε is the violation of almost firm nonexpansiveness on the neighborhood U .

  15. Polyhedrality = ⇒ metric subregularity If T is polyhedral and Fix T ∩ Λ consists of isolated points, then Id − T is metrically subregular at x relative to Λ .

  16. Application: ADMM/Douglas-Rachford Linear convergence of polyhedral DR/ADMM [Aspelmeier, Charitha, L., 2016] Let f : U → R ∪ { + ∞} and g : V → R be proper, lsc, convex, piecewise linear-quadratic functions and T the corresponding Douglas-Rachford fixed point mapping. Suppose that, for some affine subspace W , Fix T ∩ W is an isolated point { y } . Then the Douglas-Rachford sequence ( y k ) k ∈ N converges linearly to y with rate √ 1 − κ − 2 , where κ > 0 is a constant of metric bounded above by subregularity of Id − T at y for the neighborhood O . Moreover, the b k , v k � � sequence k ∈ N generated by the ADMM Algorithm converges � � x k � � linearly to b , v and the primal ADMM sequence k ∈ N converges to a solution to P .

  17. Remark Compare to Linear convergence with strong monotonicity Let f and g be proper, lsc and convex. Suppose there exists a f ∗ ◦ ( −A T ) � � � + ∂ g ∗ � solution to 0 ∈ ∂ ( x ) where A is an injective linear mappinig. Suppose further that, on some neighborhood of y g is strongly convex with constant µ and ∂ g is β -inverse strongly monotone for some β > 0. Then any DR sequence initiated on this neighborhood converges linearly to a point in Fix T with rate at least � 1 2 < 1. � 1 − 2 ηβµ 2 K = ( µ + η ) 2 [Lions&Mercier, 1979] See also He&Yuan, (2012); Boley (2013); Hesse&L. (2013); Bauschke,BelloCruz,Nghia,Phan&Wang(2014); Bauschke&Noll(2014); Hesse, Neumann&L. (2014); Patrinos, Stella&Bemporad (2014); Giselsson (2015 × 2).

  18. Strong monotonicity: nice when you have it... ◮ TV: f ( x ) := �∇ x � 1 ◮ modified Huber:  ( t + ǫ ) 2 − ǫ 2 if 0 ≤ t ≤ α − ǫ  2 α   ( t − ǫ ) 2 − ǫ 2 if − α + ǫ ≤ t ≤ 0 f α ( t ) = 2 α � � ǫ − ǫ 2 + α 2  | t | + if | t | > α − ǫ.   2 α

  19. Beyond monotonicity Pointwise quadratically supportable functions (i) ϕ : R n → R ∪ { + ∞} is pointwise quadratically supportable at y if it is subdifferentially regular there and ∃ a neighborhood V of y and a µ > 0 such that ϕ ( x ) ≥ ϕ ( y )+ � v , x − y � + µ 2 � x − y � 2 , ( ∀ v ∈ ∂ϕ ( y )) ∀ x ∈ V . (ii) ϕ : R n → R ∪ { + ∞} is strongly coercive at y if it is subdifferentially regular on V and ∃ a neighborhood V of y and a constant µ > 0 such that ϕ ( x ) ≥ ϕ ( z )+ � v , x − z � + µ 2 � x − z � 2 , ( ∀ v ∈ ∂ϕ ( z )) ∀ x , z ∈ V .

  20. Strong convexity Compare to: (pointwise) strongly convex functions (i) ϕ : R n → R ∪ { + ∞} is pointwise strongly convex at y if there ∃ a convex neighborhood V of y and a constant µ > 0 such that, ( ∀ τ ∈ ( 0 , 1 )) ϕ ( τ x + ( 1 − τ ) y ) ≤ τϕ ( x )+( 1 − τ ) ϕ ( y ) − 1 2 µτ ( 1 − τ ) � x − y � 2 , ∀ x ∈ V . (ii) ϕ : R n → R ∪ { + ∞} is strongly convex at y if ∃ a cvx neighborhood V of y and a constant µ > 0 such that, ( ∀ τ ∈ ( 0 , 1 )) ϕ ( τ x + ( 1 − τ ) z ) ≤ τϕ ( x )+( 1 − τ ) ϕ ( z ) − 1 2 µτ ( 1 − τ ) � x − z � 2 , ∀ x , z ∈ V .

  21. Relations ◮ { str cvx fncts } { str coercive fncts } = { str mon fncts } = ⊂ { cvx fncts }

  22. Relations ◮ { str cvx fncts } { str coercive fncts } = { str mon fncts } = ⊂ { cvx fncts } ◮ { ptws str cvx fncts at x } ⊂ { ptws quadr supportable fncts at x } { ptws str mon fncts at x } ⊂ { ptws quadr supportable fncts at x } f ptws quadratically supportable at x � f convex

  23. Linear Convergence of PAPC Recall PAPC Initialization: Let ( x 0 , y 0 ) ∈ R n × R m , and choose the parameters τ and σ to satisfy � � 0 , 1 1 τ ∈ 0 < τσ ≤ , �A T A� . L f Main Iteration: for k = 1 , 2 , . . . update x k , y k as follows: p k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k − 1 ); for i = 1 , . . . , I , � i p k � y k y k − 1 + σ A T i = prox σ, g ∗ ; i i x k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k ) . Saddle-point � � � � A T x , y − g ∗ ( y ) x ∈ R n max min K ( x , y ) := f ( x ) + . y ∈ R m

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend