A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE - PowerPoint PPT Presentation

A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE QUADRATICALLY SUPPORTABLE CONVEX-CONCAVE SADDLE POINT PROBLEMS Russell Luke (Timo Aspelmeier, Charitha, Ron Shefi) Universit¨ at G¨ ottingen LCCC Workshop, Large-Scale and Distributed Optimization, June 14-16, 2017, Lunds University

Outline Prelude Analysis Applications References

STimulated Emission Depletion

STimulated Emission Depletion ≈ 3 nm per pixel

Statistical Image Denoising/Deconvolution minimize f ( x ) x ∈ R n subject to g ǫ ( Ax ) ≤ 0 where f is convex, piecewise linear-quadratic, A : R n → R n , and g ǫ : R n → m = 2 R n := v �→ ( g 1 ( v ) − ǫ 1 , g 2 ( v ) − ǫ 2 , . . . , g m ( v ) − ǫ m ) T is convex and smooth

Statistical Image Denoising/Deconvolution minimize f ( x ) x ∈ R n subject to g ǫ ( Ax ) ≤ 0 where f is convex, piecewise linear-quadratic, A : R n → R n , and g ǫ : R n → m = 2 R n := v �→ ( g 1 ( v ) − ǫ 1 , g 2 ( v ) − ǫ 2 , . . . , g m ( v ) − ǫ m ) T is convex and smooth What is the scientific content of processed images?

Goals Solve 0 ∈ F ( x ) for F : E ⇒ E with E a Euclidean space. ◮ #1. Convergence (with a posteriori error bounds) of Picard iterations: x k + 1 ∈ Tx k where Fix T ≈ zer F ◮ #2. Algorithms: ◮ (Non)convex Optimization: ADMM/Douglas-Rachford ◮ Saddle-point Problems: Proximal Alternating Predictor-Corrector (PAPC) ◮ #3. Applications: ◮ Image denoising/deconvolution ◮ Phase retrieval

Building blocks ◮ Resolvent: ( Id + λ F ) − 1 ◮ Prox operator: for a function f : X → R , define � f ( y ) + 1 � 2 � y − x � 2 prox M , f ( x ) := argmin y M ◮ Proximal reflector: R M , f := 2 prox M , f − Id ◮ Projector: if f = ι Ω for Ω ⊂ X closed and nonempty, then prox M , f ( x ) = P Ω x where P Ω x := { x ∈ Ω | � x − x � = dist ( x , Ω) } dist ( x , Ω) := y ∈ Ω � x − y � M . inf ◮ Reflector: if f = ι Ω for some closed, nonempty set Ω ⊂ X , then R Ω := 2 P Ω − Id

Optimization � I � � g i ( A T i x ) =: f ( x ) + g ( A T x ) : x ∈ R n p ∗ = min f ( x ) + . ( P ) x i Reformulations: Augmented Lagrangian v ∈ R m f ( x ) + � x , A b � − � b , v � + g ( v ) + 1 2 �A T x − v � 2 x ∈ R n min min ( L ) M Saddle-point � � � � A T x , y − g ∗ ( y ) ( M ) x ∈ R n max min K ( x , y ) := f ( x ) + . y ∈ R m

Algorithms ADMM Initialization. Choose η > 0 and ( x 0 , v 0 , b 0 ) . General Step ( k = 0 , 1 , . . . ) � 2 � Ax − v k � 2 � x k + 1 f ( x ) + � b k , Ax � + η ∈ argmin x ; (1a) � 2 � Ax k + 1 − v � 2 � v k + 1 g ( v ) − � b k , v � + η ∈ argmin v ; (1b) b k + η ( Ax k + 1 − v k + 1 ) . b k + 1 = (1c) In the convex setting, the points in ADMM can be computed from the corresponding points in Douglas-Rachford y k + 1 ∈ Ty k ( k ∈ N ) for T := 1 2 ( R η B R η D + Id ) = J η B ( 2 J η D − Id ) + ( Id −J η D ) , f ∗ ◦ ( −A T ) � � D := ∂ g ∗ where B := ∂ and

Algorithms Proximal Alternating Predictor-Corrector (PAPC) [Drori, Sabach&Teboulle, 2015] Initialization: Let ( x 0 , y 0 ) ∈ R n × R m , and choose the parameters τ and σ to satisfy � � 0 , 1 1 τ ∈ , 0 < τσ ≤ �A T A� . L f Main Iteration: for k = 1 , 2 , . . . update x k , y k as follows: p k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k − 1 ); for i = 1 , . . . , I , � i p k � y k y k − 1 + σ A T i = prox σ, g ∗ ; i i x k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k ) .

Outline Prelude Analysis Applications References

Key abstract properties Almost firm nonexpansiveness T : E ⇒ E is pointwise almost firmly nonsexpansive at y when � 2 ≤ ε 2 � x − y � 2 + � x + − y + , x − y � � x + − y + � � for all x + ∈ Tx , and all y + ∈ Ty whenever x ∈ U . Metric subregularity (Ioffe, Aze, Dontchev&Rockafellar) Φ : E ⇒ Y is metrically regular on U × V ⊂ E × Y relative to Λ ⊂ E if ∃ a κ > 0 such that x , Φ − 1 ( y ) ∩ Λ � � dist ≤ κ dist ( y , Φ( x )) (2) holds for all x ∈ U ∩ Λ and y ∈ V . When the set V consists of a single point, V = { y } , then Φ is said to be metrically subregular for y on U relative to Λ ⊂ E .

Abstract results Linear convergence [L. Nguyen& Tam, 2017] Let g = ι Ω for Ω ⊂ R n semi-algebraic and let f : R n → R be linear-quadratic convex. Let ( x k ) k ∈ N be iterates of the Douglas–Rachford algorithm and let Λ = aff ( x k ) . If T DR − Id is metrically subregular at all points x ∈ Fix T DR ∩ Λ � = ∅ relative to Λ then for all x 0 close enough to Fix T DR ∩ Λ , the sequence x k converges linearly to a point in Fix T ∩ Λ with constant at most 1 + ε − 1 /κ 2 < 1 where κ is the constant of metric � c = subregularity for T DR − Id on some neighborhood U containing the sequence and ε is the violation of almost firm nonexpansiveness on the neighborhood U .

Polyhedrality = ⇒ metric subregularity If T is polyhedral and Fix T ∩ Λ consists of isolated points, then Id − T is metrically subregular at x relative to Λ .

Application: ADMM/Douglas-Rachford Linear convergence of polyhedral DR/ADMM [Aspelmeier, Charitha, L., 2016] Let f : U → R ∪ { + ∞} and g : V → R be proper, lsc, convex, piecewise linear-quadratic functions and T the corresponding Douglas-Rachford fixed point mapping. Suppose that, for some affine subspace W , Fix T ∩ W is an isolated point { y } . Then the Douglas-Rachford sequence ( y k ) k ∈ N converges linearly to y with rate √ 1 − κ − 2 , where κ > 0 is a constant of metric bounded above by subregularity of Id − T at y for the neighborhood O . Moreover, the b k , v k � � sequence k ∈ N generated by the ADMM Algorithm converges � � x k � � linearly to b , v and the primal ADMM sequence k ∈ N converges to a solution to P .

Remark Compare to Linear convergence with strong monotonicity Let f and g be proper, lsc and convex. Suppose there exists a f ∗ ◦ ( −A T ) � � � + ∂ g ∗ � solution to 0 ∈ ∂ ( x ) where A is an injective linear mappinig. Suppose further that, on some neighborhood of y g is strongly convex with constant µ and ∂ g is β -inverse strongly monotone for some β > 0. Then any DR sequence initiated on this neighborhood converges linearly to a point in Fix T with rate at least � 1 2 < 1. � 1 − 2 ηβµ 2 K = ( µ + η ) 2 [Lions&Mercier, 1979] See also He&Yuan, (2012); Boley (2013); Hesse&L. (2013); Bauschke,BelloCruz,Nghia,Phan&Wang(2014); Bauschke&Noll(2014); Hesse, Neumann&L. (2014); Patrinos, Stella&Bemporad (2014); Giselsson (2015 × 2).

Strong monotonicity: nice when you have it... ◮ TV: f ( x ) := �∇ x � 1 ◮ modified Huber:  ( t + ǫ ) 2 − ǫ 2 if 0 ≤ t ≤ α − ǫ  2 α   ( t − ǫ ) 2 − ǫ 2 if − α + ǫ ≤ t ≤ 0 f α ( t ) = 2 α � � ǫ − ǫ 2 + α 2  | t | + if | t | > α − ǫ.   2 α

Beyond monotonicity Pointwise quadratically supportable functions (i) ϕ : R n → R ∪ { + ∞} is pointwise quadratically supportable at y if it is subdifferentially regular there and ∃ a neighborhood V of y and a µ > 0 such that ϕ ( x ) ≥ ϕ ( y )+ � v , x − y � + µ 2 � x − y � 2 , ( ∀ v ∈ ∂ϕ ( y )) ∀ x ∈ V . (ii) ϕ : R n → R ∪ { + ∞} is strongly coercive at y if it is subdifferentially regular on V and ∃ a neighborhood V of y and a constant µ > 0 such that ϕ ( x ) ≥ ϕ ( z )+ � v , x − z � + µ 2 � x − z � 2 , ( ∀ v ∈ ∂ϕ ( z )) ∀ x , z ∈ V .

Strong convexity Compare to: (pointwise) strongly convex functions (i) ϕ : R n → R ∪ { + ∞} is pointwise strongly convex at y if there ∃ a convex neighborhood V of y and a constant µ > 0 such that, ( ∀ τ ∈ ( 0 , 1 )) ϕ ( τ x + ( 1 − τ ) y ) ≤ τϕ ( x )+( 1 − τ ) ϕ ( y ) − 1 2 µτ ( 1 − τ ) � x − y � 2 , ∀ x ∈ V . (ii) ϕ : R n → R ∪ { + ∞} is strongly convex at y if ∃ a cvx neighborhood V of y and a constant µ > 0 such that, ( ∀ τ ∈ ( 0 , 1 )) ϕ ( τ x + ( 1 − τ ) z ) ≤ τϕ ( x )+( 1 − τ ) ϕ ( z ) − 1 2 µτ ( 1 − τ ) � x − z � 2 , ∀ x , z ∈ V .

Relations ◮ { str cvx fncts } { str coercive fncts } = { str mon fncts } = ⊂ { cvx fncts }

Relations ◮ { str cvx fncts } { str coercive fncts } = { str mon fncts } = ⊂ { cvx fncts } ◮ { ptws str cvx fncts at x } ⊂ { ptws quadr supportable fncts at x } { ptws str mon fncts at x } ⊂ { ptws quadr supportable fncts at x } f ptws quadratically supportable at x � f convex

Linear Convergence of PAPC Recall PAPC Initialization: Let ( x 0 , y 0 ) ∈ R n × R m , and choose the parameters τ and σ to satisfy � � 0 , 1 1 τ ∈ 0 < τσ ≤ , �A T A� . L f Main Iteration: for k = 1 , 2 , . . . update x k , y k as follows: p k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k − 1 ); for i = 1 , . . . , I , � i p k � y k y k − 1 + σ A T i = prox σ, g ∗ ; i i x k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k ) . Saddle-point � � � � A T x , y − g ∗ ( y ) x ∈ R n max min K ( x , y ) := f ( x ) + . y ∈ R m

A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE - PowerPoint PPT Presentation

A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE QUADRATICALLY SUPPORTABLE CONVEX-CONCAVE SADDLE POINT PROBLEMS Russell Luke (Timo Aspelmeier, Charitha, Ron Shefi) Universit at G ottingen LCCC Workshop, Large-Scale and

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % & non-linearly. As

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

A globally convergent numerical method and adaptivity for an inverse problem via Carleman

Assimilation of Multiple Linearly Dependent Data Vectors Trond Mannseth NORCE Energy Linearly

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

PREPARING FACULTY TO ENGAGE IN TRANSDISCIPLINARY TEAMS - AND CONVERGENT RESEARCH Session 2

Developm ent of convergent J2 EE applications for OpenSER Elias Baixas Morat Engineer

A theory of closure operators Alva L. Couch Marc A. Chiarini Tufts University Convergent

Multivariable Puiseux Theorem for Convergent Generalised Power Series Tamara Servi (CMAF Lisboa)

Determining Weakly Reversible Linearly Conjugate Chemical Reaction Networks with Minimal

Linearly independent functions Definition The set of functions { 1 , . . . , n } is called

4.3 Linearly Independent Sets; Bases Definition A set of vectors v 1 , v 2 , , v p in

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

Method of summation of some slowly convergent series Pawe Wony Rafa Nowak e-mail:

Algorithms for Breakthrough STOC 2009 Gentry multiquadratic number fields cryptosystem Fully

CAFE Computer Algebra for Functional Equations Progress Report 2002-2006 November 14th, 2006

Using effective homological algebra for factoring and decomposing of linear functional systems

Parametrizing Linear Systems Daniel Robertz (joint work with F. Chyzak, A. Quadrat) Lehrstuhl B

Clock Typing of n-Synchronous Programs Louis Mandel Florence Plateau Marc Pouzet Laboratoire de

New conditions for the intersection of algebraic curves with polydisk Yacine Bouzidi INRIA

MODELING OF MODELING OF HYBRID SYSTEMS HYBRID SYSTEMS C. G. Cassandras C. G. Cassandras Dept.

Coprime factorizations and stabilizability of infinite-dimensional linear systems Kalle M.

A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE - PowerPoint PPT Presentation

A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE QUADRATICALLY SUPPORTABLE CONVEX-CONCAVE SADDLE POINT PROBLEMS Russell Luke (Timo Aspelmeier, Charitha, Ron Shefi) Universit at G ottingen LCCC Workshop, Large-Scale and

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % &amp; non-linearly. As

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

A globally convergent numerical method and adaptivity for an inverse problem via Carleman

Assimilation of Multiple Linearly Dependent Data Vectors Trond Mannseth NORCE Energy Linearly

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

PREPARING FACULTY TO ENGAGE IN TRANSDISCIPLINARY TEAMS - AND CONVERGENT RESEARCH Session 2

Developm ent of convergent J2 EE applications for OpenSER Elias Baixas Morat Engineer

A theory of closure operators Alva L. Couch Marc A. Chiarini Tufts University Convergent

Multivariable Puiseux Theorem for Convergent Generalised Power Series Tamara Servi (CMAF Lisboa)

Determining Weakly Reversible Linearly Conjugate Chemical Reaction Networks with Minimal

Linearly independent functions Definition The set of functions { 1 , . . . , n } is called

4.3 Linearly Independent Sets; Bases Definition A set of vectors v 1 , v 2 , , v p in

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

Method of summation of some slowly convergent series Pawe Wony Rafa Nowak e-mail:

Algorithms for Breakthrough STOC 2009 Gentry multiquadratic number fields cryptosystem Fully

CAFE Computer Algebra for Functional Equations Progress Report 2002-2006 November 14th, 2006

Using effective homological algebra for factoring and decomposing of linear functional systems

Parametrizing Linear Systems Daniel Robertz (joint work with F. Chyzak, A. Quadrat) Lehrstuhl B

Clock Typing of n-Synchronous Programs Louis Mandel Florence Plateau Marc Pouzet Laboratoire de

New conditions for the intersection of algebraic curves with polydisk Yacine Bouzidi INRIA

MODELING OF MODELING OF HYBRID SYSTEMS HYBRID SYSTEMS C. G. Cassandras C. G. Cassandras Dept.

Coprime factorizations and stabilizability of infinite-dimensional linear systems Kalle M.

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % & non-linearly. As