douglas rachford splitting for infeasible unbounded and
play

Douglas-Rachford Splitting for Infeasible, Unbounded, and - PowerPoint PPT Presentation

Douglas-Rachford Splitting for Infeasible, Unbounded, and Pathological Problems Yanli Liu, Ernest Ryu, Wotao Yin UCLA Math US-Mexico Workshop Optimization and its Applications Jan 812, 2018 1 / 30 Background What is splitting?


  1. Douglas-Rachford Splitting for Infeasible, Unbounded, and Pathological Problems Yanli Liu, Ernest Ryu, Wotao Yin UCLA Math US-Mexico Workshop Optimization and its Applications — Jan 8–12, 2018 1 / 30

  2. Background

  3. What is “splitting”? • Sun-Tzu: (400 BC) • Caesar: “divide-n-conquer” (100–44 BC) • Principle of computing: reduce a problem to simpler subproblems • Example: find x ∈ C 1 ∩ C 2 − → project to C 1 and C 2 alternatively 2 / 30

  4. Basic principles of splitting split: • x/y directions • linear from nonlinear • smooth from nonsmooth • spectral from spatial • convection from diffusion • composite operators • ( I − λ ( A + B )) − 1 to ( I − λA ) − 1 and ( I − λB ) − 1 Also • domain decomposition • block-coordinate descent • column generation, Bender’s decomposition, etc. 3 / 30

  5. Operator splitting pipeline 1. Formulate 0 ∈ A ( x ) + B ( x ) where A and B are operators, possibly set-valued 2. operator splitting : get a fixed-point operator T : z k +1 ← Tz k Applying T reduces to computing A and B successively 3. Correctness and convergence: • fixed-point z ∗ = Tz ∗ recovers a solution x ∗ • T is contractive or, more weakly, averaged 4 / 30

  6. Example: constrained minimization • C is a convex set. f is a differentiable convex function. minimize f ( x ) x subject to x ∈ C • equivalent inclusion problem: 0 ∈ N C ( x ) + ∇ f ( x ) N C is the normal cone • projected gradient method : x k +1 ← proj C ◦ ( I − γ ∇ f ) x k � �� � T 5 / 30

  7. Convergence

  8. Contractive operator • definition: T is contractive if, for some L ∈ [0 , 1) , � Tx − Ty � ≤ L � x − y � , ∀ x, y L 1 6 / 30

  9. Between L = 1 and L < 1 • L < 1 ⇒ geometric convergence • L = 1 ⇒ iterates are bounded, but may diverge • Some algorithms have L = 1 and still converge: • Alternative projection (von Neumann) • Gradient descent • Proximal-point algorithm • Operator splitting algorithms 7 / 30

  10. Averaged operator • residual operator: R := I − T . Hence, Rx ∗ = 0 ⇔ x ∗ = Tx ∗ • averaged operator: from some η > 0 , � Tx − Ty � 2 ≤ � x − y � 2 − η � Rx − Ry � 2 , ∀ x, y • interpretation: set y as a fixed point, then distance to y improve by the amount of fixed-point residual • property 1 : if T has a fixed point, then x k +1 ← Tx k converges weakly to a fixed point 1 Krasnosel’ski˘ i’57, Mann’56 8 / 30

  11. Why called “averaged”? lemma: For α ∈ (0 , 1) , T is α -averaged if, and only if, there exists a nonexpansive ( 1 -Lipschitz) map T ′ so that T = (1 − α ) I + αT ′ . 9 / 30

  12. Composition of averaged operators Useful theorem: T 1 , T 2 nonexpansive ⇒ T 1 ◦ T 2 nonexpansive T 1 , T 2 averaged ⇒ T 1 ◦ T 2 averaged (though the averagedness constants get worse.) 10 / 30

  13. How to get an averaged-operator composition?

  14. Forward-backward splitting • derive : 0 ∈ Ax + Bx ⇐ ⇒ x − Bx ∈ x + Ax ⇐ ⇒ ( I − B ) x ∈ ( I + A ) x ⇒ ( I + A ) − 1 ⇐ ( I − B ) x = x � �� � � �� � backward forward � �� � operator T FBS • Although ( I + A ) may be set-valued, ( I + A ) − 1 is single-valued! 11 / 30

  15. • forward-backward splitting (FBS) operator (Mercier’79): for γ > 0 T FBS := ( I + γA ) − 1 ◦ ( I − γB ) • key properties: • if A is maximally monotone 2 , then ( I + γA ) − 1 is 1 2 -averaged • if B is β -cocoercive 3 and γ ∈ (0 , 2 β ) , then ( I − γB ) is averaged • conclusion: T FBS is averaged, thus if a fixed-point exists, x k +1 ← T FBS ( x k ) converges 2 � Ax − Ay, x − y � ≥ 0 , ∀ x, y 3 � Bx − By, x − y � ≥ β � Bx − By � 2 , ∀ x, y 12 / 30

  16. Major operator splitting schemes 0 ∈ Ax + Bx • forward-backward (Mercier’79) for (maximally monotone) + (cocoercive) • Douglas-Rachford (Lion-Mercier’79) for (maximally monotone) + (maximally monotone) • forward-backward-forward (Tseng’00) for (maximally monotone) + (Lipschitz & monotone) • three-operator (Davis-Yin’15) for (maximally monotone) + (maximally monotone) + (cocoercive) • use non-Euclidean metric (Condat-Vu’13) for (maximally monotone ◦ A ) A is bounded linear operator 13 / 30

  17. DRS for optimization minimize f ( x ) + g ( x ) x • f, g are proper closed convex, may be non-differentiable • DRS iteration: z k +1 = T DRS ( z k ) ⇐ ⇒ x k +1 / 2 = prox γf ( z k ) x k +1 = prox γg (2 x k +1 / 2 − z k ) z k +1 = z k + ( x k +1 − x k +1 / 2 ) • z k → z ∗ and x k , x k +1 / 2 → x ∗ if • primal dual solutions exist, and • −∞ < p ∗ = d ∗ < ∞ . • otherwise , � z k � → ∞ . 14 / 30

  18. New results

  19. Overview • pathological conic programs , even small ones, can cripple existing solvers • proposed: use DRS • to identify infeasible, unbounded, pathological problems • to compute “certificates” if there is one • to “restore feasibility” • under the hood: understanding divergent DRS iterates 15 / 30

  20. Linear programming • standard-form: p ⋆ = min c T x subject to Ax = b , x ≥ 0 � �� � � �� � x ∈L x ∈ R + • every LP is in exactly one of the 3 cases: 1) p ⋆ finite ⇔ ∃ primal solution ⇔ ∃ primal-dual solution pair 2) p ⋆ = −∞ : problem is feasible, unbounded ⇔ ∃ improving direction 4 3) p ⋆ = + ∞ : problem is infeasible ⇔ dist( L, R + ) > 0 ⇔ ∃ strict separating hyperplane 5 • cases (2) (3) arise, e.g., during branch-n-bound • existing solvers are reliable 4 u is an improving direction if c T u < 0 and x + αu is feasible for all feasible x and α > 0 . 5 { x : h T x = β } strictly separates two sets L and K if h T x < β < h T y for all x ∈ L , y ∈ K . 16 / 30

  21. Conic programming • standard-form: K is a closed convex cone p ⋆ = min c T x subject to Ax = b , x ∈ K � �� � x ∈L • every problem is in one of the 7 cases : 1) p ⋆ finite: 1a) has PD sol pair, 1b) has P sol only, 1c) no P sol 2) p ⋆ = −∞ : 2a) has improving direction, 2b) no improving direction 3) p ⋆ = + ∞ : 3a) dist( L , K ) > 0 ⇔ has strict separating hyperplane 3b) dist( L , K ) = 0 ⇔ no strict separating hyperplane • all “b” “c” cases are pathological • even nearly pathological problems can fail existing solvers 17 / 30

  22. Example 1 • 3-variable problem: subject to x 2 = 1 , 2 x 2 x 3 ≥ x 2 minimize x 1 1 , x 2 , x 3 ≥ 0 . � �� � rotated second-order cone • belongs to case 2b): • feasible • p ⋆ = −∞ , by letting x 3 → ∞ and x 1 → −∞ • no improving direction 6 • existing solvers 7 : • SDPT3: “Failed”, p ⋆ no reported • SeDuMi: “Inaccurate/Solved”, p ⋆ = − 175514 • Mosek: “Inaccurate/Unbounded”, p ⋆ = −∞ 6 reason : any improving direction u has form ( u 1 , 0 , u 3 ) , but by the cone constraint 2 u 2 u 3 = 0 ≥ u 2 1 , so u 1 = 0 , which implies c T u 1 = 0 (not improving). 7 using their default settings 18 / 30

  23. Example 2 • 3-variable problem: � � � � 0 1 1 0 � x 2 1 + x 2 minimize 0 subject to x = , x 3 ≥ . 2 1 0 0 1 � �� � x ∈ K � �� � x ∈L • belongs to case 3b): • infeasible 8 • dist( L , K ) = 0 9 • no strict separating hyperplane • existing solvers 10 : • SDPT3: “Infeasible”, p ⋆ = ∞ • SeDuMi: “Solved”, p ⋆ = 0 • Mosek: “Failed”, p ⋆ not reported 8 x ∈ L imply x = [1 , − α, α ] T , α ∈ R , which always violates the second-order cone constraint. 9 dist( L , K ) ≤ � [1 , − α, α ] − [1 , − α, ( α 2 + 1) 1 / 2 ] � 2 → ∞ as α → ∞ . 10 using their default settings 19 / 30

  24. Conic DRS minimize c T x subject to Ax = b, x ∈ K ⇔ minimize � c T x + δ A · = b ( x ) � + δ K ( x ) � �� � � �� � g ( x ) f ( x ) • cone K is nonempty closed convex 11 , matrix A has full row rank • each iteration: projection onto A · = b , then projection onto K • per-iteration cost: O ( n 2 + cost ( proj K )) with prefactorized AA T • prior work: Wen-Goldfarb-Yin’09 for SDP • we know: if not case 1a), DRS diverges; but how? 11 not necessarily self-dual 20 / 30

  25. What happens during divergence? • iteration: z k +1 = T ( z k ) , where T is averaged • general theorem 12 : z k − z k +1 → v = Proj ran( I − T ) ( 0 ) • v is “the best approximation to a fixed point of T ” 12 Pazy’71, Baillon-Bruck-Reich’78 21 / 30

  26. Our results (Liu-Ryu-Yin’17) • proof simplification • new rate of convergence: � z k − z k +1 � ≤ � v � + ǫ + O ( 1 √ k +1 ) • for conic programs, a workflow using three simultaneous DRS: 1) original DRS 2) same DRS with c = 0 3) same DRS with b = 0 • most pathological cases are identified • for unbounded problem 2a), compute an improving direction • for infeasible problem 3a), compute a strict separating hyperplane • for all infeasible problems, minimally alter b to restore strong feasibility 22 / 30

  27. Decision flow 23 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend