semi smooth newton type methods for composite convex
play

Semi-smooth Newton Type Methods for Composite Convex Programs - PowerPoint PPT Presentation

Semi-smooth Newton Type Methods for Composite Convex Programs Zaiwen Wen Beijing International Center For Mathematical Research Peking University wenzw@pku.edu.cn 1/62 Outline composite convex programs 1 Semi-smoothness of proximal mapping


  1. Semi-smooth Newton Type Methods for Composite Convex Programs Zaiwen Wen Beijing International Center For Mathematical Research Peking University wenzw@pku.edu.cn 1/62

  2. Outline composite convex programs 1 Semi-smoothness of proximal mapping 2 semi-smooth Newton methods based on the primal 3 Approach Numerical Results Semi-smooth Newton method based on the dual (SDPNAL) 4 2/62

  3. Composite convex program Consider the following composite convex program min f ( x ) + h ( x ) , x ∈ R n where f and h are convex, f is differentiable but h may not Many applications: Sparse and low rank optimization: h ( x ) = � x � 1 or � X � ∗ and many other forms. Regularized risk minimization: f ( x ) = � i f i ( x ) is a loss function of some misfit and h is a regularization term. Constrained program: h is an indicator function of a convex set. 3/62

  4. A General Recipe Goal: study approaches to bridge the gap between first-order and second-order type methods for composite convex programs. key observations: Many popular first-order methods can be equivalent to some fixed-point iterations: x k + 1 = T ( x k ) ; Advantages: easy to implement; converge fast to a solution with moderate accuracy. Disadvantages: slow tail convergence. The original problem is equivalent to the system F ( x ) := ( I − T )( x ) = 0 . Newton-type method since F ( x ) is semi-smooth in many cases Computational costs can be controlled reasonably well 4/62

  5. An SDP From Electronic Structure Calculation system: BeO 10 2 10 0 10 0 10 -2 10 -2 err err 10 -4 10 -4 10 -6 10 -6 10 -8 10 -8 0 1000 2000 3000 4000 5000 6000 7000 2000 2010 2020 2030 2040 2050 2060 2070 iter iter (a) ADMM, CPU: 2003s (b) Semi-smooth Newton, CPU: 635s 5/62

  6. Operator splitting and fixed-point algorithm Examples: forward-backward splitting(FBS). Douglas-Rachford splitting(DRS). Peaceman-Rachford splitting(PRS). alternating direction method of multipliers(ADMM). Advantages: easy to implement; converge fast to a solution with moderate accuracy. Disadvantages: slow tail convergence. 6/62

  7. Forward-backward splitting (FBS) Consider min f ( x ) + h ( x ) x ∈ R n the proximal mapping of f is defined by u ∈ R n { f ( u ) + 1 2 t � u − x � 2 prox tf ( x ) := argmin 2 } . Proximal gradient method or the FBS is the iteration x k + 1 = prox tf ( x k − t ∇ h ( x k )) , k = 0 , 1 , · · · , Equivalent to a fixed-point iteration x k + 1 = T FBS ( x k ) . where T FBS := prox tf ◦ ( I − t ∇ h ) . 7/62

  8. Douglas-Rachford splitting (DRS) DRS is the following update: x k + 1 = prox th ( z k ) , y k + 1 = prox tf ( 2 x k + 1 − z k ) , z k + 1 = z k + y k + 1 − x k + 1 . Equivalent to a fixed-point iteration z k + 1 = T DRS ( z k ) , where T DRS := I + prox tf ◦ ( 2prox th − I ) − prox th . 8/62

  9. Alternating direction method of multipliers (ADMM) Consider a linear constrained program min f 1 ( x 1 ) + f 2 ( x 2 ) x 1 ∈ R n 1 , x 2 ∈ R n 2 s.t. A 1 x 1 + A 2 x 2 = b , The dual problem is min d 1 ( w ) + d 2 ( w ) , w ∈ R m where d 1 ( w ) := f ∗ d 2 ( w ) := f ∗ 1 ( A T 2 ( A T 2 w ) − b T w . 1 w ) , The ADMM to the primal is equivalent to the DRS to the dual 9/62

  10. Outline composite convex programs 1 Semi-smoothness of proximal mapping 2 semi-smooth Newton methods based on the primal 3 Approach Numerical Results Semi-smooth Newton method based on the dual (SDPNAL) 4 10/62

  11. Semi-smooth Newton-type method Solving the system F ( z ) = 0 , where F ( z ) = T ( z ) − z and T ( z ) is a fixed-point mapping. Fixed-point algorithms suffer from slow tail convergence and may not be suitable for high accuracy applications. F ( z ) fails to be differentiable in many interesting applications. but F ( z ) is (strongly) semi-smooth and monotone. semi-smooth Newton type method 11/62

  12. Semi-smoothness F : O → R m be locally Lipschitz continuous. The B-subdifferential of F at x is defined by � � k →∞ F ′ ( x k ) | x k ∈ D F , x k → x ∂ B F ( x ) := lim . The set ∂ F ( x ) = co ( ∂ B F ( x )) is called Clarke’s generalized Jacobian We say that F is semismooth at x ∈ O if F is directionally differentiable at x ; for any d ∈ O and J ∈ ∂ F ( x + d ) , � F ( x + d ) − F ( x ) − J ( d ) � = o ( � d � ) as d → 0 . F is said to be strongly semi-smooth at x ∈ O if F is semi-smooth and for any d ∈ O and J ∈ ∂ F ( x + d ) , � F ( x + d ) − F ( x ) − J ( d ) � = O ( � d � 2 ) as d → 0 . 12/62

  13. Semi-smoothness (Strongly) semi-smoothness is closed under scalar multiplication, summation and composition. A vector-valued function is (strongly) semi-smooth if and only if each of its component functions is (strongly) semi-smooth. Examples: semi-smooth the smooth functions all convex functions (thus norm) the piecewise differentiable functions strongly semi-smooth Differentiable functions with Lipschitz gradients For every p ∈ [ 1 , ∞ ] , the norm � · � p Piecewise affine functions 13/62

  14. Semi-smoothness of proximal mappings Many commonly seen proximal mappings are semi-smooth Examples: The proximal mapping of ℓ 1 -norm � x � 1 (or ℓ ∞ -norm � x � ∞ ) is strongly semi-smooth. The projection 1 over a polyhedral set is piecewise linear and hence strongly semi-smooth. The projections over symmetric cones are proved to be strongly semi-smooth. In many applications, the proximal mapping is shown to be piecewise C 1 and hence semi-smooth. 1 The proximal mapping of an indicator function onto a closed set is the metric projection over this set. 14/62

  15. Some concepts on monotonicity A mapping F : R n → R n is said to be monotone, if for all x , y ∈ R n . � x − y , F ( x ) − F ( y ) � ≥ 0 , A mapping F : R n → R n is called strongly monotone with modulus c > 0 if � x − y , F ( x ) − F ( y ) � ≥ c � x − y � 2 for all x , y ∈ R n . 2 , It is said that F is cocoercive with modulus β > 0 if � x − y , F ( x ) − F ( y ) � ≥ β � F ( x ) − F ( y ) � 2 for all x , y ∈ R n . 2 , 15/62

  16. Monotone mapping monotone properties of F FBS = I − T FBS and F DRS = I − T DRS : (i) Suppose that ∇ h is cocoercive with β > 0 , then F FBS is monotone if 0 < t ≤ 2 β . (ii) Suppose that ∇ h is strongly monotone with c > 0 and Lipschitz with L > 0 , then F FBS is strongly monotone if 0 < t < 2 c / L 2 . (iii) Suppose that h ∈ C 2 , H ( x ) := ∇ 2 h ( x ) is positive semidefinite for any x ∈ R n and ¯ λ = max x λ max ( H ( x )) < ∞ . Then, F FBS is monotone if 0 < t ≤ 2 / ¯ λ . (iv) The fixed-point mapping F DRS is monotone. (v) For a monotone and Lipschitz continuous mapping F : R n → R n and any x ∈ R n , each element of ∂ B F ( x ) is positive semidefinite. 16/62

  17. Outline composite convex programs 1 Semi-smoothness of proximal mapping 2 semi-smooth Newton methods based on the primal 3 Approach Numerical Results Semi-smooth Newton method based on the dual (SDPNAL) 4 17/62

  18. Semi-smooth Newton system J k ∈ ∂ B F ( z k ) : positively semidefinite. regularized Newton’s method ( J k + µ k I ) d = − F k , where F k = F ( z k ) , µ k = λ k � F k � and λ k > 0 is a regularization parameter. solve the linear system inexactly. r k := ( J k + µ k I ) d k + F k . seek to step d k by solving the system approximately such that � r k � ≤ τ min { 1 , λ k � F k � · � d k �} , where 0 < τ < 1 is some positive constant. 18/62

  19. Semi-smooth Newton method Select 0 < v < 1 , 0 < η 1 ≤ η 2 < 1 and 1 < γ 1 ≤ γ 2 . λ > 0 A trial point u k = z k + d k Define a ratio � F ( u k ) , d k � ρ k = − . � d k � 2 F Update the point � u k , if � F ( u k ) � F ≤ ν max( 1 , k − ζ + 1 ) ≤ j ≤ k � F ( z j ) � F , [Newton] max z k + 1 = z k , otherwise . [failed] Update the regularization prameter  ( λ, λ k ) , if ρ k ≥ η 2 ,  λ k + 1 ∈ [ λ k , γ 1 λ k ] , if η 1 ≤ ρ k < η 2 , ( γ 1 λ k , γ 2 λ k ] , otherwise, .  19/62

  20. Ensuring global convergence I If the residual F is not reduced sufficiently or certain other conditions are not met, switching to first order methods. Note that F itself is a first order methods construct another point from the Newton step? X. Xiao, Y. Li, Z. Wen, L, Zhang, A Regularized Semi-Smooth Newton Method with Projection Steps for Composite Convex Programs, Journal of Scientfic Computing, 2018, Vol 76, No. 1, pp 364-389 Y. Li, Z. Wen, C. Yang, Y. Yuan, A Semi-smooth Newton Method For semidefinite programs and its applications in electronic structure calculations, SIAM Journal on Scientific Computing, Vol 40, No. 6, 2018, A4131A4157 20/62

  21. Ensuring global convergence II: projection step d k = 0 , then x k is the optimal solution. A trial point u k = z k + d k . d k is small enough, F ( u k ) , z k − u k � � � F ( u k ) , d k � = − > 0 . By monotonicity of F , for any optimal solution z ∗ F ( u k ) , z ∗ − u k � � ≤ 0 . Therefore the hyperplane H k := { z ∈ R n | F ( u k ) , z − u k � � = 0 } strictly separates z k from the solution set Z ∗ . 21/62

  22. Ensuring global convergence II: projection step Define a ratio � F ( u k ) , d k � ρ k = − . � d k � 2 If ρ k is big enough, F ( u k ) , z k − u k � � z k + 1 = z k − F ( u k ) , � F ( u k ) � 2 which is the projection onto the hyperplane H k . If ρ k is too small, z k + 1 = z k and increase the parameter. 22/62

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend