stochastic optimization for dc functions and non smooth
play

Stochastic Optimization for DC Functions and Non-smooth Non-convex - PowerPoint PPT Presentation

Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence Yi Xu 1 , Qi Qi 1 , Qihang Lin 1 , Rong Jin 2 , Tianbao Yang 1 1. The University of Iowa 2. Damo Academy at Alibaba June 12, 2019


  1. Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence Yi Xu 1 , Qi Qi 1 , Qihang Lin 1 , Rong Jin 2 , Tianbao Yang 1 1. The University of Iowa 2. Damo Academy at Alibaba June 12, 2019 ICML, Long Beach, CA Yi Xu (CS@UI) SSDC June 12, 2019 1 / 7

  2. Non-Convex and Non-smooth Optimization A family of non-convex non-smooth optimization problems: x ∈ R d F ( x ) := g ( x ) − h ( x ) + r ( x ) , min (1) ◮ g ( · ), h ( · ): real-valued lower-semicontinuous convex ◮ r ( · ): proper lower-semicontinuous g ( x ) = E ξ [ g ( x ; ξ )], h ( x ) = E ς [ h ( x ; ς )] ◮ Finite-sum (a special case): � n 1 � n 2 g ( x ) = 1 i =1 g i ( x ), h ( x ) = 1 j =1 h j ( x ). n 1 n 2 It covers many applications ◮ Non-Convex Sparsity-Promoting Regularizers: LSP, MCP, SCAD, capped ℓ 1 , transformed ℓ 1 ◮ Weakly convex ◮ Least-squares Regression with ℓ 1 − 2 Regularization ◮ Positive-Unlabeled (PU) Learning Yi Xu (CS@UI) SSDC June 12, 2019 2 / 7

  3. Main Goal Critical Point : a point ¯ x s.t. x ) ∩ ˆ ∂ h (¯ ∂ ( g + r )(¯ x ) � = ∅ . ◮ ˆ ∂ f ( x ): Fr´ echet subgradient; ∂ f ( x ): limiting subgradient An ǫ -Critical Point : a point ¯ x s.t. x ) , ˆ dist( ∂ h (¯ ∂ ( g + r )(¯ x )) ≤ ǫ. ◮ If g + r is non-differentiable, finding an ǫ -critical point is challenging. ◮ An example: g = | x | , h = r = 0, then dist(0 , ∂ | x | ) = 1 when x � = 0. Goal: finding a Nearly ǫ -Critical Point x : if there exists ¯ x such that x ) , ˆ � x − ¯ x � ≤ O ( ǫ ) , dist( ∂ h (¯ ∂ ( g + r )(¯ x )) ≤ ǫ. (2) Yi Xu (CS@UI) SSDC June 12, 2019 3 / 7

  4. Stagewise Stochastic DC algorithm (SSDC- A ) When r ( x ) is convex, assume that the proximal mapping of r ( x ) can be 2 η � x − y � 2 + r ( x ) . 1 easily computed: prox η r ( y ) = arg min x ∈ R d Basic idea: solving a convex majorant function in stage-wise Stagewise Stochastic DC (SSDC) Algorithm [1 , 2 , 3] 1: for k = 1 , . . . , K do F γ x k ( x ) = g ( x )+ r ( x ) − ( h ( x k ) + ∂ h ( x k ) ⊤ ( x − x k ))+ γ 2 � x − x k � 2 . 2: x k +1 = A ( F γ x k ) 3: 4: end for 1Dinh, T.P., Souad, E.B. North-Holland Mathematics Studies, pp. 249-271, 1986. 2 Thi, H. A. L., Le, H. M., Phan, D. N., and Tran, B. in ICML, pp. 3394-3403, 2017. 3 Wen, B., Chen, X., and Pong, T. K. Computational Optimization and Applications, 69(2):297-324, 2018. Yi Xu (CS@UI) SSDC June 12, 2019 4 / 7

  5. Stagewise Stochastic DC algorithm (SSDC- A ) When r ( x ) is convex, assume that the proximal mapping of r ( x ) can be 2 η � x − y � 2 + r ( x ) . 1 easily computed: prox η r ( y ) = arg min x ∈ R d Basic idea: solving a convex majorant function in stage-wise Stagewise Stochastic DC (SSDC) Algorithm [1 , 2 , 3] 1: for k = 1 , . . . , K do F γ x k ( x ) = g ( x )+ r ( x ) − ( h ( x k ) + ∂ h ( x k ) ⊤ ( x − x k ))+ γ 2 � x − x k � 2 . 2: x k +1 = A ( F γ x k ) 3: 4: end for 1Dinh, T.P., Souad, E.B. North-Holland Mathematics Studies, pp. 249-271, 1986. 2 Thi, H. A. L., Le, H. M., Phan, D. N., and Tran, B. in ICML, pp. 3394-3403, 2017. 3 Wen, B., Chen, X., and Pong, T. K. Computational Optimization and Applications, 69(2):297-324, 2018. Yi Xu (CS@UI) SSDC June 12, 2019 4 / 7

  6. Stagewise Stochastic DC algorithm (SSDC- A ) When r ( x ) is convex, assume that the proximal mapping of r ( x ) can be 2 η � x − y � 2 + r ( x ) . 1 easily computed: prox η r ( y ) = arg min x ∈ R d Basic idea: solving a convex majorant function in stage-wise Stagewise Stochastic DC (SSDC) Algorithm [1 , 2 , 3] 1: for k = 1 , . . . , K do F γ x k ( x ) = g ( x )+ r ( x ) − ( h ( x k ) + ∂ h ( x k ) ⊤ ( x − x k ))+ γ 2 � x − x k � 2 . 2: x k +1 = A ( F γ x k ) 3: 4: end for A : stochastic algorithms (e.g., SPG, AdaGrad, SVRG) apply to F γ x k ( x ) 1Dinh, T.P., Souad, E.B. North-Holland Mathematics Studies, pp. 249-271, 1986. 2 Thi, H. A. L., Le, H. M., Phan, D. N., and Tran, B. in ICML, pp. 3394-3403, 2017. 3 Wen, B., Chen, X., and Pong, T. K. Computational Optimization and Applications, 69(2):297-324, 2018. Yi Xu (CS@UI) SSDC June 12, 2019 4 / 7

  7. Stagewise Stochastic DC algorithm (SSDC- A ) When r ( x ) is convex, assume that the proximal mapping of r ( x ) can be 2 η � x − y � 2 + r ( x ) . 1 easily computed: prox η r ( y ) = arg min x ∈ R d Basic idea: solving a convex majorant function in stage-wise Stagewise Stochastic DC (SSDC) Algorithm [1 , 2 , 3] 1: for k = 1 , . . . , K do F γ x k ( x ) = g ( x )+ r ( x ) − ( h ( x k ) + ∂ h ( x k ) ⊤ ( x − x k ))+ γ 2 � x − x k � 2 . 2: x k +1 = A ( F γ x k ) 3: 4: end for A : stochastic algorithms (e.g., SPG, AdaGrad, SVRG) apply to F γ x k ( x ) Finding x k +1 s.t. E [ F γ x k ( x k +1 ) − min x ∈ R d F γ x k ( x )] ≤ c k . 1Dinh, T.P., Souad, E.B. North-Holland Mathematics Studies, pp. 249-271, 1986. 2 Thi, H. A. L., Le, H. M., Phan, D. N., and Tran, B. in ICML, pp. 3394-3403, 2017. 3 Wen, B., Chen, X., and Pong, T. K. Computational Optimization and Applications, 69(2):297-324, 2018. Yi Xu (CS@UI) SSDC June 12, 2019 4 / 7

  8. Summary of Results ( r is convex) Table: Summary of results for finding a (nearly) ǫ -critical point of the problem (1) Algorithm A Complexity g h r O (1 /ǫ 4 ) - SM CX SPG, AdaGrad O ( n /ǫ 2 ) SM SM CX SVRG O (1 /ǫ 4 ) SM - CX, SM SPG, AdaGrad O ( n /ǫ 2 ) SM - CX, SM SVRG SM: smooth; CX: convex. n : the total number of components in a finite-sum problem. Yi Xu (CS@UI) SSDC June 12, 2019 5 / 7

  9. Non-Smooth Non-Convex Regularization When r ( x ) is non-convex, the challenge is the presence of non-smooth non-convex function r . The Moreau envelope of r ( µ > 0) is a DC function [4] : � 1 � 2 µ � y − x � 2 + r ( y ) r µ ( x ) = min y ∈ R d � 1 � = 1 µ y ⊤ x − 1 2 µ � x � 2 − max 2 µ � y � 2 − r ( y ) , y ∈ R d � �� � R µ ( x ) Key idea: solving the following DC problem, x ∈ R d F µ ( x ) := g ( x ) − h ( x ) + 1 2 µ � x � 2 − R µ ( x ) . min 4Liu, T., Pong, T. K., and Takeda, A. Mathematical Programming, 2018. Yi Xu (CS@UI) SSDC June 12, 2019 6 / 7

  10. Summary of Results ( r is non-convex) Table: Summary of results for finding a (nearly) ǫ -critical point of the problem (1) Algorithm A g h r Complexity O (1 /ǫ 8 ) SM SM NC, NS, LP SPG O (1 /ǫ 12 ) SM SM NC, NS, FV, LB SPG O ( n /ǫ 8 ) SM SM NC, NS, LP SVRG O ( n /ǫ 6 ) SM SM NC, NS, FV, LB SVRG O ( n /ǫ 6 ) SM SM NC, NS, FVC SVRG SM: smooth; CX: convex; NC: non-convex; NS: non-smooth; LP: Lipchitz continuous function; LB: lower bounded over R d ; FV: finite-valued over R d ; FVC: finite-valued over a compact set. Thank You! Poster #109, Pacific Ballroom, 06:30-09:00 PM Yi Xu (CS@UI) SSDC June 12, 2019 7 / 7

  11. Summary of Results ( r is non-convex) Table: Summary of results for finding a (nearly) ǫ -critical point of the problem (1) Algorithm A g h r Complexity O (1 /ǫ 8 ) SM SM NC, NS, LP SPG O (1 /ǫ 12 ) SM SM NC, NS, FV, LB SPG O ( n /ǫ 8 ) SM SM NC, NS, LP SVRG O ( n /ǫ 6 ) SM SM NC, NS, FV, LB SVRG O ( n /ǫ 6 ) SM SM NC, NS, FVC SVRG SM: smooth; CX: convex; NC: non-convex; NS: non-smooth; LP: Lipchitz continuous function; LB: lower bounded over R d ; FV: finite-valued over R d ; FVC: finite-valued over a compact set. Thank You! Poster #109, Pacific Ballroom, 06:30-09:00 PM Yi Xu (CS@UI) SSDC June 12, 2019 7 / 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend