Stochastic Optimization for DC Functions and Non-smooth Non-convex - PowerPoint PPT Presentation

Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence Yi Xu 1 , Qi Qi 1 , Qihang Lin 1 , Rong Jin 2 , Tianbao Yang 1 1. The University of Iowa 2. Damo Academy at Alibaba June 12, 2019 ICML, Long Beach, CA Yi Xu (CS@UI) SSDC June 12, 2019 1 / 7

Non-Convex and Non-smooth Optimization A family of non-convex non-smooth optimization problems: x ∈ R d F ( x ) := g ( x ) − h ( x ) + r ( x ) , min (1) ◮ g ( · ), h ( · ): real-valued lower-semicontinuous convex ◮ r ( · ): proper lower-semicontinuous g ( x ) = E ξ [ g ( x ; ξ )], h ( x ) = E ς [ h ( x ; ς )] ◮ Finite-sum (a special case): � n 1 � n 2 g ( x ) = 1 i =1 g i ( x ), h ( x ) = 1 j =1 h j ( x ). n 1 n 2 It covers many applications ◮ Non-Convex Sparsity-Promoting Regularizers: LSP, MCP, SCAD, capped ℓ 1 , transformed ℓ 1 ◮ Weakly convex ◮ Least-squares Regression with ℓ 1 − 2 Regularization ◮ Positive-Unlabeled (PU) Learning Yi Xu (CS@UI) SSDC June 12, 2019 2 / 7

Main Goal Critical Point : a point ¯ x s.t. x ) ∩ ˆ ∂ h (¯ ∂ ( g + r )(¯ x ) � = ∅ . ◮ ˆ ∂ f ( x ): Fr´ echet subgradient; ∂ f ( x ): limiting subgradient An ǫ -Critical Point : a point ¯ x s.t. x ) , ˆ dist( ∂ h (¯ ∂ ( g + r )(¯ x )) ≤ ǫ. ◮ If g + r is non-differentiable, finding an ǫ -critical point is challenging. ◮ An example: g = | x | , h = r = 0, then dist(0 , ∂ | x | ) = 1 when x � = 0. Goal: finding a Nearly ǫ -Critical Point x : if there exists ¯ x such that x ) , ˆ � x − ¯ x � ≤ O ( ǫ ) , dist( ∂ h (¯ ∂ ( g + r )(¯ x )) ≤ ǫ. (2) Yi Xu (CS@UI) SSDC June 12, 2019 3 / 7

Stagewise Stochastic DC algorithm (SSDC- A ) When r ( x ) is convex, assume that the proximal mapping of r ( x ) can be 2 η � x − y � 2 + r ( x ) . 1 easily computed: prox η r ( y ) = arg min x ∈ R d Basic idea: solving a convex majorant function in stage-wise Stagewise Stochastic DC (SSDC) Algorithm [1 , 2 , 3] 1: for k = 1 , . . . , K do F γ x k ( x ) = g ( x )+ r ( x ) − ( h ( x k ) + ∂ h ( x k ) ⊤ ( x − x k ))+ γ 2 � x − x k � 2 . 2: x k +1 = A ( F γ x k ) 3: 4: end for 1Dinh, T.P., Souad, E.B. North-Holland Mathematics Studies, pp. 249-271, 1986. 2 Thi, H. A. L., Le, H. M., Phan, D. N., and Tran, B. in ICML, pp. 3394-3403, 2017. 3 Wen, B., Chen, X., and Pong, T. K. Computational Optimization and Applications, 69(2):297-324, 2018. Yi Xu (CS@UI) SSDC June 12, 2019 4 / 7

Stagewise Stochastic DC algorithm (SSDC- A ) When r ( x ) is convex, assume that the proximal mapping of r ( x ) can be 2 η � x − y � 2 + r ( x ) . 1 easily computed: prox η r ( y ) = arg min x ∈ R d Basic idea: solving a convex majorant function in stage-wise Stagewise Stochastic DC (SSDC) Algorithm [1 , 2 , 3] 1: for k = 1 , . . . , K do F γ x k ( x ) = g ( x )+ r ( x ) − ( h ( x k ) + ∂ h ( x k ) ⊤ ( x − x k ))+ γ 2 � x − x k � 2 . 2: x k +1 = A ( F γ x k ) 3: 4: end for A : stochastic algorithms (e.g., SPG, AdaGrad, SVRG) apply to F γ x k ( x ) 1Dinh, T.P., Souad, E.B. North-Holland Mathematics Studies, pp. 249-271, 1986. 2 Thi, H. A. L., Le, H. M., Phan, D. N., and Tran, B. in ICML, pp. 3394-3403, 2017. 3 Wen, B., Chen, X., and Pong, T. K. Computational Optimization and Applications, 69(2):297-324, 2018. Yi Xu (CS@UI) SSDC June 12, 2019 4 / 7

Stagewise Stochastic DC algorithm (SSDC- A ) When r ( x ) is convex, assume that the proximal mapping of r ( x ) can be 2 η � x − y � 2 + r ( x ) . 1 easily computed: prox η r ( y ) = arg min x ∈ R d Basic idea: solving a convex majorant function in stage-wise Stagewise Stochastic DC (SSDC) Algorithm [1 , 2 , 3] 1: for k = 1 , . . . , K do F γ x k ( x ) = g ( x )+ r ( x ) − ( h ( x k ) + ∂ h ( x k ) ⊤ ( x − x k ))+ γ 2 � x − x k � 2 . 2: x k +1 = A ( F γ x k ) 3: 4: end for A : stochastic algorithms (e.g., SPG, AdaGrad, SVRG) apply to F γ x k ( x ) Finding x k +1 s.t. E [ F γ x k ( x k +1 ) − min x ∈ R d F γ x k ( x )] ≤ c k . 1Dinh, T.P., Souad, E.B. North-Holland Mathematics Studies, pp. 249-271, 1986. 2 Thi, H. A. L., Le, H. M., Phan, D. N., and Tran, B. in ICML, pp. 3394-3403, 2017. 3 Wen, B., Chen, X., and Pong, T. K. Computational Optimization and Applications, 69(2):297-324, 2018. Yi Xu (CS@UI) SSDC June 12, 2019 4 / 7

Summary of Results ( r is convex) Table: Summary of results for finding a (nearly) ǫ -critical point of the problem (1) Algorithm A Complexity g h r O (1 /ǫ 4 ) - SM CX SPG, AdaGrad O ( n /ǫ 2 ) SM SM CX SVRG O (1 /ǫ 4 ) SM - CX, SM SPG, AdaGrad O ( n /ǫ 2 ) SM - CX, SM SVRG SM: smooth; CX: convex. n : the total number of components in a finite-sum problem. Yi Xu (CS@UI) SSDC June 12, 2019 5 / 7

Non-Smooth Non-Convex Regularization When r ( x ) is non-convex, the challenge is the presence of non-smooth non-convex function r . The Moreau envelope of r ( µ > 0) is a DC function [4] : � 1 � 2 µ � y − x � 2 + r ( y ) r µ ( x ) = min y ∈ R d � 1 � = 1 µ y ⊤ x − 1 2 µ � x � 2 − max 2 µ � y � 2 − r ( y ) , y ∈ R d � �� R µ ( x ) Key idea: solving the following DC problem, x ∈ R d F µ ( x ) := g ( x ) − h ( x ) + 1 2 µ � x � 2 − R µ ( x ) . min 4Liu, T., Pong, T. K., and Takeda, A. Mathematical Programming, 2018. Yi Xu (CS@UI) SSDC June 12, 2019 6 / 7

Summary of Results ( r is non-convex) Table: Summary of results for finding a (nearly) ǫ -critical point of the problem (1) Algorithm A g h r Complexity O (1 /ǫ 8 ) SM SM NC, NS, LP SPG O (1 /ǫ 12 ) SM SM NC, NS, FV, LB SPG O ( n /ǫ 8 ) SM SM NC, NS, LP SVRG O ( n /ǫ 6 ) SM SM NC, NS, FV, LB SVRG O ( n /ǫ 6 ) SM SM NC, NS, FVC SVRG SM: smooth; CX: convex; NC: non-convex; NS: non-smooth; LP: Lipchitz continuous function; LB: lower bounded over R d ; FV: finite-valued over R d ; FVC: finite-valued over a compact set. Thank You! Poster #109, Pacific Ballroom, 06:30-09:00 PM Yi Xu (CS@UI) SSDC June 12, 2019 7 / 7

Stochastic Optimization for DC Functions and Non-smooth Non-convex - PowerPoint PPT Presentation

Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence Yi Xu 1 , Qi Qi 1 , Qihang Lin 1 , Rong Jin 2 , Tianbao Yang 1 1. The University of Iowa 2. Damo Academy at Alibaba June 12, 2019

Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Dual Effect in Stochastic Optimization February 10, 2015 P. Carpentier Master MMMEF Cours

Stochastic Optimization and Discretization January 06, 2021 P. Carpentier Master Optimization

Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic

Non-Smooth Convex Optimization in Data Sciences Jalal Fadili Normandie Universit-ENSICAEN,

Introduction to Stochastic Optimization January 13, 2015 P. Carpentier Master MMMEF Cours

Stochastic Online Optimization Jian Li Institute of Interdisciplinary Information Sciences

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

CHAPTER V V CHAPTER Annealing by Stochastic Annealing by Stochastic Neural Networks for

CS440/ECE448 Lecture 12: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Fitting smooth-in-time prognostic risk functions via logistic regression James A. Hanley 1 Olli S.

Smooth Shape-Aware Functions with Controlled Extrema Alec Jacobson 1 1 ETH Zurich Tino Weinkauf 2

Various Topics Outline 1. Dynamic (time-varying) Optimization Problems 2. Stochastic

Maycroft Apartments: A Low-Income Solar+Storage Resiliency Center in DC July 31, 2019

DC Motors the two motors come in all in the kit shapes and sizes You probably have 3-4 on you

PUTTI NG FAMI LI ES FI RST I N DC Decem ber 5 , 2 0 1 9 Emerging Best Practices Conference

On the Balcony T om Mi Mink nka Mi Microso soft t Res esea earch ch What I do

Solar+Storage in in Low-Income Communities March 29, 2018 Housekeeping Use the red arrow to

Pillsburys Washington Weekly Briefing: COVID-19 Developments April 22, 2020 Addr Addres

The magic of cross-spectrum measurements from DC to optics E. Rubiola FEMTO-ST Institute, CNRS

NMI Build & Test Laboratory: Continuous Integration Framework for Distributed Computing

Sambuz

Useful Links

Newsletter

Mail Us

Stochastic Optimization for DC Functions and Non-smooth Non-convex - PowerPoint PPT Presentation

Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence Yi Xu 1 , Qi Qi 1 , Qihang Lin 1 , Rong Jin 2 , Tianbao Yang 1 1. The University of Iowa 2. Damo Academy at Alibaba June 12, 2019

Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Dual Effect in Stochastic Optimization February 10, 2015 P. Carpentier Master MMMEF Cours

Stochastic Optimization and Discretization January 06, 2021 P. Carpentier Master Optimization

Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic

Non-Smooth Convex Optimization in Data Sciences Jalal Fadili Normandie Universit-ENSICAEN,

Introduction to Stochastic Optimization January 13, 2015 P. Carpentier Master MMMEF Cours

Stochastic Online Optimization Jian Li Institute of Interdisciplinary Information Sciences

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

CHAPTER V V CHAPTER Annealing by Stochastic Annealing by Stochastic Neural Networks for

CS440/ECE448 Lecture 12: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Fitting smooth-in-time prognostic risk functions via logistic regression James A. Hanley 1 Olli S.

Smooth Shape-Aware Functions with Controlled Extrema Alec Jacobson 1 1 ETH Zurich Tino Weinkauf 2

Various Topics Outline 1. Dynamic (time-varying) Optimization Problems 2. Stochastic

Maycroft Apartments: A Low-Income Solar+Storage Resiliency Center in DC July 31, 2019

DC Motors the two motors come in all in the kit shapes and sizes You probably have 3-4 on you

PUTTI NG FAMI LI ES FI RST I N DC Decem ber 5 , 2 0 1 9 Emerging Best Practices Conference

On the Balcony T om Mi Mink nka Mi Microso soft t Res esea earch ch What I do

Solar+Storage in in Low-Income Communities March 29, 2018 Housekeeping Use the red arrow to

Pillsburys Washington Weekly Briefing: COVID-19 Developments April 22, 2020 Addr Addres

The magic of cross-spectrum measurements from DC to optics E. Rubiola FEMTO-ST Institute, CNRS

NMI Build &amp; Test Laboratory: Continuous Integration Framework for Distributed Computing

Sambuz

Useful Links

Newsletter

Mail Us

NMI Build & Test Laboratory: Continuous Integration Framework for Distributed Computing