Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a - PowerPoint PPT Presentation

Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number Zaiyi Chen, Yi Xu, Haoyuan Hu, Tianbao Yang zaiyi.czy@alibaba-inc.com 2019-06-10 Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 1 / 20

Overview Introduction 1 Katalyst Algorithm and Theoretical Guarantee 2 Experiments 3 Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 2 / 20

Problem Definition Problem Definition � n x ∈ R d φ ( x ) := 1 min f i ( x ) + ψ ( x ) (1) n i =1 we can obtain a better gradient complexity w.r.t. sample size n and accuracy ǫ via variance reduced method (Johnson & Zhang, 2013) (SVRG-type). We name the proposed algorithm Katalyst after Katyusha (Allen-Zhu, 2017) and Catalyst (Lin et al., 2015). Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 3 / 20

Assumptions { f i } are L -smooth. ψ can be non-smooth but convex. φ is µ -weakly convex. Definition 1 ( L -smoothness) A function f is Lipschitz smooth with constant L if its derivatives are Lipschitz continuous with constant L , that is �∇ f ( x ) − ∇ ( y ) � ≤ L � x − y � , ∀ x , y ∈ R d Definition 2 2 � x � 2 is (Weak convexity) A function φ is µ -weakly convex, if φ ( x ) + µ convex. Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 4 / 20

Comparisons with Related Work Table 1: Comparison of gradient complexities of variance reduction based algorithms for finding ǫ -stationary point of (1). ∗ marks the result is only valid when L /µ ≤ √ n . Algorithms L /µ ≥ Ω( n ) L /µ ≤ O ( n ) Non-smooth ψ O ( n 2 / 3 L /ǫ 2 ) O ( n 2 / 3 L /ǫ 2 ) SAGA (Reddi et al., 2016) Yes O ( √ nL µ/ǫ 2 ) O (( µ n + √ nL µ ) /ǫ 2 ) � � RapGrad (Lan & Yang, 2018) indicator function O ( n 2 / 3 L /ǫ 2 ) O ( n 2 / 3 L /ǫ 2 ) SVRG (Reddi et al., 2016) Yes ∗ O ( n 2 / 3 L 2 / 3 µ 1 / 3 /ǫ 2 ) Natasha1 (Allen-Zhu, 2017) NA Yes O ( n 3 / 4 √ L µ/ǫ 2 ) O (( µ n + n 3 / 4 √ L µ ) /ǫ 2 ) � � RepeatSVRG (Allen-Zhu, 2017) Yes O ( nL /ǫ 2 ) O ( nL /ǫ 2 ) 4WD-Catalyst (Paquette et al., 2018) Yes O ( √ nL /ǫ 2 ) O ( √ nL /ǫ 2 ) SPIDER (Fang et al., 2018) No O ( √ nL /ǫ 2 ) O ( √ nL /ǫ 2 ) SNVRG (Zhou et al., 2018) No O ( √ nL µ/ǫ 2 ) � � O (( µ n + L ) /ǫ 2 ) Katalyst (this work) Yes Our bound is proved optimal up to a logarithmic factor by a recent work (Zhou & Gu, 2019). Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 5 / 20

Interpretation - Our Basic Idea 0.6 0.5 0.4 0.3 0.2 0.1 0 x 0 -0.1 x 1 -0.2 -1 -0.5 0 0.5 1 Step 1 Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 7 / 20

Interpretation - Our Basic Idea 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 x 2 x 1 -0.2 -1 -0.5 0 0.5 1 Step > 1 Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 8 / 20

A Unified Framework Meta Algorithm Algorithm 1: Stagewise-SA( w 0 , { η s } , µ , { w s } ) : a non-increasing sequence { w s } , x 0 ∈ dom( ψ ), γ = (2 µ ) − 1 ; Input 1 for s = 1 , . . . , S do f s ( · ) = φ ( · ) + 1 2 γ � · − x s − 1 � 2 ; 2 x s = Katyusha( f s , x s − 1 , K s , µ, L + µ ) // x s is usually an averaged 3 solution; 4 end Output: x τ , τ is randomly chosen from { 0 , . . . , S } according to the w τ +1 probabilities p τ = k =0 w k +1 , τ = 0 , . . . , S ; � S ) + γ − 1 − µ � n f s ( x ) = 1 ( f i ( x ) + µ � x − x s − 1 � 2 + ψ ( x ) 2 � x − x s − 1 � 2 n 2 � �� i =1 ˆ ˆ f i ( x ) ψ ( x ) Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 9 / 20

Algorithm Algorithm 2: Katyusha( f , x 0 , K , σ, � L ) 2 , τ 1 = min { � n σ Initialize: τ 2 = 1 3 L , 1 1 2 } , η = 3 τ 1 L , θ = 1 + ησ, m = x 0 ← x 0 ; ⌈ log(2 τ 1 +2 /θ − 1) ⌉ + 1, y 0 = ζ 0 = � log θ 1 for k = 0 , . . . , K − 1 do u k = ∇ ˆ x k ); f ( � 2 for t = 0 , . . . , m − 1 do 3 j = km + t ; 4 x k + (1 − τ 1 − τ 2 ) y j ; x j = τ 1 ζ j + τ 2 � 5 ∇ j +1 = u k + ∇ ˆ � f i ( x j +1 ) − ∇ ˆ x k ); f i ( � 6 2 η � ζ − ζ j � 2 + � � ∇ j +1 , ζ � + ˆ ζ j +1 = arg min ζ 1 ψ ( ζ ); 7 y j +1 = arg min y 3 � 2 � y − x j +1 � 2 + � � ∇ j +1 , y � + ˆ L ψ ( ζ ); 8 end 9 � m − 1 t =0 θ t y sm + t +1 x k +1 = � ; 10 � m − 1 j =0 θ t 11 end x K ; Output : � Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 10 / 20

Theory Theorem 3 Let w s = s α , α > 0 , γ = 2 µ , � 1 L = L + µ , σ = µ , and in each call of Katyusha let � � log( D s ) � N σ L , 1 1 τ 1 = min { 2 } , step size η = L , τ 2 = 1 / 2 , θ = 1 + ησ , and K s = , 3 � 3 τ 1 � m log( θ ) � log(2 τ 1 +2 /θ − 1) � + 1 , where D s = max { 4ˆ L /µ, ˆ L 3 /µ 3 , L 2 s /µ 2 } . Then we have that m = log θ max { E [ �∇ φ γ ( x τ +1 ) � 2 ] , E [ L 2 � x τ +1 − z τ +1 � 2 ] } ≤ 34 µ ∆ φ ( α + 1) + 98 µ ∆ φ ( α + 1) ( S + 1) α I α< 1 , S + 1 where z = prox γφ ( x ) , τ is randomly chosen from { 0 , . . . , S } according to probabilities w τ +1 p τ = k =0 w k +1 , τ = 0 , . . . , S. Furthermore, the total gradient complexity for finding x τ +1 such � S that max( E [ �∇ φ γ ( x τ +1 ) � 2 ] , L 2 E [ � x τ +1 − z τ +1 � 2 ]) ≤ ǫ 2 is � � L � 1 �  � , n ≥ 3 L  O ( µ n + n µ L ) log 4 µ ,   ǫ 2 µǫ N ( ǫ ) = � L � 1 ��  n ≤ 3 L   O nL µ log , 4 µ . ǫ 2 µǫ Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 11 / 20

Theory Theorem 4 � log( D ) � Suppose ψ = 0 . With the same parameter values as in Theorem 3 except that K = , m log( θ ) where D = max(48ˆ L /µ, 2ˆ L 3 /µ 3 ) . The total gradient complexity for finding x τ +1 such that E [ �∇ φ ( x τ +1 ) � 2 ] ≤ ǫ 2 is � 1 � � L �  � , n ≥ 3 L  ( µ n + n µ L ) log 4 µ ,  O  µ ǫ 2 N ( ǫ ) = � 1 �� L �  n ≤ 3 L   O nL µ log , 4 µ . ǫ 2 µ Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 12 / 20

Experiments I Squared hinge loss + (log-sum penalty (LSP) / transformed ℓ 1 penalty (TL1)). TL1, rcv1, λ = 1 /n LSP, rcv1, λ = 1 /n TL1, realsim, λ = 1 /n LSP, realsim, λ = 1 /n 0 0 0 0 Katalyst Katalyst Katalyst Katalyst -0.1 proxSVRG proxSVRG proxSVRG proxSVRG -0.2 -0.2 -0.2 -0.2 proxSVRG-mb proxSVRG-mb proxSVRG-mb proxSVRG-mb log 10 (objective) 4WD-Catalyst log 10 (objective) 4WD-Catalyst log 10 (objective) 4WD-Catalyst log 10 (objective) 4WD-Catalyst -0.3 -0.4 -0.4 -0.4 -0.4 -0.5 -0.6 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -0.7 -0.8 -1 -1 -1 -0.9 -1 -1.2 -1.2 -1.2 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 number of gradients/n number of gradients/n number of gradients/n number of gradients/n TL1, rcv1, λ = 0 . 1 /n LSP, rcv1, λ = 0 . 1 /n TL1, realsim, λ = 0 . 1 /n LSP, realsim, λ = 0 . 1 /n 0 0 0 0 Katalyst Katalyst Katalyst Katalyst -0.2 -0.2 -0.2 -0.2 proxSVRG proxSVRG proxSVRG proxSVRG proxSVRG-mb -0.4 proxSVRG-mb proxSVRG-mb -0.4 proxSVRG-mb -0.4 -0.4 log 10 (objective) 4WD-Catalyst log 10 (objective) 4WD-Catalyst log 10 (objective) 4WD-Catalyst log 10 (objective) 4WD-Catalyst -0.6 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -0.8 -1 -1 -1 -1 -1.2 -1.2 -1.2 -1.2 -1.4 -1.4 -1.4 -1.4 -1.6 -1.6 -1.6 -1.6 -1.8 -1.8 -1.8 -2 -1.8 -2 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 number of gradients/n number of gradients/n number of gradients/n number of gradients/n Figure 1: Comparison of different algorithms for two tasks on different datasets Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 14 / 20

Experiments II We use Smoothed SCAD given in (Lan & Yang, 2018),  λ ( x 2 + ǫ ) if ( x 2 + ǫ ) 1 1 2 , 2 ≤ λ,      2 γλ ( x 2 + ǫ ) 2 − ( x 2 + ǫ ) − λ 2 1     , 2( γ − 1) R λ,γ,ǫ ( x ) =  if λ < ( x 2 + ǫ ) 1  2 < γλ,      λ 2 ( γ + 1)   , otherwise , 2 where γ > 2, λ > 0, and ǫ > 0. Then the problem is n d � � x ∈ R d φ ( x ) := 1 i x − b i ) 2 + ρ ( a ⊤ min R λ,γ,ǫ ( x i ) 2 n 2 i =1 i =1 Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 15 / 20

Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a - PowerPoint PPT Presentation

Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number Zaiyi Chen, Yi Xu, Haoyuan Hu, Tianbao Yang zaiyi.czy@alibaba-inc.com 2019-06-10 Chen Z., et al. (CAINIAO.AI&USTC&UIowa) Katalyst 2019-06-10 1

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Boosting Methods: Implicit Combinatorial Optimization via First-Order Convex Optimization Robert

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Optimizing Convex Functions over Non-Convex Domains Dan Bienstock and Alex Michalka

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

IV and IV-GMM Christopher F Baum EC 823: Applied Econometrics Boston College, Spring 2014

Cisco Learning Network CCIE SP series IOS XR RPL Route Policy Language ukasz Bromirski

Aerial Rock Fragmentation Analysis in Low-Light Condition Using UAV Technology T. Bamford 1 K.

Belief Propagation and Applications Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240

JUST THE MATHS SLIDES NUMBER 8.4 VECTORS 4 (Triple products) by A.J.Hobson 8.4.1 The

Intro Concurrency Mutual Exclusion Condition Variables Reentrant Read / Write Locks CS 2112 Lab

Efficient Symbolic Execution for Software Testing Johannes Kinder Royal Holloway, University of

Driller: Augmenting Fuzzing through Symbolic Execution Nick Stephens , John Grosen, Christopher

Sambuz

Useful Links

Newsletter

Mail Us