acceleration of svrg and katyusha x by inexact
play

Acceleration of SVRG and Katyusha X by Inexact Preconditioning - PowerPoint PPT Presentation

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Acceleration of SVRG and Katyusha X by Inexact Preconditioning Yanli Liu, Fei Feng, and Wotao Yin University of California, Los Angeles ICML 2019 Introduction


  1. Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Acceleration of SVRG and Katyusha X by Inexact Preconditioning Yanli Liu, Fei Feng, and Wotao Yin University of California, Los Angeles ICML 2019

  2. Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Background We focus on solving n minimize F ( x ) = f ( x ) + ψ ( x ) = 1 � f i ( x ) + ψ ( x ) , n i =1 where x ∈ R d , f ( x ) is strongly convex and smooth, ψ ( x ) is convex, and can be non-differentiable. n is large and d = o ( n ) . Examples : Lasso, Logistic regression, PCA... Common solvers : SVRG, Katyusha X (a Nesterov-accelerated SVRG), SAGA, SDCA,... Challenge : As first-order methods, they suffer from ill-conditioning.

  3. Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions In this talk In this work, we propose to accelerate SVRG and Katyusha X by simple yet effective preconditioning. Acceleration is demonstrated both theoretically and numerically ( 7 × runtime speedup on average).

  4. Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions iPreSVRG SVRG: y ∈ R d { ψ ( y ) + 1 2 η � y − w t � 2 + � ˜ w t +1 = arg min ∇ t , y �} , � f i . where ˜ ∇ t is a variance-reduced stochastic gradient of f = 1 n Inexact Preconditioned SVRG (iPreSVRG): y ∈ R d { ψ ( y ) + 1 M + � ˜ 2 η � y − w t � 2 w t +1 ≈ arg min ∇ t , y �} The preconditioner M ≻ 0 approximates the Hessian of f . The subproblem is solved highly inexactly by applying FISTA a fixed number of times. This acceleration technique also applies to Katyusha X.

  5. Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Choosing M for Lasso 1 2 n � Ax − b � 2 2 + λ 1 � x � 1 + λ 2 � x � 2 minimize 2 . x ∈ R d Two choices of M for Lasso: 1 When d is small, we choose M 1 = 1 nA T A, this is the exact Hessian of the first part. 2 When d is large and A T A is almost diagonally dominant, we choose M 2 = 1 n diag ( A T A ) + αI, where α > 0 .

  6. Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Lasso results Figure 1: australian dataset 1 , d = 14 , M = M 1 , 10 × runtime speedup Figure 2: w1a.t dataset 1 , d = 300 , M = M 2 , 5 × runtime speedup 1 https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

  7. Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Choosing M for Logistic n 1 � ln(1 + exp( − b i · a T i x )) + λ 1 � x � 1 + λ 2 � x � 2 minimize 2 . n x ∈ R d i =1 Let B = diag ( b ) A = diag ( b )( a 1 , a 2 , ..., a n ) T . Two choices of M for logistic regression: 1 When d is small, we choose M 1 = 1 4 nB T B, this is approximately the Hessian of the first part. 2 When d is large and B T B is almost diagonally dominant, we choose M 2 = 1 4 n diag ( B T B ) + αI, where α > 0 .

  8. Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Logistic results Figure 3: australian dataset, d = 14 , M = M 1 , 6 × runtime speedup Figure 4: w1a.t dataset, d = 300 , M = M 2 , 4 × runtime speedup

  9. Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Theoretical Speedup Theorem 1 Let C 1 ( m, ε ) and C ′ 1 ( m, ε ) be the gradient complexities of SVRG and iPreSVRG to reach ε − suboptimality, respectively. Here m is the epoch length. 1 1 When κ f > n 2 and κ f < n 2 d − 2 , we have 1 min m ≥ 1 C ′ 1 ( m, ε ) � n 2 � min m ≥ 1 C 1 ( m, ε ) ≤ O . κ f 1 2 When κ f > n 2 and κ f > n 2 d − 2 , we have min m ≥ 1 C ′ 1 ( m, ε ) d min m ≥ 1 C 1 ( m, ε ) ≤ O ( ) . √ nκ f iPreKatX has a similar speedup.

  10. Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Conclusions 1 In this work, we apply inexact preconditioning on SVRG and Katyusha X. 2 With appropriate preconditioners and fast subproblem solvers, we obtain significant speedups in both theory and practice. Poster: Today 6:30 PM – 9:00 PM, Pacific Ballroom #192 Code: https://github.com/uclaopt/IPSVRG

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend