 
              Curvature-Exploiting Acceleration of Elastic Net Computation Vien V. Mai and Mikael Johansson KTH - Royal Institute of Technology
The elastic net problem Workhorse in ML and modern statistics � 1 2 + γ 2 � 2 n � Ax − b � 2 2 � x � 2 minimize 2 + γ 1 � x � 1 x ∈ R d Special instances: γ 1 = 0 ⇒ Ridge regression; γ 2 = 0 ⇒ Lasso In many real-world data sets, Hessian of the smooth part ∇ 2 f ( x ) = 1 nA ⊤ A + γ 2 I = C + γ 2 I has rapidly decaying spectrum . V. V. Mai (KTH) ICML 2019 2 / 8
Related work Deterministic first-order methods: � dnκ log 1 � • PGD: O ǫ dn √ κ log 1 κ = λ 1 ( C + γ 2 I ) � � • FISTA : O λ d ( C + γ 2 I ) ǫ Stochastic first-order methods: κ ) log 1 • ProxSVRG : O � � d ( n + ˜ ǫ √ log 1 κ = tr( C + γ 2 I ) � � � � • Katyusha: O d n + n ˜ κ ˜ ǫ λ d ( C + γ 2 I ) Challenge: exploit second-order information despite non-smoothness. V. V. Mai (KTH) ICML 2019 3 / 8
Main contribution Novel 2nd-order optimization algorithm computes ε -optimal solution in time κ ) log 1 O ( d ( n + c ˜ ε ) Stochastic first-order methods have c = 1 , our method has rλ r + � i>r λ i c = ≪ 1 � r i =1 λ i + � i>r λ i Dramatic improvement when C has rapidly decaying spectrum V. V. Mai (KTH) ICML 2019 4 / 8
Proposed algorithm Two building blocks: 1. Approximation of smooth Hessian using randomized block Lanczos 2. Proximal Newton method with stochastic gradients • Exploits finite-sum structure • Uses momentum acceleration to increase mini-batch size • Makes clever use of error control and warm start V. V. Mai (KTH) ICML 2019 5 / 8
Experimental results: suboptimality vs. iteration counts gisette-scale australian 10 0 10 0 10 − 1 10 − 2 Suboptimality 10 − 2 10 − 4 10 − 3 10 − 6 10 − 4 10 − 8 10 − 5 10 − 6 10 − 10 0 20 40 60 80 100 0 20 40 60 80 100 cina0 real-sim 10 0 10 − 2 Suboptimality 10 − 5 10 − 4 10 − 6 10 − 10 10 − 8 10 − 10 10 − 15 0 20 40 60 80 100 0 20 40 60 80 100 Epoch Epoch FISTA Katyusha ProxSVRG BCD Ours V. V. Mai (KTH) ICML 2019 6 / 8
Experimental results: suboptimality vs. run-times gisette-scale, b = 500 australian 10 0 BCD 10 − 2 Ours Suboptimality 10 − 5 10 − 4 10 − 10 10 − 6 10 − 15 0 100 200 300 400 0 0 . 2 0 . 4 0 . 6 0 . 8 1 real-sim, b = 2000 cina0 10 0 10 0 10 − 2 Suboptimality 10 − 5 10 − 4 10 − 6 10 − 10 10 − 8 10 − 10 10 − 15 0 5 10 15 20 0 50 100 150 200 Time [s] Time [s] V. V. Mai (KTH) ICML 2019 7 / 8
Thank you! Please come visit our poster at: Room Pacific Ballroom #196 Code: https://github.com/vienmai/elasticnet V. V. Mai (KTH) ICML 2019 8 / 8
Recommend
More recommend