curvature exploiting acceleration of elastic net
play

Curvature-Exploiting Acceleration of Elastic Net Computation Vien V. - PowerPoint PPT Presentation

Curvature-Exploiting Acceleration of Elastic Net Computation Vien V. Mai and Mikael Johansson KTH - Royal Institute of Technology The elastic net problem Workhorse in ML and modern statistics 1 2 + 2 2 n Ax b 2 2 x


  1. Curvature-Exploiting Acceleration of Elastic Net Computation Vien V. Mai and Mikael Johansson KTH - Royal Institute of Technology

  2. The elastic net problem Workhorse in ML and modern statistics � 1 2 + γ 2 � 2 n � Ax − b � 2 2 � x � 2 minimize 2 + γ 1 � x � 1 x ∈ R d Special instances: γ 1 = 0 ⇒ Ridge regression; γ 2 = 0 ⇒ Lasso In many real-world data sets, Hessian of the smooth part ∇ 2 f ( x ) = 1 nA ⊤ A + γ 2 I = C + γ 2 I has rapidly decaying spectrum . V. V. Mai (KTH) ICML 2019 2 / 8

  3. Related work Deterministic first-order methods: � dnκ log 1 � • PGD: O ǫ dn √ κ log 1 κ = λ 1 ( C + γ 2 I ) � � • FISTA : O λ d ( C + γ 2 I ) ǫ Stochastic first-order methods: κ ) log 1 • ProxSVRG : O � � d ( n + ˜ ǫ √ log 1 κ = tr( C + γ 2 I ) � � � � • Katyusha: O d n + n ˜ κ ˜ ǫ λ d ( C + γ 2 I ) Challenge: exploit second-order information despite non-smoothness. V. V. Mai (KTH) ICML 2019 3 / 8

  4. Main contribution Novel 2nd-order optimization algorithm computes ε -optimal solution in time κ ) log 1 O ( d ( n + c ˜ ε ) Stochastic first-order methods have c = 1 , our method has rλ r + � i>r λ i c = ≪ 1 � r i =1 λ i + � i>r λ i Dramatic improvement when C has rapidly decaying spectrum V. V. Mai (KTH) ICML 2019 4 / 8

  5. Proposed algorithm Two building blocks: 1. Approximation of smooth Hessian using randomized block Lanczos 2. Proximal Newton method with stochastic gradients • Exploits finite-sum structure • Uses momentum acceleration to increase mini-batch size • Makes clever use of error control and warm start V. V. Mai (KTH) ICML 2019 5 / 8

  6. Experimental results: suboptimality vs. iteration counts gisette-scale australian 10 0 10 0 10 − 1 10 − 2 Suboptimality 10 − 2 10 − 4 10 − 3 10 − 6 10 − 4 10 − 8 10 − 5 10 − 6 10 − 10 0 20 40 60 80 100 0 20 40 60 80 100 cina0 real-sim 10 0 10 − 2 Suboptimality 10 − 5 10 − 4 10 − 6 10 − 10 10 − 8 10 − 10 10 − 15 0 20 40 60 80 100 0 20 40 60 80 100 Epoch Epoch FISTA Katyusha ProxSVRG BCD Ours V. V. Mai (KTH) ICML 2019 6 / 8

  7. Experimental results: suboptimality vs. run-times gisette-scale, b = 500 australian 10 0 BCD 10 − 2 Ours Suboptimality 10 − 5 10 − 4 10 − 10 10 − 6 10 − 15 0 100 200 300 400 0 0 . 2 0 . 4 0 . 6 0 . 8 1 real-sim, b = 2000 cina0 10 0 10 0 10 − 2 Suboptimality 10 − 5 10 − 4 10 − 6 10 − 10 10 − 8 10 − 10 10 − 15 0 5 10 15 20 0 50 100 150 200 Time [s] Time [s] V. V. Mai (KTH) ICML 2019 7 / 8

  8. Thank you! Please come visit our poster at: Room Pacific Ballroom #196 Code: https://github.com/vienmai/elasticnet V. V. Mai (KTH) ICML 2019 8 / 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend