Curvature-Exploiting Acceleration of Elastic Net Computation Vien V. - - PowerPoint PPT Presentation

curvature exploiting acceleration of elastic net
SMART_READER_LITE
LIVE PREVIEW

Curvature-Exploiting Acceleration of Elastic Net Computation Vien V. - - PowerPoint PPT Presentation

Curvature-Exploiting Acceleration of Elastic Net Computation Vien V. Mai and Mikael Johansson KTH - Royal Institute of Technology The elastic net problem Workhorse in ML and modern statistics 1 2 + 2 2 n Ax b 2 2 x


slide-1
SLIDE 1

Curvature-Exploiting Acceleration of Elastic Net Computation

Vien V. Mai and Mikael Johansson

KTH - Royal Institute of Technology

slide-2
SLIDE 2

The elastic net problem Workhorse in ML and modern statistics minimize

x∈Rd

1 2nAx − b2

2 + γ2

2 x2

2 + γ1x1

  • Special instances: γ1 = 0 ⇒ Ridge regression; γ2 = 0 ⇒ Lasso

In many real-world data sets, Hessian of the smooth part ∇2f(x) = 1 nA⊤A + γ2I = C + γ2I has rapidly decaying spectrum.

  • V. V. Mai (KTH)

ICML 2019 2 / 8

slide-3
SLIDE 3

Related work Deterministic first-order methods:

  • PGD: O
  • dnκ log 1

ǫ

  • FISTA: O
  • dn√κ log 1

ǫ

  • κ = λ1(C+γ2I)

λd(C+γ2I)

Stochastic first-order methods:

  • ProxSVRG: O
  • d (n + ˜

κ) log 1

ǫ

  • Katyusha: O
  • d
  • n +

√ n˜ κ

  • log 1

ǫ

  • ˜

κ = tr(C+γ2I)

λd(C+γ2I)

Challenge: exploit second-order information despite non-smoothness.

  • V. V. Mai (KTH)

ICML 2019 3 / 8

slide-4
SLIDE 4

Main contribution Novel 2nd-order optimization algorithm computes ε-optimal solution in time O(d(n + c ˜ κ) log 1 ε) Stochastic first-order methods have c = 1, our method has c = rλr +

i>r λi

r

i=1 λi + i>r λi

≪ 1 Dramatic improvement when C has rapidly decaying spectrum

  • V. V. Mai (KTH)

ICML 2019 4 / 8

slide-5
SLIDE 5

Proposed algorithm Two building blocks:

  • 1. Approximation of smooth Hessian using randomized block Lanczos
  • 2. Proximal Newton method with stochastic gradients
  • Exploits finite-sum structure
  • Uses momentum acceleration to increase mini-batch size
  • Makes clever use of error control and warm start
  • V. V. Mai (KTH)

ICML 2019 5 / 8

slide-6
SLIDE 6

Experimental results: suboptimality vs. iteration counts

20 40 60 80 100 10−6 10−5 10−4 10−3 10−2 10−1 100 Suboptimality gisette-scale 20 40 60 80 100 10−10 10−8 10−6 10−4 10−2 100 australian 20 40 60 80 100 10−10 10−8 10−6 10−4 10−2 Epoch Suboptimality cina0 20 40 60 80 100 10−15 10−10 10−5 100 Epoch real-sim FISTA Katyusha ProxSVRG BCD Ours

  • V. V. Mai (KTH)

ICML 2019 6 / 8

slide-7
SLIDE 7

Experimental results: suboptimality vs. run-times

100 200 300 400 10−6 10−4 10−2 Suboptimality gisette-scale, b = 500 0.2 0.4 0.6 0.8 1 10−15 10−10 10−5 100 BCD Ours australian 5 10 15 20 10−10 10−8 10−6 10−4 10−2 100 Time [s] Suboptimality cina0 50 100 150 200 10−15 10−10 10−5 100 Time [s] real-sim, b = 2000

  • V. V. Mai (KTH)

ICML 2019 7 / 8

slide-8
SLIDE 8

Thank you! Please come visit our poster at: Room Pacific Ballroom #196 Code: https://github.com/vienmai/elasticnet

  • V. V. Mai (KTH)

ICML 2019 8 / 8