S upport V ector E lastic N etwork Quan Zhou, Wenlin Chen, Shiji - - PowerPoint PPT Presentation

s upport v ector e lastic n etwork
SMART_READER_LITE
LIVE PREVIEW

S upport V ector E lastic N etwork Quan Zhou, Wenlin Chen, Shiji - - PowerPoint PPT Presentation

S upport V ector E lastic N etwork Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner , Kilian Q. Weinberger, Yixin Chen Sven the Terrible T raditional Computer Science Traditional CS: Data Output Program Computer Machine Learning


slide-1
SLIDE 1

Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner, Kilian Q. Weinberger, Yixin Chen

Support Vector Elastic Network

“Sven the Terrible”

slide-2
SLIDE 2

T raditional Computer Science

Data Program Output Computer Traditional CS:

slide-3
SLIDE 3

Machine Learning

Data Program Output Computer Traditional CS: Machine Learning: Data Output Program Computer

slide-4
SLIDE 4

Support Vector Machines

w>x

min

w

1 2kwk2

2 + C n

X

i=1

max(0, 1 yi(w>xi))2

}

L2 Regularization.

}

Squared hinge loss.

14644 Citations Published in ML journals Usable means MATLAB Fast means parallel

Many GPU Implementations

slide-5
SLIDE 5

Support Vector Machines

w>x

min

w

1 2kwk2

2 + C n

X

i=1

max(0, 1 yi(w>xi))2

}

L2 Regularization.

}

Squared hinge loss.

14644 Citations Published in ML journals Usable means MATLAB Fast means parallel

Many GPU Implementations

slide-6
SLIDE 6

Elastic Net/Lasso

min

β

kXβ yk2

2 + λ2kβk2 2

such that |β|1 ≤ t

13856 Citations Published in stats journals Usable means R Fast means Fortran

Zero GPU Implementations

slide-7
SLIDE 7

min

β

kXβ yk2

2 + λ2kβk2 2

such that |β|1 ≤ t

13856 Citations Published in stats journals Usable means R Fast means Fortran

Zero GPU Implementations

Elastic Net/Lasso

slide-8
SLIDE 8

min

β

kXβ yk2

2 + λ2kβk2 2

such that |β|1 ≤ t

13856 Citations Published in stats journals Usable means R Fast means Fortran

Zero GPU Implementations

Elastic Net/Lasso

slide-9
SLIDE 9

min

β

kXβ yk2

2 + λ2kβk2 2

such that |β|1 ≤ t t

0.5 1 1.5 0.2 0.2 0.4 0.6

SVEN (GPU)

βi

L1 Budget

Elastic Net/Lasso

slide-10
SLIDE 10

+ interpretable + parallel + scales to large data + multi-platform

  • slow
  • does not scale
  • not interpretable

Elastic Net SVM

slide-11
SLIDE 11

Reductions

Problem A Problem B Solution B Solution A

Elastic Net SVM

Input X,Y Input Xnew,Ynew Output β α Output

slide-12
SLIDE 12

Reductions

Problem A Problem B Solution B Solution A

[n,p] = size(X); Xnew = [bsxfun(@minus,X,Y./t) bsxfun(@plus,X,Y./t)]'; Ynew = [ones(p,1); -ones(p,1)]; C = 1/(2*lambda); alpha = C * max(1 - Ynew.*(Xnew*model.w),0); beta = t*(alpha(1:p) - alpha(p+1:2*p)) / sum(alpha); model = trainsvmGPU(Ynew,sparse(Xnew),['-q -s 1 -c ' num2str(C)]);

Input X,Y Input Xnew,Ynew Output β α Output

Elastic Net SVM

function beta = SVEN(X,Y,t,lambda)

slide-13
SLIDE 13

Results

0.5 1 1.5 0.2 0.2 0.4 0.6

Glmnet

0.5 1 1.5 0.2 0.2 0.4 0.6

SVEN (GPU)

Coefficients βi L1 budget t L1 budget t Equivalence of regularization path

slide-14
SLIDE 14

Results

Other alg. runtime (sec)

101 MITFaces [n=489410, p=361] Yahoo [n=141397, p=519] YMSD [n=463715, p=90]

SVEN (GPU) faster SVEN (GPU) slower

FD [n=400000, p=900]

SVEN (GPU) faster SVEN (GPU) slower SVEN (GPU) faster SVEN (GPU) slower SVEN (GPU) faster SVEN (GPU) slower

SVEN (GPU) runtime (sec)

100 100 101 102 102 101 100 102 100 101 102 100 101 10-1 10-1 100 101 101 101 102 102

glmnet SVEN (CPU) Shotgun L1_Ls

n>>d datasets

O(d2) Running time: Or…

slide-15
SLIDE 15

Results

Other alg. runtime (sec)

GLI85 [n=85, p=22283] arcene [n=900, p=10000] SMKCAN187 [n=187, p=19993] GLABRA180 [n=180, p=49151] S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l

  • w

e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l

  • w

e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l

  • w

e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l

  • w

e r

100 10-1 10-2 101 100 10-1 10-2 101 100 10-1 101 102 10-1 100 101 102 100 10-1 101 10-1 100 101 10-1 100 101 102 100 10-1 101 102

glmnet SVEN (CPU) Shotgun L1_Ls

PEMS [n=440, p=138672] scene15 [n=544, p=71963] dorothea [n=800, p=88119] E2006 [n=3308, p=72812] S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l

  • w

e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l

  • w

e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l

  • w

e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l

  • w

e r

SVEN (GPU) runtime (sec)

100 10-1 101 102 10-1 100 101 102 10-1 100 101 102 100 10-1 101 102 100 10-1 101 102 10-1 100 101 102 100 101 102 103 100 101 102 103

d>>n datasets

Running time: O(n2)

slide-16
SLIDE 16

Conclusion

Elastic Net and SVM are equivalent problems. Many optimizations only for SVM now apply to Elastic Net. This leads to the fastest Elastic Net solver we are aware of.

slide-17
SLIDE 17

Questions?

“Sven the Nice?”