Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner, Kilian Q. Weinberger, Yixin Chen
Support Vector Elastic Network
“Sven the Terrible”
S upport V ector E lastic N etwork Quan Zhou, Wenlin Chen, Shiji - - PowerPoint PPT Presentation
S upport V ector E lastic N etwork Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner , Kilian Q. Weinberger, Yixin Chen Sven the Terrible T raditional Computer Science Traditional CS: Data Output Program Computer Machine Learning
Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner, Kilian Q. Weinberger, Yixin Chen
“Sven the Terrible”
Data Program Output Computer Traditional CS:
Data Program Output Computer Traditional CS: Machine Learning: Data Output Program Computer
w>x
min
w
1 2kwk2
2 + C n
X
i=1
max(0, 1 yi(w>xi))2
L2 Regularization.
Squared hinge loss.
14644 Citations Published in ML journals Usable means MATLAB Fast means parallel
Many GPU Implementations
w>x
min
w
1 2kwk2
2 + C n
X
i=1
max(0, 1 yi(w>xi))2
L2 Regularization.
Squared hinge loss.
14644 Citations Published in ML journals Usable means MATLAB Fast means parallel
Many GPU Implementations
min
β
kXβ yk2
2 + λ2kβk2 2
such that |β|1 ≤ t
13856 Citations Published in stats journals Usable means R Fast means Fortran
Zero GPU Implementations
min
β
kXβ yk2
2 + λ2kβk2 2
such that |β|1 ≤ t
13856 Citations Published in stats journals Usable means R Fast means Fortran
Zero GPU Implementations
min
β
kXβ yk2
2 + λ2kβk2 2
such that |β|1 ≤ t
13856 Citations Published in stats journals Usable means R Fast means Fortran
Zero GPU Implementations
min
β
kXβ yk2
2 + λ2kβk2 2
such that |β|1 ≤ t t
0.5 1 1.5 0.2 0.2 0.4 0.6
SVEN (GPU)
βi
L1 Budget
+ interpretable + parallel + scales to large data + multi-platform
Elastic Net SVM
Problem A Problem B Solution B Solution A
Elastic Net SVM
Input X,Y Input Xnew,Ynew Output β α Output
Problem A Problem B Solution B Solution A
[n,p] = size(X); Xnew = [bsxfun(@minus,X,Y./t) bsxfun(@plus,X,Y./t)]'; Ynew = [ones(p,1); -ones(p,1)]; C = 1/(2*lambda); alpha = C * max(1 - Ynew.*(Xnew*model.w),0); beta = t*(alpha(1:p) - alpha(p+1:2*p)) / sum(alpha); model = trainsvmGPU(Ynew,sparse(Xnew),['-q -s 1 -c ' num2str(C)]);
Input X,Y Input Xnew,Ynew Output β α Output
Elastic Net SVM
function beta = SVEN(X,Y,t,lambda)
0.5 1 1.5 0.2 0.2 0.4 0.6
Glmnet
0.5 1 1.5 0.2 0.2 0.4 0.6
SVEN (GPU)
Coefficients βi L1 budget t L1 budget t Equivalence of regularization path
Other alg. runtime (sec)
101 MITFaces [n=489410, p=361] Yahoo [n=141397, p=519] YMSD [n=463715, p=90]
SVEN (GPU) faster SVEN (GPU) slower
FD [n=400000, p=900]
SVEN (GPU) faster SVEN (GPU) slower SVEN (GPU) faster SVEN (GPU) slower SVEN (GPU) faster SVEN (GPU) slower
SVEN (GPU) runtime (sec)
100 100 101 102 102 101 100 102 100 101 102 100 101 10-1 10-1 100 101 101 101 102 102
glmnet SVEN (CPU) Shotgun L1_Ls
n>>d datasets
O(d2) Running time: Or…
Other alg. runtime (sec)
GLI85 [n=85, p=22283] arcene [n=900, p=10000] SMKCAN187 [n=187, p=19993] GLABRA180 [n=180, p=49151] S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l
e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l
e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l
e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l
e r
100 10-1 10-2 101 100 10-1 10-2 101 100 10-1 101 102 10-1 100 101 102 100 10-1 101 10-1 100 101 10-1 100 101 102 100 10-1 101 102
glmnet SVEN (CPU) Shotgun L1_Ls
PEMS [n=440, p=138672] scene15 [n=544, p=71963] dorothea [n=800, p=88119] E2006 [n=3308, p=72812] S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l
e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l
e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l
e r S V E N ( G P U ) f a s t e r S V E N ( G P U ) s l
e r
SVEN (GPU) runtime (sec)
100 10-1 101 102 10-1 100 101 102 10-1 100 101 102 100 10-1 101 102 100 10-1 101 102 10-1 100 101 102 100 101 102 103 100 101 102 103
d>>n datasets
Running time: O(n2)
Elastic Net and SVM are equivalent problems. Many optimizations only for SVM now apply to Elastic Net. This leads to the fastest Elastic Net solver we are aware of.
“Sven the Nice?”