SLIDE 8 June 2006 Trevor Hastie, Stanford Statistics 26
* * * * * * * 5 10 15 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 lambda Standardized coefficients * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Coefficient path
x4 x3 x5 x2 x1 5 4 3 2 1 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 5 10 15 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 lambda Standardized coefficients * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Coefficient path
x4 x3 x5 x2 x1 5 4 3 2 1
glmpath package
- max ℓ(β) s.t. ||β||1 ≤ t
- Predictor-corrector
methods in convex
- ptimization used.
- Computes exact path
at a sequence of index points t.
junctions (in t) where the active set changes.
package.
June 2006 Trevor Hastie, Stanford Statistics 27
Path algorithms for the SVM
- The two-class SVM classifier f(X) = α0 + N
i=1 αiK(X, xi)yi
can be seen to have a quadratic penalty and piecewise-linear
- loss. As the cost parameter C is varied, the Lagrange
multipliers αi change piecewise-linearly.
- This allows the entire regularization path to be traced exactly.
The active set is determined by the points exactly on the margin.
12 points, 6 per class, Separated Step: 17 Error: 0 Elbow Size: 2 Loss: 0
* * * * * * * * * * * *
7 8 9 10 11 12 1 2 3 4 5 6 1 3 Mixture Data − Radial Kernel Gamma=1.0 Step: 623 Error: 13 Elbow Size: 54 Loss: 30.46
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Mixture Data − Radial Kernel Gamma=5 Step: 483 Error: 1 Elbow Size: 90 Loss: 1.01
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
June 2006 Trevor Hastie, Stanford Statistics 28
SVM as a regularization method
1 2 3 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Binomial Log-likelihood Support Vector
yf(x) (margin) Loss
With f(x) = xT β + β0 and yi ∈ {−1, 1}, consider min
β0, β N
[1−yif(xi)]++λ 2 β2 This hinge loss criterion is equivalent to the SVM, with λ monotone in B. Compare with min
β0, β N
log
+λ 2 β2 This is binomial deviance loss, and the solution is “ridged” linear logistic regression.
June 2006 Trevor Hastie, Stanford Statistics 29
The Need for Regularization
1e−01 1e+01 1e+03 0.20 0.25 0.30 0.35 1e−01 1e+01 1e+03 1e−01 1e+01 1e+03 1e−01 1e+01 1e+03
Test Error
Test Error Curves − SVM with Radial Kernel γ = 5 γ = 1 γ = 0.5 γ = 0.1 C = 1/λ
- γ is a kernel parameter: K(x, z) = exp(−γ||x − z||2).
- λ (or C) are regularization parameters, which have to be
determined using some means like cross-validation.