SLIDE 1
SVM-flexible discriminant analysis Huimin Peng November 20, 2014 - - PowerPoint PPT Presentation
SVM-flexible discriminant analysis Huimin Peng November 20, 2014 - - PowerPoint PPT Presentation
SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM = Penalization method discriminant analysis FDA: flexible discriminant analysis penalized discriminant analysis mixture discriminant analysis
SLIDE 2
SLIDE 3
Reference: Elements of statistical learning chapter 12
SVM
Define a hyperplane that separates the observations: {x : f(x) = xT β + β0 = 0}. The optimization problem is min
β,β0
1 2β2 + C
N
- i=1
ξi subject to ξi ≥ 0, yi(xT
i β + β0) ≥ 1 − ξi ∀i,
where ξi ≤ C and ξi is the proportion of wrong predictions. C is the tunning
- parameter. Large C will reduce positive ξi and lead to a wiggly boundary. Small
C will lead to a smoother boundary.
Huimin Peng | NCSU — Department of Statistics 3/22
SLIDE 4
Reference: Elements of statistical learning chapter 12 The solution is ˆ β =
N
- i=1
ˆ αiyixi. The decision function is ˆ G(x) = sign[ ˆ f(x)] = sign[xT ˆ β + ˆ β0].
Huimin Peng | NCSU — Department of Statistics 4/22
SLIDE 5
Reference: Elements of statistical learning chapter 12
Nonlinear
Input features: (transformed feature vectors) h(xi) = (h1(xi), h2(xi), · · · , hM(xi)). Similarly, the classifier: ˆ G(x) = sign( ˆ f(x)) = sign
- h(x)T ˆ
β + ˆ β0
- .
The solution function is ˆ f(x) =
N
- i=1
ˆ αiyiK(x, xi) + ˆ β0.
Huimin Peng | NCSU — Department of Statistics 5/22
SLIDE 6
Reference: Elements of statistical learning chapter 12
SVM = Penalization method
If f(x) = h(x)T β + β0, then min
β0,β N
- i=1
[1 − yif(xi)]+ + λ 2 β2. (1) This is the loss + penalty function. It provides the same solution as min
β,β0
1 2β2 + C
N
- i=1
ξi subject to ξi ≥ 0, yi(xT
i β + β0) ≥ 1 − ξi ∀i,
Huimin Peng | NCSU — Department of Statistics 6/22
SLIDE 7
Reference: Elements of statistical learning chapter 12
discriminant analysis
LDA: linear discriminant analysis QDA: quadratic discriminant analysis FDA: flexible discriminant analysis PDA: penalized discriminant analysis MDA: mixture discriminant analysis R package: mda classes: G = {1, 2, · · · , K}. K classes. score function: θ : G → R1.
Huimin Peng | NCSU — Department of Statistics 7/22
SLIDE 8
Reference: Elements of statistical learning chapter 12 assign scores to the classes. training data: (gi, xi), i = 1, 2, · · · , N. optimization: min
β,θ N
- i=1
(θ(gi) − xT
i β)2.
Huimin Peng | NCSU — Department of Statistics 8/22
SLIDE 9
Reference: Elements of statistical learning chapter 12
FDA: flexible discriminant analysis
More generally, build L ≤ K − 1 sets of independent scorings for the class labels, θ1, θ2, · · · , θL, and L corresponds to the linear maps ηl(X) = XT βl, l = 1, 2, · · · , L. We can generalize ηl(x) = xT βl to be more flexible, nonparametric fits and add a J as a regularizer appropriate for some forms of nonparametric regression. ASR({θl, ηl}L
l=1) = 1
N
L
- l=1
N
- i=1
(θl(gi) − ηl(xi))2 + λJ(ηl)
- .
fda(formula, data, weights, theta, dimension, eps, method, keep.fitted, ...)
Huimin Peng | NCSU — Department of Statistics 9/22
SLIDE 10
Reference: Elements of statistical learning chapter 12
Huimin Peng | NCSU — Department of Statistics 10/22
SLIDE 11
Reference: Elements of statistical learning chapter 12 data(iris) irisfit <- fda(Species ~ ., data = iris) confusion(irisfit, iris) confusion(predict(irisfit, iris), iris$Species) plot(irisfit) coef(irisfit) posteriors <- predict(irisfit, type = "post") confusion(softmax(posteriors), iris[, "Species"]) marsfit <- fda(Species ~ ., data = iris, method = mars) marsfit2 <- update(marsfit, degree = 2) #include interactions up to 2nd degree marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2]) #start from the fitted coef’s in marsfit
Huimin Peng | NCSU — Department of Statistics 11/22
SLIDE 12
Reference: Elements of statistical learning chapter 12 > coef(irisfit) [,1] [,2] Intercept
- 2.1264786 -6.72910343
Sepal.Length -0.8377979 0.02434685 Sepal.Width
- 1.5500519
2.18649663 Petal.Length 2.2235596 -0.94138258 Petal.Width 2.8389936 2.86801283
Huimin Peng | NCSU — Department of Statistics 12/22
SLIDE 13
Reference: Elements of statistical learning chapter 12
Figure 1: plot(irisfit)
Huimin Peng | NCSU — Department of Statistics 13/22
SLIDE 14
Reference: Elements of statistical learning chapter 12
Figure 2: plot(marsfit)
Huimin Peng | NCSU — Department of Statistics 14/22
SLIDE 15
Reference: Elements of statistical learning chapter 12
Figure 3: plot(marsfit1)
Huimin Peng | NCSU — Department of Statistics 15/22
SLIDE 16
Reference: Elements of statistical learning chapter 12
Figure 4: plot(marsfit2)
Huimin Peng | NCSU — Department of Statistics 16/22
SLIDE 17
Reference: Elements of statistical learning chapter 12
penalized discriminant analysis
Quadratic penalty on the coefficients: ASR({θl, βl}L
l=1) = 1
N
L
- l=1
N
- i=1
(θl(gi) − hT (xi)βl)2 + λβT
l Ωβl
- .
The choice of Ω depends on the problem setting. ηl(x) = h(x)T βl. gen.ridge(x, y, weights, lambda=1, omega, df, ...)
Huimin Peng | NCSU — Department of Statistics 17/22
SLIDE 18
Reference: Elements of statistical learning chapter 12
mixture discriminant analysis
A Gaussian mixture model for the kth class has the density P(X|G = k) =
Rk
- r=1
πkrφ(X; µkr, Σ), (2) where Rk
r=1 πkr = 1, Rk is the number of points in class k. Incorporating the
class prior probabilities Πk: P(G = k|X = x) = Rk
r=1 πkrφ(X; µkr, Σ)Πk
K
l=1
Rl
r=1 πlrφ(X; µlr, Σ)Πl
. mda(formula, data, subclasses, sub.df, tot.df, dimension, eps, iter, weights, method, keep.fitted, trace, ...)
Huimin Peng | NCSU — Department of Statistics 18/22
SLIDE 19
Reference: Elements of statistical learning chapter 12
Huimin Peng | NCSU — Department of Statistics 19/22
SLIDE 20
Reference: Elements of statistical learning chapter 12 data(iris) irisfit <- mda(Species ~ ., data = iris) mfit=mda(Species~.,data=iris,subclass=2) coef(mfit) > coef(mfit) [,1] [,2] [,3] [,4] Intercept 6.8563935 -15.1565801 -1.454555 -2.535648 Sepal.Length 0.5545477 1.3506122 1.016966 2.945456 Sepal.Width 1.5867703 2.4658435 -1.345301 -2.562105 Petal.Length -3.2435199 0.3621319 1.341652 -2.921295 Petal.Width
- 2.3003933
- 1.3635028 -4.516518
3.448416
Huimin Peng | NCSU — Department of Statistics 20/22
SLIDE 21
Reference: Elements of statistical learning chapter 12
Figure 5: plot(mfit)
Huimin Peng | NCSU — Department of Statistics 21/22
SLIDE 22