SVM-flexible discriminant analysis Huimin Peng November 20, 2014 - - PowerPoint PPT Presentation

svm flexible discriminant analysis
SMART_READER_LITE
LIVE PREVIEW

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 - - PowerPoint PPT Presentation

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM = Penalization method discriminant analysis FDA: flexible discriminant analysis penalized discriminant analysis mixture discriminant analysis


slide-1
SLIDE 1

SVM-flexible discriminant analysis

Huimin Peng November 20, 2014

slide-2
SLIDE 2

Outline

SVM Nonlinear SVM = Penalization method discriminant analysis FDA: flexible discriminant analysis penalized discriminant analysis mixture discriminant analysis

slide-3
SLIDE 3

Reference: Elements of statistical learning chapter 12

SVM

Define a hyperplane that separates the observations: {x : f(x) = xT β + β0 = 0}. The optimization problem is min

β,β0

1 2β2 + C

N

  • i=1

ξi subject to ξi ≥ 0, yi(xT

i β + β0) ≥ 1 − ξi ∀i,

where ξi ≤ C and ξi is the proportion of wrong predictions. C is the tunning

  • parameter. Large C will reduce positive ξi and lead to a wiggly boundary. Small

C will lead to a smoother boundary.

Huimin Peng | NCSU — Department of Statistics 3/22

slide-4
SLIDE 4

Reference: Elements of statistical learning chapter 12 The solution is ˆ β =

N

  • i=1

ˆ αiyixi. The decision function is ˆ G(x) = sign[ ˆ f(x)] = sign[xT ˆ β + ˆ β0].

Huimin Peng | NCSU — Department of Statistics 4/22

slide-5
SLIDE 5

Reference: Elements of statistical learning chapter 12

Nonlinear

Input features: (transformed feature vectors) h(xi) = (h1(xi), h2(xi), · · · , hM(xi)). Similarly, the classifier: ˆ G(x) = sign( ˆ f(x)) = sign

  • h(x)T ˆ

β + ˆ β0

  • .

The solution function is ˆ f(x) =

N

  • i=1

ˆ αiyiK(x, xi) + ˆ β0.

Huimin Peng | NCSU — Department of Statistics 5/22

slide-6
SLIDE 6

Reference: Elements of statistical learning chapter 12

SVM = Penalization method

If f(x) = h(x)T β + β0, then min

β0,β N

  • i=1

[1 − yif(xi)]+ + λ 2 β2. (1) This is the loss + penalty function. It provides the same solution as min

β,β0

1 2β2 + C

N

  • i=1

ξi subject to ξi ≥ 0, yi(xT

i β + β0) ≥ 1 − ξi ∀i,

Huimin Peng | NCSU — Department of Statistics 6/22

slide-7
SLIDE 7

Reference: Elements of statistical learning chapter 12

discriminant analysis

LDA: linear discriminant analysis QDA: quadratic discriminant analysis FDA: flexible discriminant analysis PDA: penalized discriminant analysis MDA: mixture discriminant analysis R package: mda classes: G = {1, 2, · · · , K}. K classes. score function: θ : G → R1.

Huimin Peng | NCSU — Department of Statistics 7/22

slide-8
SLIDE 8

Reference: Elements of statistical learning chapter 12 assign scores to the classes. training data: (gi, xi), i = 1, 2, · · · , N. optimization: min

β,θ N

  • i=1

(θ(gi) − xT

i β)2.

Huimin Peng | NCSU — Department of Statistics 8/22

slide-9
SLIDE 9

Reference: Elements of statistical learning chapter 12

FDA: flexible discriminant analysis

More generally, build L ≤ K − 1 sets of independent scorings for the class labels, θ1, θ2, · · · , θL, and L corresponds to the linear maps ηl(X) = XT βl, l = 1, 2, · · · , L. We can generalize ηl(x) = xT βl to be more flexible, nonparametric fits and add a J as a regularizer appropriate for some forms of nonparametric regression. ASR({θl, ηl}L

l=1) = 1

N

L

  • l=1

N

  • i=1

(θl(gi) − ηl(xi))2 + λJ(ηl)

  • .

fda(formula, data, weights, theta, dimension, eps, method, keep.fitted, ...)

Huimin Peng | NCSU — Department of Statistics 9/22

slide-10
SLIDE 10

Reference: Elements of statistical learning chapter 12

Huimin Peng | NCSU — Department of Statistics 10/22

slide-11
SLIDE 11

Reference: Elements of statistical learning chapter 12 data(iris) irisfit <- fda(Species ~ ., data = iris) confusion(irisfit, iris) confusion(predict(irisfit, iris), iris$Species) plot(irisfit) coef(irisfit) posteriors <- predict(irisfit, type = "post") confusion(softmax(posteriors), iris[, "Species"]) marsfit <- fda(Species ~ ., data = iris, method = mars) marsfit2 <- update(marsfit, degree = 2) #include interactions up to 2nd degree marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2]) #start from the fitted coef’s in marsfit

Huimin Peng | NCSU — Department of Statistics 11/22

slide-12
SLIDE 12

Reference: Elements of statistical learning chapter 12 > coef(irisfit) [,1] [,2] Intercept

  • 2.1264786 -6.72910343

Sepal.Length -0.8377979 0.02434685 Sepal.Width

  • 1.5500519

2.18649663 Petal.Length 2.2235596 -0.94138258 Petal.Width 2.8389936 2.86801283

Huimin Peng | NCSU — Department of Statistics 12/22

slide-13
SLIDE 13

Reference: Elements of statistical learning chapter 12

Figure 1: plot(irisfit)

Huimin Peng | NCSU — Department of Statistics 13/22

slide-14
SLIDE 14

Reference: Elements of statistical learning chapter 12

Figure 2: plot(marsfit)

Huimin Peng | NCSU — Department of Statistics 14/22

slide-15
SLIDE 15

Reference: Elements of statistical learning chapter 12

Figure 3: plot(marsfit1)

Huimin Peng | NCSU — Department of Statistics 15/22

slide-16
SLIDE 16

Reference: Elements of statistical learning chapter 12

Figure 4: plot(marsfit2)

Huimin Peng | NCSU — Department of Statistics 16/22

slide-17
SLIDE 17

Reference: Elements of statistical learning chapter 12

penalized discriminant analysis

Quadratic penalty on the coefficients: ASR({θl, βl}L

l=1) = 1

N

L

  • l=1

N

  • i=1

(θl(gi) − hT (xi)βl)2 + λβT

l Ωβl

  • .

The choice of Ω depends on the problem setting. ηl(x) = h(x)T βl. gen.ridge(x, y, weights, lambda=1, omega, df, ...)

Huimin Peng | NCSU — Department of Statistics 17/22

slide-18
SLIDE 18

Reference: Elements of statistical learning chapter 12

mixture discriminant analysis

A Gaussian mixture model for the kth class has the density P(X|G = k) =

Rk

  • r=1

πkrφ(X; µkr, Σ), (2) where Rk

r=1 πkr = 1, Rk is the number of points in class k. Incorporating the

class prior probabilities Πk: P(G = k|X = x) = Rk

r=1 πkrφ(X; µkr, Σ)Πk

K

l=1

Rl

r=1 πlrφ(X; µlr, Σ)Πl

. mda(formula, data, subclasses, sub.df, tot.df, dimension, eps, iter, weights, method, keep.fitted, trace, ...)

Huimin Peng | NCSU — Department of Statistics 18/22

slide-19
SLIDE 19

Reference: Elements of statistical learning chapter 12

Huimin Peng | NCSU — Department of Statistics 19/22

slide-20
SLIDE 20

Reference: Elements of statistical learning chapter 12 data(iris) irisfit <- mda(Species ~ ., data = iris) mfit=mda(Species~.,data=iris,subclass=2) coef(mfit) > coef(mfit) [,1] [,2] [,3] [,4] Intercept 6.8563935 -15.1565801 -1.454555 -2.535648 Sepal.Length 0.5545477 1.3506122 1.016966 2.945456 Sepal.Width 1.5867703 2.4658435 -1.345301 -2.562105 Petal.Length -3.2435199 0.3621319 1.341652 -2.921295 Petal.Width

  • 2.3003933
  • 1.3635028 -4.516518

3.448416

Huimin Peng | NCSU — Department of Statistics 20/22

slide-21
SLIDE 21

Reference: Elements of statistical learning chapter 12

Figure 5: plot(mfit)

Huimin Peng | NCSU — Department of Statistics 21/22

slide-22
SLIDE 22

Questions?