svm flexible discriminant analysis
play

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 - PowerPoint PPT Presentation

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM = Penalization method discriminant analysis FDA: flexible discriminant analysis penalized discriminant analysis mixture discriminant analysis


  1. SVM-flexible discriminant analysis Huimin Peng November 20, 2014

  2. Outline SVM Nonlinear SVM = Penalization method discriminant analysis FDA: flexible discriminant analysis penalized discriminant analysis mixture discriminant analysis

  3. Reference: Elements of statistical learning chapter 12 SVM Define a hyperplane that separates the observations: { x : f ( x ) = x T β + β 0 = 0 } . The optimization problem is N 1 2 � β � 2 + C � min ξ i β,β 0 i =1 subject to ξ i ≥ 0 , y i ( x T i β + β 0 ) ≥ 1 − ξ i ∀ i, where � ξ i ≤ C and ξ i is the proportion of wrong predictions. C is the tunning parameter. Large C will reduce positive ξ i and lead to a wiggly boundary. Small C will lead to a smoother boundary. Huimin Peng | NCSU — Department of Statistics 3/22

  4. Reference: Elements of statistical learning chapter 12 The solution is N ˆ � β = α i y i x i . ˆ i =1 The decision function is f ( x )] = sign [ x T ˆ G ( x ) = sign [ ˆ ˆ β + ˆ β 0 ] . Huimin Peng | NCSU — Department of Statistics 4/22

  5. Reference: Elements of statistical learning chapter 12 Nonlinear Input features: (transformed feature vectors) h ( x i ) = ( h 1 ( x i ) , h 2 ( x i ) , · · · , h M ( x i )) . Similarly, the classifier: h ( x ) T ˆ � � G ( x ) = sign ( ˆ ˆ β + ˆ f ( x )) = sign β 0 . The solution function is N ˆ α i y i K ( x, x i ) + ˆ � f ( x ) = ˆ β 0 . i =1 Huimin Peng | NCSU — Department of Statistics 5/22

  6. Reference: Elements of statistical learning chapter 12 SVM = Penalization method If f ( x ) = h ( x ) T β + β 0 , then N [1 − y i f ( x i )] + + λ � 2 � β � 2 . min (1) β 0 ,β i =1 This is the loss + penalty function. It provides the same solution as N 1 2 � β � 2 + C � min ξ i β,β 0 i =1 subject to ξ i ≥ 0 , y i ( x T i β + β 0 ) ≥ 1 − ξ i ∀ i, Huimin Peng | NCSU — Department of Statistics 6/22

  7. Reference: Elements of statistical learning chapter 12 discriminant analysis LDA: linear discriminant analysis QDA: quadratic discriminant analysis FDA: flexible discriminant analysis PDA: penalized discriminant analysis MDA: mixture discriminant analysis R package: mda classes: G = { 1 , 2 , · · · , K } . K classes. score function: θ : G �→ R 1 . Huimin Peng | NCSU — Department of Statistics 7/22

  8. Reference: Elements of statistical learning chapter 12 assign scores to the classes. training data: ( g i , x i ) , i = 1 , 2 , · · · , N . optimization: N � ( θ ( g i ) − x T i β ) 2 . min β,θ i =1 Huimin Peng | NCSU — Department of Statistics 8/22

  9. Reference: Elements of statistical learning chapter 12 FDA: flexible discriminant analysis More generally, build L ≤ K − 1 sets of independent scorings for the class labels, θ 1 , θ 2 , · · · , θ L , and L corresponds to the linear maps η l ( X ) = X T β l , l = 1 , 2 , · · · , L. We can generalize η l ( x ) = x T β l to be more flexible, nonparametric fits and add a J as a regularizer appropriate for some forms of nonparametric regression. � N L � l =1 ) = 1 ( θ l ( g i ) − η l ( x i )) 2 + λJ ( η l ) � � ASR ( { θ l , η l } L . N l =1 i =1 fda(formula, data, weights, theta, dimension, eps, method, keep.fitted, ...) Huimin Peng | NCSU — Department of Statistics 9/22

  10. Reference: Elements of statistical learning chapter 12 Huimin Peng | NCSU — Department of Statistics 10/22

  11. Reference: Elements of statistical learning chapter 12 data(iris) irisfit <- fda(Species ~ ., data = iris) confusion(irisfit, iris) confusion(predict(irisfit, iris), iris$Species) plot(irisfit) coef(irisfit) posteriors <- predict(irisfit, type = "post") confusion(softmax(posteriors), iris[, "Species"]) marsfit <- fda(Species ~ ., data = iris, method = mars) marsfit2 <- update(marsfit, degree = 2) #include interactions up to 2nd degree marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2]) #start from the fitted coef’s in marsfit Huimin Peng | NCSU — Department of Statistics 11/22

  12. Reference: Elements of statistical learning chapter 12 > coef(irisfit) [,1] [,2] Intercept -2.1264786 -6.72910343 Sepal.Length -0.8377979 0.02434685 Sepal.Width -1.5500519 2.18649663 Petal.Length 2.2235596 -0.94138258 Petal.Width 2.8389936 2.86801283 Huimin Peng | NCSU — Department of Statistics 12/22

  13. Reference: Elements of statistical learning chapter 12 Figure 1: plot(irisfit) Huimin Peng | NCSU — Department of Statistics 13/22

  14. Reference: Elements of statistical learning chapter 12 Figure 2: plot(marsfit) Huimin Peng | NCSU — Department of Statistics 14/22

  15. Reference: Elements of statistical learning chapter 12 Figure 3: plot(marsfit1) Huimin Peng | NCSU — Department of Statistics 15/22

  16. Reference: Elements of statistical learning chapter 12 Figure 4: plot(marsfit2) Huimin Peng | NCSU — Department of Statistics 16/22

  17. Reference: Elements of statistical learning chapter 12 penalized discriminant analysis Quadratic penalty on the coefficients: � N L � l =1 ) = 1 ( θ l ( g i ) − h T ( x i ) β l ) 2 + λβ T � � ASR ( { θ l , β l } L l Ω β l . N l =1 i =1 The choice of Ω depends on the problem setting. η l ( x ) = h ( x ) T β l . gen.ridge(x, y, weights, lambda=1, omega, df, ...) Huimin Peng | NCSU — Department of Statistics 17/22

  18. Reference: Elements of statistical learning chapter 12 mixture discriminant analysis A Gaussian mixture model for the kth class has the density R k � P ( X | G = k ) = π kr φ ( X ; µ kr , Σ) , (2) r =1 where � R k r =1 π kr = 1 , R k is the number of points in class k. Incorporating the class prior probabilities Π k : � R k r =1 π kr φ ( X ; µ kr , Σ)Π k P ( G = k | X = x ) = . � K � R l r =1 π lr φ ( X ; µ lr , Σ)Π l l =1 mda(formula, data, subclasses, sub.df, tot.df, dimension, eps, iter, weights, method, keep.fitted, trace, ...) Huimin Peng | NCSU — Department of Statistics 18/22

  19. Reference: Elements of statistical learning chapter 12 Huimin Peng | NCSU — Department of Statistics 19/22

  20. Reference: Elements of statistical learning chapter 12 data(iris) irisfit <- mda(Species ~ ., data = iris) mfit=mda(Species~.,data=iris,subclass=2) coef(mfit) > coef(mfit) [,1] [,2] [,3] [,4] Intercept 6.8563935 -15.1565801 -1.454555 -2.535648 Sepal.Length 0.5545477 1.3506122 1.016966 2.945456 Sepal.Width 1.5867703 2.4658435 -1.345301 -2.562105 Petal.Length -3.2435199 0.3621319 1.341652 -2.921295 Petal.Width -2.3003933 -1.3635028 -4.516518 3.448416 Huimin Peng | NCSU — Department of Statistics 20/22

  21. Reference: Elements of statistical learning chapter 12 Figure 5: plot(mfit) Huimin Peng | NCSU — Department of Statistics 21/22

  22. Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend