learning from data lecture 23 svm s maximizing the margin
play

Learning From Data Lecture 23 SVMs: Maximizing the Margin A Better - PowerPoint PPT Presentation

Learning From Data Lecture 23 SVMs: Maximizing the Margin A Better Hyperplane Maximizing the Margin Link to Regularization M. Magdon-Ismail CSCI 4100/6100 recap: Linear Models, RBFs, Neural Networks Linear Model with Nonlinear Transform


  1. Learning From Data Lecture 23 SVM’s: Maximizing the Margin A Better Hyperplane Maximizing the Margin Link to Regularization M. Magdon-Ismail CSCI 4100/6100

  2. recap: Linear Models, RBFs, Neural Networks Linear Model with Nonlinear Transform Neural Network k -RBF-Network   � � � � ˜ d m k � � � h ( x ) = θ  w 0 + w j Φ j ( x ) h ( x ) = θ w 0 + w j θ ( v j t x ) h ( x ) = θ w 0 + w j φ ( | | x − µ j | | )  j =1 j =1 j =1 gradient descent k-means Neural Network: generalization of linear model by adding layers. Support Vector Machine: more ‘robust’ linear model M Maximizing the Margin : 2 /19 � A c L Creator: Malik Magdon-Ismail Which separator to pick? − →

  3. Which Separator Do You Pick? Being robust to noise (measurement error) is good (remember regularization). M Maximizing the Margin : 3 /19 � A c L Creator: Malik Magdon-Ismail Robustness to noise − →

  4. Robustness to Noisy Data Being robust to noise (measurement error) is good (remember regularization). M Maximizing the Margin : 4 /19 � A c L Creator: Malik Magdon-Ismail Thicker cushion means more robust − →

  5. Thicker Cushion Means More Robustness We call such hyperplanes fat M Maximizing the Margin : 5 /19 � A c L Creator: Malik Magdon-Ismail Two crucial questions − →

  6. Two Crucial Questions 1. Can we efficiently find the fattest separating hyperplane? 2. Is a fatter hyperplane better than a thin one? M Maximizing the Margin : 6 /19 � A c L Creator: Malik Magdon-Ismail Pulling out the bias − →

  7. Pulling Out the Bias Before Now x ∈ { 1 } × R d ; w ∈ R d +1 x ∈ R d ; b ∈ R , w ∈ R d     bias b 1 w 0     x 1 w 1 x 1 w 1     x =  ; w =  .  .   .  . . . . .  ; .  . . . . . x = w =     x d w d x d w d signal = w t x signal = w t x + b M Maximizing the Margin : 7 /19 � A c L Creator: Malik Magdon-Ismail Separating the data − →

  8. Separating The Data Hyperplane h = ( b, w ) h separates the data means: w t x n + b > 0 y n ( w t x n + b ) > 0 By rescaling the weights and bias, n =1 ,...,N y n ( w t x n + b ) = 1 min w t x n + b < 0 (renormalize the weights so that the signal w t x + b is meaningful) M Maximizing the Margin : 8 /19 � A c L Creator: Malik Magdon-Ismail Distance to the hyperplane − →

  9. Distance to the Hyperplane w is normal to the hyperplane: w t ( x 2 − x 1 ) = w t x 2 − w t x 1 = − b + b = 0 . x (because w t x = − b on the hyperplane) w Unit normal u = w / | | w | | . dist ( x , h ) x 2 x 1 dist ( x , h ) = | u t ( x − x 1 ) | 1 | · | w t x − w t x 1 | = | | w | 1 | · | w t x + b | = | | w | M Maximizing the Margin : 9 /19 � A c L Creator: Malik Magdon-Ismail Fatness of a separating hyperplane − →

  10. Fatness of a Separating Hyperplane 1 dist ( x , h ) = | · | w t x + b | | | w | Since | w t x n + b | = | y n ( w t x n + b ) | = y n ( w t x n + b ) , Fatness = Distance to the closest point 1 dist ( x n , h ) = | · y n ( w t x n + b ) . | | w | Fatness = min n dist ( x n , h ) 1 = | · min n y n ( w t x n + b ) ← − separation condition | | w | 1 = ← − the margin γ ( h ) | | w | | M Maximizing the Margin : 10 /19 � A c L Creator: Malik Magdon-Ismail Maximizing the margin − →

  11. Maximizing the Margin 1 margin γ ( h ) = ← − bias b does not appear here | | w | | 1 minimize 2 w t w b, w subject to: n =1 ,...,N y n ( w t x n + b ) = 1 . min 1 minimize 2 w t w b, w subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. M Maximizing the Margin : 11 /19 � A c L Creator: Malik Magdon-Ismail Equivalent form − →

  12. Maximizing the Margin 1 margin γ ( h ) = ← − bias b does not appear here | | w | | 1 minimize 2 w t w b, w subject to: n =1 ,...,N y n ( w t x n + b ) = 1 . min 1 minimize 2 w t w b, w subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. M Maximizing the Margin : 12 /19 � A c L Creator: Malik Magdon-Ismail Example – our toy data set − →

  13. Example – Our Toy Data Set y n ( w t x n + b ) ≥ 1     0 0 − 1 − b ≥ 1 ( i ) 2 2 − 1 − (2 w 1 + 2 w 2 + b ) ≥ 1 ( ii )     X = y =     2 0 +1 2 w 1 + b ≥ 1 ( iii )     3 0 +1 3 w 1 + b ≥ 1 ( iv ) (i) and (iii) gives w 1 ≥ 1 (ii) and (iii) gives w 2 ≤ − 1 So, 1 2 ( w 2 1 + w 2 2 ) ≥ 1 ( b = − 1 , w 1 = 1 , w 2 = − 1) Optimal Hyperplane 0 . 707 g ( x ) = sign( x 1 − x 2 − 1) 1 | = 1 √ margin: 2 ≈ 0 . 707 . 0 | w ∗ | | = 1 − x 2 − For data points (i), (ii) and (iii) y n ( w ∗ t x n + b ∗ ) = 1 x 1 ↑ Support Vectors M Maximizing the Margin : 13 /19 � A c L Creator: Malik Magdon-Ismail Quadratic programming − →

  14. Quadratic Programming 1 minimize 2 u t Q u + p t u u ∈ R q subject to: A u ≥ c u ∗ ← QP (Q , p , A , c ) (Q = 0 is linear programming) M Maximizing the Margin : 14 /19 � A c L Creator: Malik Magdon-Ismail Maximum margin hyperplane is QP − →

  15. Maximum Margin Hyperplane is QP 1 minimize 2 w t w 1 minimize 2 u t Q u + c t u b, w u ∈ R q subject to: A u ≥ a subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. � � b ∈ R d +1 u = w w t � � � � � � � � � 1 0 b 0 0 0 t 0 t 0 t � b 2 w t w = d = u t d u = ⇒ Q = d , p = 0 d +1 I d I d I d 0 d w t 0 d 0 d         y 1 y 1 x t 1 y 1 y 1 x t 1 1 1 . . . . . . � y n �  =  , c =  u ≥ . . . . . . y n ( w t x n + b ) ≥ 1 ≡ y n x t u ≥ 1 = ⇒ ⇒ A =      . . . . . . n y N y N x t 1 y N y N x t 1 N N M Maximizing the Margin : 15 /19 � A c L Creator: Malik Magdon-Ismail Back to our example − →

  16. Back To Our Example Exercise: y n ( w t x n + b ) ≥ 1     0 0 − 1 − b ≥ 1 ( i ) 2 2 − 1 − (2 w 1 + 2 w 2 + b ) ≥ 1 ( ii )     X = y =     2 0 +1 2 w 1 + b ≥ 1 ( iii )     3 0 +1 3 w 1 + b ≥ 1 ( iv ) Show that     − 1 0 0 1     0 0 0 0 − 1 − 2 − 2 1     Q = 0 1 0 p = 0 A = c =         1 2 0 1     0 0 1 0 1 3 0 1 Use your QP-solver to give ( b ∗ , w ∗ 1 , w ∗ 2 ) = ( − 1 , 1 , − 1) M Maximizing the Margin : 16 /19 � A c L Creator: Malik Magdon-Ismail Primal QP algorithm − →

  17. Primal QP algorithm for linear-SVM 1: Let p = 0 d +1 be the ( d + 1)-vector of zeros and c = 1 N the N -vector of ones. Construct matrices Q and A, where   � 0 y 1 — y 1 x t 1 — � 0 t . . . . d A = . . , Q =   0 d I d y N — y N x t N — � �� � signed data matrix � b ∗ � = u ∗ ← QP (Q , p , A , c ) . 2: Return w ∗ 3: The final hypothesis is g ( x ) = sign( w ∗ t x + b ∗ ). M Maximizing the Margin : 17 /19 � A c L Creator: Malik Magdon-Ismail Example: SVM vs PLA − →

  18. Example: SVM vs PLA E out (SVM) 0 0 . 02 0 . 04 0 . 06 0 . 08 E out PLA depends on the ordering of data (e.g. random) M Maximizing the Margin : 18 /19 � A c L Creator: Malik Magdon-Ismail Link to regularization

  19. Link to Regularization minimize E in ( w ) w subject to: w t w ≤ C. optimal hyperplane regularization minimize E in w t w subject to E in = 0 w t w ≤ C M Maximizing the Margin : 19 /19 � A c L Creator: Malik Magdon-Ismail

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend