Learning From Data Lecture 23 SVMs: Maximizing the Margin A Better - PowerPoint PPT Presentation

Learning From Data Lecture 23 SVM’s: Maximizing the Margin A Better Hyperplane Maximizing the Margin Link to Regularization M. Magdon-Ismail CSCI 4100/6100

recap: Linear Models, RBFs, Neural Networks Linear Model with Nonlinear Transform Neural Network k -RBF-Network   � � � � ˜ d m k � � � h ( x ) = θ  w 0 + w j Φ j ( x ) h ( x ) = θ w 0 + w j θ ( v j t x ) h ( x ) = θ w 0 + w j φ ( | | x − µ j | | )  j =1 j =1 j =1 gradient descent k-means Neural Network: generalization of linear model by adding layers. Support Vector Machine: more ‘robust’ linear model M Maximizing the Margin : 2 /19 � A c L Creator: Malik Magdon-Ismail Which separator to pick? − →

Which Separator Do You Pick? Being robust to noise (measurement error) is good (remember regularization). M Maximizing the Margin : 3 /19 � A c L Creator: Malik Magdon-Ismail Robustness to noise − →

Robustness to Noisy Data Being robust to noise (measurement error) is good (remember regularization). M Maximizing the Margin : 4 /19 � A c L Creator: Malik Magdon-Ismail Thicker cushion means more robust − →

Thicker Cushion Means More Robustness We call such hyperplanes fat M Maximizing the Margin : 5 /19 � A c L Creator: Malik Magdon-Ismail Two crucial questions − →

Two Crucial Questions 1. Can we efficiently find the fattest separating hyperplane? 2. Is a fatter hyperplane better than a thin one? M Maximizing the Margin : 6 /19 � A c L Creator: Malik Magdon-Ismail Pulling out the bias − →

Pulling Out the Bias Before Now x ∈ { 1 } × R d ; w ∈ R d +1 x ∈ R d ; b ∈ R , w ∈ R d     bias b 1 w 0     x 1 w 1 x 1 w 1     x =  ; w =  .  .   .  . . . . .  ; .  . . . . . x = w =     x d w d x d w d signal = w t x signal = w t x + b M Maximizing the Margin : 7 /19 � A c L Creator: Malik Magdon-Ismail Separating the data − →

Separating The Data Hyperplane h = ( b, w ) h separates the data means: w t x n + b > 0 y n ( w t x n + b ) > 0 By rescaling the weights and bias, n =1 ,...,N y n ( w t x n + b ) = 1 min w t x n + b < 0 (renormalize the weights so that the signal w t x + b is meaningful) M Maximizing the Margin : 8 /19 � A c L Creator: Malik Magdon-Ismail Distance to the hyperplane − →

Distance to the Hyperplane w is normal to the hyperplane: w t ( x 2 − x 1 ) = w t x 2 − w t x 1 = − b + b = 0 . x (because w t x = − b on the hyperplane) w Unit normal u = w / | | w | | . dist ( x , h ) x 2 x 1 dist ( x , h ) = | u t ( x − x 1 ) | 1 | · | w t x − w t x 1 | = | | w | 1 | · | w t x + b | = | | w | M Maximizing the Margin : 9 /19 � A c L Creator: Malik Magdon-Ismail Fatness of a separating hyperplane − →

Maximizing the Margin 1 margin γ ( h ) = ← − bias b does not appear here | | w | | 1 minimize 2 w t w b, w subject to: n =1 ,...,N y n ( w t x n + b ) = 1 . min 1 minimize 2 w t w b, w subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. M Maximizing the Margin : 11 /19 � A c L Creator: Malik Magdon-Ismail Equivalent form − →

Maximizing the Margin 1 margin γ ( h ) = ← − bias b does not appear here | | w | | 1 minimize 2 w t w b, w subject to: n =1 ,...,N y n ( w t x n + b ) = 1 . min 1 minimize 2 w t w b, w subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. M Maximizing the Margin : 12 /19 � A c L Creator: Malik Magdon-Ismail Example – our toy data set − →

Example – Our Toy Data Set y n ( w t x n + b ) ≥ 1     0 0 − 1 − b ≥ 1 ( i ) 2 2 − 1 − (2 w 1 + 2 w 2 + b ) ≥ 1 ( ii )     X = y =     2 0 +1 2 w 1 + b ≥ 1 ( iii )     3 0 +1 3 w 1 + b ≥ 1 ( iv ) (i) and (iii) gives w 1 ≥ 1 (ii) and (iii) gives w 2 ≤ − 1 So, 1 2 ( w 2 1 + w 2 2 ) ≥ 1 ( b = − 1 , w 1 = 1 , w 2 = − 1) Optimal Hyperplane 0 . 707 g ( x ) = sign( x 1 − x 2 − 1) 1 | = 1 √ margin: 2 ≈ 0 . 707 . 0 | w ∗ | | = 1 − x 2 − For data points (i), (ii) and (iii) y n ( w ∗ t x n + b ∗ ) = 1 x 1 ↑ Support Vectors M Maximizing the Margin : 13 /19 � A c L Creator: Malik Magdon-Ismail Quadratic programming − →

Quadratic Programming 1 minimize 2 u t Q u + p t u u ∈ R q subject to: A u ≥ c u ∗ ← QP (Q , p , A , c ) (Q = 0 is linear programming) M Maximizing the Margin : 14 /19 � A c L Creator: Malik Magdon-Ismail Maximum margin hyperplane is QP − →

Maximum Margin Hyperplane is QP 1 minimize 2 w t w 1 minimize 2 u t Q u + c t u b, w u ∈ R q subject to: A u ≥ a subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. � � b ∈ R d +1 u = w w t � � � � � � � � � 1 0 b 0 0 0 t 0 t 0 t � b 2 w t w = d = u t d u = ⇒ Q = d , p = 0 d +1 I d I d I d 0 d w t 0 d 0 d         y 1 y 1 x t 1 y 1 y 1 x t 1 1 1 . . . . . . � y n �  =  , c =  u ≥ . . . . . . y n ( w t x n + b ) ≥ 1 ≡ y n x t u ≥ 1 = ⇒ ⇒ A =      . . . . . . n y N y N x t 1 y N y N x t 1 N N M Maximizing the Margin : 15 /19 � A c L Creator: Malik Magdon-Ismail Back to our example − →

Back To Our Example Exercise: y n ( w t x n + b ) ≥ 1     0 0 − 1 − b ≥ 1 ( i ) 2 2 − 1 − (2 w 1 + 2 w 2 + b ) ≥ 1 ( ii )     X = y =     2 0 +1 2 w 1 + b ≥ 1 ( iii )     3 0 +1 3 w 1 + b ≥ 1 ( iv ) Show that     − 1 0 0 1     0 0 0 0 − 1 − 2 − 2 1     Q = 0 1 0 p = 0 A = c =         1 2 0 1     0 0 1 0 1 3 0 1 Use your QP-solver to give ( b ∗ , w ∗ 1 , w ∗ 2 ) = ( − 1 , 1 , − 1) M Maximizing the Margin : 16 /19 � A c L Creator: Malik Magdon-Ismail Primal QP algorithm − →

Primal QP algorithm for linear-SVM 1: Let p = 0 d +1 be the ( d + 1)-vector of zeros and c = 1 N the N -vector of ones. Construct matrices Q and A, where   � 0 y 1 — y 1 x t 1 — � 0 t . . . . d A = . . , Q =   0 d I d y N — y N x t N — � �� signed data matrix � b ∗ � = u ∗ ← QP (Q , p , A , c ) . 2: Return w ∗ 3: The final hypothesis is g ( x ) = sign( w ∗ t x + b ∗ ). M Maximizing the Margin : 17 /19 � A c L Creator: Malik Magdon-Ismail Example: SVM vs PLA − →

Example: SVM vs PLA E out (SVM) 0 0 . 02 0 . 04 0 . 06 0 . 08 E out PLA depends on the ordering of data (e.g. random) M Maximizing the Margin : 18 /19 � A c L Creator: Malik Magdon-Ismail Link to regularization

Link to Regularization minimize E in ( w ) w subject to: w t w ≤ C. optimal hyperplane regularization minimize E in w t w subject to E in = 0 w t w ≤ C M Maximizing the Margin : 19 /19 � A c L Creator: Malik Magdon-Ismail

Learning From Data Lecture 23 SVMs: Maximizing the Margin A Better - PowerPoint PPT Presentation

Learning From Data Lecture 23 SVMs: Maximizing the Margin A Better Hyperplane Maximizing the Margin Link to Regularization M. Magdon-Ismail CSCI 4100/6100 recap: Linear Models, RBFs, Neural Networks Linear Model with Nonlinear Transform

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

SVM on Intel Graphics Jesse Barnes Intel Open Source Technology Center 1 What is SVM?

Machine Learning Theory CS 446 1. SVM risk SVM risk Consider the empirical and true/population

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

Classication SVM algorithms with interval-valued training data using triangular and

Support Vector Machines (II): Non-linear SVMs LING 572 Advanced Statistical Methods for NLP

Using machine learning Learning knot methods in geometric modeling placement SVM knot placement

Fitting SVM models in Matlab mdl = fitcsvm(X,y) fit a classifier using SVM X is a

An SVM- -based Masquerade Detection based Masquerade Detection An SVM Method with Online Update

Eye-blink Detection Based on SVM Wang Xiaoxing Shanghai Jiao Tong University figure1

Public clouds and vulnerable CPUs: are we secure? FOSDEM 2020 Vitaly Kuznetsov

Hypercolumns for Object Segmentation and Fine-grained Localization Bharath Hariharan, Pablo

Hyper-local sustainable assortment planning Nupur Aggarwal, Abhishek Bansal, Kushagra Manglik,

Recurrent Pixel Embedding for Grouping Shu Kong CS, ICS, UCI Outline 1. Problem Statement --

Machine learning with mlr Dr. Shirin Elsinghorst Data Scientist DataCamp Hyperparameter Tuning

Move to High Availability with Hyper-v What is it and Why do you need it? Refers to a systems

A Sequence-based Selection Hyper-heuristic Utilising a Hidden Markov Model Ahmed Kheiri Ed

Racing in Hyperspace: Closing Hyper-Threading Side Channels on SGX with Contrived Data Races