learning from data lecture 25 the kernel trick
play

Learning From Data Lecture 25 The Kernel Trick Learning with only - PowerPoint PPT Presentation

Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. Magdon-Ismail CSCI 4100/6100 recap: Large Margin is Better Controling Overfitting Non-Separable Data 0.08 random hyperplane 2 w t w + C N E


  1. Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. Magdon-Ismail CSCI 4100/6100

  2. recap: Large Margin is Better Controling Overfitting Non-Separable Data 0.08 random hyperplane 2 w t w + C � N E out 1 minimize n =1 ξ n 0.06 b, w , ξ subject to: y n ( w t x n + b ) ≥ 1 − ξ n ξ n ≥ 0 for n = 1 , . . . , N SVM 0.04 0 0 . 25 0 . 5 0 . 75 1 γ (random hyperplane) /γ (SVM) � R 2 � Theorem. d vc ( γ ) ≤ + 1 γ 2 Φ 2 + SVM Φ 3 + SVM Φ 3 + pseudoinverse algorithm E cv ≤ # support vectors N Complex hypothesis that does not overfit because it is ‘simple’, controlled by only a few support vectors. M Kernel Trick : 2 /18 � A c L Creator: Malik Magdon-Ismail Mechanics of the nonlinear transform − →

  3. Recall: Mechanics of the Nonlinear Transform ˜ X -space is R d d Z -space is R       1 1 1 Φ − → x 1 Φ 1 ( x ) z 1       x = z = Φ ( x ) =  =  .   .   .  . . .  .   .  .  x d Φ ˜ d ( x ) z ˜ d x 1 , x 2 , . . . , x N z 1 , z 2 , . . . , z N 1. Original data 2. Transform the data y 1 , y 2 , . . . , y N y 1 , y 2 , . . . , y N x n ∈ X z n = Φ( x n ) ∈ Z   w 0 ↓ w 1   no weights w = ˜ .   .  .  w ˜ d d vc = d + 1 d vc = d + 1 ‘ Φ − 1 ’ g ( x ) = sign( ˜ w t Φ ( x )) ← − 4. Classify in X -space 3. Separate data in Z -space Have to transform the data to the Z -space. g ( x ) = ˜ g (Φ( x )) = sign( ˜ w t Φ( x )) ˜ g ( z ) = sign( ˜ w t z ) M Kernel Trick : 3 /18 � A c L Creator: Malik Magdon-Ismail Topic for this lecture − →

  4. This Lecture How to use nonlinear transforms without physically transforming data to Z -space. M Kernel Trick : 4 /18 � A c L Creator: Malik Magdon-Ismail Primal versus dual − →

  5. Primal Versus Dual Primal Dual N N � � 1 minimize α n α m y n y m ( x t n x m ) − α n 1 2 minimize 2 w t w α n,m =1 n =1 b, w subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N N � subject to: α n y n = 0 n =1 α n ≥ 0 for n = 1 , . . ., N N � w ∗ = α ∗ n y n x n support vectors n =1 ւ b ∗ = y s − w t x s ( α ∗ s > 0) g ( x ) = sign( w ∗ t x + b ∗ ) g ( x ) = sign( w t x + b ) � N � � α ∗ = sign n y n x t n ( x − x s ) + y s n =1 d + 1 optimization variables w , b N optimization variables α M Kernel Trick : 5 /18 � A c L Creator: Malik Magdon-Ismail Vector-matrix form − →

  6. Primal Versus Dual - Matrix Vector Form Primal Dual 1 minimize 2 α t G α − 1 t α (G nm = y n y m x t n x m ) 1 minimize 2 w t w α b, w subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N subject to: y t α = 0 α ≥ 0 N � w ∗ = α ∗ n y n x n support vectors n =1 ւ b ∗ = y s − w t x s ( α ∗ s > 0) g ( x ) = sign( w ∗ t x + b ∗ ) � N g ( x ) = sign( w t x + b ) � � α ∗ = sign n y n x t n ( x − x s ) + y s n =1 d + 1 optimization variables w , b N optimization variables α M Kernel Trick : 6 /18 � A c L Creator: Malik Magdon-Ismail The Lagrangian − →

  7. Deriving the Dual: The Lagrangian N L = 1 � 2 w t w + α n · (1 − y n ( w t x n + b )) n =1 ↑ ↑ lagrange the constraints multipliers minimize w.r.t. b, w ← unconstrained maximize w.r.t. α ≥ 0 Intuition • 1 − y n ( w t x n + b ) > 0 = ⇒ α n → ∞ gives L → ∞ • Choose ( b, w ) to min L , so 1 − y n ( w t x n + b ) ≤ 0 • 1 − y n ( w t x n + b ) < 0 = ⇒ α n = 0 (max L w.r.t. α n ) ↑ non support vectors Conclusion Formally: use KKT conditions to transform the primal. At the optimum, α n ( y n ( w t x n + b ) − 1) = 0, so L = 1 2 w t w is minimized and the constraints are satisfied 1 − y n ( w t x n + b ) ≤ 0 M Kernel Trick : 7 /18 � A c L Creator: Malik Magdon-Ismail − →

  8. Unconstrained Minimization w.r.t. ( b, w ) N L = 1 � 2 w t w − α n · ( y n ( w t x n + b ) − 1) n =1 Set ∂ L ∂b = 0: N N ∂ L � � ∂b = α n y n = ⇒ α n y n = 0 n =1 n =1 Set ∂ L ∂ w = 0: N N ∂ L � � ∂ w = w − α n y n x n = ⇒ w = α n y n x n n =1 n =1 1 Substitute into L to maximize w.r.t. α ≥ 0 minimize 2 α t G α − 1 t α (G nm = y n y m x t n x m ) α N N N 1 � � � y t α = 0 subject to: 2 w t w − w t L = α n y n x n − b α n y n + α n n =1 n =1 n =1 α ≥ 0 N − 1 � = 2 w t w + α n w = � N n =1 α ∗ n y n x n n =1 N N − 1 � � α s > 0 = ⇒ y s ( w t x s + b ) − 1 = 0 α n α m y n y m x t = n x m + α n 2 m,n =1 n =1 = ⇒ b = y s − w t x s M Kernel Trick : 8 /18 � A c L Creator: Malik Magdon-Ismail Example − →

  9. Example — Our Toy Data Set signed data matrix ↓         0 0 − 1 0 0 0 0 0 0 2 2 − 1 − 2 − 2 0 8 − 4 − 6         X = y = − → X s = − → G = X s X t s =         2 0 +1 2 0 0 − 4 4 6         3 0 +1 3 0 0 − 6 6 9 Quadratic Programming Dual SVM 1 1 minimize 2 u t Q u + p t z minimize 2 α t G α − 1 t α α u subject to: A u ≥ c subject to: y t α = 0 α = 1 α ≥ 0 2   1  2 u = α  1    x 1 − x 2 − 1 = 0 α ∗ =    2 Q = G       1     p = − 1 N   0      α = 1 y t   4 � � 2 QP (Q , p , A , c ) 1 � − − − − − − − → A = − y t α ∗ w = n y n x n =   − 1 α = 1 α = 0  I N   n =1        0   b = y 1 − w t x 1 = − 1   0 c =      non-support vectors = ⇒ α n = 0  0 N 1 1 γ = | = √ only support vectors can have α n > 0 | | w | 2 M Kernel Trick : 9 /18 � A c L Creator: Malik Magdon-Ismail Dual linear-SVM QP algorithm − →

  10. Dual QP Algorithm for Hard Margin linear-SVM 1: Input: X , y . 2: Let p = − 1 N be the N -vector of ones and c = 0 N +2 the N -vector of zeros. Construct matrices Q and A, where     — y 1 x t 1 — y t 1 minimize 2 α t G α − 1 t α . . α X s = , Q = X s X t s , A = − y t .     y t α = 0 subject to: I N × N — y N x t N — α ≥ 0 � �� � signed data matrix ↑ Some packages allow equality 3: α ∗ ← QP (Q , c , A , a ) . and bound constraints to directly solve this type of QP 4: Return � w ∗ = α ∗ n y n x n n > 0 α ∗ b ∗ = y s − w t x s ( α ∗ s > 0) 5: The final hypothesis is g ( x ) = sign( w ∗ t x + b ∗ ). M Kernel Trick : 10 /18 � A c L Creator: Malik Magdon-Ismail Primal versus dual (non-separable) − →

  11. Primal Versus Dual (Non-Separable) Primal Dual 2 w t w + C � N 1 1 minimize 2 α t G α − 1 t α minimize n =1 ξ n α b, w , ξ subject to: y t α = 0 subject to: y n ( w t x n + b ) ≥ 1 − ξ n ξ n ≥ 0 for n = 1 , . . . , N C ≥ α ≥ 0 N � w ∗ = α ∗ n y n x n n =1 b ∗ = y s − w t x s ( C > α ∗ s > 0) g ( x ) = sign( w ∗ t x + b ∗ ) g ( x ) = sign( w t x + b ) � N � � α ∗ = sign n y n x t n ( x − x s ) + y s n =1 N + d + 1 optimization variables b, w , ξ N optimization variables α M Kernel Trick : 11 /18 � A c L Creator: Malik Magdon-Ismail Inner product algorithm − →

  12. Dual SVM is an Inner Product Algorithm X -Space 1 minimize 2 α t G α − 1 t α α subject to: y t α = 0 C ≥ α ≥ 0 G nm = y n y m ( x t n x m )   C > α ∗ s > 0 � α ∗ n x ) + b ∗ g ( x ) = sign n y n ( x t  � b ∗ = y s − α ∗ n y n ( x t n x s ) α ∗ n > 0 α ∗ n > 0 Can compute z t z ′ without needing z = Φ( x ) to visit Z -space? M Kernel Trick : 12 /18 � A c L Creator: Malik Magdon-Ismail Z -space inner product algorithm − →

  13. Dual SVM is an Inner Product Algorithm Z -Space 1 minimize 2 α t G α − 1 t α α subject to: y t α = 0 C ≥ α ≥ 0 G nm = y n y m ( z t n z m )   C > α ∗ s > 0 � α ∗ n z ) + b ∗ g ( x ) = sign n y n ( z t  � b ∗ = y s − α ∗ n y n ( z t n z s ) α ∗ n > 0 α ∗ n > 0 Can we compute z t z ′ without needing z = Φ( x ) to visit Z -space? Can we compute z t z ′ efficiently − M Kernel Trick : 13 /18 � A c L Creator: Malik Magdon-Ismail →

  14. Dual SVM is an Inner Product Algorithm Z -Space 1 minimize 2 α t G α − 1 t α α subject to: y t α = 0 C ≥ α ≥ 0 G nm = y n y m ( z t n z m )   C > α ∗ s > 0 � α ∗ g ( x ) = sign n y n ( z t n z ) + b  � α ∗ b = y s − n y n ( z t n z s ) α ∗ n > 0 α ∗ n > 0 Can we compute z t z ′ without needing z = Φ( x ) to visit Z -space? M Kernel Trick : 14 /18 � A c L Creator: Malik Magdon-Ismail The Kernel − →

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend