lecture 26 support vector classifjcation unsupervised
play

Lecture 26: Support Vector Classifjcation, Unsupervised Learning - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Lecture 26: Support Vector Classifjcation, Unsupervised Learning Instructor: Prof. Ganesh Ramakrishnan October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . . 1 / 28


  1. . . . . . . . . . . . . . . . . . Lecture 26: Support Vector Classifjcation, Unsupervised Learning Instructor: Prof. Ganesh Ramakrishnan October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . . 1 / 28

  2. . . . . . . . . . . . . . . . . . . Support Vector Classifjcation October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . 2 / 28

  3. . . . . . . . . . . . . . . . Perceptron does not fjnd the best seperating hyperplane, it fjnds any seperating hyperplane. In case the initial w does not classify all the examples, the seperating hyperplane The seperating hyperplane does not provide enough breathing space – this is what SVMs address and we already saw that for regression! We now quickly do the same for classifjcation October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . 3 / 28 corresponding to the fjnal w ∗ will often pass through an example.

  4. . . . . . . . . . . . . . . . . Perceptron does not fjnd the best seperating hyperplane, it fjnds any seperating hyperplane. In case the initial w does not classify all the examples, the seperating hyperplane The seperating hyperplane does not provide enough breathing space – this is what SVMs address and we already saw that for regression! October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . . . 3 / 28 corresponding to the fjnal w ∗ will often pass through an example. ▶ We now quickly do the same for classifjcation

  5. . . . . . . . . . . . . . . . . . Support Vector Classifjcation: Separable Case R m There is large margin to seperate the +ve and -ve examples October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . . 4 / 28 w ⊤ φ ( x ) + b ≥ +1 for y = +1 w ⊤ φ ( x ) + b ≤ − 1 for y = − 1 w , φ ∈ I

  6. x i i ( for y i x i i ( for y i Multiplying y i on both sides, we get: y i w x i fjed point is from the seperating hyperplane): When the examples are not linearly seperable, Support Vector Classifjcation: Non-separable Case . . . . . . . w ) b . w b ) b i , i n October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 28 we need to consider the slackness ξ i (always +ve) of each example x ( i ) (how far a misclassi-

  7. . . . . . . . . . . . . . . . . . Support Vector Classifjcation: Non-separable Case When the examples are not linearly seperable, fjed point is from the seperating hyperplane): October 27, 2016 . . . . . . . . . . . . . . . . . . 5 / 28 . . . . . we need to consider the slackness ξ i (always +ve) of each example x ( i ) (how far a misclassi- w ⊤ φ ( x ( i ) ) + b ≥ +1 − ξ i ( for y ( i ) = +1 ) w ⊤ φ ( x ( i ) ) + b ≤ − 1 + ξ i ( for y ( i ) = − 1 ) Multiplying y ( i ) on both sides, we get: y ( i ) ( w ⊤ φ ( x ( i ) ) + b ) ≥ 1 − ξ i , ∀ i = 1 , . . . , n

  8. . . . . . . . . . . . . . . . . . Maximize the margin Recall that w is perpendicular to the separating surface concerned with the direction of w and not its magnitude October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . . 6 / 28 We maximize the margin ( φ ( x + ) − φ ( x − )) ⊤ [ w ∥ w ∥ ] Here, x + and x − lie on boundaries of the margin. We project the vectors φ ( x + ) and φ ( x − ) on w , and normalize by w as we are only

  9. . . . . . . . . . . . . . . . Simplifying the margin expression Adding 2 to 1 , w x x Thus, the margin expression to maximize is: w October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . 7 / 28 Maximize the margin ( φ ( x + ) − φ ( x − )) ⊤ [ w ∥ w ∥ ] At x + : y + = 1 , ξ + = 0 hence, ( w ⊤ φ ( x + ) + b ) = 1 — 1 At x − : y − = 1 , ξ − = 0 hence, − ( w ⊤ φ ( x − ) + b ) = 1 — 2

  10. . . . . . . . . . . . . . . . . . Simplifying the margin expression Adding 2 to 1 , Thus, the margin expression to maximize is: October 27, 2016 . . . . . . . . . . . . . . . . . . 7 / 28 . . . . . Maximize the margin ( φ ( x + ) − φ ( x − )) ⊤ [ w ∥ w ∥ ] At x + : y + = 1 , ξ + = 0 hence, ( w ⊤ φ ( x + ) + b ) = 1 — 1 At x − : y − = 1 , ξ − = 0 hence, − ( w ⊤ φ ( x − ) + b ) = 1 — 2 w ⊤ ( φ ( x + ) − φ ( x − )) = 2 2 ∥ w ∥

  11. . . . . . . . . . . . . . . . Formulating the objective Thus, with arbitrarily large values of i , the constraints become easily satisfjable for any w , which defeats the purpose. Hence, we also want to minimize the i ’s. E.g. , minimize i October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . 8 / 28 . . . Problem at hand: Find w ∗ , b ∗ that maximize the margin. ( w ∗ , b ∗ ) = arg max w , b 2 ∥ w ∥ s.t. y ( i ) ( w ⊤ φ ( x ( i ) ) + b ) ≥ 1 − ξ i and ξ i ≥ 0 , ∀ i = 1 , . . . , n However, as ξ i → ∞ , 1 − ξ i → −∞

  12. . . . . . . . . . . . . . . . . . Formulating the objective w , which defeats the purpose. October 27, 2016 . . . . . . . . . . . . . . . . . . 8 / 28 . . . . . Problem at hand: Find w ∗ , b ∗ that maximize the margin. ( w ∗ , b ∗ ) = arg max w , b 2 ∥ w ∥ s.t. y ( i ) ( w ⊤ φ ( x ( i ) ) + b ) ≥ 1 − ξ i and ξ i ≥ 0 , ∀ i = 1 , . . . , n However, as ξ i → ∞ , 1 − ξ i → −∞ Thus, with arbitrarily large values of ξ i , the constraints become easily satisfjable for any Hence, we also want to minimize the ξ i ’s. E.g. , minimize ∑ ξ i

  13. . . . . . . . . . . . . . . . . . Objective n Instead of maximizing October 27, 2016 . . . . . . . . . . . . . 9 / 28 . . . . . . . . . . 1 2 ∥ w ∥ 2 + C ∑ ( w ∗ , b ∗ , ξ ∗ i ) = arg min ξ i w , b ,ξ i i =1 s.t. y ( i ) ( w ⊤ φ ( x ( i ) ) + b ) ≥ 1 − ξ i and ξ i ≥ 0 , ∀ i = 1 , . . . , n 2 ∥ w ∥ 2 ∥ w ∥ , minimize 1 2 2 ∥ w ∥ 2 is monotonically decreasing with respect to ( 1 2 ∥ w ∥ ) C determines the trade-ofg between the error ∑ ξ i and the margin 2 ∥ w ∥

  14. . . . . . . . . . . . . . . . . . Support Vector Machines Dual Objective October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . . 10 / 28

  15. . . . . . . . . . . . . . . . . . 2 Approaches to Showing Kernelized Form for Dual (Generalized from derivation of Kernel Logistic Regression, Tutorial 7, Problem 3) See http://qwone.com/~jason/writing/kernel.pdf for list of kernelized objectives October 27, 2016 . . . . . . . . . . . . . . . . . . . . . . . 11 / 28 1 Approach 1: The Reproducing Kernel Hilbert Space and Representer theorem 2 Approach 2: Derive using First principles (provided for completeness in Tutorial 9)

  16. . . . . . . . . . . . . . . Approach 1: Special case of Representer Theorem & Reproducing Kernel Hilbert Space (RKHS) http://qwone.com/~jason/writing/kernel.pdf for list of kernelized objectives m E f Reproducing (RKHS) Kernel 1 Proof provided in optional slide deck at the end October 27, 2016 . . . . . . . . . . . . . . 12 / 28 . . . . . . . . . . . . 1 Generalized from derivation of Kernel Logistic Regression, Tutorial 7, Problem 3. See { x (1) , x (2) , . . . , x ( m ) } 2 Let X be the space of examples such that D = ⊆ X and for any x ∈ X , K ( ., x ) : X → ℜ 3 (Optional) 1 The solution f ∗ ∈ H (Hilbert space) to the following problem ( ) f ∗ = argmin ( x ( i ) ) ∑ , y ( i ) + Ω( ∥ f ∥ K ) f ∈H i =1 can be always written as f ∗ ( x ) = ∑ m i =1 α i K ( x , x ( i ) ) , provided Ω( ∥ f ∥ K ) is a monotonically increasing function of ∥ f ∥ K . H is the Hilbert space and K ( ., x ) : X → ℜ is called the

  17. . . . . . . . . . . . . . . Approach 1: Special case of Representer Theorem & Reproducing Kernel . m E f m E f October 27, 2016 . Hilbert Space (RKHS) . . . . . . . . . . . . 13 / 28 . . . . . . . . . . . . 1 (Optional) The solution f ∗ ∈ H (Hilbert space) to the following problem ( ) f ∗ = argmin ( x ( i ) ) ∑ , y ( i ) + Ω( ∥ f ∥ K ) f ∈H i =1 can be always written as f ∗ ( x ) = ∑ m i =1 α i K ( x , x ( i ) ) , provided Ω( ∥ f ∥ K ) is a .... 2 More specifjcally, if f ( x ) = w T φ ( x ) + b and K ( x ′ , x ) = φ T ( x ) φ ( x ′ ) then the solution w ∗ ∈ ℜ n to the following problem ( ) ( x ( i ) ) ∑ ( w ∗ , b ∗ ) = argmin , y ( i ) + Ω( ∥ w ∥ 2 ) w , b i =1 can be always written as φ T ( x ) w ∗ + b = ∑ m i =1 α i K ( x , x ( i ) ) , provided Ω( ∥ w ∥ 2 ) is a monotonically increasing function of ∥ w ∥ 2 . ℜ n +1 is the Hilbert space and K ( ., x ) : X → ℜ is the Reproducing (RKHS) Kernel

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend