support vector machines kernels lecture 6
play

Support Vector Machines & Kernels Lecture 6 David Sontag New - PowerPoint PPT Presentation

Support Vector Machines & Kernels Lecture 6 David Sontag New York University Slides adapted from Luke Zettlemoyer and Carlos Guestrin, and Vibhav Gogate Dual SVM derivation (1) the linearly separable case Original optimization


  1. Support Vector Machines & Kernels Lecture 6 David Sontag New York University Slides adapted from Luke Zettlemoyer and Carlos Guestrin, and Vibhav Gogate

  2. Dual SVM derivation (1) – the linearly separable case Original optimization problem: One Lagrange multiplier Rewrite per example constraints Lagrangian: Our goal now is to solve:

  3. Dual SVM derivation (2) – the linearly separable case (Primal) Swap min and max (Dual) Slater’s condition from convex optimization guarantees that these two optimization problems are equivalent!

  4. Dual SVM derivation (3) – the linearly separable case (Dual) ⇤ ⌅ Can solve for optimal w , b as function of α : ∂ L ⌥  ∂ w = w − α j y j x j j  Substituting these values back in (and simplifying), we obtain: (Dual) Sums over all training examples scalars dot product

  5. Dual SVM derivation (3) – the linearly separable case (Dual) ⇤ ⌅ Can solve for optimal w , b as function of α : ∂ L ⌥  ∂ w = w − α j y j x j j  Substituting these values back in (and simplifying), we obtain: (Dual) So, in dual formulation we will solve for α directly! • w and b are computed from α (if needed)

  6. Dual SVM derivation (3) – the linearly separable case Lagrangian: α j > 0 for some j implies constraint is tight. We use this to obtain b : (1) (2) (3)

  7. Classification rule using dual solution Using dual solution dot product of feature vectors of new example with support vectors

  8. Dual for the non-separable case Primal: Solve for w,b, α : Dual: What changed? • Added upper bound of C on α i ! • Intuitive explanation: • Without slack, α i  ∞ when constraints are violated (points misclassified) • Upper bound of C limits the α i , so misclassifications are allowed

  9. Support vectors • Complementary slackness conditions: • Support vectors : points x j such that (includes all j such that , but also additional points where ) ↵ ∗ j = 0 ∧ y j ( ~ w ∗ · ~ x j + b ) ≤ 1 • Note: the SVM dual solution may not be unique!

  10. Dual SVM interpretation: Sparsity w . x + b = +1 w . x + b = 0 w . x + b = -1 Final solution tends to be sparse • α j =0 for most j • don’t need to store these points to compute w or make predictions Non-support Vectors: • α j =0 Support Vectors: • moving them will not • α j ≥ 0 change w

  11. SVM with kernels • Never compute features explicitly!!! – Compute dot products in closed form Predict with: • O(n 2 ) time in size of dataset to compute objective – much work on speeding up

  12. Quadratic kernel [Tommi Jaakkola]

  13. Quadratic kernel Feature mapping given by: [Cynthia Rudin]

  14. Common kernels • Polynomials of degree exactly d • Polynomials of degree up to d • Gaussian kernels Euclidean distance, squared • And many others: very active area of research! (e.g., structured kernels that use dynamic programming to evaluate, string kernels, …)

  15. Gaussian kernel Level sets, i.e. w.x=r for some r Support vectors [Cynthia Rudin] [mblondel.org]

  16. Kernel algebra Q: How would you prove that the “Gaussian kernel” is a valid kernel? A: Expand the Euclidean norm as follows: To see that this is a kernel, use the Taylor series expansion of the Then, apply (e) from above exponential, together with repeated application of (a), (b), and (c): The feature mapping is infinite dimensional! [Justin Domke]

  17. Overfitting? • Huge feature space with kernels: should we worry about overfitting? – SVM objective seeks a solution with large margin • Theory says that large margin leads to good generalization (we will see this in a couple of lectures) – But everything overfits sometimes!!! – Can control by: • Setting C • Choosing a better Kernel • Varying parameters of the Kernel (width of Gaussian, etc.)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend