Support Vector Machines & Kernels Lecture 6 David Sontag New - PowerPoint PPT Presentation

Support Vector Machines & Kernels Lecture 6 David Sontag New York University Slides adapted from Luke Zettlemoyer and Carlos Guestrin, and Vibhav Gogate

Dual SVM derivation (1) – the linearly separable case Original optimization problem: One Lagrange multiplier Rewrite per example constraints Lagrangian: Our goal now is to solve:

Dual SVM derivation (2) – the linearly separable case (Primal) Swap min and max (Dual) Slater’s condition from convex optimization guarantees that these two optimization problems are equivalent!

Dual SVM derivation (3) – the linearly separable case (Dual) ⇤ ⌅ Can solve for optimal w , b as function of α : ∂ L ⌥  ∂ w = w − α j y j x j j  Substituting these values back in (and simplifying), we obtain: (Dual) Sums over all training examples scalars dot product

Dual SVM derivation (3) – the linearly separable case (Dual) ⇤ ⌅ Can solve for optimal w , b as function of α : ∂ L ⌥  ∂ w = w − α j y j x j j  Substituting these values back in (and simplifying), we obtain: (Dual) So, in dual formulation we will solve for α directly! • w and b are computed from α (if needed)

Dual SVM derivation (3) – the linearly separable case Lagrangian: α j > 0 for some j implies constraint is tight. We use this to obtain b : (1) (2) (3)

Classification rule using dual solution Using dual solution dot product of feature vectors of new example with support vectors

Dual for the non-separable case Primal: Solve for w,b, α : Dual: What changed? • Added upper bound of C on α i ! • Intuitive explanation: • Without slack, α i  ∞ when constraints are violated (points misclassified) • Upper bound of C limits the α i , so misclassifications are allowed

Support vectors • Complementary slackness conditions: • Support vectors : points x j such that (includes all j such that , but also additional points where ) ↵ ∗ j = 0 ∧ y j ( ~ w ∗ · ~ x j + b ) ≤ 1 • Note: the SVM dual solution may not be unique!

Dual SVM interpretation: Sparsity w . x + b = +1 w . x + b = 0 w . x + b = -1 Final solution tends to be sparse • α j =0 for most j • don’t need to store these points to compute w or make predictions Non-support Vectors: • α j =0 Support Vectors: • moving them will not • α j ≥ 0 change w

SVM with kernels • Never compute features explicitly!!! – Compute dot products in closed form Predict with: • O(n 2 ) time in size of dataset to compute objective – much work on speeding up

Quadratic kernel [Tommi Jaakkola]

Quadratic kernel Feature mapping given by: [Cynthia Rudin]

Common kernels • Polynomials of degree exactly d • Polynomials of degree up to d • Gaussian kernels Euclidean distance, squared • And many others: very active area of research! (e.g., structured kernels that use dynamic programming to evaluate, string kernels, …)

Gaussian kernel Level sets, i.e. w.x=r for some r Support vectors [Cynthia Rudin] [mblondel.org]

Kernel algebra Q: How would you prove that the “Gaussian kernel” is a valid kernel? A: Expand the Euclidean norm as follows: To see that this is a kernel, use the Taylor series expansion of the Then, apply (e) from above exponential, together with repeated application of (a), (b), and (c): The feature mapping is infinite dimensional! [Justin Domke]

Overfitting? • Huge feature space with kernels: should we worry about overfitting? – SVM objective seeks a solution with large margin • Theory says that large margin leads to good generalization (we will see this in a couple of lectures) – But everything overfits sometimes!!! – Can control by: • Setting C • Choosing a better Kernel • Varying parameters of the Kernel (width of Gaussian, etc.)

Support Vector Machines & Kernels Lecture 6 David Sontag New - PowerPoint PPT Presentation

Support Vector Machines & Kernels Lecture 6 David Sontag New York University Slides adapted from Luke Zettlemoyer and Carlos Guestrin, and Vibhav Gogate Dual SVM derivation (1) the linearly separable case Original optimization

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

Support Vector Machines & Kernels Lecture 5 David Sontag New York University Slides adapted

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

objective: minimal realization of light composite Higgs USQCD 2015 Lattice Higgs Collaboration (L

Lecture 26 Log into Linux. Copy files from /home/hwang/cs375/lecture26/. Reminder:

1.112.1 ping dig Fundamentals of TCP/IP traceroute Weight 4 whois Linux Professional

Network Layer Addressing, forwarding, routing Why do we need a Network layer? Cannot afford

Intermediate Code Generation - Part 1 Y.N. Srikant Department of Computer Science and Automation

Exploring Processing What is Processing? Easy-to-use programming environment Lets you

Cryptography and Network Chapter 6 Block Cipher Operation Security Many savages at the present

T-79.159 Cryptography and Data Security Lecture 4: 4.1 Stream ciphers 4.2 Block cipher

Sambuz

Useful Links

Newsletter

Mail Us

Support Vector Machines & Kernels Lecture 6 David Sontag New - PowerPoint PPT Presentation

Support Vector Machines & Kernels Lecture 6 David Sontag New York University Slides adapted from Luke Zettlemoyer and Carlos Guestrin, and Vibhav Gogate Dual SVM derivation (1) the linearly separable case Original optimization

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

Support Vector Machines &amp; Kernels Lecture 5 David Sontag New York University Slides adapted

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

objective: minimal realization of light composite Higgs USQCD 2015 Lattice Higgs Collaboration (L

Lecture 26 Log into Linux. Copy files from /home/hwang/cs375/lecture26/*.* Reminder:

1.112.1 ping dig Fundamentals of TCP/IP traceroute Weight 4 whois Linux Professional

Network Layer Addressing, forwarding, routing Why do we need a Network layer? Cannot afford

Intermediate Code Generation - Part 1 Y.N. Srikant Department of Computer Science and Automation

Exploring Processing What is Processing? Easy-to-use programming environment Lets you

Cryptography and Network Chapter 6 Block Cipher Operation Security Many savages at the present

T-79.159 Cryptography and Data Security Lecture 4: 4.1 Stream ciphers 4.2 Block cipher

Sambuz

Useful Links

Newsletter

Mail Us

Support Vector Machines & Kernels Lecture 5 David Sontag New York University Slides adapted

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Lecture 26 Log into Linux. Copy files from /home/hwang/cs375/lecture26/. Reminder: