Support vector machines (SVMs) Lecture 5 David Sontag - PowerPoint PPT Presentation

Support ¡vector ¡machines ¡(SVMs) ¡ Lecture ¡5 ¡ David ¡Sontag ¡ New ¡York ¡University ¡

So5 ¡margin ¡SVM ¡ w . x ¡+ ¡b ¡= ¡+1 ¡ w . x ¡+ ¡b ¡= ¡0 ¡ w . x ¡+ ¡b ¡= ¡-‑1 ¡ “slack ¡variables” ¡ ξ 1 ξ 3 ξ 2 Slack ¡penalty ¡ C > 0 : ¡ • C = ∞ � ¡minimizes ¡upper ¡bound ¡on ¡0-‑1 ¡loss ¡ • C ≈ 0 � ¡ ¡points ¡with ¡ ξ i =0 ¡have ¡big ¡margin ¡ ξ 4 • ¡Select ¡using ¡cross-‑valida=on ¡ Support ¡vectors: ¡ Data ¡points ¡for ¡which ¡the ¡constraints ¡are ¡binding ¡ ¡

So5 ¡margin ¡SVM ¡ QP ¡form: ¡ More ¡“natural” ¡form: ¡ Equivalent ¡if ¡ RegularizaNon ¡ Empirical ¡loss ¡ term ¡

Subgradient ¡ (for ¡non-‑differenNable ¡funcNons) ¡

(Sub)gradient ¡descent ¡of ¡SVM ¡objecNve ¡ Step ¡size: ¡ -‑ ¡

The ¡Pegasos ¡Algorithm ¡ General ¡framework ¡ Pegasos ¡Algorithm ¡(from ¡homework) ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ For ¡iter ¡= ¡1,2,…,20 ¡ While ¡not ¡converged ¡ ¡ For ¡j=1,2,…,|data| ¡ ¡t ¡= ¡t+1 ¡ ¡ ¡t ¡= ¡t+1 ¡ ¡Choose ¡a ¡stepsize, ¡η t ¡ ¡ ¡ η t ¡= ¡1/(tλ) ¡ ¡Choose ¡a ¡direcNon, ¡ p t ¡ ¡ ¡ If ¡y j (w t ¡x j ) ¡< ¡1 ¡ ¡Go! ¡ ¡ ¡ ¡ ¡w t+1 ¡= ¡(1-‑η t λ) ¡w t ¡+ ¡η t ¡y j ¡x j ¡ ¡Test ¡for ¡convergence ¡ ¡ ¡ Else ¡ ¡ ¡ ¡w t+1 ¡= ¡(1-‑η t λ) ¡w t ¡ Output: ¡wt+1 ¡ Output: ¡wt+1 ¡

The ¡Pegasos ¡Algorithm ¡ General ¡framework ¡ Pegasos ¡Algorithm ¡(from ¡homework) ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ For ¡iter ¡= ¡1,2,…,20 ¡ While ¡not ¡converged ¡ ¡ For ¡j=1,2,…,|data| ¡ ¡t ¡= ¡t+1 ¡ ¡ ¡t ¡= ¡t+1 ¡ ¡Choose ¡a ¡stepsize, ¡η t ¡ ¡ ¡η t ¡= ¡1/(tλ) ¡ ¡Choose ¡a ¡direcNon, ¡ p t ¡ ¡ ¡ If ¡y j (w t ¡x j ) ¡< ¡1 ¡ ¡Go! ¡ ¡ ¡ ¡ ¡w t+1 ¡= ¡w t ¡– ¡η t (λw t -‑ ¡ ¡y j x j ) ¡ ¡Test ¡for ¡convergence ¡ ¡ ¡ Else ¡ ¡ ¡ ¡w t+1 ¡= ¡w t ¡ – ¡η t λw t ¡ Output: ¡wt+1 ¡ Output: ¡wt+1 ¡

The ¡Pegasos ¡Algorithm ¡ General ¡framework ¡ Pegasos ¡Algorithm ¡(from ¡homework) ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ For ¡iter ¡= ¡1,2,…,20 ¡ While ¡not ¡converged ¡ ¡For ¡j=1,2,…,|data| ¡ ¡t ¡= ¡t+1 ¡ ¡ ¡t ¡= ¡t+1 ¡ ¡Choose ¡a ¡stepsize, ¡η t ¡ ¡ ¡ η t ¡= ¡1/(tλ) ¡ ¡Choose ¡a ¡direcNon, ¡ p t ¡ ¡ ¡ If ¡y j (w t ¡x j ) ¡< ¡1 ¡ ¡Go! ¡ ¡ ¡ ¡ ¡w t+1 ¡= ¡w t ¡– ¡η t (λw t -‑ ¡ ¡y j x j ) ¡ ¡Test ¡for ¡convergence ¡ ¡ ¡ Else ¡ ¡ ¡ ¡w t+1 ¡= ¡w t ¡– ¡η t λw t ¡ Output: ¡wt+1 ¡ Output: ¡wt+1 ¡ Convergence ¡choice ¡: ¡Fixed ¡number ¡of ¡itera=ons ¡ ¡ ¡ ¡ ¡ ¡ ¡T=20*|data| ¡

The ¡Pegasos ¡Algorithm ¡ General ¡framework ¡ Pegasos ¡Algorithm ¡(from ¡homework) ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ For ¡iter ¡= ¡1,2,…,20 ¡ While ¡not ¡converged ¡ ¡For ¡j=1,2,…,|data| ¡ ¡t ¡= ¡t+1 ¡ ¡ ¡t ¡= ¡t+1 ¡ ¡ Choose ¡a ¡stepsize, ¡η t ¡ ¡ ¡η t ¡= ¡1/(tλ) ¡ ¡Choose ¡a ¡direcNon, ¡ p t ¡ ¡ ¡ If ¡y j (w t ¡x j ) ¡< ¡1 ¡ ¡Go! ¡ ¡ ¡ ¡ ¡w t+1 ¡= ¡w t ¡– ¡ η t (λw t -‑ ¡ ¡y j x j ) ¡ ¡Test ¡for ¡convergence ¡ ¡ ¡ Else ¡ ¡ ¡ ¡w t+1 ¡= ¡w t ¡– ¡ η t λw t ¡ Output: ¡wt+1 ¡ Output: ¡wt+1 ¡ Stepsize ¡choice: ¡-‑ ¡Ini=alize ¡with ¡1/λ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡-‑ ¡Decays ¡with ¡1/t ¡

The ¡Pegasos ¡Algorithm ¡ General ¡framework ¡ Pegasos ¡Algorithm ¡(from ¡homework) ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ For ¡iter ¡= ¡1,2,…,20 ¡ While ¡not ¡converged ¡ ¡For ¡j=1,2,…,|data| ¡ ¡t ¡= ¡t+1 ¡ ¡ ¡t ¡= ¡t+1 ¡ ¡Choose ¡a ¡stepsize, ¡η t ¡ ¡ ¡ η t ¡= ¡1/(tλ) ¡ ¡Choose ¡a ¡direc=on, ¡ p t ¡ ¡ ¡ If ¡y j (w t ¡x j ) ¡< ¡1 ¡ ¡Go! ¡ ¡ ¡ ¡ ¡w t+1 ¡= ¡w t ¡– ¡η t ( λw t -‑ ¡ ¡y j x j ) ¡ ¡Test ¡for ¡convergence ¡ ¡ ¡ Else ¡ ¡ ¡ ¡w t+1 ¡= ¡w t ¡ – ¡η t λw t ¡ Output: ¡wt+1 ¡ Output: ¡wt+1 ¡ Direc=on ¡choice: ¡ ¡Stochas=c ¡approx ¡to ¡the ¡subgradient ¡

Subgradient ¡calculaNon ¡ λ 2 || w || 2 + 1 Objec=ve: ¡ X max { 0 , 1 − y i w · x i } m i λ 2 || w || 2 + max { 0 , 1 − y i w · x i } Stochas=c ¡Approx: ¡ For ¡a ¡randomly ¡chosen ¡data ¡point ¡ i ¡ (in ¡the ¡assignment ¡the ¡choice ¡of ¡i ¡is ¡ not ¡random ¡-‑ ¡ easier ¡ to ¡debug ¡and ¡compare ¡between ¡students). ¡

Subgradient ¡calculaNon ¡ λ 2 || w || 2 + 1 Objec=ve: ¡ X max { 0 , 1 − y i w · x i } m i λ 2 || w || 2 + max { 0 , 1 − y i w · x i } Stochas=c ¡Approx: ¡ (sub)gradient: ¡ λ || w || + d dw max { 0 , 1 − y i w · x i }

Subgradient ¡calculaNon ¡ λ 2 || w || 2 + 1 Objec=ve: ¡ X max { 0 , 1 − y i w · x i } m i λ 2 || w || 2 + max { 0 , 1 − y i w · x i } Stochas=c ¡Approx: ¡ 0 ¡ y i w · x i (sub)gradient: ¡ 0 ¡ 1 ¡ λ || w || + d dw max { 0 , 1 − y i w · x i } y i w · x i <1 ¡ >1 ¡ =1 ¡ 0 − y i x i

Subgradient ¡calculaNon ¡ λ 2 || w || 2 + 1 Objec=ve: ¡ X max { 0 , 1 − y i w · x i } m i λ 2 || w || 2 + max { 0 , 1 − y i w · x i } Stochas=c ¡Approx: ¡ 0 ¡ y i w · x i (sub)gradient: ¡ 0 ¡ 1 ¡ λ || w || + d dw max { 0 , 1 − y i w · x i } y i w · x i <1 ¡ >1 ¡ =1 ¡ 0 0 − y i x i

Subgradient ¡calculaNon ¡ λ 2 || w || 2 + 1 Objec=ve: ¡ X max { 0 , 1 − y i w · x i } m i λ 2 || w || 2 + max { 0 , 1 − y i w · x i } Stochas=c ¡Approx: ¡ (sub)gradient: ¡ if y i w · x i < 1 λ w − y i x i else λ w + 0

The ¡Pegasos ¡Algorithm ¡ General ¡framework ¡ Pegasos ¡Algorithm ¡(from ¡homework) ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ For ¡iter ¡= ¡1,2,…,20 ¡ While ¡not ¡converged ¡ ¡For ¡ j=1,2,…,|data| ¡ ¡t ¡= ¡t+1 ¡ ¡ ¡t ¡= ¡t+1 ¡ ¡Choose ¡a ¡stepsize, ¡η t ¡ ¡ ¡ η t ¡= ¡1/(tλ) ¡ ¡Choose ¡a ¡direc=on, ¡ p t ¡ ¡ ¡ If ¡y j (w t ¡x j ) ¡< ¡1 ¡ ¡Go! ¡ ¡ ¡ ¡ ¡w t+1 ¡= ¡w t ¡– ¡η t (λw t -‑ ¡ ¡y j x j ) ¡ ¡Test ¡for ¡convergence ¡ ¡ ¡ Else ¡ ¡ ¡ ¡w t+1 ¡= ¡w t ¡ – ¡η t (λwt ¡ + ¡0) ¡ Output: ¡wt+1 ¡ Output: ¡wt+1 ¡ Direc=on ¡choice: ¡ ¡Stochas=c ¡approx ¡to ¡the ¡subgradient ¡ if y i w · x i < 1 λ w − y i x i else λ w + 0

The ¡Pegasos ¡Algorithm ¡ General ¡framework ¡ Pegasos ¡Algorithm ¡(from ¡homework) ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ Ini=alize: ¡ w 1 ¡= ¡0, ¡t=0 ¡ For ¡iter ¡= ¡1,2,…,20 ¡ While ¡not ¡converged ¡ ¡For ¡j=1,2,…,|data| ¡ ¡t ¡= ¡t+1 ¡ ¡ ¡t ¡= ¡t+1 ¡ ¡Choose ¡a ¡stepsize, ¡η t ¡ ¡ ¡ η t ¡= ¡1/(tλ) ¡ ¡Choose ¡a ¡direcNon, ¡ p t ¡ ¡ ¡ If ¡y j (w t ¡x j ) ¡< ¡1 ¡ ¡ Go! ¡ ¡ ¡ ¡ ¡ w t+1 ¡= ¡w t ¡– ¡η t (λw t -‑ ¡ ¡y j x j ) ¡ ¡Test ¡for ¡convergence ¡ ¡ ¡ Else ¡ ¡ ¡ ¡ w t+1 ¡= ¡w t ¡– ¡η t λw t ¡ Output: ¡wt+1 ¡ Output: ¡wt+1 ¡ Go: ¡ ¡update ¡w t+1 ¡= ¡w t ¡-‑ ¡η t p t ¡

Why ¡is ¡this ¡algorithm ¡interesNng? ¡ • Simple ¡to ¡implement, ¡state ¡of ¡the ¡art ¡results. ¡ – NoNce ¡similarity ¡to ¡Perceptron ¡algorithm! ¡ Algorithmic ¡differences: ¡ updates ¡if ¡insufficient ¡ margin, ¡scales ¡weight ¡vector, ¡and ¡has ¡a ¡learning ¡rate. ¡ • Since ¡based ¡on ¡ stochas7c ¡ gradient ¡descent, ¡its ¡ running ¡Nme ¡guarantees ¡are ¡probabilisNc. ¡ • Highlights ¡interesNng ¡tradeoffs ¡between ¡running ¡ Nme ¡and ¡data. ¡

Much ¡faster ¡than ¡previous ¡methods ¡ • 3 ¡datasets ¡(provided ¡by ¡Joachims) ¡ – Reuters ¡CCAT ¡(800K ¡examples, ¡47k ¡features) ¡ – Physics ¡ArXiv ¡(62k ¡examples, ¡100k ¡features) ¡ – Covertype ¡(581k ¡examples, ¡54 ¡features) ¡ Pegasos SVM-Perf SVM-Light 2 77 20,075 Training ¡Time ¡ Reuters (in ¡seconds): ¡ 6 85 25,514 Covertype 2 5 80 Astro-Physics

Support vector machines (SVMs) Lecture 5 David Sontag - PowerPoint PPT Presentation

Support vector machines (SVMs) Lecture 5 David Sontag New York University So5 margin SVM w . x + b = +1 w . x + b = 0 w . x + b =

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J Applied Parallel Computing

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines CSC 411 Tutorial April 1, 2015 Tutor: Shenlong

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning al-

Lecture 20: Support Vector Machines (SVMs) CS109A Introduction to Data Science Pavlos Protopapas

Support vector machines (SVMs) Lecture 3 David Sontag New York University Slides adapted from

Support vector machines (SVMs) Lecture 6 David Sontag New York University Slides adapted from

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Parameter-Free Convex Learning through Coin Betting Francesco Orabona and Dvid Pl Yahoo

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 06: Learning with

CAREERS MARKET organizations. (Top) Girls from the College Hair Salon coordinated by Ms Estelle

Communication Sketching Minimal information to connect with viewer schema Straight

Discriminant Analysis using Logistic Regression OLS1D XL4E: V0D XL4E : OLS1D V0D XL4E : OLS1D V0D

Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT

Learning convex bounds for linear quadratic control policy synthesis Jack Umenberger Thomas

The weak Bruhat order on the symmetric group is Sperner Yibo Gao Joint work with: Christian

Support vector machines (SVMs) Lecture 5 David Sontag - PowerPoint PPT Presentation

Support vector machines (SVMs) Lecture 5 David Sontag New York University So5 margin SVM w . x + b = +1 w . x + b = 0 w . x + b =

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J Applied Parallel Computing

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines CSC 411 Tutorial April 1, 2015 Tutor: Shenlong

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning al-

Lecture 20: Support Vector Machines (SVMs) CS109A Introduction to Data Science Pavlos Protopapas

Support vector machines (SVMs) Lecture 3 David Sontag New York University Slides adapted from

Support vector machines (SVMs) Lecture 6 David Sontag New York University Slides adapted from

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Parameter-Free Convex Learning through Coin Betting Francesco Orabona and Dvid Pl Yahoo

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 06: Learning with

CAREERS MARKET organizations. (Top) Girls from the College Hair Salon coordinated by Ms Estelle

Communication Sketching Minimal information to connect with viewer schema Straight

Discriminant Analysis using Logistic Regression OLS1D XL4E: V0D XL4E : OLS1D V0D XL4E : OLS1D V0D

Linear Models for Classification Henrik I Christensen Robotics &amp; Intelligent Machines @ GT

Learning convex bounds for linear quadratic control policy synthesis Jack Umenberger Thomas

The weak Bruhat order on the symmetric group is Sperner Yibo Gao Joint work with: Christian

Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT