Part 5: Structured Support Vector Machines Sebastian Nowozin and - PowerPoint PPT Presentation

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Problem (Loss-Minimizing Parameter Learning) Let d ( x, y ) be the (unknown) true data distribution. Let D = { ( x 1 , y 1 ) , . . . , ( x N , y N ) } be i.i.d. samples from d ( x, y ) . Let φ : X × Y → R D be a feature function. Let ∆ : Y × Y → R be a loss function. ◮ Find a weight vector w ∗ that leads to minimal expected loss E ( x,y ) ∼ d ( x,y ) { ∆( y, f ( x )) } for f ( x ) = argmax y ∈Y � w, φ ( x, y ) � . 2 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Problem (Loss-Minimizing Parameter Learning) Let d ( x, y ) be the (unknown) true data distribution. Let D = { ( x 1 , y 1 ) , . . . , ( x N , y N ) } be i.i.d. samples from d ( x, y ) . Let φ : X × Y → R D be a feature function. Let ∆ : Y × Y → R be a loss function. ◮ Find a weight vector w ∗ that leads to minimal expected loss E ( x,y ) ∼ d ( x,y ) { ∆( y, f ( x )) } for f ( x ) = argmax y ∈Y � w, φ ( x, y ) � . Pro: ◮ We directly optimize for the quantity of interest: expected loss. ◮ No expensive-to-compute partition function Z will show up. Con: ◮ We need to know the loss function already at training time. ◮ We can’t use probabilistic reasoning to find w ∗ . 3 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Reminder: learning by regularized risk minimization For compatibility function g ( x, y ; w ) := � w, φ ( x, y ) � find w ∗ that minimizes E ( x,y ) ∼ d ( x,y ) ∆( y, argmax y g ( x, y ; w ) ) . Two major problems: ◮ d ( x, y ) is unknown ◮ argmax y g ( x, y ; w ) maps into a discrete space → ∆( y, argmax y g ( x, y ; w )) is discontinuous, piecewise constant 4 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Task: min E ( x,y ) ∼ d ( x,y ) ∆( y, argmax y g ( x, y ; w ) ) . w Problem 1: ◮ d ( x, y ) is unknown Solution: 1 ◮ Replace E ( x,y ) ∼ d ( x,y ) � � � � · with empirical estimate � · ( x n ,y n ) N ◮ To avoid overfitting: add a regularizer , e.g. λ � w � 2 . New task: N λ � w � 2 + 1 � ∆( y n , argmax y g ( x n , y ; w ) ) . min N w n =1 5 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Task: N λ � w � 2 + 1 � ∆( y n , argmax y g ( x n , y ; w ) ) . min N w n =1 Problem: ◮ ∆( y, argmax y g ( x, y ; w ) ) discontinuous w.r.t. w . Solution: ◮ Replace ∆( y, y ′ ) with well behaved ℓ ( x, y, w ) ◮ Typically: ℓ upper bound to ∆ , continuous and convex w.r.t. w . New task: N λ � w � 2 + 1 � ℓ ( x n , y n , w )) min N w n =1 6 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Regularized Risk Minimization N 1 � λ � w � 2 ℓ ( x n , y n , w )) min + N w n =1 Regularization + Loss on training data 7 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Regularized Risk Minimization N 1 � λ � w � 2 ℓ ( x n , y n , w )) min + N w n =1 Regularization + Loss on training data Hinge loss: maximum margin training ℓ ( x n , y n , w ) := max ∆( y n , y ) + � w, φ ( x n , y ) � − � w, φ ( x n , y n ) � � � y ∈Y 8 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Regularized Risk Minimization N 1 � λ � w � 2 ℓ ( x n , y n , w )) min + N w n =1 Regularization + Loss on training data Hinge loss: maximum margin training ℓ ( x n , y n , w ) := max ∆( y n , y ) + � w, φ ( x n , y ) � − � w, φ ( x n , y n ) � � � y ∈Y ◮ ℓ is maximum over linear functions → continuous , convex . ◮ ℓ bounds ∆ from above. y = argmax y g ( x n , y, w ) Proof: Let ¯ ∆( y n , ¯ y ) ≤ ∆( y n , ¯ y ) + g ( x n , ¯ y, w ) − g ( x n , y n , w ) ∆( y n , y ) + g ( x n , y, w ) − g ( x n , y n , w ) � � ≤ max y ∈Y 9 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Regularized Risk Minimization N 1 � λ � w � 2 ℓ ( x n , y n , w )) min + N w n =1 Regularization + Loss on training data Hinge loss: maximum margin training ℓ ( x n , y n , w ) := max ∆( y n , y ) + � w, φ ( x n , y ) � − � w, φ ( x n , y n ) � � � y ∈Y Alternative: Logistic loss: probabilistic training � ℓ ( x n , y n , w ) := log � w, φ ( x n , y ) � − � w, φ ( x n , y n ) � � � exp y ∈Y 10 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Structured Output Support Vector Machine N 2 � w � 2 + C 1 � � � y ∈Y ∆( y n , y ) + � w, φ ( x n , y ) � − � w, φ ( x n , y n ) � min max N w n =1 Conditional Random Field N � w � 2 � �� w, φ ( x n , y ) � − � w, φ ( x n , y n ) � � min 2 σ 2 + log exp w n =1 y ∈Y CRFs and SSVMs have more in common than usually assumed. ◮ both do regularized risk minimization ◮ log � y exp( · ) can be interpreted as a soft-max 11 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Solving the Training Optimization Problem Numerically Structured Output Support Vector Machine: N 1 2 � w � 2 + C � �� ∆( y n , y ) + � w, φ ( x n , y ) � − � w, φ ( x n , y n ) � min max N w y ∈Y n =1 Unconstrained optimization, convex, non-differentiable objective. 12 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Structured Output SVM (equivalent formulation): N 2 � w � 2 + C 1 � ξ n min N w,ξ n =1 subject to, for n = 1 , . . . , N , � � ∆( y n , y ) + � w, φ ( x n , y ) � − � w, φ ( x n , y n ) � ≤ ξ n max y ∈Y N non-linear contraints, convex, differentiable objective. 13 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Structured Output SVM (also equivalent formulation): N 1 2 � w � 2 + C � ξ n min N w,ξ n =1 subject to, for n = 1 , . . . , N , ∆( y n , y ) + � w, φ ( x n , y ) � − � w, φ ( x n , y n ) � ≤ ξ n , for all y ∈ Y N |Y| linear constraints, convex, differentiable objective. 14 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Example: Multiclass SVM � for y � = y ′ 1 ◮ Y = { 1 , 2 , . . . , K } , ∆( y, y ′ ) = otherwise . 0 � � ◮ φ ( x, y ) = � y = 1 � φ ( x ) , � y = 2 � φ ( x ) , . . . , � y = K � φ ( x ) N 1 2 � w � 2 + C � ξ n Solve: min N w,ξ n =1 subject to, for i = 1 , . . . , n , � w, φ ( x n , y n ) � − � w, φ ( x n , y ) � ≥ 1 − ξ n for all y ∈ Y \ { y n } . Classification: f ( x ) = argmax y ∈Y � w, φ ( x, y ) � . Crammer-Singer Multiclass SVM [K. Crammer, Y. Singer: ”On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines”, JMLR, 2001] 15 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Example: Hierarchical SVM Hierarchical Multiclass Loss: ∆( y, y ′ ) := 1 2( distance in tree ) ∆( cat , cat ) = 0 , ∆( cat , dog ) = 1 , ∆( cat , bus ) = 2 , etc. N 1 2 � w � 2 + C � ξ n Solve: min N w,ξ n =1 subject to, for i = 1 , . . . , n , � w, φ ( x n , y n ) � − � w, φ ( x n , y ) � ≥ ∆( y n , y ) − ξ n for all y ∈ Y . [L. Cai, T. Hofmann: ”Hierarchical Document Categorization with Support Vector Machines”, ACM CIKM, 2004] [A. Binder, K.-R. M¨ uller, M. Kawanabe: ”On taxonomies for multi-class image categorization”, IJCV, 2011] 16 / 56

Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Solving the Training Optimization Problem Numerically We can solve SSVM training like CRF training: N 1 2 � w � 2 + C � � � y ∈Y ∆( y n , y ) + � w, φ ( x n , y ) � − � w, φ ( x n , y n ) � min max N w n =1 ◮ continuous � ◮ unconstrained � ◮ convex � ◮ non-differentiable � → we can’t use gradient descent directly. → we’ll have to use subgradients 17 / 56

f(w 0 )+⟨v,w-w 0 ⟩ f(w) f(w 0 ) w w 0 Sebastian Nowozin and Christoph Lampert – Structured Models in Computer Vision – Part 5. Structured SVMs Definition Let f : R D → R be a convex, not necessarily differentiable, function. A vector v ∈ R D is called a subgradient of f at w 0 , if f ( w ) ≥ f ( w 0 ) + � v, w − w 0 � for all w . f(w 0 )+⟨v,w-w 0 ⟩ f(w) f(w 0 ) w w 0 18 / 56

Part 5: Structured Support Vector Machines Sebastian Nowozin and - PowerPoint PPT Presentation

Sebastian Nowozin and Christoph Lampert Structured Models in Computer Vision Part 5. Structured SVMs Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 56

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus

An Introductory Tutorial on Implementing DRL Algorithms with DQN and TensorFlow Tim Tse May 18,

Lecture 10: Neural Networks (Part 2) Feb 25th, 2020 Lecturer: Steven Wu Scribe: Steven Wu 1

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

CSCE 978 Lecture 3: Risk and Loss Functions Introduction In Lecture 1 we mentioned our

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go

Appendix A Borough Plan outcomes performance Q2 2017/18 Reducing Inequality Green Amber Red

Introductory Statistics Day 13 Review Review Activity 1: Reading a histogram: True or False:

Sambuz

Useful Links

Newsletter

Mail Us

Part 5: Structured Support Vector Machines Sebastian Nowozin and - PowerPoint PPT Presentation

Sebastian Nowozin and Christoph Lampert Structured Models in Computer Vision Part 5. Structured SVMs Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 56

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus

An Introductory Tutorial on Implementing DRL Algorithms with DQN and TensorFlow Tim Tse May 18,

Lecture 10: Neural Networks (Part 2) Feb 25th, 2020 Lecturer: Steven Wu Scribe: Steven Wu 1

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

CSCE 978 Lecture 3: Risk and Loss Functions Introduction In Lecture 1 we mentioned our

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go

Appendix A Borough Plan outcomes performance Q2 2017/18 Reducing Inequality Green Amber Red

Introductory Statistics Day 13 Review Review Activity 1: Reading a histogram: True or False:

Sambuz

Useful Links

Newsletter

Mail Us

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David