Support vector machines and kernels Thurs April 20 Kristen Grauman - PDF document

4/20/2017 Support vector machines and kernels Thurs April 20 Kristen Grauman UT Austin Last time • Sliding window object detection wrap-up • Attentional cascade • Applications / examples • Pros and cons • Supervised classification continued • Nearest neighbors Today • Supervised classification continued • Nearest neighbors (wrap up) • Support vector machines • HoG pedestrians example • Kernels • Multi-class from binary classifiers • Pyramid match kernels • Evaluation • Scoring an object detector • Scoring a multi-class recognition system 1

4/20/2017 Nearest Neighbor classification • Assign label of nearest training data point to each test data point Black = negative Novel test example Red = positive Closest to a positive example from the training set, so classify it as positive. from Duda et al. Voronoi partitioning of feature space for 2-category 2D data K-Nearest Neighbors classification • For a new point, find the k closest points from training data • Labels of the k points “vote” to classify k = 5 Black = negative If query lands here, the 5 Red = positive NN consist of 3 negatives and 2 positives, so we classify it as negative. Source: D. Lowe Window-based models: Three case studies Boosting + face SVM + person NN + scene Gist detection classification detection e.g., Hays & Efros e.g., Dalal & Triggs Viola & Jones 2

4/20/2017 Where in the World? [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] 6+ million geotagged photos by 109,788 photographers Annotated by Flickr users Which scene properties are relevant? • Gist scene descriptor • Color Histograms - L*A*B* 4x14x14 histograms • Texton Histograms – 512 entry, filter bank based • Line Features – Histograms of straight line stats 3

4/20/2017 Im2gps: Scene Matches [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] Im2gps: Scene Matches [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] 4

4/20/2017 [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] Scene Matches [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] 5

4/20/2017 Quantitative Evaluation Test Set … The Importance of Data [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] Nearest neighbors: pros and cons • Pros : – Simple to implement – Flexible to feature / distance choices – Naturally handles multi-class cases – Can do well in practice with enough representative data • Cons: – Large search problem to find nearest neighbors – Storage of data – Must know we have a meaningful distance function Kristen Grauman 6

4/20/2017 Window-based models: Three case studies Boosting + face NN + scene Gist SVM + person detection detection classification e.g., Hays & Efros e.g., Dalal & Triggs Viola & Jones Linear classifiers Linear classifiers • Find linear function to separate positive and negative examples    x positive : x w b 0 i i    x negative : x w b 0 i i Which line is best? 7

4/20/2017 Support Vector Machines (SVMs) • Discriminative classifier based on optimal separating line (for 2d case) • Maximize the margin between the positive and negative training examples Support vector machines • Want line that maximizes the margin.     x positive ( y 1) : x w b 1 i i i       x negative ( y 1) : x w b 1 i i i x i w   b   1 For support, vectors, Support vectors Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998 Support vector machines • Want line that maximizes the margin.     x positive ( y 1) : x w b 1 i i i       x negative ( y 1) : x w b 1 i i i     x i w b 1 For support, vectors,   | x w b | Distance between point i and line: || w || For support vectors:  b  w Τ x 1 1  1 2  M    Support vectors Margin M w w w w w 8

4/20/2017 Support vector machines • Want line that maximizes the margin.     x positive ( y 1) : x w b 1 i i i       x negative ( y 1) : x w b 1 i i i     x i w b 1 For support, vectors, | x  w  b | Distance between point i and line: || w || Therefore, the margin is 2 / || w || Support vectors Margin M Finding the maximum margin line 1. Maximize margin 2/|| w || 2. Correctly classify all training data points:     x positive ( y 1) : x w b 1 i i i       x negative ( y 1) : x w b 1 i i i Quadratic optimization problem : 1 w T w Minimize 2 Subject to y i ( w · x i + b ) ≥ 1 Finding the maximum margin line    w i y x • Solution: i i i learned Support weight vector C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998 9

4/20/2017 Finding the maximum margin line    w i y x • Solution: i i i b = y i – w · x i (for any support vector)        w x b y x x b i i i i • Classification function:    f ( x ) sign ( w x b)        sign y x x b i i i i If f(x) < 0, classify as negative, if f(x) > 0, classify as positive C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998 • CVPR 2005 • 18,317 citations HoG descriptor Dalal & Triggs, CVPR 2005 10

4/20/2017 Person detection with HoG’s & linear SVM’s • Map each grid cell in the input window to a histogram counting the gradients per orientation. • Train a linear SVM using training set of pedestrian vs. non-pedestrian windows. Dalal & Triggs, CVPR 2005 Person detection with HoGs & linear SVMs • Histograms of Oriented Gradients for Human Detection, Navneet Dalal, Bill Triggs, International Conference on Computer Vision & Pattern Recognition - June 2005 • http://lear.inrialpes.fr/pubs/2005/DT05/ Understanding classifier mistakes 11

4/20/2017 Carl Vondrick http://web.mit.edu/vondrick/ihog/slides.pdf HOGgles: Visualizing Object Detection Features Carl Vondrick, MIT; Aditya Khosla; Tomasz Malisiewicz; Antonio Torralba, MIT http://web.mit.edu/vondrick/ihog/slides.pdf HOGGLES: Visualizing Object Detection Features HOGgles: Visualizing Object Detection Features Carl Vondrick, MIT; Aditya Khosla; Tomasz Malisiewicz; Antonio Torralba, MIT http://web.mit.edu/vondrick/ihog/slides.pdf 12

4/20/2017 HOGGLES: Visualizing Object Detection Features HOGgles: Visualizing Object Detection Features Carl Vondrick, MIT; Aditya Khosla; Tomasz Malisiewicz; Antonio Torralba, MIT http://web.mit.edu/vondrick/ihog/slides.pdf HOGGLES: Visualizing Object Detection Features HOGgles: Visualizing Object Detection Features; Carl Vondrick, MIT; Aditya Khosla; Tomasz Malisiewicz; Antonio Torralba, MIT http://web.mit.edu/vondrick/ihog/slides.pdf HOGGLES: Visualizing Object Detection Features HOGgles: Visualizing Object Detection Features; ICCV 2013 Carl Vondrick, MIT; Aditya Khosla; Tomasz Malisiewicz; Antonio Torralba, MIT http://web.mit.edu/vondrick/ihog/slides.pdf 13

4/20/2017 HOGGLES: Visualizing Object Detection Features http://web.mit.edu/vondrick/ihog/ Questions • What if the data is not linearly separable? Non-linear SVMs  Datasets that are linearly separable with some noise work out great: x 0  But what are we going to do if the dataset is just too hard? x 0  How about … mapping data to a higher-dimensional space: x 2 0 x 14

4/20/2017 Non-linear SVMs: feature spaces  General idea: the original input space can be mapped to some higher-dimensional feature space where the training set is separable: Φ: x→ φ( x ) Slide from Andrew Moore’s tutorial: http://www.autonlab.org/tutorials/svm.html Nonlinear SVMs • The kernel trick : instead of explicitly computing the lifting transformation φ ( x ), define a kernel function K such that K ( x i , x j j ) = φ ( x i ) · φ ( x j ) • This gives a nonlinear decision boundary in the original feature space:    y K ( x , x ) b i i i i “Kernel trick”: Example 2-dimensional vectors x=[ x 1 x 2 ]; let K (x i ,x j )=(1 + x i T x j ) 2 Need to show that K (x i ,x j )= φ(x i ) T φ(x j ): K (x i ,x j )=(1 + x i T x j ) 2 , = 1+ x i1 2 x j1 2 + 2 x i1 x j1 x i2 x j2 + x i2 2 x j2 2 + 2 x i1 x j1 + 2 x i2 x j2 = [1 x i1 2 √ 2 x i1 x i2 x i2 2 √ 2 x i1 √ 2 x i2 ] T [1 x j1 2 √ 2 x j1 x j2 x j2 2 √ 2 x j1 √ 2 x j2 ] = φ(x i ) T φ(x j ), where φ(x) = [1 x 1 2 √ 2 x 1 x 2 x 2 2 √ 2 x 1 √ 2 x 2 ] 15

Support vector machines and kernels Thurs April 20 Kristen Grauman - PDF document

4/20/2017 Support vector machines and kernels Thurs April 20 Kristen Grauman UT Austin Last time Sliding window object detection wrap-up Attentional cascade Applications / examples Pros and cons Supervised classification

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Support Vector Machines & Kernels Lecture 5 David Sontag New York University Slides adapted

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

Support Vector Machines Support Vector Machines CSC 411 Tutorial April 1, 2015 Tutor: Shenlong

Advanced Natural Language Processing: Background and Overview Regina Barzilay and Michael

Semantic Knowledge Acquisition using Frequency Based Patterns Roy Schwartz and Ari Rappoport

r t t

Wissenschaftliches Arbeiten 193.052, SS 2020, 2.0h (3 ECTS) Philipp Erler

Lessons Learned Lessons Learned From From Lessons Learned Lessons Learned From From

Distributed Representations CMSC 473/673 UMBC September 27 th , 2017 Some slides adapted from

Distributional Lexical Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

A New Investment Opportunity 0 7 . 1 0 . 2 0 1 9 E x p o R e a l , M u n i c h Porto Macro

Support vector machines and kernels Thurs April 20 Kristen Grauman - PDF document

4/20/2017 Support vector machines and kernels Thurs April 20 Kristen Grauman UT Austin Last time Sliding window object detection wrap-up Attentional cascade Applications / examples Pros and cons Supervised classification

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Support Vector Machines &amp; Kernels Lecture 5 David Sontag New York University Slides adapted

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

Support Vector Machines Support Vector Machines CSC 411 Tutorial April 1, 2015 Tutor: Shenlong

Advanced Natural Language Processing: Background and Overview Regina Barzilay and Michael

Semantic Knowledge Acquisition using Frequency Based Patterns Roy Schwartz and Ari Rappoport

r t t

Wissenschaftliches Arbeiten 193.052, SS 2020, 2.0h (3 ECTS) Philipp Erler

Lessons Learned Lessons Learned From From Lessons Learned Lessons Learned From From

Distributed Representations CMSC 473/673 UMBC September 27 th , 2017 Some slides adapted from

Distributional Lexical Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

A New Investment Opportunity 0 7 . 1 0 . 2 0 1 9 E x p o R e a l , M u n i c h Porto Macro

Support Vector Machines & Kernels Lecture 5 David Sontag New York University Slides adapted