support vector machines and kernels
play

Support vector machines and kernels Thurs April 20 Kristen Grauman - PDF document

4/20/2017 Support vector machines and kernels Thurs April 20 Kristen Grauman UT Austin Last time Sliding window object detection wrap-up Attentional cascade Applications / examples Pros and cons Supervised classification


  1. 4/20/2017 Support vector machines and kernels Thurs April 20 Kristen Grauman UT Austin Last time • Sliding window object detection wrap-up • Attentional cascade • Applications / examples • Pros and cons • Supervised classification continued • Nearest neighbors Today • Supervised classification continued • Nearest neighbors (wrap up) • Support vector machines • HoG pedestrians example • Kernels • Multi-class from binary classifiers • Pyramid match kernels • Evaluation • Scoring an object detector • Scoring a multi-class recognition system 1

  2. 4/20/2017 Nearest Neighbor classification • Assign label of nearest training data point to each test data point Black = negative Novel test example Red = positive Closest to a positive example from the training set, so classify it as positive. from Duda et al. Voronoi partitioning of feature space for 2-category 2D data K-Nearest Neighbors classification • For a new point, find the k closest points from training data • Labels of the k points “vote” to classify k = 5 Black = negative If query lands here, the 5 Red = positive NN consist of 3 negatives and 2 positives, so we classify it as negative. Source: D. Lowe Window-based models: Three case studies Boosting + face SVM + person NN + scene Gist detection classification detection e.g., Hays & Efros e.g., Dalal & Triggs Viola & Jones 2

  3. 4/20/2017 Where in the World? [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] 6+ million geotagged photos by 109,788 photographers Annotated by Flickr users Which scene properties are relevant? • Gist scene descriptor • Color Histograms - L*A*B* 4x14x14 histograms • Texton Histograms – 512 entry, filter bank based • Line Features – Histograms of straight line stats 3

  4. 4/20/2017 Im2gps: Scene Matches [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] Im2gps: Scene Matches [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] 4

  5. 4/20/2017 [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] Scene Matches [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] 5

  6. 4/20/2017 Quantitative Evaluation Test Set … The Importance of Data [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] Nearest neighbors: pros and cons • Pros : – Simple to implement – Flexible to feature / distance choices – Naturally handles multi-class cases – Can do well in practice with enough representative data • Cons: – Large search problem to find nearest neighbors – Storage of data – Must know we have a meaningful distance function Kristen Grauman 6

  7. 4/20/2017 Window-based models: Three case studies Boosting + face NN + scene Gist SVM + person detection detection classification e.g., Hays & Efros e.g., Dalal & Triggs Viola & Jones Linear classifiers Linear classifiers • Find linear function to separate positive and negative examples    x positive : x w b 0 i i    x negative : x w b 0 i i Which line is best? 7

  8. 4/20/2017 Support Vector Machines (SVMs) • Discriminative classifier based on optimal separating line (for 2d case) • Maximize the margin between the positive and negative training examples Support vector machines • Want line that maximizes the margin.     x positive ( y 1) : x w b 1 i i i       x negative ( y 1) : x w b 1 i i i x i w   b   1 For support, vectors, Support vectors Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998 Support vector machines • Want line that maximizes the margin.     x positive ( y 1) : x w b 1 i i i       x negative ( y 1) : x w b 1 i i i     x i w b 1 For support, vectors,   | x w b | Distance between point i and line: || w || For support vectors:  b  w Τ x 1 1  1 2  M    Support vectors Margin M w w w w w 8

  9. 4/20/2017 Support vector machines • Want line that maximizes the margin.     x positive ( y 1) : x w b 1 i i i       x negative ( y 1) : x w b 1 i i i     x i w b 1 For support, vectors, | x  w  b | Distance between point i and line: || w || Therefore, the margin is 2 / || w || Support vectors Margin M Finding the maximum margin line 1. Maximize margin 2/|| w || 2. Correctly classify all training data points:     x positive ( y 1) : x w b 1 i i i       x negative ( y 1) : x w b 1 i i i Quadratic optimization problem : 1 w T w Minimize 2 Subject to y i ( w · x i + b ) ≥ 1 Finding the maximum margin line    w i y x • Solution: i i i learned Support weight vector C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998 9

  10. 4/20/2017 Finding the maximum margin line    w i y x • Solution: i i i b = y i – w · x i (for any support vector)        w x b y x x b i i i i • Classification function:    f ( x ) sign ( w x b)        sign y x x b i i i i If f(x) < 0, classify as negative, if f(x) > 0, classify as positive C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998 • CVPR 2005 • 18,317 citations HoG descriptor Dalal & Triggs, CVPR 2005 10

  11. 4/20/2017 Person detection with HoG’s & linear SVM’s • Map each grid cell in the input window to a histogram counting the gradients per orientation. • Train a linear SVM using training set of pedestrian vs. non-pedestrian windows. Dalal & Triggs, CVPR 2005 Person detection with HoGs & linear SVMs • Histograms of Oriented Gradients for Human Detection, Navneet Dalal, Bill Triggs, International Conference on Computer Vision & Pattern Recognition - June 2005 • http://lear.inrialpes.fr/pubs/2005/DT05/ Understanding classifier mistakes 11

  12. 4/20/2017 Carl Vondrick http://web.mit.edu/vondrick/ihog/slides.pdf HOGgles: Visualizing Object Detection Features Carl Vondrick, MIT; Aditya Khosla; Tomasz Malisiewicz; Antonio Torralba, MIT http://web.mit.edu/vondrick/ihog/slides.pdf HOGGLES: Visualizing Object Detection Features HOGgles: Visualizing Object Detection Features Carl Vondrick, MIT; Aditya Khosla; Tomasz Malisiewicz; Antonio Torralba, MIT http://web.mit.edu/vondrick/ihog/slides.pdf 12

  13. 4/20/2017 HOGGLES: Visualizing Object Detection Features HOGgles: Visualizing Object Detection Features Carl Vondrick, MIT; Aditya Khosla; Tomasz Malisiewicz; Antonio Torralba, MIT http://web.mit.edu/vondrick/ihog/slides.pdf HOGGLES: Visualizing Object Detection Features HOGgles: Visualizing Object Detection Features; Carl Vondrick, MIT; Aditya Khosla; Tomasz Malisiewicz; Antonio Torralba, MIT http://web.mit.edu/vondrick/ihog/slides.pdf HOGGLES: Visualizing Object Detection Features HOGgles: Visualizing Object Detection Features; ICCV 2013 Carl Vondrick, MIT; Aditya Khosla; Tomasz Malisiewicz; Antonio Torralba, MIT http://web.mit.edu/vondrick/ihog/slides.pdf 13

  14. 4/20/2017 HOGGLES: Visualizing Object Detection Features http://web.mit.edu/vondrick/ihog/ Questions • What if the data is not linearly separable? Non-linear SVMs  Datasets that are linearly separable with some noise work out great: x 0  But what are we going to do if the dataset is just too hard? x 0  How about … mapping data to a higher-dimensional space: x 2 0 x 14

  15. 4/20/2017 Non-linear SVMs: feature spaces  General idea: the original input space can be mapped to some higher-dimensional feature space where the training set is separable: Φ: x→ φ( x ) Slide from Andrew Moore’s tutorial: http://www.autonlab.org/tutorials/svm.html Nonlinear SVMs • The kernel trick : instead of explicitly computing the lifting transformation φ ( x ), define a kernel function K such that K ( x i , x j j ) = φ ( x i ) · φ ( x j ) • This gives a nonlinear decision boundary in the original feature space:    y K ( x , x ) b i i i i “Kernel trick”: Example 2-dimensional vectors x=[ x 1 x 2 ]; let K (x i ,x j )=(1 + x i T x j ) 2 Need to show that K (x i ,x j )= φ(x i ) T φ(x j ): K (x i ,x j )=(1 + x i T x j ) 2 , = 1+ x i1 2 x j1 2 + 2 x i1 x j1 x i2 x j2 + x i2 2 x j2 2 + 2 x i1 x j1 + 2 x i2 x j2 = [1 x i1 2 √ 2 x i1 x i2 x i2 2 √ 2 x i1 √ 2 x i2 ] T [1 x j1 2 √ 2 x j1 x j2 x j2 2 √ 2 x j1 √ 2 x j2 ] = φ(x i ) T φ(x j ), where φ(x) = [1 x 1 2 √ 2 x 1 x 2 x 2 2 √ 2 x 1 √ 2 x 2 ] 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend