cs 4476 computer vision
play

CS 4476: Computer Vision Introduction to Object Recognition Guest - PowerPoint PPT Presentation

CS 4476: Computer Vision Introduction to Object Recognition Guest Lecturer: Judy Hoffman Slides by Lana Lazebnik except where indicated otherwise Introduction to recognition Source: Charley Harper Outline Overview: recognition tasks


  1. CS 4476: Computer Vision Introduction to Object Recognition Guest Lecturer: Judy Hoffman Slides by Lana Lazebnik except where indicated otherwise

  2. Introduction to recognition Source: Charley Harper

  3. Outline § Overview: recognition tasks § Statistical learning approach § Classic / Shallow Pipeline “Bag of features” representation § Classifiers: nearest neighbor, linear, SVM § § Deep Pipeline § Neural Networks

  4. Common Recognition Tasks Adapted from Fei-Fei Li

  5. Image Classification and Tagging • outdoor • mountains • city What is this an • Asia image of? • Lhasa • … Adapted from Fei-Fei Li

  6. Object Detection find pedestrians Localize! Adapted from Fei-Fei Li

  7. Activity Recognition • walking • shopping • rolling a cart What are they • sitting doing? • talking • … Adapted from Fei-Fei Li

  8. Semantic Segmentation Label Every Pixel Adapted from Fei-Fei Li

  9. Semantic Segmentation sky mountain building Label Every Pixel tree building lamp lamp umbrella umbrella person market stall person person person person ground Adapted from Fei-Fei Li

  10. Detection, semantic and instance segmentation image classification object detection semantic segmentation instance segmentation Image source

  11. Image Description This is a busy street in an Asian city. Mountains and a large palace or fortress loom in the background. In the foreground, we see colorful souvenir stalls and people walking around and shopping. One person in the lower left is pushing an empty cart, and a couple of people in the middle are sitting, possibly posing for a photograph. Adapted from Fei-Fei Li

  12. Image classification

  13. The statistical learning framework Apply a prediction function to a feature representation of the image to get the desired output: f( ) = “apple” f( ) = “tomato” f( ) = “cow”

  14. The statistical learning framework 𝑧 = 𝑔(𝒚) “apple” output prediction feature function representation Training Testing Given unlabeled test instance Given labeled training set 𝒚 { 𝒚 ( , 𝑧 ( , … , 𝒚 + , 𝑧 + } Learn the prediction function Predict the output label 𝑧 as 𝑔 , by minimizing prediction 𝑧 = 𝑔(𝒚) error on training set

  15. Steps Training Training Labels Training Images Image Learned Training Features model Slide credit: D. Hoiem

  16. Steps Training Training Labels Training Images Image Learned Training Features model Learned model Testing Image Prediction “apple” Features Test Image Slide credit: D. Hoiem

  17. “Classic” recognition pipeline Image Class Feature Trainable Pixels label representation classifier • Hand-crafted feature representation • Off-the-shelf trainable classifier

  18. “Classic” representation: Bag of features

  19. Motivation 1: Part-based models Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)

  20. Motivation 2: Texture models Texton histogram “Texton dictionary” Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

  21. Motivation 3: Bags of words Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

  22. Motivation 3: Bags of words Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/projects/preztags/

  23. Motivation 3: Bags of words Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/projects/preztags/

  24. Motivation 3: Bags of words Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/projects/preztags/

  25. Bag of features: Outline 1. Extract local features 2. Learn “visual vocabulary” 3. Quantize local features using visual vocabulary 4. Represent images by frequencies of “visual words”

  26. 1. Local feature extraction Sample patches and extract descriptors

  27. 2. Learning the visual vocabulary … Extracted descriptors from the training set Slide credit: Josef Sivic

  28. 2. Learning the visual vocabulary … Clustering Slide credit: Josef Sivic

  29. 2. Learning the visual vocabulary Visual vocabulary … Clustering Slide credit: Josef Sivic

  30. Recall: K-means clustering Goal: minimize sum of squared Euclidean distances between features x i and their nearest cluster centers m k å å = - 2 D ( X , M ) ( x m ) i k cluster k point i in cluster k Algorithm : • Randomly initialize K cluster centers • Iterate until convergence: Assign each feature to the nearest center • Recompute each cluster center as the mean of all features assigned to it •

  31. Recall: Visual vocabularies … Appearance codebook Source: B. Leibe

  32. Bag of features: Outline 1. Extract local features 2. Learn “visual vocabulary” 3. Quantize local features using visual vocabulary 4. Represent images by frequencies of “visual words”

  33. Spatial pyramids level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

  34. Spatial pyramids level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006)

  35. Spatial pyramids level 0 level 1 level 2 Lazebnik, Schmid & Ponce (CVPR 2006)

  36. Spatial pyramids Scene classification results

  37. Spatial pyramids Caltech101 classification results

  38. “Classic” recognition pipeline Image Class Feature Trainable Pixels label representation classifier • Hand-crafted feature representation • Off-the-shelf trainable classifier

  39. Classifiers: Nearest neighbor Training Test Training examples example examples from class 2 from class 1 f( x ) = label of the training example nearest to x • All we need is a distance or similarity function for our inputs • No training required!

  40. Functions for comparing histograms N å = - D ( h , h ) | h ( i ) h ( i ) | • L1 distance: 1 2 1 2 = i 1 ( ) 2 - N h ( i ) h ( i ) • χ 2 distance: å = D ( h , h ) 1 2 1 2 + h ( i ) h ( i ) = i 1 1 2 • Quadratic distance ( cross-bin distance ): å = - 2 D ( h , h ) A ( h ( i ) h ( j )) 1 2 ij 1 2 i , j • Histogram intersection (similarity function): N å = I ( h , h ) min( h ( i ), h ( i )) 1 2 1 2 = i 1

  41. K-nearest neighbor classifier • For a new point, find the k closest points from training data • Vote for class label with labels of the k points k = 5 What is the label for x ?

  42. Quiz : K-nearest neighbor classifier Which classifier is more robust to outliers ? Credit: Andrej Karpathy, http://cs231n.github.io/classification/

  43. K-nearest neighbor classifier Credit: Andrej Karpathy, http://cs231n.github.io/classification/

  44. Linear classifiers Find a linear function to separate the classes: f( x ) = sgn( w × x + b)

  45. Visualizing linear classifiers Source: Andrej Karpathy, http://cs231n.github.io/linear-classify/

  46. Nearest neighbor vs. linear classifiers Nearest Neighbors Linear Models • Pros: • Pros: – Simple to implement – Low-dimensional parametric representation – Decision boundaries not necessarily linear – Very fast at test time – Works for any number of classes – Nonparametric method • Cons: • Cons: – Need good distance function – Works for two classes – Slow at test time – How to train the linear function? – What if data is not linearly separable?

  47. Linear classifiers When the data is linearly separable, there may be more than one separator (hyperplane) Which separator is best?

  48. Review: Neural Networks http://playground.tensorflow.org/

  49. “Deep” recognition pipeline Image Simple Layer 1 Layer 2 Layer 3 pixels Classifier • Learn a feature hierarchy from pixels to classifier • Each layer extracts features from the output of previous layer • Train all layers jointly

  50. “Deep” vs. “shallow” (SVMs) Learning

  51. Training of multi-layer networks Find network weights to minimize the prediction loss between • true and estimated labels of training examples: 𝐹 𝐱 = ∑ 0 𝑚(𝐲 0 , 𝑧 0 ; 𝐱) • ¶ E ¬ - a w w Update weights by gradient descent: • ¶ w w 2 w 1

  52. Training of multi-layer networks Find network weights to minimize the prediction loss between • true and estimated labels of training examples: 𝐹 𝐱 = ∑ 0 𝑚(𝐲 0 , 𝑧 0 ; 𝐱) • ¶ E ¬ - a w w Update weights by gradient descent: • ¶ w Back-propagation: gradients are computed in the direction • from output to input layers and combined using chain rule Stochastic gradient descent: compute the weight update w.r.t. • one training example (or a small batch of examples) at a time, cycle through training examples in random order in multiple epochs

  53. Network with a single hidden layer • Neural networks with at least one hidden layer are universal function approximators

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend