CS 4476: Computer Vision Introduction to Object Recognition Guest - PowerPoint PPT Presentation

CS 4476: Computer Vision Introduction to Object Recognition Guest Lecturer: Judy Hoffman Slides by Lana Lazebnik except where indicated otherwise

Introduction to recognition Source: Charley Harper

Outline § Overview: recognition tasks § Statistical learning approach § Classic / Shallow Pipeline “Bag of features” representation § Classifiers: nearest neighbor, linear, SVM § § Deep Pipeline § Neural Networks

Common Recognition Tasks Adapted from Fei-Fei Li

Image Classification and Tagging • outdoor • mountains • city What is this an • Asia image of? • Lhasa • … Adapted from Fei-Fei Li

Object Detection find pedestrians Localize! Adapted from Fei-Fei Li

Activity Recognition • walking • shopping • rolling a cart What are they • sitting doing? • talking • … Adapted from Fei-Fei Li

Semantic Segmentation Label Every Pixel Adapted from Fei-Fei Li

Semantic Segmentation sky mountain building Label Every Pixel tree building lamp lamp umbrella umbrella person market stall person person person person ground Adapted from Fei-Fei Li

Detection, semantic and instance segmentation image classification object detection semantic segmentation instance segmentation Image source

Image Description This is a busy street in an Asian city. Mountains and a large palace or fortress loom in the background. In the foreground, we see colorful souvenir stalls and people walking around and shopping. One person in the lower left is pushing an empty cart, and a couple of people in the middle are sitting, possibly posing for a photograph. Adapted from Fei-Fei Li

Image classification

The statistical learning framework Apply a prediction function to a feature representation of the image to get the desired output: f( ) = “apple” f( ) = “tomato” f( ) = “cow”

The statistical learning framework 𝑧 = 𝑔(𝒚) “apple” output prediction feature function representation Training Testing Given unlabeled test instance Given labeled training set 𝒚 { 𝒚 ( , 𝑧 ( , … , 𝒚 + , 𝑧 + } Learn the prediction function Predict the output label 𝑧 as 𝑔 , by minimizing prediction 𝑧 = 𝑔(𝒚) error on training set

Steps Training Training Labels Training Images Image Learned Training Features model Slide credit: D. Hoiem

Steps Training Training Labels Training Images Image Learned Training Features model Learned model Testing Image Prediction “apple” Features Test Image Slide credit: D. Hoiem

“Classic” recognition pipeline Image Class Feature Trainable Pixels label representation classifier • Hand-crafted feature representation • Off-the-shelf trainable classifier

“Classic” representation: Bag of features

Motivation 1: Part-based models Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)

Motivation 2: Texture models Texton histogram “Texton dictionary” Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Motivation 3: Bags of words Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

Motivation 3: Bags of words Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/projects/preztags/

Bag of features: Outline 1. Extract local features 2. Learn “visual vocabulary” 3. Quantize local features using visual vocabulary 4. Represent images by frequencies of “visual words”

1. Local feature extraction Sample patches and extract descriptors

2. Learning the visual vocabulary … Extracted descriptors from the training set Slide credit: Josef Sivic

2. Learning the visual vocabulary … Clustering Slide credit: Josef Sivic

2. Learning the visual vocabulary Visual vocabulary … Clustering Slide credit: Josef Sivic

Recall: K-means clustering Goal: minimize sum of squared Euclidean distances between features x i and their nearest cluster centers m k å å = - 2 D ( X , M ) ( x m ) i k cluster k point i in cluster k Algorithm : • Randomly initialize K cluster centers • Iterate until convergence: Assign each feature to the nearest center • Recompute each cluster center as the mean of all features assigned to it •

Recall: Visual vocabularies … Appearance codebook Source: B. Leibe

Bag of features: Outline 1. Extract local features 2. Learn “visual vocabulary” 3. Quantize local features using visual vocabulary 4. Represent images by frequencies of “visual words”

Spatial pyramids level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramids level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramids level 0 level 1 level 2 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramids Scene classification results

Spatial pyramids Caltech101 classification results

“Classic” recognition pipeline Image Class Feature Trainable Pixels label representation classifier • Hand-crafted feature representation • Off-the-shelf trainable classifier

Classifiers: Nearest neighbor Training Test Training examples example examples from class 2 from class 1 f( x ) = label of the training example nearest to x • All we need is a distance or similarity function for our inputs • No training required!

Functions for comparing histograms N å = - D ( h , h ) | h ( i ) h ( i ) | • L1 distance: 1 2 1 2 = i 1 ( ) 2 - N h ( i ) h ( i ) • χ 2 distance: å = D ( h , h ) 1 2 1 2 + h ( i ) h ( i ) = i 1 1 2 • Quadratic distance ( cross-bin distance ): å = - 2 D ( h , h ) A ( h ( i ) h ( j )) 1 2 ij 1 2 i , j • Histogram intersection (similarity function): N å = I ( h , h ) min( h ( i ), h ( i )) 1 2 1 2 = i 1

K-nearest neighbor classifier • For a new point, find the k closest points from training data • Vote for class label with labels of the k points k = 5 What is the label for x ?

Quiz : K-nearest neighbor classifier Which classifier is more robust to outliers ? Credit: Andrej Karpathy, http://cs231n.github.io/classification/

K-nearest neighbor classifier Credit: Andrej Karpathy, http://cs231n.github.io/classification/

Linear classifiers Find a linear function to separate the classes: f( x ) = sgn( w × x + b)

Visualizing linear classifiers Source: Andrej Karpathy, http://cs231n.github.io/linear-classify/

Nearest neighbor vs. linear classifiers Nearest Neighbors Linear Models • Pros: • Pros: – Simple to implement – Low-dimensional parametric representation – Decision boundaries not necessarily linear – Very fast at test time – Works for any number of classes – Nonparametric method • Cons: • Cons: – Need good distance function – Works for two classes – Slow at test time – How to train the linear function? – What if data is not linearly separable?

Linear classifiers When the data is linearly separable, there may be more than one separator (hyperplane) Which separator is best?

Review: Neural Networks http://playground.tensorflow.org/

“Deep” recognition pipeline Image Simple Layer 1 Layer 2 Layer 3 pixels Classifier • Learn a feature hierarchy from pixels to classifier • Each layer extracts features from the output of previous layer • Train all layers jointly

“Deep” vs. “shallow” (SVMs) Learning

Training of multi-layer networks Find network weights to minimize the prediction loss between • true and estimated labels of training examples: 𝐹 𝐱 = ∑ 0 𝑚(𝐲 0 , 𝑧 0 ; 𝐱) • ¶ E ¬ - a w w Update weights by gradient descent: • ¶ w w 2 w 1

Training of multi-layer networks Find network weights to minimize the prediction loss between • true and estimated labels of training examples: 𝐹 𝐱 = ∑ 0 𝑚(𝐲 0 , 𝑧 0 ; 𝐱) • ¶ E ¬ - a w w Update weights by gradient descent: • ¶ w Back-propagation: gradients are computed in the direction • from output to input layers and combined using chain rule Stochastic gradient descent: compute the weight update w.r.t. • one training example (or a small batch of examples) at a time, cycle through training examples in random order in multiple epochs

Network with a single hidden layer • Neural networks with at least one hidden layer are universal function approximators

CS 4476: Computer Vision Introduction to Object Recognition Guest - PowerPoint PPT Presentation

CS 4476: Computer Vision Introduction to Object Recognition Guest Lecturer: Judy Hoffman Slides by Lana Lazebnik except where indicated otherwise Introduction to recognition Source: Charley Harper Outline Overview: recognition tasks

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Deep Learning in Image Processing Topics: Image Filtering 101 CNNs 101 Image

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

CS201 Lecture 02 Computer Vision: Image Formation and Basic Techniques John Magee 1 Computer

Panel on AI Ethics Education at the Symposium on Educational Advances in Artificial

BUILDING REACTIVE PIPELINES WITH KOTLIN & SPRING MARK HECKLER @mkheck Copenhagen Denmark

Generalized Planning via Abstraction: Arbitrary Numbers of Objects Le on Illanes Sheila A.

CSSE232 Computer Architecture Introduc5on Reading Be:er for

Software Language Design with Intent or, How I Read 24 Books and Why Dr. Vadim Zaytsev CSO

Principles of Programming Jacques Carette McMaster University Week of September 7, 2016

CSCI 2570 Introduction to Nanocomputing Historical Context for Computing John E Savage A

Human-Robot Collaboration in an industrial environment Laura Schfer University of Hamburg

CS 4476: Computer Vision Introduction to Object Recognition Guest - PowerPoint PPT Presentation

CS 4476: Computer Vision Introduction to Object Recognition Guest Lecturer: Judy Hoffman Slides by Lana Lazebnik except where indicated otherwise Introduction to recognition Source: Charley Harper Outline Overview: recognition tasks

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Deep Learning in Image Processing Topics: Image Filtering 101 CNNs 101 Image

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

CS201 Lecture 02 Computer Vision: Image Formation and Basic Techniques John Magee 1 Computer

Panel on AI Ethics Education at the Symposium on Educational Advances in Artificial

BUILDING REACTIVE PIPELINES WITH KOTLIN &amp; SPRING MARK HECKLER @mkheck Copenhagen Denmark

Generalized Planning via Abstraction: Arbitrary Numbers of Objects Le on Illanes Sheila A.

CSSE232 Computer Architecture Introduc5on Reading Be:er for

Software Language Design with Intent or, How I Read 24 Books and Why Dr. Vadim Zaytsev CSO

Principles of Programming Jacques Carette McMaster University Week of September 7, 2016

CSCI 2570 Introduction to Nanocomputing Historical Context for Computing John E Savage A

Human-Robot Collaboration in an industrial environment Laura Schfer University of Hamburg

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

BUILDING REACTIVE PIPELINES WITH KOTLIN & SPRING MARK HECKLER @mkheck Copenhagen Denmark