Typically represent objects by bounding boxes. People have tried - - PowerPoint PPT Presentation

typically represent objects by bounding boxes people have
SMART_READER_LITE
LIVE PREVIEW

Typically represent objects by bounding boxes. People have tried - - PowerPoint PPT Presentation

Lecture 6: Introduction to Detection Jonathan Krause Fei-Fei Li, Jonathan Krause Lecture 6 - 1 Typically represent objects by bounding boxes. People have tried Goal rotated bounding boxes before. Locate objects in images Fei-Fei Li,


slide-1
SLIDE 1

Lecture 6 - Fei-Fei Li, Jonathan Krause 1

Lecture 6: Introduction to Detection

Jonathan Krause

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Locate objects in images

Goal

2

Typically represent objects by bounding boxes. People have tried rotated bounding boxes before.

Lecture 6 - Fei-Fei Li, Jonathan Krause

Variants: Pedestrian Detection

3 Leibe et al., 2005

This is a pretty big subfield of vision

slide-2
SLIDE 2

Lecture 6 - Fei-Fei Li, Jonathan Krause

Variants: Face Detection

4

Another big subfield of vision

Lecture 6 - Fei-Fei Li, Jonathan Krause

Variants: Instance Detection

5 Lowe 2004

Fun fact: This is what SIFT was originally designed for

Lecture 6 - Fei-Fei Li, Jonathan Krause

Variants: Multi-Class Detection

6

slide-3
SLIDE 3

Lecture 6 - Fei-Fei Li, Jonathan Krause

Application: Tagging People

7

Putin Obama

Lecture 6 - Fei-Fei Li, Jonathan Krause

Application: Autonomous Driving

8 Huval et al., 2015 Lecture 6 - Fei-Fei Li, Jonathan Krause

Application: Robotics

9 Lai et al., 2012

slide-4
SLIDE 4

Lecture 6 - Fei-Fei Li, Jonathan Krause

Application: Tracking

10 Berclaz et al., 2011 Lecture 6 - Fei-Fei Li, Jonathan Krause

Application: Segmentation

11 Hariharan et al., 2014 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • 1. Sliding Window Methods
  • 2. Region-based Methods
  • 3. Extra Topics

Outline

12

slide-5
SLIDE 5

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • 1. Sliding Window Methods
  • 1. Overview
  • 2. Viola-Jones Face Detection
  • 3. HOG
  • 4. Exemplar SVM
  • 5. DPM
  • 2. Region-based Methods
  • 3. Extra Topics

Outline

13 Lecture 6 - Fei-Fei Li, Jonathan Krause

Getting Started: Kitten Detection

14

Goal: Detect all kittens

Lecture 6 - Fei-Fei Li, Jonathan Krause

Checking Windows for Kittens

15

No

Run a classifier at each sliding window

slide-6
SLIDE 6

Lecture 6 - Fei-Fei Li, Jonathan Krause

Checking Windows for Kittens

16

No

Lecture 6 - Fei-Fei Li, Jonathan Krause

Checking Windows for Kittens

17

No

Lecture 6 - Fei-Fei Li, Jonathan Krause

Checking Windows for Kittens

18

No

slide-7
SLIDE 7

Lecture 6 - Fei-Fei Li, Jonathan Krause

Sliding Windows

19

Evaluate every bounding box position

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Even if we search all 2d positions, still don’t

know aspect ratio or scale.

Aspect Ratio and Scale

20

  • Solution: Multiple aspect ratios and multi-scale

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Extremely fast
  • Very accurate (at the time)

Viola Jones Face Detector

21 Viola, Jones. 2001

slide-8
SLIDE 8

Lecture 6 - Fei-Fei Li, Jonathan Krause

Viola Jones

22 Viola, Jones. 2001

Key Idea: Boosting on weak classifiers

Lecture 6 - Fei-Fei Li, Jonathan Krause

Haar Filters

23 Viola, Jones. 2001

Simple patterns of lightness and darkness

Lecture 6 - Fei-Fei Li, Jonathan Krause

Haar Filters w/Integral Images

24

Filter: Image:

Decomposition: smaller filters

slide-9
SLIDE 9

Lecture 6 - Fei-Fei Li, Jonathan Krause

Haar Filters w/Integral Images

25

Response at a single location: =

  • +

Only need to compute sum of top-left responses (DP)!

Lecture 6 - Fei-Fei Li, Jonathan Krause

Viola Jones: Weak Classifiers

26 Viola, Jones. 2001

Each Haar filter is a weak classifier

Top classifier Second best

Lecture 6 - Fei-Fei Li, Jonathan Krause

Combining Weak Classifiers

27 Viola, Jones. 2001

AdaBoost:

: binary classifier on Haar filter t : learned weight on classifier t AdaBoost classifier: minimizes loss:

slide-10
SLIDE 10

Lecture 6 - Fei-Fei Li, Jonathan Krause

Cascade

28 Viola, Jones. 2001

Reject negatives quickly

Should remind you of TLD

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Fast at runtime
  • Takes a long time to train
  • Very accurate (at the time)
  • Inspired other detection methods

Viola Jones Summary

29 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Histograms of Oriented Gradients
  • Designed for Pedestrian Detection
  • Really just good feature engineering

HOG

30 Dalal, Triggs. 2005

slide-11
SLIDE 11

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Lots of feature engineering…

HOG

31 Dalal, Triggs. 2005 Lecture 6 - Fei-Fei Li, Jonathan Krause

More feature engineering

32 Dalal, Triggs. 2005 Lecture 6 - Fei-Fei Li, Jonathan Krause

But it works

33 Dalal, Triggs. 2005

avg. gradient max pos. SVM weight min neg. SVM weight HOG pos SVM weights neg SVM weights

slide-12
SLIDE 12

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Key idea: Train a separate SVM for each positive

training example (on HOG features!).

Exemplar SVM

34 Malisiewicz et al. 2011 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Q: But wait, isn’t that going to be horribly slow?
  • A: Yep! Much slower than a single SVM. No one I

know of actually uses this. However….

  • Can transfer metadata (segmentations!)

Exemplar SVM

35 Malisiewicz et al. 2011 Lecture 6 - Fei-Fei Li, Jonathan Krause

Exemplar SVM Examples

36 Malisiewicz et al. 2011

slide-13
SLIDE 13

Lecture 6 - Fei-Fei Li, Jonathan Krause

Exemplar SVM Examples

37 Malisiewicz et al. 2011 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • (sneak preview of student presentation)
  • Similar to SVM on HOG, but also with parts

(latent SVM)

  • State of the art for several years

Deformable Part Models

38 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Evaluate classifier at many positions
  • Dominant detection paradigm until ~2 years ago
  • Boosting, SVM, and DPM

Sliding Window Summary

39

slide-14
SLIDE 14

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • 1. Sliding Window Methods
  • 2. Region-based Methods

1. Motivation 2. Region Proposals 3. R-CNN

  • 3. Extra Topics

Outline

40 Lecture 6 - Fei-Fei Li, Jonathan Krause

Sliding Window Problem: Efficiency

41

Q: How many bounding boxes in this 482 x 348 image? A: 6,999,078,138 (7 trillion)

Lecture 6 - Fei-Fei Li, Jonathan Krause 42

Can’t classify 7 trillion windows, even millions is slow. Can we massively cut down this number (e.g. 1000s)?

Sliding Window Problem: Efficiency

slide-15
SLIDE 15

Lecture 6 - Fei-Fei Li, Jonathan Krause

Detection on Regions

43

  • Generate detection proposals (typically ~2000)
  • Classify each region with a much stronger classifier
  • More or less taken over modern detection

van de Sande et al., 2011 Lecture 6 - Fei-Fei Li, Jonathan Krause

Region Proposals

44

  • Sliding window or grouping pixels
  • May or may not output score
  • Varying amount of control over number of regions

“What makes for effective detection proposals?”. Hosang, Benenson, Dollar, Schiele. 2015 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Sliding window
  • Score based on a bunch of heuristic features

Objectness

45 Alexe, Deselares, Ferrari. 2010

slide-16
SLIDE 16

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Felzenszwalb superpixels
  • Merge based on color features
  • Most common method in use

Selective Search

46 van de Sande et al., 2011 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Structured decision forest for object boundaries
  • Coarse sliding windows with location refinement
  • Seems fast and accurate, but time will tell

Edge Boxes

47 Zitnick, Dollar. 2014 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • What fraction of ground truth bounding boxes

do they recover?

  • How many proposals does it take?
  • At what IoU overlap threshold?

Evaluating Region Proposals

48 “What makes for effective detection proposals?”. Hosang, Benenson, Dollar, Schiele. 2015

slide-17
SLIDE 17

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Recall at IoU threshold=0.7 predicts detection

performance well

  • Most people use ~2000 regions produced with

Selective Search (a few seconds/image)

  • Edge Boxes looks promising

In Practice

49 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Most detectors, region proposal methods in

particular, reduce detection to repeated classification

  • Let’s take a look at a few key ideas in

classification

Aside: Classification

50 Lecture 6 - Fei-Fei Li, Jonathan Krause

Descriptors

Classification: Bag of Words

51

Codebook

Offline: Cluster descriptors in training images

Histogram SVM

frequency

Note: No spatial information

Early 2000s

slide-18
SLIDE 18

Lecture 6 - Fei-Fei Li, Jonathan Krause

Classification: Spatial Pyramid

52

big SVM

Lazebnik et al. 2006

2006 and onward

Lecture 6 - Fei-Fei Li, Jonathan Krause

Classification

53

  • Sparse Coding (LLC: Locality constrained Linear

Coding)

  • Represent descriptor with more than one codeword
  • Fisher Vectors
  • Represent difference between descriptor and

codewords (very roughly)

  • A little better, still used sometimes

Wang et al. 2010 Perronnin et al. 2010

2010 and on

Lecture 6 - Fei-Fei Li, Jonathan Krause

2012

54

  • In 2012 neural networks started working

[Krizhevsky et al. 2012]

Russakovsky et al. 2015

slide-19
SLIDE 19

Lecture 6 - Fei-Fei Li, Jonathan Krause

Neural Nets

55

  • Learn the whole pipeline (pixels to classes)

from scratch.

  • Many layers of (learned) intermediate features
  • Will see more in student presentation

Krizhevsky et al. 2012 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • R-CNN = Selective Search + CNN
  • That’s it.

R-CNN

56 Girshick et al. 2014 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Need region to fit input size of CNN
  • Region warping method:

R-CNN Details

57 Girshick et al. 2014

add context region pad with zero warp works the best

slide-20
SLIDE 20

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Context around region
  • 0 or 16 pixels (in CNN reference frame)

R-CNN Details

58 Girshick et al. 2014

region works the best 16

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • CNN Layer is important
  • fc6 best?

R-CNN Details

59 Girshick et al. 2014 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • fine-tuning on PASCAL (CNN trained on ILSVRC)
  • It helps, and may make another layer better

R-CNN Details

60 Girshick et al. 2014

slide-21
SLIDE 21

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Bounding box regression
  • Regress from CNN features to bounding box
  • Helps quite a bit

R-CNN Details

61 Girshick et al. 2014 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Train SVM on top of CNN features
  • Be careful about which are positives and which

are negatives (use the IoU overlap!)

  • Hard negative mining for efficiency.

R-CNN Details

62 Girshick et al. 2014 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • 1. Sliding Window Methods
  • 2. Region-based Methods
  • 3. Extra Topics

Outline

63

slide-22
SLIDE 22

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Typically done with Average Precision (AP)
  • When considering multiple classes, use mean

(across classes) Average Precision (mAP)

Evaluation

64

Bounding box is correct if IoU >= .5 Be careful about handling multiple detections

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Surroundings can provide information
  • Many methods use a weak version of this

Context

65

What object is this?

R-CNN uses a weak version of context

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Turn multiple detections into one
  • Common approach: merge bounding boxes with

>= 0.5 (or some threshold) IoU, keep the higher scoring box.

Non-maximal Suppression

66

slide-23
SLIDE 23

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Efficient sliding windows with CNNs

OverFeat

67 Sermanet et al. 2013

For CNNs, you can reuse a lot of computation of the first layers

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Very new, reuses most CNN computation across

regions

Fast R-CNN

68

  • Girshick. 2015

Hot off the presses — a couple of weeks old

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • Try to learn the region proposals

Multibox

69 Erhan et al. 2014

Going to have a guest speaker talk about this

slide-24
SLIDE 24

Lecture 6 - Fei-Fei Li, Jonathan Krause

  • 20 Object Categories, thousands of images
  • 2007-2012
  • Was the dataset for a long time.

Detection Challenges: PASCAL

70 Lecture 6 - Fei-Fei Li, Jonathan Krause

  • 200 Object Categories, 100,000s of images
  • 2013-current
  • Not all images fully annotated.

Detection Challenges: ILSVRC

71