The 2006 PASCAL Visual Object Classes Challenge Mark Everingham - - PowerPoint PPT Presentation

the 2006 pascal visual object classes challenge
SMART_READER_LITE
LIVE PREVIEW

The 2006 PASCAL Visual Object Classes Challenge Mark Everingham - - PowerPoint PPT Presentation

The 2006 PASCAL Visual Object Classes Challenge Mark Everingham Luc Van Gool Chris Williams Andrew Zisserman Challenge Ten object classes bicycle, bus, car, cat, cow, dog, horse, motorbike, person, sheep Classification


slide-1
SLIDE 1

The 2006 PASCAL Visual Object Classes Challenge

Mark Everingham Luc Van Gool Chris Williams Andrew Zisserman

slide-2
SLIDE 2

Challenge

  • Ten object classes

– bicycle, bus, car, cat, cow, dog, horse, motorbike, person, sheep

  • Classification

– Predict whether at least one object of a given class is present

  • Detection

– Predict bounding boxes of objects of a given class

slide-3
SLIDE 3

Competitions

  • Train on the supplied data

– Which methods perform best given specified training data?

  • Train on any (non-test) data

– How well do state-of-the-art methods perform on these problems? – Which methods perform best?

slide-4
SLIDE 4

Dataset

  • Images taken from three sources

– Personal photos contributed by Edinburgh/Oxford – Microsoft Research Cambridge images – Images taken from “flickr” photo-sharing website

  • Annotation

– Bounding box – Viewpoint: front, rear, left, right, unspecified – “Truncated” flag: Bounding box ≠ object extent – “Difficult” flag: Objects ignored in challenge

slide-5
SLIDE 5

Examples

Bicycle Car Bus Cat Cow Dog Motorbike Horse Person Sheep

slide-6
SLIDE 6

Annotation Procedure

  • All annotation performed in a single session in a single

location by seven annotators

  • Detailed guidelines decided beforehand

– What to label

  • Not excessive motion blur, poor illumination etc.
  • Object size, “recognisability”, level of occlusion
  • “Close-fitting occluders” e.g. snow/mud treated as object
  • Through glass, mirrors, pictures: label, reflections (=occlusion)
  • Non-photorealistic pictures: don’t label

– Viewpoint – Bounding box e.g. don’t extend greatly for few pixels – Truncation: significant amount of object outside bounding box

  • “Difficult” flag set afterwards by a single annotator

examining individual objects in isolation

slide-7
SLIDE 7

Dataset Statistics

train val trainval test img

  • bj

img

  • bj

img

  • bj

img

  • bj

Bicycle 127 161 143 162 270 323 268 326 Bus 93 118 81 117 174 235 180 233 Car 271 427 282 427 553 854 544 854 Cat 192 214 194 215 386 429 388 429 Cow 102 156 104 157 206 313 197 315 Dog 189 211 176 211 365 422 370 423 Horse 129 164 118 162 247 326 254 324 Motorbike 118 138 117 137 235 275 234 274 Person 319 577 347 579 666 1156 675 1153 Sheep 119 211 132 210 251 421 238 422 Total 1277 2377 1341 2377 2618 4754 2686 4753

slide-8
SLIDE 8

Participation

  • 22 participants submitted results

– 14 different institutions

  • 28 different methods

– 19 for classification task only – 4 for detection task only – 5 for classification and detection

slide-9
SLIDE 9
  • 1. Classification Task

Predict whether at least one object of a given class is present

slide-10
SLIDE 10

Evaluation

  • Receiver Operating Characteristic (ROC)

– Area Under Curve (AUC)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Rate True Positive Rate

AUC

slide-11
SLIDE 11

Methods

  • Bag of words: 15/20 (75%)
  • Correspondence-based
  • Classification of individual patches/regions
  • Local classification of “concepts”
  • Graph neural network
  • Classification by detection

– Generalized Hough transform – “Star” constellation model – Sliding-window classifier

slide-12
SLIDE 12

“Bag of words” Methods

  • Local regions are extracted from the image
  • Region appearance is described by a descriptor
  • Descriptors are quantized into “visual words”
  • Image is represented as a histogram of visual

words

  • Classifier is trained to output class/non-class

Region Selection Region Description Vector Quantization Classifier Histogram

slide-13
SLIDE 13

Region Selection

  • “Sparse” methods based on interest points

– Scale invariant: Harris-Laplace, Laplacian, DoG – Affine invariant: Hessian-Affine, MSER – Wavelets

  • “Dense” methods

– Multi-scale (overlapping) grid

  • Other methods

– Random position and scale patches with feedback from classifier – Segmented regions

  • Combination of multiple methods

Region Selection Region Description Vector Quantization Classifier Histogram

slide-14
SLIDE 14

Region Description

  • SIFT
  • PCA on vector of pixel values
  • Haar wavelets
  • Grey-level moments and invariants
  • Colour and colour histograms
  • Shape context
  • Texture moments, texton histograms
  • Position in spatial pyramid

Region Selection Region Description Vector Quantization Classifier Histogram

slide-15
SLIDE 15

Vector Quantization

  • Single codebook
  • Multiple codebooks: per class, per region type,

per descriptor type

  • K-means, LBG clustering
  • Supervised clustering
  • Random cluster centres + selection by validation

Region Selection Region Description Vector Quantization Classifier Histogram

slide-16
SLIDE 16

Histogramming

  • “Continuous valued”

– Record frequency of each visual word

  • Binary valued

– Record only presence/absence of each visual word

Region Selection Region Description Vector Quantization Classifier Histogram

slide-17
SLIDE 17

Classifier

  • Non-linear SVM: χ2 kernel

– Single classifier – Classifier per pyramid level

  • Linear

– Logistic regression/iterative scaling – Linear SVM – Least angle regression

  • Other

– Linear programming boosting

Region Selection Region Description Vector Quantization Classifier Histogram

slide-18
SLIDE 18

Other Methods

  • Correspondence-based: Find nearest neighbour

region in training images (with geometric context) and vote by class of training image

  • Classification of individual patches/regions:

Classify patches and accumulate class confidence over patches in the image

– Nearest neighbour, boosting, self-organizing map

  • Graph neural network: Segment image into a

fixed number of regions and classify based on region descriptors and neighbour relations

slide-19
SLIDE 19

Classification by Detection

  • Detect objects of particular class in the image

– Generalized Hough transform – “Star” Constellation model – Sliding-window classifier

  • Assign maximum detection confidence as image

classification confidence

  • More in-line with human intuition: “There is a car

here therefore the image contains a car”

slide-20
SLIDE 20

Classification Results

Competition 1: Train on VOC data

slide-21
SLIDE 21

Participants

bicycle bus car cat cow dog horse motor bike person sheep × × × × ENSMP − − − − − − − − − − INRIA_Douze − − − − − − − − − − INRIA_Laptev − − − − − − − − − − TUD − − − − − − − − − − × × × × × KUL − − − − − − − − − − MIT_Fergus − − − − − − − − − − MIT_Torralba − − − − − − − − − − × × × × TKK × × × × × × × × × × UVA × × × × × × × × × × XRCE × × × × × × × × × × × × × × × × − × × × × × × × × × − × − × × × × × × × × × × × − × × × × × × × × × × × × × × × × × × × × × × × − × × × × × × × × × × × − × × × × × × × × × × × × × − × × × × × × × × × × × × × × × × × × × − × × × × × × AP06_Batra AP06_Lee Cambridge INRIA_Larlus INRIA_Marszalek INRIA_Moosmann INRIA_Nowak INSARouen MUL QMUL RWTH Siena

slide-22
SLIDE 22

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positive rate true positive rate QMUL_HSLS (0.977) QMUL_LSPCH (0.975) INRIA_Marszalek (0.971) INRIA_Nowak (0.971) XRCE (0.967) INRIA_Moosmann (0.957) UVA_big5 (0.945) INRIA_Larlus (0.943) TKK (0.943) RWTH_GMM (0.942) RWTH_SparseHists (0.935) RWTH_DiscHist (0.930) MUL_1v1 (0.928) MUL_1vALL (0.914) UVA_weibull (0.910) AP06_Lee (0.897) INSARouen (0.895) Cambridge (0.887) Siena (0.842) AP06_Batra (0.833)

Competition 1: Car

  • All methods
slide-23
SLIDE 23

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 false positive rate true positive rate QMUL_HSLS (0.977) QMUL_LSPCH (0.975) INRIA_Marszalek (0.971) INRIA_Nowak (0.971) XRCE (0.967)

Competition 1: Car

  • Top 5 methods by AUC
slide-24
SLIDE 24

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positive rate true positive rate XRCE (0.863) QMUL_LSPCH (0.855) INRIA_Marszalek (0.845) QMUL_HSLS (0.845) INRIA_Nowak (0.814) TKK (0.781) INRIA_Moosmann (0.780) RWTH_SparseHists (0.776) UVA_big5 (0.774) RWTH_DiscHist (0.764) INRIA_Larlus (0.736) UVA_weibull (0.723) MUL_1v1 (0.718) RWTH_GMM (0.718) Cambridge (0.715) Siena (0.660) AP06_Lee (0.622) MUL_1vALL (0.616) AP06_Batra (0.550)

Competition 1: Person

  • All methods
slide-25
SLIDE 25

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 false positive rate true positive rate XRCE (0.863) QMUL_LSPCH (0.855) INRIA_Marszalek (0.845) QMUL_HSLS (0.845) INRIA_Nowak (0.814)

Competition 1: Person

  • Top 5 methods by AUC
slide-26
SLIDE 26

AUC by Method and Class

bicycle bus car cat cow dog horse motor bike person sheep AP06_Batra 0.791 0.637 0.833 0.733 0.756 0.644 0.607 0.672 0.550 0.792 AP06_Lee 0.845 0.916 0.897 0.859 0.838 0.766 0.694 0.829 0.622 0.875 Cambridge 0.873 0.864 0.887 0.822 0.850 0.768 0.754 0.844 0.715 0.866 INRIA_Larlus 0.903 0.948 0.943 0.870 0.880 0.743 0.850 0.890 0.736 0.892 INRIA_Marszalek 0.929 0.984 0.971 0.922 0.938 0.856 0.908 0.964 0.845 0.944 INRIA_Moosmann 0.903 0.933 0.957 0.883 0.895 0.825 0.824

  • 0.780

0.930 INRIA_Nowak 0.924 0.973 0.971 0.906 0.892 0.797 0.904 0.961 0.814 0.940 INSARouen

  • 0.895
  • 0.764
  • 0.869

MUL_1vALL 0.857 0.852 0.914 0.562 0.632 0.584 0.525 0.831 0.616 0.758 MUL_1v1 0.864 0.945 0.928 0.826 0.789 0.764 0.733 0.906 0.718 0.872 QMUL_HSLS 0.944 0.984 0.977 0.936 0.936 0.874 0.922 0.966 0.845 0.946 QMUL_LSPCH 0.948 0.981 0.975 0.937 0.938 0.876 0.926 0.969 0.855 0.956 RWTH_DiscHist 0.874 0.955 0.930 0.879 0.910 0.799 0.854 0.938 0.764 0.906 RWTH_GMM 0.882 0.935 0.942 0.866 0.856 0.825 0.802 0.905 0.718 0.892 RWTH_SparseHists 0.863 0.941 0.935 0.883 0.883 0.704 0.844 0.858 0.776 0.907 Siena 0.671 0.749 0.842 0.696 0.774 0.677 0.644 0.701 0.660 0.768 TKK 0.857 0.928 0.943 0.871 0.892 0.811 0.806 0.908 0.781 0.900 UVA_big5 0.897 0.929 0.945 0.845 0.862 0.785 0.806 0.923 0.774 0.885 UVA_weibull 0.855 0.880 0.910 0.818 0.849 0.762 0.759 0.888 0.723 0.811 XRCE 0.943 0.978 0.967 0.933 0.940 0.866 0.925 0.957 0.863 0.951

slide-27
SLIDE 27

Ranking by AUC per Class

bicycle bus car cat cow dog horse motor bike person sheep AP06_Batra 18 19 20 17 18 19 18 18 19 18 AP06_Lee 17 14 16 12 15 12 16 16 17 13 Cambridge 11 16 18 15 13 11 14 14 15 16 INRIA_Larlus 6 7 8 10 10 16 7 11 11 10 INRIA_Marszalek 4 1 3 4 2 4 4 3 3 4 INRIA_Moosmann 7 11 6 6 6 5 9

  • 7

6 INRIA_Nowak 5 5 4 5 7 9 5 4 5 5 INSARouen

  • 17
  • 13
  • 15

MUL_1vALL 14 17 14 19 19 20 19 15 18 20 MUL_1v1 12 8 13 14 16 14 15 9 13 14 QMUL_HSLS 2 2 1 2 4 2 3 2 4 3 QMUL_LSPCH 1 3 2 1 3 1 1 1 2 1 RWTH_DiscHist 10 6 12 8 5 8 6 6 10 8 RWTH_GMM 9 10 10 11 12 6 12 10 14 11 RWTH_SparseHists 13 9 11 7 9 17 8 13 8 7 Siena 19 18 19 18 17 18 17 17 16 19 TKK 15 13 9 9 8 7 10 8 6 9 UVA_big5 8 12 7 13 11 10 11 7 9 12 UVA_weibull 16 15 15 16 14 15 13 12 12 17 XRCE 3 4 5 3 1 3 2 5 1 2

slide-28
SLIDE 28

AUC by Class

0.7 0.75 0.8 0.85 0.9 0.95 1 bicycle bus car cat cow dog horse motorbike person sheep Max Median

slide-29
SLIDE 29

“ANOVA” Analysis

  • Explain AUC as a function of method i and class j:

AUC(i,j)=αi+βj+μ

Method α QMUL_LSPCH 0.074 QMUL_HSLS 0.071 XRCE 0.070 INRIA_Marszalek 0.064 INRIA_Nowak 0.046 RWTH_DiscHist 0.019 TKK 0.007 INRIA_Larlus 0.003 UVA_big5 0.003 RWTH_GMM 0.000 RWTH_SparseHists

  • 0.003

MUL_1v1

  • 0.028

UVA_weibull

  • 0.037

Cambridge

  • 0.038

AP06_Lee

  • 0.048

Siena

  • 0.144

MUL_1vALL

  • 0.149

AP06_Batra

  • 0.161

Class β car 0.047 bus 0.029 motorbike 0.003 sheep 0.000 bicycle

  • 0.008

cow

  • 0.025

cat

  • 0.039

horse

  • 0.089

dog

  • 0.109

person

  • 0.138

AUC(i,j) Class Method

slide-30
SLIDE 30

Median ranked images: Bicycle

  • Highest ranked class images
  • Highest ranked non-Microsoft class images
slide-31
SLIDE 31

Median ranked images: Bicycle

  • Lowest ranked class images
  • Highest ranked non-class images
slide-32
SLIDE 32

Median ranked images: Bus

  • Highest ranked class images
slide-33
SLIDE 33

Median ranked images: Bus

  • Lowest ranked class images
  • Highest ranked non-class images
slide-34
SLIDE 34

Median ranked images: Car

  • Highest ranked class images
  • Highest ranked non-Microsoft class images
slide-35
SLIDE 35

Median ranked images: Car

  • Lowest ranked class images
  • Highest ranked non-class images
slide-36
SLIDE 36

Median ranked images: Cat

  • Highest ranked class images
slide-37
SLIDE 37

Median ranked images: Cat

  • Lowest ranked class images
  • Highest ranked non-class images
slide-38
SLIDE 38

Median ranked images: Cow

  • Highest ranked class images
  • Highest ranked non-Microsoft class images
slide-39
SLIDE 39

Median ranked images: Cow

  • Lowest ranked class images
  • Highest ranked non-class images
slide-40
SLIDE 40

Median ranked images: Dog

  • Highest ranked class images
slide-41
SLIDE 41

Median ranked images: Dog

  • Lowest ranked class images
  • Highest ranked non-class images
slide-42
SLIDE 42

Median ranked images: Horse

  • Highest ranked class images
slide-43
SLIDE 43

Median ranked images: Horse

  • Lowest ranked class images
  • Highest ranked non-class images
slide-44
SLIDE 44

Median ranked images: Motorbike

  • Highest ranked class images
slide-45
SLIDE 45

Median ranked images: Motorbike

  • Lowest ranked class images
  • Highest ranked non-class images
slide-46
SLIDE 46

Median ranked images: Person

  • Highest ranked class images
slide-47
SLIDE 47

Median ranked images: Person

  • Lowest ranked class images
  • Highest ranked non-class images
slide-48
SLIDE 48

Median ranked images: Sheep

  • Highest ranked class images
  • Highest ranked non-Microsoft class images
slide-49
SLIDE 49

Median ranked images: Sheep

  • Lowest ranked class images
  • Highest ranked non-class images
slide-50
SLIDE 50

Classification Results

Competition 2: Train on own data

slide-51
SLIDE 51

Participants

bicycle bus car cat cow dog horse motor bike person sheep − − − − ENSMP − − − − − − − − − − INRIA_Douze − − − − − − − − − − INRIA_Laptev − − − − − − − − − − TUD − − − − − − − − − − − − − − − KUL − − − − − − − × − − MIT_Fergus − − × − − − − × − − MIT_Torralba − − × − − − − − − − − − − − TKK − − − − − − − − − − UVA − − − − − − − − − − XRCE − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − AP06_Batra AP06_Lee Cambridge INRIA_Larlus INRIA_Marszalek INRIA_Moosmann INRIA_Nowak INSARouen MUL QMUL RWTH Siena

slide-52
SLIDE 52

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positive rate true positive rate MIT_Fergus (0.763) MIT_Torralba (0.745)

Competition 2: Train on own data

  • Class “car”
  • Max AUC

trained on VOC data: 0.977 vs. 0.763 here

slide-53
SLIDE 53

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positive rate true positive rate MIT_Fergus (0.821) KUL (0.797)

Competition 2: Train on own data

  • Class “motorbike”
  • Max AUC

trained on VOC data: 0.969 vs. 0.821 here

slide-54
SLIDE 54

Conclusions?

  • Best results obtained by “bag of words” model

– Number of small variations on basic bag of words model giving small differences in performance – Less diverse than VOC2005: χ2 kernel

  • Seemingly better results than VOC2005 test2

– More balanced mix of close-up/distant images?

  • Qualitative observations

– Bias towards close-up views – Exploitation of context? bicycle/railings – Bias towards particular image composition? cats/dogs – Not always intuitive confusions? motorbike/bicycle

slide-55
SLIDE 55
  • 2. Detection Task

Predict bounding boxes of objects of a given class

slide-56
SLIDE 56

Participants

bicycle bus car cat cow dog horse motor bike person sheep − − − × ENSMP − − × − × − − − − − INRIA_Douze × × × − × − − × × × INRIA_Laptev × − − − × − × × × − TUD − − − − − − − × × − − − − − − KUL − − − − − − − − − − MIT_Fergus − − − − − − − − − − MIT_Torralba − − − − − − − − − − − − − − TKK × × × × × × × × × × UVA − − − − − − − − − − XRCE − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − × − − − − − − − − − − − AP06_Batra AP06_Lee Cambridge INRIA_Larlus INRIA_Marszalek INRIA_Moosmann INRIA_Nowak INSARouen MUL QMUL RWTH Siena

slide-57
SLIDE 57

Detection Results

Competition 3: Train on VOC data

slide-58
SLIDE 58

AP by Method and Class

bicycle bus car cat cow dog horse motor bike person sheep Cambridge 0.249 0.138 0.254 0.151 0.149 0.118 0.091 0.178 0.030 0.131 ENSMP

  • 0.398
  • 0.159
  • INRIA_Douze

0.414 0.117 0.444

  • 0.212
  • 0.390

0.164 0.251 INRIA_Laptev 0.440

  • 0.224
  • 0.140

0.318 0.114

  • TUD
  • 0.153

0.074

  • TKK

0.303 0.169 0.222 0.160 0.252 0.113 0.137 0.265 0.039 0.227 bicycle bus car cat cow dog horse motor bike person sheep Cambridge 4 2 3 2 5 1 3 4 5 3 ENSMP

  • 2
  • 4
  • INRIA_Douze

2 3 1

  • 3
  • 1

1 1 INRIA_Laptev 1

  • 2
  • 1

2 2

  • TUD
  • 5

3

  • TKK

3 1 4 1 1 2 2 3 4 2

Rank by AP per Class

slide-59
SLIDE 59

Programme

  • 15:00 - 15:30 Overview of the 2006 VOC Challenge.

Mark Everingham, University of Oxford.

  • 15:30 - 15:55 TextonBoost: Joint Appearance, Shape and Context Modeling

for Multi-Class Object Recognition and Segmentation. John Winn, Microsoft Research Cambridge.

  • 15:55 - 16:20 Object Detection using Histograms of Oriented Gradients.

Navneet Dalal, INRIA Rhones-Alpes.

  • 16:20 - 16:35 Break.
  • 16:35 - 17:00 Local Features and Kernels for Classification of Object

Categories. Jianguo Zhang, Queen Mary University of London.

  • 17:00 - 17:30 The MUSCLE / ImageCLEF Image Retrieval Evaluation

Campaigns. Allan Hanbury, Vienna University of Technology.

  • 17:30 - 18:00 Conclusions & Discussion.
slide-60
SLIDE 60

Recap: Classification Task

slide-61
SLIDE 61

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false positive rate true positive rate QMUL_HSLS (0.977) QMUL_LSPCH (0.975) INRIA_Marszalek (0.971) INRIA_Nowak (0.971) XRCE (0.967) INRIA_Moosmann (0.957) UVA_big5 (0.945) INRIA_Larlus (0.943) TKK (0.943) RWTH_GMM (0.942) RWTH_SparseHists (0.935) RWTH_DiscHist (0.930) MUL_1v1 (0.928) MUL_1vALL (0.914) UVA_weibull (0.910) AP06_Lee (0.897) INSARouen (0.895) Cambridge (0.887) Siena (0.842) AP06_Batra (0.833)

Competition 1: Car

  • All methods
slide-62
SLIDE 62

AUC by Class

0.7 0.75 0.8 0.85 0.9 0.95 1 bicycle bus car cat cow dog horse motorbike person sheep Max Median

slide-63
SLIDE 63
  • 2. Detection Task

Predict bounding boxes of objects of a given class

slide-64
SLIDE 64

Evaluation

  • Correct detection: 50% overlap in bounding boxes

– Multiple detections considered as (one true + ) false positives

  • Precision/Recall

– Average Precision (AP) as defined by TREC

  • Mean precision interpolated at recall = 0,0.1,…,0.9,1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision

Measured Interpolated

slide-65
SLIDE 65

Methods

  • Sliding-window classifiers

– Assign confidence to windows over an image pyramid – Non-maximum suppression to obtain bounding boxes – Multi-view methods: one classifier per view – Classifiers and features

  • Linear SVM classifier with SIFT-like spatial/orientation

histogram

  • Boosted classifier with spatial orientation histogram features
  • Boosted classifier with pixel-level features
  • Boosted classifier with template correlation features shared

across views

slide-66
SLIDE 66

Methods

  • Generalized Hough transform

– Vector-quantized regions around interest points “vote” for centre of object by a non-parametric distribution – Single-view and multi-view methods

  • Multi-view method reinforces votes by “overlapping” views
  • Constellation model

– Gaussian distribution with full covariance over position and appearance of small number of regions, detected at interest points

slide-67
SLIDE 67

Methods

  • Pixel-wise classification

– Boosted classifier assigns class to each pixel based

  • n spatial neighbourhood. Bounding boxes are

derived from connected components of same class

  • Classification of segmented regions

– Regions from a segmentation algorithm classified and bounding boxes derived from region classification – Region confidence is combination of global (image) and local (region) confidence

slide-68
SLIDE 68

Detection Results

Competition 3: Train on VOC data

slide-69
SLIDE 69

Participants

bicycle bus car cat cow dog horse motor bike person sheep − − − × ENSMP − − × − × − − − − − INRIA_Douze × × × − × − − × × × INRIA_Laptev × − − − × − × × × − TUD − − − − − − − × × − − − − − − KUL − − − − − − − − − − MIT_Fergus − − − − − − − − − − MIT_Torralba − − − − − − − − − − − − − − TKK × × × × × × × × × × UVA − − − − − − − − − − XRCE − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − − − − − − − × − − − − − − − − − − − × − − − − − − − × − − − − − − − − − − − AP06_Batra AP06_Lee Cambridge INRIA_Larlus INRIA_Marszalek INRIA_Moosmann INRIA_Nowak INSARouen MUL QMUL RWTH Siena

slide-70
SLIDE 70

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision INRIA_Laptev (0.440) INRIA_Douze (0.414) TKK (0.303) Cambridge (0.249)

Competition 3: Train on VOC data

  • Class “bicycle”
slide-71
SLIDE 71

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision TKK (0.169) Cambridge (0.138) INRIA_Douze (0.117)

Competition 3: Train on VOC data

  • Class “bus”
slide-72
SLIDE 72

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision INRIA_Douze (0.444) ENSMP (0.398) Cambridge (0.254) TKK (0.222)

Competition 3: Train on VOC data

  • Class “car”
slide-73
SLIDE 73

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision TKK (0.160) Cambridge (0.151)

Competition 3: Train on VOC data

  • Class “cat”
slide-74
SLIDE 74

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision TKK (0.252) INRIA_Laptev (0.224) INRIA_Douze (0.212) ENSMP (0.159) Cambridge (0.149)

Competition 3: Train on VOC data

  • Class “cow”
slide-75
SLIDE 75

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision Cambridge (0.118) TKK (0.113)

Competition 3: Train on VOC data

  • Class “dog”
slide-76
SLIDE 76

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision INRIA_Laptev (0.140) TKK (0.137) Cambridge (0.091)

Competition 3: Train on VOC data

  • Class “horse”
slide-77
SLIDE 77

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision INRIA_Douze (0.390) INRIA_Laptev (0.318) TKK (0.265) Cambridge (0.178) TUD (0.153)

Competition 3: Train on VOC data

  • Class “motorbike”
slide-78
SLIDE 78

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision INRIA_Douze (0.164) INRIA_Laptev (0.114) TUD (0.074) TKK (0.039) Cambridge (0.030)

Competition 3: Train on VOC data

  • Class “person”
slide-79
SLIDE 79

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision INRIA_Douze (0.251) TKK (0.227) Cambridge (0.131)

Competition 3: Train on VOC data

  • Class “sheep”
slide-80
SLIDE 80

AP by Method and Class

bicycle bus car cat cow dog horse motor bike person sheep Cambridge 0.249 0.138 0.254 0.151 0.149 0.118 0.091 0.178 0.030 0.131 ENSMP

  • 0.398
  • 0.159
  • INRIA_Douze

0.414 0.117 0.444

  • 0.212
  • 0.390

0.164 0.251 INRIA_Laptev 0.440

  • 0.224
  • 0.140

0.318 0.114

  • TUD
  • 0.153

0.074

  • TKK

0.303 0.169 0.222 0.160 0.252 0.113 0.137 0.265 0.039 0.227 bicycle bus car cat cow dog horse motor bike person sheep Cambridge 4 2 3 2 5 1 3 4 5 3 ENSMP

  • 2
  • 4
  • INRIA_Douze

2 3 1

  • 3
  • 1

1 1 INRIA_Laptev 1

  • 2
  • 1

2 2

  • TUD
  • 5

3

  • TKK

3 1 4 1 1 2 2 3 4 2

Rank by AP per Class

slide-81
SLIDE 81

AP by Class

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 bicycle bus car cat cow dog horse motorbike person sheep Max Median

slide-82
SLIDE 82

Detection Results

Competition 4: Train on own data

slide-83
SLIDE 83

Participants

bicycle bus car cat cow dog horse motor bike person sheep − − − − ENSMP − − − − − − − − − − INRIA_Douze − − − − − − − − × − INRIA_Laptev − − − − − − − − − − TUD − − − − − − − − − − − − − − − KUL − − − − − − − × − − MIT_Fergus − − × − − − − × − − MIT_Torralba − − × − − − − − − − − − − − TKK − − − − − − − − − − UVA − − − − − − − − − − XRCE − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − AP06_Batra AP06_Lee Cambridge INRIA_Larlus INRIA_Marszalek INRIA_Moosmann INRIA_Nowak INSARouen MUL QMUL RWTH Siena

slide-84
SLIDE 84

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision MIT_Torralba (0.217) MIT_Fergus (0.160)

Competition 4: Train on own data

  • Class “car”
  • Max AP trained
  • n VOC data:

0.444 vs. 0.217 here

slide-85
SLIDE 85

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision KUL (0.229) MIT_Fergus (0.159)

Competition 4: Train on own data

  • Class “motorbike”
  • Max AP trained
  • n VOC data:

0.390 vs. 0.229 here

slide-86
SLIDE 86

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall precision INRIA_Douze (0.162)

Competition 4: Train on own data

  • Class “person”
  • Max AP trained
  • n VOC data:

0.164 vs. 0.162 here (same method)

slide-87
SLIDE 87

Conclusions?

  • Much more challenging than classification task
  • No overall winner but sliding-window methods

tended to give best results

  • Generalized Hough transform method gave poor

results compared to VOC2005

– Greater viewpoint variation? Lack of SVM stage?

  • For “person” class, use of own training data

changed results little cf. VOC2005

– Sufficiently large training set? Extremely difficult?

slide-88
SLIDE 88

Overall Conclusions?

  • Classification: Variety of methods with predominance of

“bag of words”

– Some re-introduction of spatial information

  • Results on less rigid/non-manmade classes (dogs,

people) worse then “traditional” cars, motorbikes

– Bias towards classes with distinctive local appearance?

  • Hard to distinguish between many classification methods

– Usefulness of this task exhausted?

  • Detection: Sliding-window methods gave better results

than more explicit modelling of object “parts”

  • Still much progress to be made

– Unconstrained viewpoint etc. remain very challenging